cs.LG - 2023-07-26

Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy

  • paper_url: http://arxiv.org/abs/2307.14243
  • repo_url: None
  • paper_authors: Luca Clissa, Antonio Macaluso, Roberto Morelli, Alessandra Occhinegro, Emiliana Piscitiello, Ludovico Taddei, Marco Luppi, Roberto Amici, Matteo Cerri, Timna Hitrec, Lorenzo Rinaldi, Antonio Zoccoli
  • for: fluorescence microscopy image analysis and deep learning research in life sciences
  • methods: diverse markers for rodent neuronal cells’ nuclei and cytoplasm, ground-truth annotations for semantic segmentation, object detection, and counting
  • results: facilitating methodological advancements in computer vision approaches and catalyzing breakthroughs in fluorescence microscopy analysis for life sciences research.Here’s the Chinese version:
  • for: 生物科学中的染色体微scopic image分析和深度学习研究
  • methods: 使用多种标记突出rodent neuronal cells的核和细胞体特征
  • results: 促进计算机视觉方法的进步和推动生物科学研究中的突破性发现
    Abstract Fluorescent Neuronal Cells v2 is a collection of fluorescence microscopy images and the corresponding ground-truth annotations, designed to foster innovative research in the domains of Life Sciences and Deep Learning. This dataset encompasses three image collections in which rodent neuronal cells' nuclei and cytoplasm are stained with diverse markers to highlight their anatomical or functional characteristics. Alongside the images, we provide ground-truth annotations for several learning tasks, including semantic segmentation, object detection, and counting. The contribution is two-fold. First, given the variety of annotations and their accessible formats, we envision our work facilitating methodological advancements in computer vision approaches for segmentation, detection, feature learning, unsupervised and self-supervised learning, transfer learning, and related areas. Second, by enabling extensive exploration and benchmarking, we hope Fluorescent Neuronal Cells v2 will catalyze breakthroughs in fluorescence microscopy analysis and promote cutting-edge discoveries in life sciences. The data are available at: https://amsacta.unibo.it/id/eprint/7347
    摘要 fluorescent neuronal cells v2是一个包含 fluorescence microscopy 图像和相应的真实标注的集合,旨在推动生命科学和深度学习领域的创新研究。这个数据集包括三个图像集,其中 rodent neuronal cells的核和质物被用 diverse markers 染料来标示其 анатомиче或功能特征。同时,我们提供了真实标注数据 для多种学习任务,包括semantic segmentation,对象检测和计数。我们的贡献是twofold。首先,由于数据集的多样性和可访问的格式,我们期望我们的工作可以推动计算机视觉领域中的方法创新,包括 segmentation,检测,特征学习,无监督和自监督学习,转移学习,等等。其次,通过允许广泛探索和比较,我们希望 fluorescent neuronal cells v2 可以促进 fluorescence microscopy 分析的进步,并促进生命科学的前沿研究。数据可以在:https://amsacta.unibo.it/id/eprint/7347 获取。

Evolving Multi-Objective Neural Network Controllers for Robot Swarms

  • paper_url: http://arxiv.org/abs/2307.14237
  • repo_url: None
  • paper_authors: Karl Mason, Sabine Hauert
  • for: 这个研究是为了开发多目标控制器 для 群体机器人。
  • methods: 这个研究使用进化神经网络方法来训练群体机器人控制器,并在低精度Python实现和高精度Webots实现中进行训练和测试。
  • results: 研究结果显示,提案的方法可以有效地控制每个机器人,并且可以随着目标权重的调整,让机器人群展示不同的行为。同时,研究还证实了训练在低精度实现中的多目标神经网络控制器可以转移到高精度实现中,并且不需要进一步 retrained。
    Abstract Many swarm robotics tasks consist of multiple conflicting objectives. This research proposes a multi-objective evolutionary neural network approach to developing controllers for swarms of robots. The swarm robot controllers are trained in a low-fidelity Python simulator and then tested in a high-fidelity simulated environment using Webots. Simulations are then conducted to test the scalability of the evolved multi-objective robot controllers to environments with a larger number of robots. The results presented demonstrate that the proposed approach can effectively control each of the robots. The robot swarm exhibits different behaviours as the weighting for each objective is adjusted. The results also confirm that multi-objective neural network controllers evolved in a low-fidelity simulator can be transferred to high-fidelity simulated environments and that the controllers can scale to environments with a larger number of robots without further retraining needed.
    摘要 许多群体机器人任务具有多个冲突目标。本研究提出了一种多目标演化神经网络方法来开发群体机器人控制器。群体机器人控制器在低精度Python模拟器中进行训练,然后在使用Webots的高精度模拟环境进行测试。在 simulate the scalability of the evolved multi-objective robot controllers to environments with a larger number of robots。结果显示,提出的方法可以有效地控制每个机器人。机器人群体展示不同的行为,按照目标权重的调整。结果还证明了在低精度模拟器中进行多目标神经网络控制器的演化可以在高精度模拟环境中转移,而且控制器可以在更多的机器人环境中进行扩展,无需进行进一步的 retrained。

Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences

  • paper_url: http://arxiv.org/abs/2307.14225
  • repo_url: None
  • paper_authors: Scott Sanner, Krisztian Balog, Filip Radlinski, Ben Wedin, Lucas Dixon
  • for: 这个论文主要针对的是如何使用语言基于的偏好表达来进行推荐。
  • methods: 这篇论文使用了大型自然语言模型(LLM)的提示方法来进行推荐。
  • results: 研究发现,使用LLM的提示方法可以在没有特定任务的情况下(即冷启动)提供竞争力强的推荐性能,并且这种方法的偏好表达比Item-based CF方法更加可解释和透明。
    Abstract Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.
    摘要 传统推荐系统利用用户ITEM的喜好历史来推荐新的内容,但现代对话界面允许用户通过语言基于的喜好输入来提供一种全然不同的可能性。受最近大语言模型(LLM)的成功启发,我们研究其在基于ITEM和语言基于的喜好input中进行推荐的可能性,并与当前状态艺术CF方法进行比较。为支持这些研究,我们收集了一个新的数据集,该数据集包含了基于ITEM和语言基于的喜好输入,以及用户对各种(偏见)推荐ITEM和(无偏见)随机ITEM的评分。在许多实验结果中,我们发现了LLM在冷启动情况下,对于纯language基于的喜好(没有ITEM喜好)的推荐性能和ITEM基CF方法相当,即使没有特定任务的直接supervised训练(零shot)或只有几个标签(几shot)。这对于语言基于的喜好表示是更加可解释和易于理解,因为ITEM基CF方法的表示是基于VECTOR的。注意:这里使用的 Simplified Chinese 是指简化字符串的 Simplified Chinese,而不是指特定的字符串。

Online Modeling and Monitoring of Dependent Processes under Resource Constraints

  • paper_url: http://arxiv.org/abs/2307.14208
  • repo_url: None
  • paper_authors: Tanapol Kosolwattana, Huazheng Wang, Ying Lin
  • for: 监测受限资源的依赖过程集中有 kritical importance for abnormal event detection.
  • methods: 提出了一种online collaborative learning方法,可以动态分配资源以优先级进行高风险进程的利用和依赖动力学的探索。
  • results: 理论分析和实验证明了方法的效率。
    Abstract Monitoring a population of dependent processes under limited resources is critical for abnormal events detection. A novel online collaborative learning method is proposed to adaptively allocate the resources for exploitation of high-risk processes and exploration of dependent dynamics. Efficiency of the proposed method is proved through theoretical analysis and experiments.
    摘要 监测依赖过程的人口在有限资源下是检测异常事件的关键。提出了一种新的在线合作学习方法,以适应尽可能地分配资源,以便利用高风险过程的探索和依赖动态的探索。我们通过理论分析和实验证明了该方法的效率。

Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling

  • paper_url: http://arxiv.org/abs/2307.14199
  • repo_url: None
  • paper_authors: Masoume Kazemi, Davood Moradkhani, Alireza Abbas Alipour
  • for: 这项研究目的是研究压滤过程中硫铁残留物的影响,并通过Random Forest和Support Vector Machine模型来预测硫铁残留物的含水率。
  • methods: 本研究使用了Random Forest Regression和Support Vector Regression模型,这些模型以实验室样本中的连续变量(提取特征)为输入。
  • results: 研究发现,Random Forest Regression模型在预测硫铁残留物含水率方面比Support Vector Regression模型更为精确。
    Abstract The hydrometallurgical method of zinc production involves leaching zinc from ore and then separating the solid residue from the liquid solution by pressure filtration. This separation process is very important since the solid residue contains some moisture that can reduce the amount of zinc recovered. This study modeled the pressure filtration process through Random Forest (RF) and Support Vector Machine (SVM). The models take continuous variables (extracted features) from the lab samples as inputs. Thus, regression models namely Random Forest Regression (RFR) and Support Vector Regression (SVR) were chosen. A total dataset was obtained during the pressure filtration process in two conditions: 1) Polypropylene (S1) and 2) Polyester fabrics (S2). To predict the cake moisture, solids concentration (0.2 and 0.38), temperature (35 and 65 centigrade), pH (2, 3.5, and 5), pressure, cake thickness (14, 20, 26, and 34 mm), air-blow time (2, 10 and 15 min) and filtration time were applied as input variables. The models' predictive accuracy was evaluated by the coefficient of determination (R2) parameter. The results revealed that the RFR model is superior to the SVR model for cake moisture prediction.
    摘要 《锌生产水化металлурги法》中的压 филь特过程是非常重要的,因为压 filtering process中的固体剩下物含有一定的湿度,这可能会降低锌的回收率。本研究通过Random Forest(RF)和Support Vector Machine(SVM)模型来模拟压 filtering process。这两种模型都是回归模型,它们使用实验室样本中的连续变量(提取特征)作为输入。因此,我们选择了Random Forest Regression(RFR)和Support Vector Regression(SVR)模型来预测压 filtering process中固体剩下物的湿度。我们获得了压 filtering process在两种条件下的总数据集:1)Polypropylene(S1)和2)Polyester fabrics(S2)。为了预测固体剩下物的湿度,我们选择了以下输入变量:压 filtering time(2, 10和15 min),温度(35和65 centigrade),pH(2, 3.5和5),压力,压 filtering cake thickness(14, 20, 26和34 mm),空气吹气时间(2, 10和15 min)。我们使用R2参数来评估模型预测的准确性。结果表明,RFR模型在预测固体剩下物的湿度方面比SVR模型更为有力。

Efficient Learning of Discrete-Continuous Computation Graphs

  • paper_url: http://arxiv.org/abs/2307.14193
  • repo_url: https://github.com/nec-research/dccg
  • paper_authors: David Friede, Mathias Niepert
  • for: 本研究旨在提出新的方法来训练混合抽象和连续模型,以便更好地处理复杂的机器学习任务。
  • methods: 研究人员使用了混合抽象和连续模型,并使用了Stochastic softmax tricks来搅合抽象和连续模型。
  • results: 研究人员发现,使用新的方法可以训练复杂的混合抽象和连续模型,并且这些模型在一些benchmark datasets上表现更好。
    Abstract Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.
    摘要 很多超级vised和强化学习模型都可以借鉴混合精度和连续模型组件。端到端学习可靠的精度-连续模型是可组合的,通常更容易泛化,而且更易于解释。在建立精度-连续计算图时,一种常见的方法是通过将精度概率分布 integrate到神经网络中使用随机杂化技术。先前的工作主要集中在计算图上有单个精度组件的每个执行路径。我们分析了更复杂的随机计算图中多个顺序精度组件的行为。我们发现这些模型的参数优化很困难,主要是因为小的梯度和地方最小值。我们然后提出了两种新策略来解决这些挑战。首先,我们显示在训练时增加抽象噪声拟合的扩大参数可以改善学习行为。其次,我们提出特殊设计 для某些随机、精度-连续计算图的dropout降阶连接。通过广泛的实验,我们证明可以训练复杂的精度-连续模型,而标准随机杂化技术无法训练这些模型。此外,我们还证明复杂的精度模型在多个benchmark数据集上比其连续counterpart更好地适应和泛化。

A comparison of machine learning surrogate models of street-scale flooding in Norfolk, Virginia

  • paper_url: http://arxiv.org/abs/2307.14185
  • repo_url: None
  • paper_authors: Diana McSpadden, Steven Goldenberg, Binata Roy, Malachi Schram, Jonathan L. Goodall, Heather Richter
  • for: 评估低洼海岸城市(如诺福克,弗吉尼亚)面临的街道洪水问题,这会压力交通和废水系统,并可能导致财产损害。
  • methods: 使用诺福克降雨事件数据(2016-2018年),比较之前的抽象模型(基于随机森林算法)和两种深度学习模型:长短期记忆(LSTM)和闭合循环单元(GRU)的性能。
  • results: 研究表明,使用支持预测不确定性的模型架构和有效地结合相关多样特征是关键,以提高预测准确性和稳定性。
    Abstract Low-lying coastal cities, exemplified by Norfolk, Virginia, face the challenge of street flooding caused by rainfall and tides, which strain transportation and sewer systems and can lead to property damage. While high-fidelity, physics-based simulations provide accurate predictions of urban pluvial flooding, their computational complexity renders them unsuitable for real-time applications. Using data from Norfolk rainfall events between 2016 and 2018, this study compares the performance of a previous surrogate model based on a random forest algorithm with two deep learning models: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). This investigation underscores the importance of using a model architecture that supports the communication of prediction uncertainty and the effective integration of relevant, multi-modal features.
    摘要 低洼海岸城市,如尼科尔斯,面临洪水泛滥和潮汐的挑战,这会压力交通和废水系统,并可能导致财产损害。虽然高精度的物理学基模型可以准确预测城市洪水,但它们的计算复杂度使其不适用于实时应用。根据2016-2018年尼科尔斯雨事件的数据,本研究比较了之前的随机森林算法基于模型和两种深度学习模型:长短期记忆(LSTM)和闭包逻辑单元(GRU)的表现。这一研究强调了使用一种支持预测不确定性的模型架构,并有效地 integrate 多种多样的特征。

Learning Disentangled Discrete Representations

  • paper_url: http://arxiv.org/abs/2307.14151
  • repo_url: https://github.com/david-friede/lddr
  • paper_authors: David Friede, Christian Reimers, Heiner Stuckenschmidt, Mathias Niepert
  • for: 该论文探讨了精制的离散特征空间如何提高分离表示的质量。
  • methods: 该论文使用了特定的 categorical variational autoencoder (VAE),并通过分析和实验证明了离散分布的格子结构可以减轻多变量 Gaussian 分布中的旋转不变性问题,从而为分离表示提供了有效的induction prior。
  • results: 该论文通过分析和实验表明,离散 VAE 可以更好地学习分离表示,并提出了首个不经过标注的模型选择策略,以便寻找更好的分离表示模型。
    Abstract Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.
    摘要 近期的图像生成、模型基 Reinforcement Learning 和文本到图像生成 achievements have shown the empirical advantages of discrete latent representations, but the reasons behind these benefits are not clear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions solves the problem of rotational invariance associated with multivariate Gaussian distributions, serving as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.

Toward Design of Synthetic Active Inference Agents by Mere Mortals

  • paper_url: http://arxiv.org/abs/2307.14145
  • repo_url: None
  • paper_authors: Bert de Vries
  • for: 总结有效的活动推理代理在边缘设备上实现
  • methods: 软件工具箱支持不熟悉的工程师开发工作的活动推理代理
  • results: 实现了加速活动推理代理在边缘设备上的民主化Here’s the English version for reference:
  • for: Realizing effective active inference agents on edge devices
  • methods: A software toolbox supporting non-expert engineers to develop working active inference agents
  • results: Accelerating the democratization of active inference agents on edge devices
    Abstract The theoretical properties of active inference agents are impressive, but how do we realize effective agents in working hardware and software on edge devices? This is an interesting problem because the computational load for policy exploration explodes exponentially, while the computational resources are very limited for edge devices. In this paper, we discuss the necessary features for a software toolbox that supports a competent non-expert engineer to develop working active inference agents. We introduce a toolbox-in-progress that aims to accelerate the democratization of active inference agents in a similar way as TensorFlow propelled applications of deep learning technology.
    摘要 理论上,活动推理代理的性能很吸引人,但实现工作硬件和软件上的有效代理却是一个挑战。这是因为策略探索计算负担会 exponentiates,而边缘设备的计算资源非常有限。在这篇文章中,我们讨论了实现有效的活动推理代理所需的必要特性,以及一个在进行中的工具箱,用于加速活动推理代理的普及。

  • paper_url: http://arxiv.org/abs/2307.14138
  • repo_url: None
  • paper_authors: Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh Maghsudi
  • for: 解决piecewise站点稳定 combinatorial半带准问题,即在非站点环境下,base arms的分布变化、奖励之间的 causal 关系变化或 Both,导致奖励生成过程发生变化。
  • methods: 提出了一种基于 Upper Confidence Bound(UCB)算法的优化策略,即采用改变点探测器基于 Generalized Likelihood Ratio(GLR)测试,并引入了一种新的结构化环境中的 group restart 策略。
  • results: theoretically, Establish a regret upper bound that reflects the effects of the number of structural- and distribution changes on the performance, and the numerical experiments in real-world scenarios exhibit the applicability and superior performance of the proposed method compared to the state-of-the-art benchmarks.
    Abstract We study the piecewise stationary combinatorial semi-bandit problem with causally related rewards. In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process. In such an environment, an optimal decision-maker must follow both sources of change and adapt accordingly. The problem becomes aggravated in the combinatorial semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms. The core of our proposed policy is the Upper Confidence Bound (UCB) algorithm. We assume the agent relies on an adaptive approach to overcome the challenge. More specifically, it employs a change-point detector based on the Generalized Likelihood Ratio (GLR) test. Besides, we introduce the notion of group restart as a new alternative restarting strategy in the decision making process in structured environments. Finally, our algorithm integrates a mechanism to trace the variations of the underlying graph structure, which captures the causal relationships between the rewards in the bandit setting. Theoretically, we establish a regret upper bound that reflects the effects of the number of structural- and distribution changes on the performance. The outcome of our numerical experiments in real-world scenarios exhibits applicability and superior performance of our proposal compared to the state-of-the-art benchmarks.
    摘要 我们研究了分割式站立 combinatorial semi-bandit问题,其中奖励关系存在 causal 关系。在我们的非站点环境中,奖励生成过程中存在变化,包括基础武器的分布变化、奖励之间的 causal 关系变化或者两者都存在变化。在这种环境中,一个优化的决策者需要跟踪这些变化并适应应对。在 combinatorial semi-bandit 设定下,决策者只能观察选择的武器集的结果。我们的提议的策略是使用 Upper Confidence Bound(UCB)算法。我们假设agent使用可适应的方法来解决这个挑战。更具体地说,它使用基于 Generalized Likelihood Ratio(GLR)测试的变化检测器。此外,我们引入了一种新的分组重启策略,即 group restart,以便在结构化环境中决策过程中使用。最后,我们的算法包括一个跟踪变化的基本图结构,这个结构捕捉了奖励之间的 causal 关系。理论上,我们Establish了变量数量和分布变化对性能的 regret Upper bound。实际实验结果表明我们的提议在实际 scenarios 中表现出优于状态艺术的 benchmarcks。

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

  • paper_url: http://arxiv.org/abs/2307.14397
  • repo_url: https://github.com/sutd-visual-computing-group/awesome-generative-modeling-under-data-constraints
  • paper_authors: Milad Abdollahzadeh, Touba Malekzadeh, Christopher T. H. Teo, Keshigeyan Chandrasegaran, Guimeng Liu, Ngai-Man Cheung
  • for: 本研究探讨在数据约束下学习生成模型,即Generative Modeling under Data Constraint(GM-DC)。GM-DC在医疗应用中数据采集困难时非常重要。
  • methods: 本研究提出了两种分类法:一种是GM-DC任务分类法,另一种是GM-DC方法分类法。此外,本研究还研究了不同GM-DC任务和方法之间的交互关系。
  • results: 本研究提出了一个GM-DC任务和方法分类框架,并进行了对GM-DC任务和方法的研究 gap、研究趋势和未来探索的探讨。
    Abstract In machine learning, generative modeling aims to learn to generate new data statistically similar to the training data distribution. In this paper, we survey learning generative models under limited data, few shots and zero shot, referred to as Generative Modeling under Data Constraint (GM-DC). This is an important topic when data acquisition is challenging, e.g. healthcare applications. We discuss background, challenges, and propose two taxonomies: one on GM-DC tasks and another on GM-DC approaches. Importantly, we study interactions between different GM-DC tasks and approaches. Furthermore, we highlight research gaps, research trends, and potential avenues for future exploration. Project website: https://gmdc-survey.github.io.
    摘要 在机器学习中,生成模型学习的目标是学习生成新数据,与训练数据分布相似。在这篇论文中,我们对有限数据、少量shot和零shot生成模型学习(GM-DC)进行了报告。这是在数据收集困难时,如医疗应用场景中非常重要的话题。我们讨论了背景、挑战和两种分类:一种是GM-DC任务分类,另一种是GM-DC方法分类。此外,我们还研究了不同GM-DC任务和方法之间的交互。此外,我们还提出了未来探索的研究漏洞、趋势和潜在的研究方向。项目网站:https://gmdc-survey.github.io。Here's the breakdown of the translation:* 机器学习 (machine learning) becomes 机器学习 (machine learning)* 生成模型 (generative model) becomes 生成模型 (generative model)* 学习 (learn) becomes 学习 (learn)* 新数据 (new data) becomes 新数据 (new data)* 训练数据 (training data) becomes 训练数据 (training data)* GM-DC (Generative Modeling under Data Constraint) becomes GM-DC (生成模型学习下数据约束)* 有限数据 (limited data) becomes 有限数据 (limited data)* 少量shot (few shots) becomes 少量shot (few shots)* 零shot (zero shot) becomes 零shot (zero shot)* 讨论 (discuss) becomes 讨论 (discuss)* 背景 (background) becomes 背景 (background)* 挑战 (challenges) becomes 挑战 (challenges)* 两种分类 (two taxonomies) becomes 两种分类 (two taxonomies)* 任务 (task) becomes 任务 (task)* 方法 (method) becomes 方法 (method)* 交互 (interaction) becomes 交互 (interaction)* 未来探索 (future exploration) becomes 未来探索 (future exploration)* 研究漏洞 (research gaps) becomes 研究漏洞 (research gaps)* 趋势 (trends) becomes 趋势 (trends)* 潜在的研究方向 (potential research directions) becomes 潜在的研究方向 (potential research directions)

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models

  • paper_url: http://arxiv.org/abs/2307.14134
  • repo_url: None
  • paper_authors: Himmet Toprak Kesgin, Muzaffer Kaan Yuce, Mehmet Fatih Amasyali
  • for: 这个研究旨在 bridge the research gap in less-resourced languages, 通过开发和评估不同大小的 Turkish BERT 模型。
  • methods: 研究者使用了多种数据源,包括多种文本类型,并在不同任务上进行了测试,包括偏好预测、情感分类、新闻分类和零shot分类。
  • results: 研究者发现,即使使用了更小的模型,模型仍然可以达到可靠的性能,包括零shot任务,同时保证计算效率和快速执行时间。
    Abstract This study introduces and evaluates tiny, mini, small, and medium-sized uncased Turkish BERT models, aiming to bridge the research gap in less-resourced languages. We trained these models on a diverse dataset encompassing over 75GB of text from multiple sources and tested them on several tasks, including mask prediction, sentiment analysis, news classification, and, zero-shot classification. Despite their smaller size, our models exhibited robust performance, including zero-shot task, while ensuring computational efficiency and faster execution times. Our findings provide valuable insights into the development and application of smaller language models, especially in the context of the Turkish language.
    摘要 Translated into Simplified Chinese:这项研究介绍和评估了不同大小的无框 Turkish BERT 模型,以填补资源更少的语言研究隔阂。我们使用多种数据源,训练这些模型,并在多个任务上进行测试,包括掩码预测、情感分析、新闻分类和零shot分类。尽管这些模型较小,但它们在零shot任务中 still exhibited robust performance,同时保证计算效率和快速执行时间。我们的发现对小语言模型的开发和应用提供了有价值的 Insights,特别是在土耳其语言上。

GraphRNN Revisited: An Ablation Study and Extensions for Directed Acyclic Graphs

  • paper_url: http://arxiv.org/abs/2307.14109
  • repo_url: None
  • paper_authors: Taniya Das, Mark Koch, Maya Ravichandran, Nikhil Khatri
  • for: 学习图形生成模型
  • methods: 使用深度学习架构GraphRNN,并对基eline模型进行评估和ablation study
  • results: 发现BFS traverse对模型性能有重要贡献,并将GraphRNN扩展到生成导向图。Here’s the same information in English:
  • for: Learning graph generative models
  • methods: Using the deep learning-based GraphRNN architecture and evaluating against baseline models with an ablation study
  • results: Find that the BFS traversal suggested by You et al. to collapse representations of isomorphic graphs significantly contributes to model performance, and extend GraphRNN to generate directed acyclic graphs by replacing the BFS traversal with a topological sort, demonstrating improved performance on a real-world dataset.
    Abstract GraphRNN is a deep learning-based architecture proposed by You et al. for learning generative models for graphs. We replicate the results of You et al. using a reproduced implementation of the GraphRNN architecture and evaluate this against baseline models using new metrics. Through an ablation study, we find that the BFS traversal suggested by You et al. to collapse representations of isomorphic graphs contributes significantly to model performance. Additionally, we extend GraphRNN to generate directed acyclic graphs by replacing the BFS traversal with a topological sort. We demonstrate that this method improves significantly over a directed-multiclass variant of GraphRNN on a real-world dataset.
    摘要 GRAPHRNN是一种深度学习基于架构,由You等人提出用于学习图生成模型。我们复制了You等人的结果,使用自己实现的GRAPHRNN架构,并对基准模型进行评估。通过一项剥夺研究,我们发现,You等人提出的BFS traverse方法可以很好地减少同构图的表示。此外,我们扩展GRAPHRNN来生成直接的цикли graphs,通过将BFS traverse替换为拓扑排序。我们示出,这种方法在实际数据上明显超过了一种 direkt-多类变体的GRAPHRNN。

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

  • paper_url: http://arxiv.org/abs/2307.14085
  • repo_url: None
  • paper_authors: Siyu Chen, Mengdi Wang, Zhuoran Yang
  • for: 学习一种名为量化斯坦堡峰点的 equilibrio(QSE)在一个 episodic Markov 游戏中,其中有一位领袖和一位跟随者。
  • methods: 使用 reinforcement learning(RL)和 maximum likelihood estimation(MLE)来学习领袖的决策问题,并且使用 entropy-regularized policy optimization problem(EPOP)来解决跟随者的决策问题。
  • results: 提出了一些可靠的算法来解决领袖的决策问题,并且可以在线和离线两种设置下实现。这些算法可以在不见跟随者的奖励情况下做出优化的决策,并且可以在特定的线性和偏好设置下实现高效的计算。
    Abstract We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization problem induced by leader's policy. The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data. A key challenge of this problem is that the leader cannot observe the follower's reward, and needs to infer the follower's quantal response model from his actions against leader's policies. We propose sample-efficient algorithms for both the online and offline settings, in the context of function approximation. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem, and we show that they achieve sublinear regret upper bounds. Moreover, we quantify the uncertainty of these estimators and leverage the uncertainty to implement optimistic and pessimistic algorithms for online and offline settings. Besides, when specialized to the linear and myopic setting, our algorithms are also computationally efficient. Our theoretical analysis features a novel performance-difference lemma which incorporates the error of quantal response model, which might be of independent interest.
    摘要 我们研究强化学习(RL)以学习一个量化Stackelberg均衡(QSE)在一个 episodic Markov 游戏中,具有领袖-追随者结构。具体来说,在游戏开始时,领袖宣布她的策略给追随者,并将其固定下来。追随者根据领袖的策略采取一个量化回应策略,解决由领袖的策略引起的Entropy 正则化的策略优化问题。领袖的目标是找到她的优化策略,以实现与追随者的互动和学习数据的最优预期总回报。一个关键挑战是,领袖无法观察追随者的奖励,需要从追随者的行动中推断出追随者的量化回应模型。我们提出了样本效率高的算法,包括在线和离线设置中的Maximum Likelihood估计和模型自由或模型基于的RL,并证明它们可以实现下界 regret upper bound。此外,我们还评估了这些估计的uncertainty,并利用这些uncertainty来实现在线和离线设置中的 оптимистик和悲观算法。此外,当特化到线性和偏置设置时,我们的算法也是计算效率高的。我们的理论分析包括一个新的性能差异公式,可能是独立的兴趣。

Learning to simulate partially known spatio-temporal dynamics with trainable difference operators

  • paper_url: http://arxiv.org/abs/2307.14395
  • repo_url: None
  • paper_authors: Xiang Huang, Zhuoyuan Li, Hongsheng Liu, Zidong Wang, Hongye Zhou, Bin Dong, Bei Hua
  • for: 用神经网络模拟空间时间动态的研究在最近几年得到了很多关注。然而,大多数现有方法采用纯数据驱动的黑盒模型,准确性和可解释性受限。本文提出了一种新的混合体系,名为PDE-Net++,它通过结合可训练的差异算子和黑盒模型来Explicitly embedding partial prior knowledge of the underlying PDEs。
  • methods: 本文提出了两种不同的差异算子选项:trainable flipping difference layer (TFDL)和trainable dynamic difference layer (TDDL)。
  • results: numerical experiments show that PDE-Net++ has better prediction accuracy and extrapolation performance than black-box models.
    Abstract Recently, using neural networks to simulate spatio-temporal dynamics has received a lot of attention. However, most existing methods adopt pure data-driven black-box models, which have limited accuracy and interpretability. By combining trainable difference operators with black-box models, we propose a new hybrid architecture explicitly embedded with partial prior knowledge of the underlying PDEs named PDE-Net++. Furthermore, we introduce two distinct options called the trainable flipping difference layer (TFDL) and the trainable dynamic difference layer (TDDL) for the difference operators. Numerous numerical experiments have demonstrated that PDE-Net++ has superior prediction accuracy and better extrapolation performance than black-box models.
    摘要 Note:* "Recently" is translated as "近些时候" (jìn xiē shí hou)* "using neural networks" is translated as "使用神经网络" (shǐ yòng jīng xīn wǎng luò)* "simulate spatio-temporal dynamics" is translated as "模拟空间时间动态" (mó xiàng kōng jiān shí huan dòng)* "black-box models" is translated as "黑盒模型" (hēi bāo mó xiàng)* "trainable difference operators" is translated as "可训练差异算子" (kě xù xíng yì yán sè)* "PDE-Net++" is translated as "PDE-Net++" (PDE-Net++), no change* "numerical experiments" is translated as "数值实验" (shù xiàng shí yàn)* "superior prediction accuracy" is translated as "更高的预测精度" (gèng gāo de yù jí dào)* "better extrapolation performance" is translated as "更好的推论性能" (gèng hǎo de tuī yì yù)

Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation

  • paper_url: http://arxiv.org/abs/2307.14068
  • repo_url: None
  • paper_authors: Long Liu, Bo Zhou, Zhipeng Zhao, Zening Liu
  • for: 本研究旨在解决多源零监督领域适应(MUDA)中的困难,包括适应性下降和功能缺失。
  • methods: 我们提出了一种名为动态领域差异调整的活动多源领域适应(D3AAMDA)方法。该方法基于源频率的多源动态调整机制,控制每个源频率与目标频率之间的匹配程度,以便有效地利用每个源频率的本地有利特征信息。此外,我们还提出了一种多源活动边界选择策略(MABS),使用引导动态边界损失来设计高效的查询函数,以选择重要的样本。
  • results: 我们对常用的领域适应数据集进行了广泛的比较研究,并证明了我们的方法的超越性。
    Abstract Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain. While recent MUDA methods have shown promising results, most focus on aligning the overall feature distributions across source domains, which can lead to negative effects due to redundant features within each domain. Moreover, there is a significant performance gap between MUDA and supervised methods. To address these challenges, we propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA). Firstly, we establish a multi-source dynamic modulation mechanism during the training process based on the degree of distribution differences between source and target domains. This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains. Additionally, we propose a Multi-source Active Boundary Sample Selection (MABS) strategy, which utilizes a guided dynamic boundary loss to design an efficient query function for selecting important samples. This strategy achieves improved generalization to the target domain with minimal sampling costs. We extensively evaluate our proposed method on commonly used domain adaptation datasets, comparing it against existing UDA and ADA methods. The experimental results unequivocally demonstrate the superiority of our approach.
    摘要 多源无监督领域适应(MUDA)目标是将相关的源频率域知识传递到无标注目标频率域。although recent MUDA methods have shown promising results, most of them focus on aligning the overall feature distributions across source domains, which can lead to negative effects due to redundant features within each domain. Moreover, there is a significant performance gap between MUDA and supervised methods. To address these challenges, we propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA). Firstly, we establish a multi-source dynamic modulation mechanism during the training process based on the degree of distribution differences between source and target domains. This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains. Additionally, we propose a Multi-source Active Boundary Sample Selection (MABS) strategy, which utilizes a guided dynamic boundary loss to design an efficient query function for selecting important samples. This strategy achieves improved generalization to the target domain with minimal sampling costs. We extensively evaluate our proposed method on commonly used domain adaptation datasets, comparing it against existing UDA and ADA methods. The experimental results unequivocally demonstrate the superiority of our approach.

Hypergraph Isomorphism Computation

  • paper_url: http://arxiv.org/abs/2307.14394
  • repo_url: None
  • paper_authors: Yifan Feng, Jiashu Han, Shihui Ying, Yue Gao
  • for: solve the problem of hypergraph isomorphism and improve the performance of hypergraph kernel methods.
  • methods: propose a hypergraph Weisfiler-Lehman test algorithm and a general hypergraph Weisfeiler-Lehamn kernel framework, and implement two instances.
  • results: significant improvements in hypergraph classification and outperform other typical kernel-based methods in terms of runtime.
    Abstract The isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehman kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.
    摘要 <>TRANSLATE_TEXTThe isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehamn kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.TRANSLATE_TEXT

Machine Learning Applications In Healthcare: The State Of Knowledge and Future Directions

  • paper_url: http://arxiv.org/abs/2307.14067
  • repo_url: None
  • paper_authors: Mrinmoy Roy, Sarwar J. Minar, Porarthi Dhar, A T M Omor Faruq
  • For: The paper aims to gather and present Machine Learning (ML) applications in different areas of healthcare, such as community level work, risk management/preventive care, healthcare operation management, remote care, and early detection, to provide quick access to necessary information and reduce the knowledge gap of clinicians about ML applications in healthcare.* Methods: The paper uses a comprehensive review of existing literature to identify and categorize ML applications in healthcare, and provides relevant references with descriptions in tabular form for quick access.* Results: The paper provides a comprehensive overview of ML applications in healthcare, including their potential benefits and limitations, and aims to motivate healthcare professionals towards more ML-based healthcare systems.Here’s the same information in Simplified Chinese text:* For: 这篇论文目的是为健康培训系统中的不同领域提供机器学习(ML)应用,以便快速获取必要信息,并减少医生关于ML应用的知识差距。* Methods: 这篇论文通过对现有文献进行全面审查,identify和分类健康培训系统中的ML应用,并提供相关参考文献的描述在表格形式以便快速访问。* Results: 这篇论文提供了健康培训系统中ML应用的全面概述,包括其潜在优势和局限性,并希望能够激励医疗专业人员更加关注ML基于健康培训系统。
    Abstract Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system.
    摘要 <>translate(Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system.)中文(简化)机器学习(ML)因其快速处理和检测隐藏 Patterns 的能力,成为今天医疗系统中不可或缺的一部分。虽然许多 ML 应用已经被发现,但只有一些被当前医疗系统采纳。因此,医疗系统中对 ML 的潜在机会巨大,但是分散的信息和医疗相关领域的文献不够整洁和容易解释,使 ML 应用困难于医疗专业人员。本研究的目标是收集不同领域的 ML 应用,并按照五大类划分:社区层次的工作、风险管理/预防护理、医疗运营管理、远程护理和早期检测。将这些类划分为子类,并提供相关参考文献的描述在表格形式,以便快速获取相关信息。我们的目标是通过了解医疗工业中 ML 的可能性,减少医生关于 ML 应用的知识差距,并激励医疗专业人员更多地采用机器学习基于医疗系统。)

Pre-Training with Diffusion models for Dental Radiography segmentation

  • paper_url: http://arxiv.org/abs/2307.14066
  • repo_url: None
  • paper_authors: Jérémy Rousseau, Christian Alaka, Emma Covili, Hippolyte Mayard, Laura Misrachi, Willy Au
  • for: 这篇论文旨在提高医疗X射影像分类的效率,尤其是 dental radiography 的分类。
  • methods: 本文提出了一种使用 Denoising Diffusion Probabilistic Models (DDPM) 的单簇预训练方法,并在这种预训练方法下预训 U-Net 模型。
  • results: 实验结果显示,提出的方法可以与现有的预训练方法相比,在 dental radiographs 的分类任务中达到高效的结果。
    Abstract Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straightforward approach achieves remarkable performance in terms of label efficiency and does not require architectural modifications between pre-training and downstream tasks. We propose to first pre-train a Unet by exploiting the DDPM training objective, and then fine-tune the resulting model on a segmentation task. Our experimental results on the segmentation of dental radiographs demonstrate that the proposed method is competitive with state-of-the-art pre-training methods.
    摘要 医疗成像分割,特别是牙科成像,受到标注成本的限制,需要专业知识和劳动密集的标注。在这项工作中,我们提出了一种简单的预训练方法 для semantic segmentation,利用Denosing Diffusion Probabilistic Models(DDPM),这种模型在生成模型中表现出色。我们的简单approach可以 достичьRemarkable的标签效率,不需要下游任务中的建筑修改。我们提议首先预训一个Unet,利用DDPM训练目标,然后精度调整得到的模型以进行分割任务。我们的实验结果表明,提议的方法与现有的预训方法竞争。

Topologically-Regularized Multiple Instance Learning for Red Blood Cell Disease Classification

  • paper_url: http://arxiv.org/abs/2307.14025
  • repo_url: None
  • paper_authors: Salome Kazeminia, Ario Sadafi, Asya Makhro, Anna Bogdanova, Carsten Marr, Bastian Rieck
  • for: 鉴别罕见贫血病用微scopic图像
  • methods: 使用topology基本特征来正则化多例学习
  • results: 在71名患有罕见贫血病的患者和521张红血球微scopic图像上,使用topology正则化方法可以提高自动识别罕见贫血病的性能超过3%。
    Abstract Diagnosing rare anemia disorders using microscopic images is challenging for skilled specialists and machine-learning methods alike. Due to thousands of disease-relevant cells in a single blood sample, this constitutes a complex multiple-instance learning (MIL) problem. While the spatial neighborhood of red blood cells is not meaningful per se, the topology, i.e., the geometry of blood samples as a whole, contains informative features to remedy typical MIL issues, such as vanishing gradients and overfitting when training on limited data. We thus develop a topology-based approach that extracts multi-scale topological features from bags of single red blood cell images. The topological features are used to regularize the model, enforcing the preservation of characteristic topological properties of the data. Applied to a dataset of 71 patients suffering from rare anemia disorders with 521 microscopic images of red blood cells, our experiments show that topological regularization is an effective method that leads to more than 3% performance improvements for the automated classification of rare anemia disorders based on single-cell images. This is the first approach that uses topological properties for regularizing the MIL process.
    摘要 诊断罕见血红细胞疾病使用微scopic图像具有挑战性, tanto для专家和机器学习方法。由于每个血液样本中有数以千计疾病相关的细胞,这构成了复杂的多例学习(MIL)问题。虽然红血球的空间邻居无法直接提供有用信息,但血液样本的整体几何结构却含有有用的特征,以解决典型的MIL问题,如消失梯度和过拟合。我们因此开发了基于几何特征的方法,从单个红血球图像中提取多尺度几何特征。这些几何特征用于规范模型,使模型保持疾病特征的几何特性。我们对71名患有罕见血红细胞疾病的病人,共521个微scopic红血球图像进行实验,结果表明,基于几何特征的规范是一种有效的方法,可以提高单细胞图像自动诊断罕见血红细胞疾病的性能,高于3%。这是首次使用几何特征进行MIL规范的方法。

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

  • paper_url: http://arxiv.org/abs/2307.14023
  • repo_url: None
  • paper_authors: Tokio Kajitsuka, Issei Sato
  • for: 这个论文是为了探讨Transformer模型的表达能力而写的。
  • methods: 该论文使用了 clarify the connection between the softmax function and the Boltzmann operator来解释Transformer模型的表达能力。
  • results: 研究发现,单层自注意力层可以完全捕捉输入序列的上下文,因此单层Transformer模型具有内存化能力,而且由一层自注意力层和两个隐藏层组成的Transformer模型是对 kontinuous function的universal approximator。
    Abstract Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice. This is primarily due to the interpretation of the softmax function as an approximation of the hardmax function. By clarifying the connection between the softmax function and the Boltzmann operator, we prove that a single layer of self-attention with low-rank weight matrices possesses the capability to perfectly capture the context of an entire input sequence. As a consequence, we show that single-layer Transformer has a memorization capacity for finite samples, and that Transformers consisting of one self-attention layer with two feed-forward neural networks are universal approximators for continuous functions on a compact domain.
    摘要 现有的分析表明,Transformer模型的表达能力需要过度深度层来进行数据记忆,导致与实际使用中的Transformer模型存在差异。这主要是因为对softmax函数的解释为硬max函数的近似,从而导致了层次结构的增加。我们通过证明了softmax函数和Boltzmann运算符之间的连接,表明了一层自注意力层可以完全捕捉整个输入序列的上下文。因此,我们显示了单层Transformer具有 finite samples的记忆能力,并证明了由一层自注意力层和两个输入神经网络组成的Transformer模型是对紧窄领域上的连续函数的universalapproximator。

MCMC-Correction of Score-Based Diffusion Models for Model Composition

  • paper_url: http://arxiv.org/abs/2307.14012
  • repo_url: https://github.com/jackonelli/mcmc_corr_score_diffusion
  • paper_authors: Anders Sjöberg, Jakob Lindqvist, Magnus Önnheim, Mats Jirstrand, Lennart Svensson
  • for: 这篇论文的目的是提出一种基于得分函数的扩展样本过程,以便组合不同的Markov链 Monte Carlo(MCMC)方法。
  • methods: 这篇论文使用了得分函数来Parameterize diffusion models,并通过线tegration of the score function来计算能量基于的接受概率。
  • results: 实验表明,这种方法可以 дости得与使用能量函数参数化的性能,且可以 reuse existing diffusion models。
    Abstract Diffusion models can be parameterised in terms of either a score or an energy function. The energy parameterisation has better theoretical properties, mainly that it enables an extended sampling procedure with a Metropolis--Hastings correction step, based on the change in total energy in the proposed samples. However, it seems to yield slightly worse performance, and more importantly, due to the widespread popularity of score-based diffusion, there are limited availability of off-the-shelf pre-trained energy-based ones. This limitation undermines the purpose of model composition, which aims to combine pre-trained models to sample from new distributions. Our proposal, however, suggests retaining the score parameterization and instead computing the energy-based acceptance probability through line integration of the score function. This allows us to re-use existing diffusion models and still combine the reverse process with various Markov-Chain Monte Carlo (MCMC) methods. We evaluate our method on a 2D experiment and find that it achieve similar or arguably better performance than the energy parameterisation.
    摘要 Diffusion models 可以被参数化为得分或能量函数。能量参数化有更好的理论性质,主要是它允许扩展采样过程,并且基于变化总能量的 Metropolis--Hastings 修正步骤。然而,它似乎表现稍微下降,而且由于得分基 diffusion 的广泛普及,有限的可用性。这限制了模型组合的目的,即将预训练模型组合以采样新的分布。我们的提议是保留得分参数化,并计算基于得分函数的能量基于接受probability。这允许我们重用现有的 diffusion 模型,并且与多种Markov-Chain Monte Carlo(MCMC)方法结合。我们对2D实验进行评估,发现其表现相当或者更好于能量参数化。

Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG

  • paper_url: http://arxiv.org/abs/2307.14389
  • repo_url: https://github.com/yorgoon/diffe
  • paper_authors: Soowon Kim, Young-Eun Lee, Seo-Hyun Lee, Seong-Whan Lee
  • for: 用于实现脑机器人通信的想像语音数据解析
  • methods: 使用抽象数据模型(DDPMs)和增强型自适应神经网络(Diff-E)进行EEG信号处理
  • results: 比传统机器学习技术和基准模型更高的实现EEG信号实现想像语音的精度
    Abstract Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising approaches for representation learning in various domains. Our study proposes a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models. Our findings suggest that DDPMs can be an effective tool for EEG signal decoding, with potential implications for the development of brain-computer interfaces that enable communication through imagined speech.
    摘要 decode EEG 信号为想像的语音是一项复杂的任务,因为数据的维度高,信号噪声比低。在过去几年,杂Diffusion probabilistic models(DDPMs)已经出现为不同领域的表示学习提供了promising的方法。我们的研究提出了一种使用 DDPMs 和名为 Diff-E 的条件 autoencoder 来解码 EEG 信号的新方法。结果显示,Diff-E 可以 significatively 提高对想像语音的 EEG 信号解码精度,比传统机器学习技术和基线模型更高。我们的发现表明,DDPMs 可以是 EEG 信号解码的有效工具,具有可能应用于通过想像语音的 brain-computer interfaces 的发展的潜在意义。

Fast algorithms for k-submodular maximization subject to a matroid constraint

  • paper_url: http://arxiv.org/abs/2307.13996
  • repo_url: None
  • paper_authors: Shuxian Niu, Qian Liu, Yang Zhou, Min Li
  • for: 本文使用阈值递减算法来最大化$k$-submodular函数 beneath 环境约束,从而降低算法的查询复杂度,与排序算法相比,只有小量的损失。
  • methods: 本文提出了$(1/2-\epsilon)$-近似算法 для monotone $k$-submodular函数最大化,以及$(1/3-\epsilon)$-近似算法 для非 monotone случа例,其复杂度为 $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$.
  • results: 本文提出了一种快速的算法来最大化$k$-submodular函数 beneath 总大小约束,其复杂度为 $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$. 这些结果可以看作是对特殊的Uniform matroid环境的一种快速算法。
    Abstract In this paper, we apply a Threshold-Decreasing Algorithm to maximize $k$-submodular functions under a matroid constraint, which reduces the query complexity of the algorithm compared to the greedy algorithm with little loss in approximation ratio. We give a $(\frac{1}{2} - \epsilon)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - \epsilon)$-approximation algorithm for non-monotone case, with complexity $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$, where $r$ denotes the rank of the matroid, and $IO, EO$ denote the number of oracles to evaluate whether a subset is an independent set and to compute the function value of $f$, respectively. Since the constraint of total size can be looked as a special matroid, called uniform matroid, then we present the fast algorithm for maximizing $k$-submodular functions subject to a total size constraint as corollaries. corollaries.
    摘要 在这篇论文中,我们采用了阈值降低算法来最大化$k$-超模ular函数 beneath a matroid constraint,这将相比于格雷德算法减少查询复杂度,即使有些损失相对精度。我们提供了$(1/2-\epsilon)$-近似算法 для升华$k$-超模ular函数最大化,以及$(1/3-\epsilon)$-近似算法 для非升华 случа子,其复杂度为$O(\frac{n(k\cdot EO+IO)}{\epsilon}\log\frac{r}{\epsilon})$,其中$r$表示矩阵的排名,$IO, EO$表示评估 subset 是独立集的或acles数和计算函数值的次数。由于总大小的约束可以看作特殊的矩阵,即uniform矩阵,因此我们将在corollaries中提供快速算法 для最大化$k$-超模ular函数 subject to 总大小约束。

Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space

  • paper_url: http://arxiv.org/abs/2307.13995
  • repo_url: None
  • paper_authors: Guogang Zhu, Xuefeng Liu, Shaojie Tang, Jianwei Niu, Xinghao Wu, Jiaxing Shen
  • for: 这篇论文是针对个性化联合学习(PFL)框架的研究,具体来说是如何在不同客户端的数据分布下实现个性化的模型。
  • methods: 这篇论文提出了一种新的PFL框架,即FedPick,它在全局编码器生成的特征空间中采取了适应式选择客户端任务相关的特征,以提高模型在跨域FL中的性能。
  • results: 实验结果表明,FedPick可以有效地选择客户端任务相关的特征,并提高模型在跨域FL中的性能。
    Abstract Personalized federated learning (PFL) is a popular framework that allows clients to have different models to address application scenarios where clients' data are in different domains. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client's local data. Nonetheless, due to the differences between the data distributions of different clients (aka, domain gaps), the universal features produced by the global encoder largely encompass numerous components irrelevant to a certain client's local task. Some recent PFL methods address the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space. In contrast, the feature space exhibits a lower dimensionality, providing greater intuitiveness and interpretability as compared to the parameter space. To this end, we propose a novel PFL framework named FedPick. FedPick achieves PFL in the low-dimensional feature space by selecting task-relevant features adaptively for each client from the features generated by the global encoder based on its local data distribution. It presents a more accessible and interpretable implementation of PFL compared to those methods working in the parameter space. Extensive experimental results show that FedPick could effectively select task-relevant features for each client and improve model performance in cross-domain FL.
    摘要 personalized federated learning (PFL) 是一种 популяр的框架,允许客户端有不同的模型来应对应用场景中客户端数据在不同域中。典型的客户端模型在 PFL 中包括全球编码器通过所有客户端训练的通用特征Extract和客户端本地数据使用的个性化层(例如分类器)。然而,由于客户端数据分布的不同(也就是域漏),全球编码器生成的通用特征主要包括客户端本地任务无关的多个组件。一些最近的 PFL 方法 Addressing the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space.在 contrast,特征空间具有较低的维度,提供更加直观和可解释的特征,相比于参数空间。为此,我们提议一种新的 PFL 框架,名为 FedPick。FedPick 实现了 PFL 在低维特征空间中,通过对每个客户端的本地数据分布进行适应性地选择任务相关的特征。它提供了 PFL 的更加可访问和可解释的实现,相比于在参数空间工作的方法。我们进行了广泛的实验研究,表明 FedPick 可以有效地选择每个客户端的任务相关特征,并提高 cross-domain FL 的模型性能。

BovineTalk: Machine Learning for Vocalization Analysis of Dairy Cattle under Negative Affective States

  • paper_url: http://arxiv.org/abs/2307.13994
  • repo_url: None
  • paper_authors: Dinu Gavojdian, Teddy Lazebnik, Madalina Mincu, Ariel Oren, Ioana Nicolae, Anna Zamansky
  • for: 这个研究的目的是开发和验证对livestock动物情感状态的非侵入式指标,以便将其 integrate到场地评估协议中。
  • methods: 这个研究使用了牛的 vocal indicators,并使用了深度学习和可解释机器学习两种计算框架来分类低频和高频牛叫,以及牛叫voice recognition。
  • results: 研究结果表明,使用深度学习和可解释机器学习两种计算框架可以达到87.2%和89.4%的叫声分类精度,并达到68.9%和72.5%的牛叫voice recognition精度。
    Abstract There is a critical need to develop and validate non-invasive animal-based indicators of affective states in livestock species, in order to integrate them into on-farm assessment protocols, potentially via the use of precision livestock farming (PLF) tools. One such promising approach is the use of vocal indicators. The acoustic structure of vocalizations and their functions were extensively studied in important livestock species, such as pigs, horses, poultry and goats, yet cattle remain understudied in this context to date. Cows were shown to produce two types vocalizations: low-frequency calls (LF), produced with the mouth closed, or partially closed, for close distance contacts and open mouth emitted high-frequency calls (HF), produced for long distance communication, with the latter considered to be largely associated with negative affective states. Moreover, cattle vocalizations were shown to contain information on individuality across a wide range of contexts, both negative and positive. Nowadays, dairy cows are facing a series of negative challenges and stressors in a typical production cycle, making vocalizations during negative affective states of special interest for research. One contribution of this study is providing the largest to date pre-processed (clean from noises) dataset of lactating adult multiparous dairy cows during negative affective states induced by visual isolation challenges. Here we present two computational frameworks - deep learning based and explainable machine learning based, to classify high and low-frequency cattle calls, and individual cow voice recognition. Our models in these two frameworks reached 87.2% and 89.4% accuracy for LF and HF classification, with 68.9% and 72.5% accuracy rates for the cow individual identification, respectively.
    摘要 “有一个急需要发展和验证不侵入性动物表征情感状态的需求,以便在农场评估程序中 integrate 其。一种有前途的方法是使用 vocals。livestock 种类中,如猪、马、鸡和山羊的 vocalizations 和其功能已经广泛研究,但是牛尚未在这个设定中受到研究。牛产生了两种 vocals:低频声(LF),通过关闭或部分关闭口部生成,用于近距离接触,以及开口输出高频声(HF),用于长距离通信,后者被认为与负面情感状态有关。此外,牛 vocalizations 包含个体特征信息,在各种情况下都有广泛的应用。现在,生产周期中的牛面临许多负面挑战和压力,使得在负面情感状态下的牛 vocalizations 特别有研究价值。本研究将提供过去最大的(清洁掉噪) dataset,包括排程哺乳的成年多孢牛在负面情感状态下的 vocalizations。我们提出了两个 computation framework - deep learning 基于的和可解释机器学习基于的,用于分类高频和低频牛声,以及个体牛声识别。我们的模型在这两个框架中分别达到了 87.2% 和 89.4% 的准确率 для LF 和 HF 分类,以及 68.9% 和 72.5% 的准确率 для牛个体识别。”

Differentiable short-time Fourier transform with respect to the hop length

  • paper_url: http://arxiv.org/abs/2308.02421
  • repo_url: https://github.com/maxime-leiber/dstft
  • paper_authors: Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui
  • for: 提出一种可微分的短时傅立叙 transform (STFT),允许通过 kontinuous 的 hop 长或帧时间位置进行梯度下降优化。
  • methods: 使用 kontinuous 的 hop 长和帧时间位置进行梯度下降优化,提供更精细的时间位置控制。
  • results: 在 simulated 示例中,提出的方法可以更好地控制时间位置,并且可以轻松地与现有的算法和神经网络集成。
    Abstract In this paper, we propose a differentiable version of the short-time Fourier transform (STFT) that allows for gradient-based optimization of the hop length or the frame temporal position by making these parameters continuous. Our approach provides improved control over the temporal positioning of frames, as the continuous nature of the hop length allows for a more finely-tuned optimization. Furthermore, our contribution enables the use of optimization methods such as gradient descent, which are more computationally efficient than conventional discrete optimization methods. Our differentiable STFT can also be easily integrated into existing algorithms and neural networks. We present a simulated illustration to demonstrate the efficacy of our approach and to garner interest from the research community.
    摘要 在这篇论文中,我们提出了一种可微分的短时傅立叙变换(STFT),允许通过使这些参数变为连续的,进行梯度下降优化。我们的方法可以提供更好的控制时间位置,因为连续的跳跃长度允许更细化优化。此外,我们的贡献允许使用优化方法,如梯度下降,这些方法更有效率于传统的离散优化方法。我们的可微分STFT也可以轻松地与现有的算法和神经网络结合使用。我们在示例中提供了一个示例,以证明我们的方法的有效性,并且吸引研究人员的关注。

METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation

  • paper_url: http://arxiv.org/abs/2307.13991
  • repo_url: None
  • paper_authors: Junwon Seo, Taekyung Kim, Seongyong Ahn, Kiho Kwak
  • for: 本研究旨在开发一种能够准确预测不同环境中的地形通行可能性的自适应导航系统。
  • methods: 该研究使用meta-学习框架,通过自我指导式方式,使用汽车-地面交互反馈来训练一个全球模型,从稀疏的LiDAR点云中生成一个连续值的成本图,以预测地形通行可能性。
  • results: 研究人员通过收集不同地形的驾驶数据,训练了一个全球模型,并在部署过程中进行了在线适应,以快速适应当地环境。这种方法能够减少预测uncertainty,并且可以安全地和稳定地导航在未知和未适应的地形中。
    Abstract Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a variety of environments. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.
    摘要 自主导航在不结构化环境中需要准确地估计地形通行能力。然而,在无结构环境中的通行能力估计受到许多因素的变化所致的高度不确定性的影响。因此,constructing a generalizable model that can accurately predict terrain traversability in a variety of environments is challenging. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.Note that Simplified Chinese is a written language that is used in mainland China, and it is different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Differentiable adaptive short-time Fourier transform with respect to the window length

  • paper_url: http://arxiv.org/abs/2308.02418
  • repo_url: https://github.com/maxime-leiber/dstft
  • paper_authors: Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui
  • for: 这篇论文是为了提出一种基于梯度的方法,用于在实时进行 STFT 的优化,包括每帧和每频域窗口长的优化。
  • methods: 这篇论文使用了 differentiable STFT,并通过将窗口长变量化,使其成为可微分的。这使得可以使用梯度下降来优化 STFT。
  • results: 研究人员透过实验验证了他们的方法,并发现其可以很好地适应具有变动和站立 ком component的时间频率图表。
    Abstract This paper presents a gradient-based method for on-the-fly optimization for both per-frame and per-frequency window length of the short-time Fourier transform (STFT), related to previous work in which we developed a differentiable version of STFT by making the window length a continuous parameter. The resulting differentiable adaptive STFT possesses commendable properties, such as the ability to adapt in the same time-frequency representation to both transient and stationary components, while being easily optimized by gradient descent. We validate the performance of our method in vibration analysis.
    摘要 Here's the text in Simplified Chinese:这篇论文提出了一种基于梯度的方法,用于在实时中对STFT的每帧和每频窗长进行优化,与之前的工作相关,我们将STFT中的窗长作为连续参数来实现可导的STFT。这种可导的STFT具有许多优点,如适应同时频域中的激变和站ARY组件,同时也可以通过梯度下降方便地优化。我们在振荡分析中验证了这种方法的性能。

This is not correct! Negation-aware Evaluation of Language Generation Systems

  • paper_url: http://arxiv.org/abs/2307.13989
  • repo_url: https://github.com/dmlls/cannot-dataset
  • paper_authors: Miriam Anschütz, Diego Miguel Lozano, Georg Groh
  • for: 本文为了解决大语言模型对否定语句的影响不充分评估,提出了一种基于否定语句的评估指标——NegBLEURT。
  • methods: 本文使用了一种基于规则的句子否定工具,并使用这个工具创建了CANNOT negation评估数据集。然后,通过这个数据集进行了一些模型的微调和评估指标的修改,以提高对否定语句的敏感性。
  • results: 对于现有的benchmark测试,我们的微调模型和评估指标在否定句子上表现出了很大的改善,而不会影响其他类型的折衣。
    Abstract Large language models underestimate the impact of negations on how much they change the meaning of a sentence. Therefore, learned evaluation metrics based on these models are insensitive to negations. In this paper, we propose NegBLEURT, a negation-aware version of the BLEURT evaluation metric. For that, we designed a rule-based sentence negation tool and used it to create the CANNOT negation evaluation dataset. Based on this dataset, we fine-tuned a sentence transformer and an evaluation metric to improve their negation sensitivity. Evaluating these models on existing benchmarks shows that our fine-tuned models outperform existing metrics on the negated sentences by far while preserving their base models' performances on other perturbations.
    摘要 大型语言模型会 під估算负语气对句子意思的影响。因此,基于这些模型的评估指标会忽略负语气。在这篇论文中,我们提出了NegBLEURT,一个负语气意识的版本的BLEURT评估指标。为此,我们设计了一个基于规则的句子负语气工具,并使用这个工具创建了CANNOT负语气评估集。基于这个集合,我们精微调整了句子转换器和评估指标,以改善它们对负语气的敏感性。在现有的测试基础上评估这些模型,我们发现我们的精微调整模型在负语气句子上大幅超越了现有的指标,而且保持了基本模型在其他折冲上的表现。

Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation

  • paper_url: http://arxiv.org/abs/2307.13978
  • repo_url: None
  • paper_authors: Mahyar Abbasian, Taha Rajabzadeh, Ahmadreza Moradipari, Seyed Amir Hossein Aqajari, Hongsheng Lu, Amir Rahmani
  • for: 本研究旨在解决生成器网络中控制生成过程的挑战,通过结合激励学习(RL)Agent和幂变空间生成器(l-GAN)来实现恰当的输出生成。
  • methods: 我们提出了一种 integrate RL agent with l-GAN 的方法,包括设计了一个actor-critic RL agent和一个仔细设计的奖励策略,使得 agent 在 latent space 中穿梭并生成基于指定任务的输出。
  • results: 我们通过使用 MNIST dataset 进行了一系列实验,包括数学运算的 illustrate task,实验结果证明了我们的方法的有效性。
    Abstract Generative Adversarial Networks (GAN) have emerged as a formidable AI tool to generate realistic outputs based on training datasets. However, the challenge of exerting control over the generation process of GANs remains a significant hurdle. In this paper, we propose a novel methodology to address this issue by integrating a reinforcement learning (RL) agent with a latent-space GAN (l-GAN), thereby facilitating the generation of desired outputs. More specifically, we have developed an actor-critic RL agent with a meticulously designed reward policy, enabling it to acquire proficiency in navigating the latent space of the l-GAN and generating outputs based on specified tasks. To substantiate the efficacy of our approach, we have conducted a series of experiments employing the MNIST dataset, including arithmetic addition as an illustrative task. The outcomes of these experiments serve to validate our methodology. Our pioneering integration of an RL agent with a GAN model represents a novel advancement, holding great potential for enhancing generative networks in the future.
    摘要

Mathematical Modeling of BCG-based Bladder Cancer Treatment Using Socio-Demographics

  • paper_url: http://arxiv.org/abs/2307.15084
  • repo_url: None
  • paper_authors: Elizaveta Savchenko, Ariel Rosenfeld, Svetlana Bunimovich-Mendrazitsky
  • for: 这个研究旨在提供一个基于患者的社会民主ographic信息的个性化BCG治疗模型,以提高BCG治疗的效果。
  • methods: 研究人员采用了一个已知的BCG治疗模型,并将machine learning成分integrated到模型中,以时间地调整和重配置关键参数,以便个性化治疗。
  • results: 使用实际临床数据,研究人员发现,个性化模型比原始模型在预测治疗结束时的癌细胞数量方面,有14.8%的改善,平均而言。
    Abstract Cancer is one of the most widespread diseases around the world with millions of new patients each year. Bladder cancer is one of the most prevalent types of cancer affecting all individuals alike with no obvious prototypical patient. The current standard treatment for BC follows a routine weekly Bacillus Calmette-Guerin (BCG) immunotherapy-based therapy protocol which is applied to all patients alike. The clinical outcomes associated with BCG treatment vary significantly among patients due to the biological and clinical complexity of the interaction between the immune system, treatments, and cancer cells. In this study, we take advantage of the patient's socio-demographics to offer a personalized mathematical model that describes the clinical dynamics associated with BCG-based treatment. To this end, we adopt a well-established BCG treatment model and integrate a machine learning component to temporally adjust and reconfigure key parameters within the model thus promoting its personalization. Using real clinical data, we show that our personalized model favorably compares with the original one in predicting the number of cancer cells at the end of the treatment, with 14.8% improvement, on average.
    摘要 癌症是全球最常见的疾病之一,每年新生癌病例数量达数百万。膀胱癌是最常见的癌病种之一,它对所有患者来说都是无型的。现行的BCG免疫疗法从业余周调度,这个调度是给所有患者都一样的。但是,BCG治疗的临床结果差强人之间,这是因为免疫系统、治疗和癌细胞之间的生物和临床复杂性。在这个研究中,我们利用患者的社会demographics来提供一个个性化的数学模型,描述BCG治疗的临床动力学。为此,我们采用了一个已知的BCG治疗模型,并将其整合了机器学习 ком成分,以时间地调整和重新配置关键参数,以便个性化。使用实际的临床数据,我们显示了我们的个性化模型与原始模型之间的比较,显示了14.8%的改善,平均而言。

Understanding Deep Neural Networks via Linear Separability of Hidden Layers

  • paper_url: http://arxiv.org/abs/2307.13962
  • repo_url: None
  • paper_authors: Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, Dacheng Tao
  • for: 研究深度神经网络的特点,尤其是隐藏层输出的线性可分性度。
  • methods: 提出了基于美丽度差(MD)的线性可分性度度量(LSM)来评估隐藏层输出的线性可分性度。
  • results: 发现隐藏层输出的线性可分性度和网络训练性能之间存在同步关系,即如果更新参数可以提高隐藏层输出的线性可分性度,则更新后的网络将在训练过程中表现更好,并且相反。此外,研究了活化函数和网络大小(包括宽度和深度)对隐藏层输出的线性可分性度的影响。
    Abstract In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks. In particular, we first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. Then, we demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance, i.e., if the updated weights can enhance the linear separability degree of hidden layer outputs, the updated network will achieve a better training performance, and vice versa. Moreover, we study the effect of activation function and network size (including width and depth) on the linear separability of hidden layers. Finally, we conduct the numerical experiments to validate our findings on some popular deep networks including multilayer perceptron (MLP), convolutional neural network (CNN), deep belief network (DBN), ResNet, VGGNet, AlexNet, vision transformer (ViT) and GoogLeNet.
    摘要 在这篇论文中,我们测量了深度神经网络的线性可分性来研究它们的特点。特别是,我们首先提出了米诺夫差分基于的线性可分性度量(MD-LSM)来评估两个点集的线性可分性度。然后,我们证明了深度神经网络训练性能和隐层输出的线性可分性度之间存在同步关系,即如果更新的权重可以提高隐层输出的线性可分性度,那么更新后的网络将获得更好的训练性能,并且vice versa。此外,我们研究了活动函数和网络大小(包括宽和深)对隐层输出的线性可分性的影响。最后,我们进行了一些实验来验证我们的发现,并在多层感知网络(MLP)、卷积神经网络(CNN)、深度信念网络(DBN)、ResNet、VGGNet、AlexNet、视transformer(ViT)和GoogLeNet等 популяр的深度神经网络上进行了实验。

Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

  • paper_url: http://arxiv.org/abs/2308.02362
  • repo_url: None
  • paper_authors: Yuxi Mi, Hongquan Liu, Yewei Xia, Yiheng Sun, Jihong Guan, Shuigeng Zhou
  • for: 保护数据隐私和实现任务目标之间的平衡,针对垂直联合学习中的敏感信息泄露问题。
  • methods: 提出一种灵活和通用的方法,通过分解两个目标来解决这两个目标之间的矛盾,首先通过norm clipping来确保隐私保护,然后通过adaptive调整特征编码的扩展和分布来提高任务性能,保持先前的隐私机制。
  • results: 经验表明,提出的VFL-AFE框架能够有效地防止隐私泄露和保持任务性能,并且可以适应不同的数据集和模型。
    Abstract The emergence of vertical federated learning (VFL) has stimulated concerns about the imperfection in privacy protection, as shared feature embeddings may reveal sensitive information under privacy attacks. This paper studies the delicate equilibrium between data privacy and task utility goals of VFL under differential privacy (DP). To address the generality issue of prior arts, this paper advocates a flexible and generic approach that decouples the two goals and addresses them successively. Specifically, we initially derive a rigorous privacy guarantee by applying norm clipping on shared feature embeddings, which is applicable across various datasets and models. Subsequently, we demonstrate that task utility can be optimized via adaptive adjustments on the scale and distribution of feature embeddings in an accuracy-appreciative way, without compromising established DP mechanisms. We concretize our observation into the proposed VFL-AFE framework, which exhibits effectiveness against privacy attacks and the capacity to retain favorable task utility, as substantiated by extensive experiments.
    摘要 《垂直联邦学习(VFL)的出现引发了隐私保护的担忧,因为共享特征嵌入可能会泄露敏感信息在隐私攻击下。这篇论文研究了VFL中数据隐私和任务使用目标之间的敏感平衡,通过减少隐私攻击的方法来保护隐私。为了普适性,这篇论文提出了一种灵活和通用的方法,将数据隐私和任务使用目标分开,然后一一地处理它们。具体来说,我们首先通过在共享特征嵌入中应用范围clip来确保隐私保障,这种方法适用于不同的数据集和模型。然后,我们表明了可以通过对特征嵌入的扩缩和分布进行适应调整,以提高任务使用度,而不会违反已有的隐私机制。我们将这种观察结合到VFL-AFE框架中,该框架能够具有防止隐私攻击和保持任务使用度的能力,经过广泛的实验证明。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Entropy Neural Estimation for Graph Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.13944
  • repo_url: https://github.com/kunzhan/M-ILBO
  • paper_authors: Yixuan Ma, Xiaolin Zhang, Peng Zhang, Kun Zhan
  • for: 本 paper 目的是提取图像中节点的可区分高级表示。
  • methods: 作者提出了一种使用 neural network 来估计数据集的熵,并使用这个熵来提取图像中节点的高级表示。他们还提出了一种subset sampling strategy来对图像进行对比,并使用两个目标函数同时优化网络。
  • results: 作者在七个图像 benchmark 上进行了广泛的实验,并取得了与当前状态艺术法相当的性能。
    Abstract Contrastive learning on graphs aims at extracting distinguishable high-level representations of nodes. In this paper, we theoretically illustrate that the entropy of a dataset can be approximated by maximizing the lower bound of the mutual information across different views of a graph, \ie, entropy is estimated by a neural network. Based on this finding, we propose a simple yet effective subset sampling strategy to contrast pairwise representations between views of a dataset. In particular, we randomly sample nodes and edges from a given graph to build the input subset for a view. Two views are fed into a parameter-shared Siamese network to extract the high-dimensional embeddings and estimate the information entropy of the entire graph. For the learning process, we propose to optimize the network using two objectives, simultaneously. Concretely, the input of the contrastive loss function consists of positive and negative pairs. Our selection strategy of pairs is different from previous works and we present a novel strategy to enhance the representation ability of the graph encoder by selecting nodes based on cross-view similarities. We enrich the diversity of the positive and negative pairs by selecting highly similar samples and totally different data with the guidance of cross-view similarity scores, respectively. We also introduce a cross-view consistency constraint on the representations generated from the different views. This objective guarantees the learned representations are consistent across views from the perspective of the entire graph. We conduct extensive experiments on seven graph benchmarks, and the proposed approach achieves competitive performance compared to the current state-of-the-art methods. The source code will be publicly released once this paper is accepted.
    摘要 “对图进行对比学习的目标是提取图像中节点的明确高级表示。在这篇论文中,我们理论上验证了对于不同视图的图集, Entropy 的估计可以通过最大化不同视图之间的互信息lower bound来进行approximation。基于这个发现,我们提出了一种简单 yet effective的subset sampling策略,即在给定图集中随机选择节点和边,并将其作为视图的输入子集来建立。两个视图被feed into a shared参数的siamesenetwork中,以EXTRACT高维表示和估算整个图集的Entropy。在学习过程中,我们提出了两个目标函数,同时进行优化。具体来说,输入对比损失函数的输入包括正例和负例。我们的选择策略与之前的作品不同,我们提出了一种新的选择策略,根据不同视图的相似性来选择节点。我们通过选择高度相似的样本和完全不同的数据来增强图像编码器的表示能力。此外,我们还引入了跨视图一致性约束,这个约束保证了学习到的表示是视图的整个图集的一致的表示。我们在七个图基本 benchmark 上进行了广泛的实验,并发现我们的方法与当前状态的艺术方法相当竞争。我们将代码公开发布一旦这篇论文被接受。”

Topology-aware Robust Optimization for Out-of-distribution Generalization

  • paper_url: http://arxiv.org/abs/2307.13943
  • repo_url: https://github.com/joffery/tro
  • paper_authors: Fengchun Qiao, Xi Peng
  • for: 该研究旨在提高机器学习模型对非典型数据的鲁棒性,以适应高风险应用场景。
  • methods: 该研究提出了一种基于分布 topological structure的 robust optimization 方法,即 Topology-aware Robust Optimization (TRO),其包括两个优化目标:(1) Topology Learning 探索数据抽象表示的分布 topological structure; (2) Learning on Topology 利用分布 topological structure来约束robust优化,以避免过度优化。
  • results: 研究证明了 TRO 的有效性,并在多种任务,如分类、回归和semantic segmentation 中显著超越了现有方法。此外,研究发现数据驱动的分布 topological structure与领域知识相一致,从而提高了该方法的解释性。
    Abstract Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD resilience. To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. More specifically, TRO solves two optimization objectives: (1) Topology Learning which explores data manifold to uncover the distributional topology; (2) Learning on Topology which exploits the topology to constrain robust optimization for tightly-bounded generalization risks. We theoretically demonstrate the effectiveness of our approach and empirically show that it significantly outperforms the state of the arts in a wide range of tasks including classification, regression, and semantic segmentation. Moreover, we empirically find the data-driven distributional topology is consistent with domain knowledge, enhancing the explainability of our approach.
    摘要

Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network

  • paper_url: http://arxiv.org/abs/2307.13938
  • repo_url: https://github.com/kunzhan/DSSN
  • paper_authors: Zhibo Tain, Xiaolin Zhang, Peng Zhang, Kun Zhan
  • for: 提高semantic segmentation任务中使用无标数据的效果,降低标注训练示例的成本。
  • methods: 提出了一种双级套件结构网络(DSSN),通过在低级图像空间和高级特征空间进行像素级强制对应,以便充分利用可用的无标数据。同时,引入了一种新的分类意识 pseudo-label 选择策略,用于 Addressing the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes。
  • results: 在 PASCAL VOC 2012 和 Cityscapes 两个预测集上达到了领先的状态态结果,与其他 SSS 算法相比,具有显著的优势。
    Abstract Semi-supervised semantic segmentation (SSS) is an important task that utilizes both labeled and unlabeled data to reduce expenses on labeling training examples. However, the effectiveness of SSS algorithms is limited by the difficulty of fully exploiting the potential of unlabeled data. To address this, we propose a dual-level Siamese structure network (DSSN) for pixel-wise contrastive learning. By aligning positive pairs with a pixel-wise contrastive loss using strong augmented views in both low-level image space and high-level feature space, the proposed DSSN is designed to maximize the utilization of available unlabeled data. Additionally, we introduce a novel class-aware pseudo-label selection strategy for weak-to-strong supervision, which addresses the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes. Specifically, our strategy selects the top high-confidence prediction of the weak view for each class to generate pseudo labels that supervise the strong augmented views. This strategy is capable of taking into account the class imbalance and improving the performance of long-tailed classes. Our proposed method achieves state-of-the-art results on two datasets, PASCAL VOC 2012 and Cityscapes, outperforming other SSS algorithms by a significant margin.
    摘要 semi-supervised semantic segmentation(SSS)是一项重要的任务,它利用了标注和无标注数据来降低标注训练示例的成本。然而,SSS算法的效iveness受到无标注数据的具体性的限制。为了解决这个问题,我们提议了一种双级SIAMESE结构网络(DSSN) для像素级对比学习。我们使用了强制对augmented views的像素级和高级特征空间进行对比,以使得提议的DSSN能够充分利用可用的无标注数据。此外,我们还引入了一种新的类意识 pseudo-label 选择策略,以Addressing the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes. Specifically, our strategy selects the top high-confidence prediction of the weak view for each class to generate pseudo labels that supervise the strong augmented views. This strategy is capable of taking into account the class imbalance and improving the performance of long-tailed classes. Our proposed method achieves state-of-the-art results on two datasets, PASCAL VOC 2012 and Cityscapes, outperforming other SSS algorithms by a significant margin.

trajdata: A Unified Interface to Multiple Human Trajectory Datasets

  • paper_url: http://arxiv.org/abs/2307.13924
  • repo_url: https://github.com/nvlabs/trajdata
  • paper_authors: Boris Ivanovic, Guanyu Song, Igor Gilitschenski, Marco Pavone
  • for: 本研究目的是提供一个统一的人行走轨迹数据接口,以便对多个人行走数据集进行训练和评估。
  • methods: 本研究使用了一个简单、通用、高效的数据表示和API,以探讨现有的人行走数据集。
  • results: 本研究提供了一个全面的实验分析,以帮助研究人员更好地理解现有的人行走数据集,并提出了未来数据集的建议。
    Abstract The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluate methods across multiple datasets. To remedy this, we present trajdata: a unified interface to multiple human trajectory datasets. At its core, trajdata provides a simple, uniform, and efficient representation and API for trajectory and map data. As a demonstration of its capabilities, in this work we conduct a comprehensive empirical evaluation of existing trajectory datasets, providing users with a rich understanding of the data underpinning much of current pedestrian and AV motion forecasting research, and proposing suggestions for future datasets from these insights. trajdata is permissively licensed (Apache 2.0) and can be accessed online at https://github.com/NVlabs/trajdata
    摘要 领域轨迹预测在最近几年内得到了广泛发展,部分归功于许多大规模的实际世界人员轨迹数据集的发布,用于自动驾驶车(AV)和人行轨迹跟踪。然而,这些数据集各自使用自定义的数据格式和API,使研究人员在多个数据集之间训练和评估方法变得困难。为解决这个问题,我们提出了trajdata:一个统一的接口,用于多个人轨迹数据集。trajdata的核心思想是提供简单、统一的轨迹和地图数据表示和API。在这篇论文中,我们进行了详细的实验性评估,对现有的轨迹数据集进行了全面的检验,并提出了将来数据集的建议。trajdata是允许任意使用(Apache 2.0),可以在https://github.com/NVlabs/trajdata上线上访问。

HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning

  • paper_url: http://arxiv.org/abs/2307.14384
  • repo_url: None
  • paper_authors: Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Huabin Zhu, Yanchao Tan, Jun Wang, Yue Qi
  • for: 提高 Federated Learning (FL) 下非标一致数据的表现
  • methods: 使用 Hyperbolic Prototype Tammes Initialization (HPTI)、Hyperbolic Prototype Learning (HPL) 和 Consistent Aggregation (CA) 模块
  • results: 在四个数据集上进行了广泛的研究,证明 HyperFed 能够有效提高 FL 下非标一致数据的表现
    Abstract Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID set.
    摘要
  1. 类别统计移动2. 缺乏层次信息利用3. 客户端聚合不一致为解决以上问题,我们提出了 HyperFed,它包括以下三个主要模块:1. hyperbolic prototype Tammes initialization (HPTI):在服务器端,constructs uniformly distributed和 fixes 类别 prototype,并将其分享给客户端,以匹配类别统计,并且引导客户端的准确特征表示。2. hyperbolic prototype learning (HPL):在每个客户端上,使用分布式hyperbolic模型,在shared class prototype的超visions下,捕捉本地数据中的层次信息。3. consistent aggregation (CA):在服务器端,mitigates the impact of inconsistent deviations from clients to server。我们对四个数据集进行了广泛的研究,证明了 HyperFed 可以在非Identical和独立的客户端数据下提高 FL 的性能。

Simulation-based Inference for Cardiovascular Models

  • paper_url: http://arxiv.org/abs/2307.13918
  • repo_url: None
  • paper_authors: Antoine Wehenkel, Jens Behrmann, Andrew C. Miller, Guillermo Sapiro, Ozan Sener, Marco Cuturi, Jörn-Henrik Jacobsen
  • for: 这 paper 是为了研究 cardiovascular systems 的实验室模拟工具,以及将 physiological parameters 映射回到可能的 waveforms 中。
  • methods: 这 paper 使用 simulation-based inference (SBI) 方法,通过 statistical inference 来解决 inverse problem,并提供了 multi-dimensional 的 uncertainty 表示。
  • results: 这 paper 的研究结果表明,SBI 可以为 five biomarkers of clinical interest 提供可靠的 estimations,并且可以捕捉到 standard sensitivity analyses 无法捕捉到的实用信息,如 parameter estimation 的不同不同 uncertainty regimes。
    Abstract Over the past decades, hemodynamics simulators have steadily evolved and have become tools of choice for studying cardiovascular systems in-silico. While such tools are routinely used to simulate whole-body hemodynamics from physiological parameters, solving the corresponding inverse problem of mapping waveforms back to plausible physiological parameters remains both promising and challenging. Motivated by advances in simulation-based inference (SBI), we cast this inverse problem as statistical inference. In contrast to alternative approaches, SBI provides \textit{posterior distributions} for the parameters of interest, providing a \textit{multi-dimensional} representation of uncertainty for \textit{individual} measurements. We showcase this ability by performing an in-silico uncertainty analysis of five biomarkers of clinical interest comparing several measurement modalities. Beyond the corroboration of known facts, such as the feasibility of estimating heart rate, our study highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations for which parameter estimation exhibits distinct uncertainty regimes. Finally, we study the gap between in-vivo and in-silico with the MIMIC-III waveform database and critically discuss how cardiovascular simulations can inform real-world data analysis.
    摘要 Inspired by advances in simulation-based inference (SBI), we approach this inverse problem as a statistical inference problem. Unlike other methods, SBI provides a distribution of posterior probabilities for the parameters of interest, providing a multi-dimensional representation of uncertainty for individual measurements. We demonstrate the power of SBI by performing an in-silico uncertainty analysis of five biomarkers of clinical interest using different measurement modalities.Our study confirms known facts, such as the feasibility of estimating heart rate, and highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations with distinct uncertainty regimes. Finally, we compare in-vivo and in-silico data using the MIMIC-III waveform database and discuss how cardiovascular simulations can inform real-world data analysis.

BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery

  • paper_url: http://arxiv.org/abs/2307.13917
  • repo_url: None
  • paper_authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong
  • for: 本研究旨在推断 causal 模型的 posterior 分布,Quantifying 认知不确定性并对下游任务产生积极影响。
  • methods: 我们引入了一种可扩展的 Bayesian causal discovery 框架,基于 Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC),可以缓解现有的计算挑战。我们的方法可以直接从 posterior 中抽取 DAGs,不需要任何 DAG 正则化,同时可以同时抽取函数参数样本,并适用于线性和非线性 causal 模型。
  • results: 我们的方法在 synthetic 和实际数据上进行了 empirical 评估,与 state-of-the-art 基准集比较,并达到了更高的效果。
    Abstract Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on stochastic gradient Markov Chain Monte Carlo (SG-MCMC) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluations on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.
    摘要 bayesian causal discovery 目的是从观察数据中推断 posterior 分布 над causal models,量化 epistemic uncertainty 并且提高下游任务的表现。然而,因为joint inference над combinatorial space of Directed Acyclic Graphs (DAGs) 和 nonlinear functions 而产生计算挑战。尽管最近有些进展在 posterior inference над DAGs 上,现有的方法是 either limited to variational inference on node permutation matrices for linear causal models,导致推断精度受限,或者 continuous relaxation of adjacency matrices constrained by a DAG regularizer,无法确保得到的图是 DAGs。在这种情况下,我们提出了一个可扩展的 bayesian causal discovery 框架,基于 Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC)。我们的方法可以直接从 posterior 中抽出 DAGs,无需任何 DAG regularization,同时同时 draw function parameter samples 和可以应用于 linear 和 nonlinear causal models。为了实现我们的方法,我们 derive 了一种新的 permutation-based DAG learning 的等价关系,这使得可以使用任何 relaxed gradient estimator defined over permutations。到我们所知,这是第一个 applying gradient-based MCMC sampling for causal discovery 的框架。我们的实验表明,我们的方法可以与现有的基准值相比,在 synthetic 和实际世界数据上表现更好。

Online learning in bandits with predicted context

  • paper_url: http://arxiv.org/abs/2307.13916
  • repo_url: None
  • paper_authors: Yongyi Guo, Susan Murphy
  • for: solve the contextual bandit problem with non-diminishing context error
  • methods: extend the measurement error model in classical statistics to the online decision-making setting
  • results: achieve sublinear regret compared to the appropriate benchmark
    Abstract We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-diminishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret compared to the appropriate benchmark. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations.
    摘要 我团队考虑了 Contextual Bandit 问题,在每个时间点,机器人只有访问一个含有噪声的上下文的机会。这种设定是由各种应用场景所驱动,其中真实的决策上下文未知,仅可获取一个可能复杂的机器学习算法预测的上下文预测。当上下文噪声不逐渐减少时, classical bandit 算法无法实现 суб线性快悟。我们提出了首个在这种设定下的在线算法,与相应的 referent 相比,具有 суб线性快悟。主要想法是将 classical statistics 中的 measurement error model 扩展到在线决策设定中,这是由于决策取决于噪声上下文观测的政策依赖关系。

Graph Neural Networks-based Hybrid Framework For Predicting Particle Crushing Strength

  • paper_url: http://arxiv.org/abs/2307.13909
  • repo_url: https://github.com/doujiang-zheng/gnn-for-particle-crushing
  • paper_authors: Tongya Zheng, Tianli Zhang, Qingzheng Guan, Wenjie Huang, Zunlei Feng, Mingli Song, Chun Chen
  • for: This paper aims to characterize the mechanical behaviors of particle crushing through the connectivity of particle fragments with Graph Neural Networks (GNNs) and to facilitate the research progress of machine learning for particle crushing by generating a large-scale dataset.
  • methods: The authors use a hybrid framework based on GNNs to predict particle crushing strength in a particle fragment view, and compare their hybrid framework against traditional machine learning methods and the plain MLP to verify its effectiveness.
  • results: The authors generate a dataset with 45,000 numerical simulations and 900 particle types, and their hybrid framework achieves better performance than traditional machine learning methods and the plain MLP. They also discuss the usefulness of different features through gradient attribution explanation w.r.t the predictions.
    Abstract Graph Neural Networks have emerged as an effective machine learning tool for multi-disciplinary tasks such as pharmaceutical molecule classification and chemical reaction prediction, because they can model non-euclidean relationships between different entities. Particle crushing, as a significant field of civil engineering, describes the breakage of granular materials caused by the breakage of particle fragment bonds under the modeling of numerical simulations, which motivates us to characterize the mechanical behaviors of particle crushing through the connectivity of particle fragments with Graph Neural Networks (GNNs). However, there lacks an open-source large-scale particle crushing dataset for research due to the expensive costs of laboratory tests or numerical simulations. Therefore, we firstly generate a dataset with 45,000 numerical simulations and 900 particle types to facilitate the research progress of machine learning for particle crushing. Secondly, we devise a hybrid framework based on GNNs to predict particle crushing strength in a particle fragment view with the advances of state of the art GNNs. Finally, we compare our hybrid framework against traditional machine learning methods and the plain MLP to verify its effectiveness. The usefulness of different features is further discussed through the gradient attribution explanation w.r.t the predictions. Our data and code are released at https://github.com/doujiang-zheng/GNN-For-Particle-Crushing.
    摘要 图 neural network 已成为多学科任务的有效机器学习工具,如药品分类和化学反应预测,因为它可以模型不对称关系 между不同实体。在 civil engineering 中, particulate crushing 描述了受 fragment bond 的破坏而导致的 granular material 的破坏,这使我们想使用 GNN 来描述 particulate crushing 的机械行为。然而,由于实验室测试或数值 simulate 的成本太高,因此在 particle crushing 领域中缺乏开源大规模数据集,以便进行研究。因此,我们首先生成了一个数据集,包含 45,000 个数值 simulate 和 900 种 particule type,以促进机器学习在 particle crushing 领域的研究进步。其次,我们开发了基于 GNN 的混合框架,用于预测 particulate crushing 的强度在 particule fragment 视角中。最后,我们比较了我们的混合框架与传统机器学习方法和平方多层感知网络,以验证其效果。此外,我们还通过 gradient attribution 的解释来评估不同特征的用用。我们的数据和代码在 GitHub 上发布,请参考 https://github.com/doujiang-zheng/GNN-For-Particle-Crushing。

Robustness Verification of Deep Neural Networks using Star-Based Reachability Analysis with Variable-Length Time Series Input

  • paper_url: http://arxiv.org/abs/2307.13907
  • repo_url: None
  • paper_authors: Neelanjana Pal, Diego Manzanas Lopez, Taylor T Johnson
    for:这个研究探讨了如何使用神经网络(NN)进行资料驱动的异常探测和预测维护,并将注意力集中在时间序列资料的NN-基的分析方法上。methods:这篇论文使用了设计 Variable-length input 来简化输入处理并增强网络架构的可扩展性。它还使用了星形可达分析来检验NN的 Robustness,并使用了多个性能指标来衡量对输入噪声的影响。results:这篇论文发现,这些NN-based analytics 具有高度的Robustness,并且可以对时间序列资料进行高精度的预测。它还发现,这些NNs 对于不同的输入噪声情况下的预测结果稳定且可靠。
    Abstract Data-driven, neural network (NN) based anomaly detection and predictive maintenance are emerging research areas. NN-based analytics of time-series data offer valuable insights into past behaviors and estimates of critical parameters like remaining useful life (RUL) of equipment and state-of-charge (SOC) of batteries. However, input time series data can be exposed to intentional or unintentional noise when passing through sensors, necessitating robust validation and verification of these NNs. This paper presents a case study of the robustness verification approach for time series regression NNs (TSRegNN) using set-based formal methods. It focuses on utilizing variable-length input data to streamline input manipulation and enhance network architecture generalizability. The method is applied to two data sets in the Prognostics and Health Management (PHM) application areas: (1) SOC estimation of a Lithium-ion battery and (2) RUL estimation of a turbine engine. The NNs' robustness is checked using star-based reachability analysis, and several performance measures evaluate the effect of bounded perturbations in the input on network outputs, i.e., future outcomes. Overall, the paper offers a comprehensive case study for validating and verifying NN-based analytics of time-series data in real-world applications, emphasizing the importance of robustness testing for accurate and reliable predictions, especially considering the impact of noise on future outcomes.
    摘要 数据驱动、基于神经网络(NN)的异常检测和预测维护是当前研究领域的新兴领域。NN基于时间序列数据的分析可以为过去行为提供有价值的洞察,并估计设备的剩余有用生命(RUL)和电池的状态充电(SOC)。然而,输入时间序列数据可能会受到意外或非意外的噪声影响,因此需要Robust验证和验证这些NN。这篇论文介绍了对时间序列回归NN(TSRegNN)的Robust验证方法,使用变量长度输入数据来简化输入处理和提高网络架构的通用性。该方法在两个PHM应用领域的数据集上进行了应用:(1)离子电池SOC估计和(2)涡轮机RUL估计。NN的Robustness通过星形可达分析进行检查,并通过一些性能指标评估输入噪声对网络输出的影响,即未来的结果。总之,这篇论文提供了实际应用中验证和验证NN基于时间序列数据的分析的全面的案例研究,强调Robust验证的重要性,特别是考虑噪声对未来结果的影响。

  • paper_url: http://arxiv.org/abs/2307.13903
  • repo_url: None
  • paper_authors: Shiliang Zuo
  • for: 学习一个 lipschitz 函数,即 adversary 选择的函数 $f$。
  • methods: 使用 natural yet powerful technique sanity check,设计了 corruption-robust 算法。
  • results: 对于 symmetric loss, learner 的 regret 为 $O(C\log T)$ (其中 $d = 1$),或 $O_d(C\log T + T^{(d-1)/d})$(其中 $d > 1$);对于 pricing loss, learner 的 regret 为 $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$。
    Abstract I study the problem of learning a Lipschitz function with corrupted binary signals. The learner tries to learn a Lipschitz function $f$ that the adversary chooses. In each round, the adversary selects a context vector $x_t$ in the input space, and the learner makes a guess to the true function value $f(x_t)$ and receives a binary signal indicating whether the guess was high or low. In a total of $C$ rounds, the signal may be corrupted, though the value of $C$ is unknown to the learner. The learner's goal is to incur a small cumulative loss. I present a natural yet powerful technique sanity check, which proves useful in designing corruption-robust algorithms. I design algorithms which (treating the Lipschitz parameter $L$ as constant): for the symmetric loss, the learner achieves regret $O(C\log T)$ with $d = 1$ and $O_d(C\log T + T^{(d-1)/d})$ with $d > 1$; for the pricing loss the learner achieves regret $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$.
    摘要 我研究一个学习具有欺诈 Binary 信号的 Lipschitz 函数问题。学习者尝试学习一个由 против方选择的 Lipschitz 函数 $f$。在每个回合中,对手选择一个输入空间中的上下文向量 $x_t$,学习者对真实函数值 $f(x_t)$ 进行猜测,并接收一个指示猜测高或低的 Binary 信号。总共有 $C$ 回合,信号可能受到欺诈,尝试者不知道对手会欺诈多少回合。学习者的目标是减少总的损失。我提出了一种自然且强大的技术Check,用于设计欺诈Robust算法。我设计了算法,对于对称损失,学习者可以得到 $O(C\log T)$ 的 regret,其中 $d = 1$ 和 $O_d(C\log T + T^{(d-1)/d})$ ,其中 $d > 1$。对于价格损失,学习者可以得到 $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$。

Regularizing Neural Networks with Meta-Learning Generative Models

  • paper_url: http://arxiv.org/abs/2307.13899
  • repo_url: None
  • paper_authors: Shin’ya Yamaguchi, Daiki Chijiwa, Sekitoshi Kanai, Atsutoshi Kumagai, Hisashi Kashima
  • for: 该论文旨在提高深度学习中的生成数据增强。
  • methods: 该论文提出了一种新的生成数据增强策略 called meta generative regularization (MGR),通过在特征提取器中使用生成样本来避免生成数据增强的性能下降。
  • results: 实验结果表明,MGR可以避免生成数据增强的性能下降,并在小 dataset 设置下特别有效,稳定超过基eline。
    Abstract This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is because the synthetic samples do not perfectly represent class categories in real data and uniform sampling does not necessarily provide useful samples for tasks. In this paper, we present a novel strategy for generative data augmentation called meta generative regularization (MGR). To avoid the degradation of generative data augmentation, MGR utilizes synthetic samples in the regularization term for feature extractors instead of in the loss function, e.g., cross-entropy. These synthetic samples are dynamically determined to minimize the validation losses through meta-learning. We observed that MGR can avoid the performance degradation of na\"ive generative data augmentation and boost the baselines. Experiments on six datasets showed that MGR is effective particularly when datasets are smaller and stably outperforms baselines.
    摘要 To address this challenge, we propose a novel approach called meta generative regularization (MGR). Instead of using synthetic samples in the loss function, such as cross-entropy, MGR utilizes these samples in the regularization term for feature extractors. The synthetic samples are dynamically determined to minimize validation losses through meta-learning.We observed that MGR can effectively avoid the performance degradation of naive generative data augmentation and improve baseline performance. Our experiments on six datasets showed that MGR is particularly effective in small datasets and consistently outperforms baselines.

Efficient Estimation of the Local Robustness of Machine Learning Models

  • paper_url: http://arxiv.org/abs/2307.13885
  • repo_url: None
  • paper_authors: Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
  • for: 本文旨在提高机器学习模型对噪声输入数据的Robustness。
  • methods: 本文提出了首个分析性的 robustness estimator,通过地方线性函数近似和多变量正态分布函数,可以高效计算多类分类模型的地方Robustness。
  • results: 本文通过DERIVATION的过程,确认了这些 estimator 能够高度准确地计算标准深度学习模型的地方Robustness。此外,本文还证明了这些 estimator 在不同任务中的有用性,如测试模型的Robustness偏见和找到数据集中噪声扰动的易受到影响的示例。
    Abstract Machine learning models often need to be robust to noisy input data. The effect of real-world noise (which is often random) on model predictions is captured by a model's local robustness, i.e., the consistency of model predictions in a local region around an input. However, the na\"ive approach to computing local robustness based on Monte-Carlo sampling is statistically inefficient, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute local robustness of multi-class discriminative models using local linear function approximation and the multivariate Normal CDF. Through the derivation of these estimators, we show how local robustness is connected to concepts such as randomized smoothing and softmax probability. We also confirm empirically that these estimators accurately and efficiently compute the local robustness of standard deep learning models. In addition, we demonstrate these estimators' usefulness for various tasks involving local robustness, such as measuring robustness bias and identifying examples that are vulnerable to noise perturbation in a dataset. By developing these analytical estimators, this work not only advances conceptual understanding of local robustness, but also makes its computation practical, enabling the use of local robustness in critical downstream applications.
    摘要

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

  • paper_url: http://arxiv.org/abs/2307.13883
  • repo_url: None
  • paper_authors: Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton
  • for: 这个论文的目的是描述一种基于拆分的程序生成策略,以及这种策略在不同复杂性水平下的普适性。
  • methods: 这个论文使用了一种基于执行目标的分解策略,即预测每个步骤的执行目标,以解决问题步骤通过执行程序。
  • results: compared to基elines,这种分解策略能够更好地普适化并且具有更高的程序生成性能。
    Abstract When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, we can measure whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we characterize several different forms of compositional generalization that are desirable in program synthesis, forming a meta-benchmark which we use to create generalization tasks for two popular datasets, RobustFill and DeepCoder. We then propose ExeDec, a novel decomposition-based synthesis strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step. ExeDec has better synthesis performance and greatly improved compositional generalization ability compared to baselines.
    摘要 当编写程序时,人们可以将复杂任务分解成更加熟悉的子任务,以便更好地进行解决。虽然无法直接测量神经程序合成方法是否具有类似的能力,但我们可以测量它们是否具有组合普适性,即是否一个已经在简单子任务上训练的模型能够解决更复杂的任务。在这篇论文中,我们描述了几种不同的组合普适性形式,这些形式是程序合成中极其感兴趣的。我们使用这些形式组成一个元标准,并使用这个元标准来创建一些总体化任务,以便测试两个流行的数据集:RobustFill和DeepCoder。然后,我们提出了一种新的分解基于的合成策略,即ExeDec,这种策略可以预测执行目标,以解决问题步骤条件下。ExeDec的合成性能和组合普适性能比基eline要好。

Good Lattice Training: Physics-Informed Neural Networks Accelerated by Number Theory

  • paper_url: http://arxiv.org/abs/2307.13869
  • repo_url: None
  • paper_authors: Takashi Matsubara, Takaharu Yaguchi
  • for: 解决partial differential equations (PDEs)
  • methods: 使用physics-informed neural networks (PINNs)和good lattice training (GLT)
  • results: 需要2-20倍少的散点(相对于随机抽样或拉丁hypercube抽样),而且实现竞争性的性能
    Abstract Physics-informed neural networks (PINNs) offer a novel and efficient approach to solving partial differential equations (PDEs). Their success lies in the physics-informed loss, which trains a neural network to satisfy a given PDE at specific points and to approximate the solution. However, the solutions to PDEs are inherently infinite-dimensional, and the distance between the output and the solution is defined by an integral over the domain. Therefore, the physics-informed loss only provides a finite approximation, and selecting appropriate collocation points becomes crucial to suppress the discretization errors, although this aspect has often been overlooked. In this paper, we propose a new technique called good lattice training (GLT) for PINNs, inspired by number theoretic methods for numerical analysis. GLT offers a set of collocation points that are effective even with a small number of points and for multi-dimensional spaces. Our experiments demonstrate that GLT requires 2--20 times fewer collocation points (resulting in lower computational cost) than uniformly random sampling or Latin hypercube sampling, while achieving competitive performance.
    摘要 physics-informed neural networks (PINNs) 提供了一种新的和高效的方法来解决partial differential equations (PDEs)。其成功归功于物理学习损失,该损失训练一个神经网络满足给定PDE的特定点并估算解。然而,解析方程的解是无穷维度的,而误差的评价是通过域内积分来定义的。因此,物理学习损失只提供了有限的approximation,选择合适的集合点变得非常重要,尽管这一点经常被忽略。在这篇论文中,我们提出了一种新的技术called good lattice training (GLT) дляPINNs, inspirited by number theoretic methods for numerical analysis。GLT提供了一组高效的集合点,可以在小量的点数下在多维空间中达到竞争性表现。我们的实验表明,GLT比uniformly random sampling或Latin hypercube sampling要少2--20倍的集合点数(导致更低的计算成本),同时具有竞争性的表现。

Learning sources of variability from high-dimensional observational studies

  • paper_url: http://arxiv.org/abs/2307.13868
  • repo_url: https://github.com/ebridge2/cdcorr
  • paper_authors: Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, Joshua T. Vogelstein
    for: 这个论文是关于 causal inference 的研究,即判断变量是否对观察结果产生影响的问题。methods: 这篇论文使用了一种新的方法,即将 causal estimands 扩展到包括任意维度或任意测量空间的输出,并将 Nominal 变量的 causal estimands 表示为 causal discrepancy tests。results: 该方法在数据分析中具有较好的finite sample validity和power性能,并且可以在 GitHub 上获取开源代码。
    Abstract Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.
    摘要 causal inference studies whether a variable's presence affects an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. unfortunately, most methods are limited to univariate outcomes. our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. we propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. numerical experiments illustrate that our method, causal CDcorr, leads to improvements in both finite sample validity and power compared to existing strategies. our methods are all open source and available at github.com/ebridge2/cdcorr.Here's the translation breakdown:* "causal inference" is causal inference ( causal 推断)* "whether a variable's presence affects an observed outcome" is whether a variable's presence affects an observed outcome ( 变量存在影响观察结果)* "as measured by quantities such as the 'average treatment effect'" is as measured by quantities such as the 'average treatment effect' ( 根据 "average treatment effect" 类型的量进行测量)* "this paradigm is employed across numerous biological fields" is this paradigm is employed across numerous biological fields ( 这种方法在生物学多个领域中使用)* "from vaccine and drug development to policy interventions" is from vaccine and drug development to policy interventions ( 从疫苗和药物开发到政策干预)* "unfortunately, most methods are limited to univariate outcomes" is unfortunately, most methods are limited to univariate outcomes ( unfortunately, most methods are limited to univariate outcomes)* "our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space" is our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space ( 我们的工作泛化 causal estimands 到任意维度或任意测量空间)* "and formulates traditional causal estimands for nominal variables as causal discrepancy tests" is and formulates traditional causal estimands for nominal variables as causal discrepancy tests ( 并将 traditional causal estimands для nominal variables 表示为 causal discrepancy tests)* "we propose a simple technique for adjusting universally consistent conditional independence tests" is we propose a simple technique for adjusting universally consistent conditional independence tests ( 我们提出一种简单的方法来调整 universally consistent conditional independence tests)* "and prove that these tests are universally consistent causal discrepancy tests" is and prove that these tests are universally consistent causal discrepancy tests ( 并证明这些测试是 universally consistent causal discrepancy tests)* "numerical experiments illustrate that our method, causal CDcorr, leads to improvements in both finite sample validity and power" is numerical experiments illustrate that our method, causal CDcorr, leads to improvements in both finite sample validity and power ( 数学实验表明,我们的方法 causal CDcorr 在finite sample validity 和 power 两个方面都有改善)* "our methods are all open source and available at github.com/ebridge2/cdcorr" is our methods are all open source and available at github.com/ebridge2/cdcorr ( 我们的方法都是开源的,可以在 github.com/ebridge2/cdcorr 上获取)

Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT

  • paper_url: http://arxiv.org/abs/2307.13865
  • repo_url: None
  • paper_authors: Taha Emre, Marzieh Oghbaie, Arunava Chakravarty, Antoine Rivail, Sophie Riedl, Julia Mai, Hendrik P. N. Scholl, Sobha Sivaprasad, Daniel Rueckert, Andrew Lotery, Ursula Schmidt-Erfurth, Hrvoje Bogunović
    for: 这种研究旨在提高静脉穿梭图像处理领域的肿瘤诊断和预测性能。methods: 该研究采用了2.5D结构,结合了卷积神经网络(CNN)、长短期记忆(LSTM)和变换器,并利用了非对照预训练方法。results: 研究表明,该方法可以在两个大型静脉穿梭数据集上预测肿瘤患者在六个月内到达湿性肿瘤(AMD)的患病程度,并且比传统方法提高了性能和数据使用效率。
    Abstract In the field of medical imaging, 3D deep learning models play a crucial role in building powerful predictive models of disease progression. However, the size of these models presents significant challenges, both in terms of computational resources and data requirements. Moreover, achieving high-quality pretraining of 3D models proves to be even more challenging. To address these issues, hybrid 2.5D approaches provide an effective solution for utilizing 3D volumetric data efficiently using 2D models. Combining 2D and 3D techniques offers a promising avenue for optimizing performance while minimizing memory requirements. In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers. In addition, leveraging the benefits of recent non-contrastive pretraining approaches in 2D, we enhanced the performance and data efficiency of 2.5D techniques even further. We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period on two large longitudinal OCT datasets.
    摘要 医学成像领域中,3D深度学习模型在建立疾病进程预测模型方面发挥关键作用。然而,这些模型的大小带来了计算资源和数据需求的挑战。同时,获得高质量预训练3D模型也是极其困难的。为解决这些问题,混合2.5D方法提供了高效地利用3D体积数据的解决方案。将2D和3D技术结合使用,可以备受提高性能的同时减少内存需求。在本文中,我们探讨了基于卷积神经网络(CNN)、长期吸引记忆(LSTM)和转换器的2.5D架构。此外,利用最近的非对照预训练方法在2D中,我们进一步提高了2.5D技术的性能和数据效率。我们在两个大型Longitudinal OCT数据集上证明了这些架构和预训练的效果,用于预测在6个月内进行湿性肿瘤性macular degeneration(AMD)的进程。

Learning to Design Analog Circuits to Meet Threshold Specifications

  • paper_url: http://arxiv.org/abs/2307.13861
  • repo_url: https://github.com/indylab/circuit-synthesis
  • paper_authors: Dmitrii Krylov, Pooya Khajeh, Junhan Ouyang, Thomas Reeves, Tongkai Liu, Hiba Ajmal, Hamidreza Aghasi, Roy Fox
  • for: 本文是关于自动化分析和无线电电路设计的研究,使用supervised或反射学习从 simulation 数据中学习。
  • methods: 本文提出了一种方法,通过在 simulation 数据上生成一个数据集,以便通过supervised learning训练系统,以实现适用于阈值要求的电路设计。
  • results: 本文的实验结果表明,该方法可以在5%错误率下达到90%的成功率,同时提高数据效率。 Demo 系统可以在 circuits.streamlit.app 上测试。
    Abstract Automated design of analog and radio-frequency circuits using supervised or reinforcement learning from simulation data has recently been studied as an alternative to manual expert design. It is straightforward for a design agent to learn an inverse function from desired performance metrics to circuit parameters. However, it is more common for a user to have threshold performance criteria rather than an exact target vector of feasible performance measures. In this work, we propose a method for generating from simulation data a dataset on which a system can be trained via supervised learning to design circuits to meet threshold specifications. We moreover perform the to-date most extensive evaluation of automated analog circuit design, including experimenting in a significantly more diverse set of circuits than in prior work, covering linear, nonlinear, and autonomous circuit configurations, and show that our method consistently reaches success rate better than 90% at 5% error margin, while also improving data efficiency by upward of an order of magnitude. A demo of this system is available at circuits.streamlit.app
    摘要 自动化的分析和广播逻辑电路设计使用监督或强化学习从 simulate 数据进行研究,以代替人工专家设计。它直观的 для 设计代理人学习一个逆函数从需要性能指标到电路参数。然而,更常见的用户有阈值性能标准而不是具体的可行性表现度量Vector。在这项工作中,我们提出了一种从 simulate 数据中生成一个可以通过监督学习训练系统来满足阈值要求的数据集。我们进一步执行了之前的自动分析电路设计评估,包括在更加多样化的电路配置中进行实验,包括线性、非线性和自主电路配置,并证明了我们的方法可以在 5% 误差率下达到 Better than 90% 的成功率,同时也提高了数据效率,高达一个数量级。一个示例系统可以在 circuits.streamlit.app 上查看。

On the unreasonable vulnerability of transformers for image restoration – and an easy fix

  • paper_url: http://arxiv.org/abs/2307.13856
  • repo_url: None
  • paper_authors: Shashank Agnihotri, Kanchana Vaishnavi Gandikota, Julia Grabinski, Paramanand Chandramouli, Margret Keuper
  • for: 这个论文主要研究了使用视Transformer(ViT)进行图像修复 task 的 robustness 性能。
  • methods: 作者使用了 Projected Gradient Descent(PGD)和 Cosine PGD(CosPGD)等 adversarial attack 方法进行评估。
  • results: 研究发现,Restormer 模型在 GoPro dataset 上的图像锐化任务中具有较高的Robustness,但 NAAFNet 和 Baseline 模型的Robustness 较差。通过对 Restormer 进行 adversarial training,可以得到显著提高的Robustness。
    Abstract Following their success in visual recognition tasks, Vision Transformers(ViTs) are being increasingly employed for image restoration. As a few recent works claim that ViTs for image classification also have better robustness properties, we investigate whether the improved adversarial robustness of ViTs extends to image restoration. We consider the recently proposed Restormer model, as well as NAFNet and the "Baseline network" which are both simplified versions of a Restormer. We use Projected Gradient Descent (PGD) and CosPGD, a recently proposed adversarial attack tailored to pixel-wise prediction tasks for our robustness evaluation. Our experiments are performed on real-world images from the GoPro dataset for image deblurring. Our analysis indicates that contrary to as advocated by ViTs in image classification works, these models are highly susceptible to adversarial attacks. We attempt to improve their robustness through adversarial training. While this yields a significant increase in robustness for Restormer, results on other networks are less promising. Interestingly, the design choices in NAFNet and Baselines, which were based on iid performance, and not on robust generalization, seem to be at odds with the model robustness. Thus, we investigate this further and find a fix.
    摘要 根据视觉任务的成功,视觉转换器(ViT)在图像修复领域中得到了越来越多的应用。一些最近的研究表明,ViT在图像分类任务中也有更好的鲁棒性质,我们来 investigate这些鲁棒性质是否扩展到图像修复领域。我们考虑了最近提出的Restormer模型,以及NAFNet和"基eline网络",这两者都是Restormer的简化版本。我们使用项目化梯度下降(PGD)和CosPGD,一种最近提出的针对像精度预测任务的抗击攻击方法来评估我们的模型的鲁棒性。我们的实验是在GoPro dataset上进行了实际图像锐化任务。我们的分析表明,与在图像分类任务中所提出的鲁棒性相反,这些模型对抗击攻击非常易受伤。我们尝试通过对模型进行鲁棒训练来提高其鲁棒性。而对Restormer来说,这种方法带来了显著的鲁棒性提高,但对NAFNet和基eline网络来说,结果不那么抵触。我们进一步调查这个问题,发现iid性的设计选择在NAFNet和基eline网络中不是适合的。因此,我们进一步调查这个问题,并发现了一个解决方案。

Exploring the Sharpened Cosine Similarity

  • paper_url: http://arxiv.org/abs/2307.13855
  • repo_url: None
  • paper_authors: Skyler Wu, Fred Lu, Edward Raff, James Holt
  • for: 对比 convolutional layers,这篇论文探讨了使用 Sharpened Cosine Similarity (SCS) 来替换图像分类模型的可能性。
  • methods: 这篇论文使用了 SCS 的参数行为和多种 CNN 架构的比较,以及对 CIFAR-10 数据集的测试。
  • results: 论文发现,使用 SCS 可能不会提高准确率,但可能学习更易于理解的特征表示。此外,在某些情况下,SCS 可能会提高对抗式攻击的Robustness。
    Abstract Convolutional layers have long served as the primary workhorse for image classification. Recently, an alternative to convolution was proposed using the Sharpened Cosine Similarity (SCS), which in theory may serve as a better feature detector. While multiple sources report promising results, there has not been to date a full-scale empirical analysis of neural network performance using these new layers. In our work, we explore SCS's parameter behavior and potential as a drop-in replacement for convolutions in multiple CNN architectures benchmarked on CIFAR-10. We find that while SCS may not yield significant increases in accuracy, it may learn more interpretable representations. We also find that, in some circumstances, SCS may confer a slight increase in adversarial robustness.
    摘要 卷积层长期作为图像分类的主要工具。最近,一种使用卷积的替代方案,即加剪极值相似性(SCS),被提出。虽然多种来源报道了批处性的结果,但到目前为止没有一个全面的实验分析了神经网络性能使用这些新层。在我们的工作中,我们探索SCS的参数行为和作为替换卷积层的潜在可能性。我们发现SCS可能不会导致显著提高准确率,但可能学习更易于理解的表示。我们还发现,在某些情况下,SCS可能会提供轻微的防御性提升。

WebArena: A Realistic Web Environment for Building Autonomous Agents

  • paper_url: http://arxiv.org/abs/2307.13854
  • repo_url: https://github.com/web-arena-x/webarena
  • paper_authors: Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
  • for: 这篇论文的目的是为了建立一个高度真实和可重现的自动控制代理环境,以便测试和评估基于自然语言命令的自动代理。
  • methods: 这篇论文使用了现有的自然语言处理技术,如GPT-4,以及一些特定的推理和决策策略,来实现自动代理的功能。
  • results: 根据这篇论文的结果,当前的状态对抗算法仍然有很大的改进空间,特别是在解决复杂的实际任务时。最好的GPT-4基本代理只有10.59%的任务完成率。这些结果表明,在实际任务中,自动代理仍然需要进一步的发展和改进。
    Abstract With generative AI advances, the exciting potential for autonomous agents to manage daily tasks via natural language commands has emerged. However, cur rent agents are primarily created and tested in simplified synthetic environments, substantially limiting real-world scenario representation. In this paper, we build an environment for agent command and control that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on websites, and we create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (e.g., a map) and external knowledge bases (e.g., user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and are designed to emulate tasks that humans routinely perform on the internet. We design and implement several autonomous agents, integrating recent techniques such as reasoning before acting. The results demonstrate that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 10.59%. These results highlight the need for further development of robust agents, that current state-of-the-art LMs are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress. Our code, data, environment reproduction resources, and video demonstrations are publicly available at https://webarena.dev/.
    摘要 With the advancement of generative AI, the possibility of autonomous agents managing daily tasks through natural language commands has emerged. However, current agents are primarily developed and tested in simplified synthetic environments, which significantly limits the representation of real-world scenarios. In this paper, we create an environment for agent command and control that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on websites, and we create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (such as a map) and external knowledge bases (such as user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks that focus on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and are designed to emulate tasks that humans routinely perform on the internet. We design and implement several autonomous agents, integrating recent techniques such as reasoning before acting. The results show that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 10.59%. These results highlight the need for further development of robust agents, that current state-of-the-art language models are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress. Our code, data, environment reproduction resources, and video demonstrations are publicly available at .

SplitFed resilience to packet loss: Where to split, that is the question

  • paper_url: http://arxiv.org/abs/2307.13851
  • repo_url: None
  • paper_authors: Chamani Shiranthika, Zahra Hafezi Kafshgari, Parvaneh Saeedi, Ivan V. Bajić
  • for: 这篇论文是研究分布式机器学习中的 Federation Learning(FL)和 Split Learning(SL)的hybrid模型 Split Federated Learning(SplitFed或SFL)的稳定性。
  • methods: 这篇论文使用了在 коммуникаation链上的包 loss对 SFL的影响的研究,并对不同的 SFL 聚合策略进行了测试,包括在模型中分割点的深度。
  • results: 实验结果表明,在人类胚胎图像分割模型中,使用深度分割点可以获得更高的准确率。
    Abstract Decentralized machine learning has broadened its scope recently with the invention of Federated Learning (FL), Split Learning (SL), and their hybrids like Split Federated Learning (SplitFed or SFL). The goal of SFL is to reduce the computational power required by each client in FL and parallelize SL while maintaining privacy. This paper investigates the robustness of SFL against packet loss on communication links. The performance of various SFL aggregation strategies is examined by splitting the model at two points -- shallow split and deep split -- and testing whether the split point makes a statistically significant difference to the accuracy of the final model. Experiments are carried out on a segmentation model for human embryo images and indicate the statistically significant advantage of a deeper split point.
    摘要 分布式机器学习在最近已经扩展了其范围,包括联邦学习(FL)、分布式学习(SL)以及其混合体(SplitFed或SFL)。SFL的目标是降低每个客户端在FL中计算能力需求并并行SL,同时保持隐私。这篇论文研究了SFL在通信链路上 packet loss 的Robustness。具体来说,该论文分析了不同的SFL聚合策略在不同的分割点(浅分割和深分割)下的性能,并测试了这些分割点是否对最终模型的准确率产生了统计学上的影响。实验使用了人类胚胎图像分割模型,结果表明深分割点具有统计学上的优势。

MAEA: Multimodal Attribution for Embodied AI

  • paper_url: http://arxiv.org/abs/2307.13850
  • repo_url: None
  • paper_authors: Vidhi Jain, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Yonatan Bisk
  • for: 本研究旨在理解多Modal perception дляembodied AI,因为输入可能包含高度相互补充的信息。
  • methods: 本研究使用了解归报分析来理解不同策略在ALFRED数据集上的全球趋势。
  • results: 研究发现了一种名为MAEA的框架,可以计算任意分 diferenciable 策略的全球贡献。此外,研究还表明了如何使用贡献来分析低级行为在EAI策略中。
    Abstract Understanding multimodal perception for embodied AI is an open question because such inputs may contain highly complementary as well as redundant information for the task. A relevant direction for multimodal policies is understanding the global trends of each modality at the fusion layer. To this end, we disentangle the attributions for visual, language, and previous action inputs across different policies trained on the ALFRED dataset. Attribution analysis can be utilized to rank and group the failure scenarios, investigate modeling and dataset biases, and critically analyze multimodal EAI policies for robustness and user trust before deployment. We present MAEA, a framework to compute global attributions per modality of any differentiable policy. In addition, we show how attributions enable lower-level behavior analysis in EAI policies for language and visual attributions.
    摘要 (简体中文)理解多模态识别对带有体的AI是一个开放的问题,因为这些输入可能包含高度相互补做的信息。一个有利的方向是理解每个模态的全球趋势在融合层。为此,我们分离不同策略在ALFRED数据集上训练的视觉、语言和前一个动作输入的归因分析。归因分析可以用来排序和分组失败场景,调查模型和数据集偏见,并在部署前对多模态EAI策略进行 crítico分析。我们提出了MAEA框架,用于计算任意可导策略的全球归因。此外,我们还示出了归因如何帮助分析EAI策略的下一个行为。

  • paper_url: http://arxiv.org/abs/2307.13831
  • repo_url: None
  • paper_authors: Yuki Tsukada, Hideaki Iiduka
  • for: 本研究探讨了使用散射梯度下降(SGD)训练深度学习模型时的收敛分析。
  • methods: 本研究使用了Armijo线搜索学习率来实现SGD的收敛分析。
  • results: 研究结果表明,当步长和批处理大 enough时,SGD的期望平方误差的Upper bound会很小。此外,研究还发现,SGD WITH Armijo-line-search 学习率,批处理大小的增加会使得训练深度学习模型所需的步长数量减少。最后,研究还发现了一个关键的批处理大小,可以最小化Stochastic first-order oracle(SFO)复杂度。 numerics 支持了理论结果,它们表明,训练深度学习模型所需的步长数量随批处理大小的增加而减少,并且存在一个关键的批处理大小。
    Abstract Stochastic gradient descent (SGD) is the simplest deep learning optimizer with which to train deep neural networks. While SGD can use various learning rates, such as constant or diminishing rates, the previous numerical results showed that SGD performs better than other deep learning optimizers using when it uses learning rates given by line search methods. In this paper, we perform a convergence analysis on SGD with a learning rate given by an Armijo line search for nonconvex optimization. The analysis indicates that the upper bound of the expectation of the squared norm of the full gradient becomes small when the number of steps and the batch size are large. Next, we show that, for SGD with the Armijo-line-search learning rate, the number of steps needed for nonconvex optimization is a monotone decreasing convex function of the batch size; that is, the number of steps needed for nonconvex optimization decreases as the batch size increases. Furthermore, we show that the stochastic first-order oracle (SFO) complexity, which is the stochastic gradient computation cost, is a convex function of the batch size; that is, there exists a critical batch size that minimizes the SFO complexity. Finally, we provide numerical results that support our theoretical results. The numerical results indicate that the number of steps needed for training deep neural networks decreases as the batch size increases and that there exist the critical batch sizes that can be estimated from the theoretical results.
    摘要 Stochastic gradient descent(SGD)是深度学习优化器中最简单的一种,用于训练深度神经网络。SGD可以使用不同的学习率,如常数或减小学习率,但前面的数据分析表明SGD使用给定的线搜索方法学习率时表现比其他深度学习优化器更好。在这篇论文中,我们进行了SGD的收敛分析,其中SGD使用Armijo线搜索学习率进行非 convex 优化。分析结果表明,当数据步长和批处理大小增加时,SGD的期望平方误差的上界变小。然后,我们证明SGD使用Armijo-线搜索学习率时,非 convex 优化的数据步长是增加批处理大小的 monotone 减少函数;即,数据步长随着批处理大小增加而逐渐减少。此外,我们证明SGD的杂乱首项 Oracle(SFO)复杂度,即杂乱首项计算成本,是批处理大小的几何函数;即存在一个最佳批处理大小,可以最小化SFO复杂度。最后,我们提供了实际数据支持我们的理论结果。实际数据表明,训练深度神经网络时,数据步长随着批处理大小增加而逐渐减少,并且存在一个最佳批处理大小,可以从理论结果中估算。

Offline Reinforcement Learning with On-Policy Q-Function Regularization

  • paper_url: http://arxiv.org/abs/2307.13824
  • repo_url: None
  • paper_authors: Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist
  • for: 本研究的目的是解决线上强化学习(RL)中的扩展错误问题,特别是在history集和期望策略之间的分布转换导致的扩展错误。
  • methods: 本研究使用Q函数估计来正则化学习策略,而不是直接正则化策略本身,以便更好地处理扩展错误。
  • results: 提出了两种基于Q函数估计的算法,并在D4RL标准吨量上表现出色。
    Abstract The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
    摘要 核心挑战是线上强化学习(RL)是处理可能导致极端的推理错误的分布shift问题。大部分先前工作是通过隐式/显式正则化学习策略向行为策略,这是在实践中难以估算的。在这种工作中,我们建议正则化学习向行为策略的Q函数,而不是行为策略本身,因为Q函数可以更好地被SARSA预测器估算,并且更直观地处理推理错误。我们提出了两种利用估算Q函数的算法,并在D4RL标准准则上展示强大的表现。

Fitting Auditory Filterbanks with Multiresolution Neural Networks

  • paper_url: http://arxiv.org/abs/2307.13821
  • repo_url: https://github.com/lostanlen/lostanlen2023waspaa
  • paper_authors: Vincent Lostanlen, Daniel Haider, Han Han, Mathieu Lagrange, Peter Balazs, Martin Ehler
  • for: 这paper的目的是超越深度学习抽象波形模型中的非Parametric和Parametric两种方法之间的矛盾。
  • methods: 这paper使用了一种名为多解析 neural network (MuReNN),它是通过在快速傅立叶变换 (DWT) 中的 Octave 下分割,然后对每个 Octave 下的傅立叶函数进行分割学习的 convolutional neural network (CNN)。
  • results: compare to state of the art, MuReNN 在一些优化问题上达到了最佳性能,包括 hold-out set 上的好き度Of fit 和 Heisenberg time-frequency localization。
    Abstract Waveform-based deep learning faces a dilemma between nonparametric and parametric approaches. On one hand, convolutional neural networks (convnets) may approximate any linear time-invariant system; yet, in practice, their frequency responses become more irregular as their receptive fields grow. On the other hand, a parametric model such as LEAF is guaranteed to yield Gabor filters, hence an optimal time-frequency localization; yet, this strong inductive bias comes at the detriment of representational capacity. In this paper, we aim to overcome this dilemma by introducing a neural audio model, named multiresolution neural network (MuReNN). The key idea behind MuReNN is to train separate convolutional operators over the octave subbands of a discrete wavelet transform (DWT). Since the scale of DWT atoms grows exponentially between octaves, the receptive fields of the subsequent learnable convolutions in MuReNN are dilated accordingly. For a given real-world dataset, we fit the magnitude response of MuReNN to that of a well-established auditory filterbank: Gammatone for speech, CQT for music, and third-octave for urban sounds, respectively. This is a form of knowledge distillation (KD), in which the filterbank ''teacher'' is engineered by domain knowledge while the neural network ''student'' is optimized from data. We compare MuReNN to the state of the art in terms of goodness of fit after KD on a hold-out set and in terms of Heisenberg time-frequency localization. Compared to convnets and Gabor convolutions, we find that MuReNN reaches state-of-the-art performance on all three optimization problems.
    摘要 文本描述一个深度学习问题,即 между非 Parametric 和 Parametric 方法之间的矛盾。一种方法是使用卷积神经网络(convnets),它们可以近似任何线性时变系统;然而,在实践中,它们的频谱响应会随着它们的观测领域而变得更加不规则。另一方面,一种 Parametric 模型如 LEAF 可以提供最佳的时间频率地址,但是这种强制性的假设来到了表达能力的代价。本文的目标是解决这个矛盾,通过引入多尺度神经网络(MuReNN)。MuReNN 的关键思想是在 discrete wavelet transform(DWT) 中的 Octave 子域上训练分离的卷积操作。由于 DWT 中的尺度很大的 Atom 在不同 Octave 中的尺度增长,因此 MuReNN 中的后续学习可以在不同 Octave 中进行扩展。对于一个真实的数据集,我们将 MuReNN 的 магниту德响应与一个已知的听力滤波器anka 进行比较:Gammatone для语音、CQT для音乐和第三Octave для城市声音,分别。这是一种知识填充(KD),在听力滤波器“教师”是由领域知识工程而成,而神经网络“学生”是通过数据优化。我们将 MuReNN 与现状的最佳性比较,包括在 KD 中的准确性和 Heisenberg 时间频率地址。相比 convnets 和 Gabor 卷积,我们发现 MuReNN 在三个优化问题中达到了状态的最佳性。

Gradient-Based Spectral Embeddings of Random Dot Product Graphs

  • paper_url: http://arxiv.org/abs/2307.13818
  • repo_url: https://github.com/marfiori/efficient-ase
  • paper_authors: Marcelo Fiori, Bernardo Marenco, Federico Larroca, Paola Bermolen, Gonzalo Mateos
  • for: 这篇论文旨在提出一种基于非对映准则的关系数据生成模型,以及一种基于非对映准则的图像抽象算法,以解决现有的图像抽象算法存在的问题。
  • methods: 该论文使用了非对映准则优化方法来解决图像抽象问题,并且提出了一种新的 feasible 优化方法来保证对角矩阵的正则性。
  • results: 该论文通过 reproduce 性的实验表明,提出的图像抽象算法可以更好地处理实际网络数据,并且可以更好地捕捉网络数据的变化特征。
    Abstract The Random Dot Product Graph (RDPG) is a generative model for relational data, where nodes are represented via latent vectors in low-dimensional Euclidean space. RDPGs crucially postulate that edge formation probabilities are given by the dot product of the corresponding latent positions. Accordingly, the embedding task of estimating these vectors from an observed graph is typically posed as a low-rank matrix factorization problem. The workhorse Adjacency Spectral Embedding (ASE) enjoys solid statistical properties, but it is formally solving a surrogate problem and can be computationally intensive. In this paper, we bring to bear recent advances in non-convex optimization and demonstrate their impact to RDPG inference. We advocate first-order gradient descent methods to better solve the embedding problem, and to organically accommodate broader network embedding applications of practical relevance. Notably, we argue that RDPG embeddings of directed graphs loose interpretability unless the factor matrices are constrained to have orthogonal columns. We thus develop a novel feasible optimization method in the resulting manifold. The effectiveness of the graph representation learning framework is demonstrated on reproducible experiments with both synthetic and real network data. Our open-source algorithm implementations are scalable, and unlike the ASE they are robust to missing edge data and can track slowly-varying latent positions from streaming graphs.
    摘要 “Random Dot Product Graph(RDPG)是一种生成模型,用于关系数据,节点通过低维欧几何空间中的latent vector表示。RDPG假设边形成概率由latent vector的点积生成。因此,从观察到的图像到latent vector的嵌入问题通常是一个低级matrix factorization问题。ASE工作马力广泛应用,但是它是一个代理问题,可能会 computationally expensive。在这篇论文中,我们利用非 convex 优化的最新进展,并证明其对RDPG推理的影响。我们建议使用first-order gradient descent方法,以更好地解决嵌入问题,并且可以自然地承载更广泛的网络嵌入应用。另外,我们发现RDPG对于指向图的嵌入具有解释性的限制,因此我们开发了一种新的可行优化方法。我们的图表示学框架在 reproduceable 实验中表现出色,可扩展性和稳定性都很好。”Note: Simplified Chinese is used in this translation, which is a more casual and widely-used version of Chinese. If you prefer Traditional Chinese, I can provide that version as well.

How to Scale Your EMA

  • paper_url: http://arxiv.org/abs/2307.13813
  • repo_url: https://github.com/ZulqarnainZilli/-9-Email-Marketing-Tips-For-Content-Marketers
  • paper_authors: Dan Busbridge, Jason Ramapuram, Pierre Ablin, Tatiana Likhomanenko, Eeshan Gunesh Dhekane, Xavier Suau, Russ Webb
  • for: This paper aims to improve the practicality of machine learning by preserving training dynamics across batch sizes, enabling the trade-off between batch size and wall-clock time.
  • methods: The paper proposes a scaling rule for optimization in the presence of model Exponential Moving Averages (EMAs), which can improve the robustness and generalization properties of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL).
  • results: The paper demonstrates the validity of the scaling rule across a range of architectures, optimizers, and data modalities, and shows that the rule enables training of EMA-based pseudo-labeling and SSL methods at small and large batch sizes. Additionally, the paper achieves a 6x wall-clock time reduction for training BYOL up to batch size 24,576 without sacrificing performance.
    Abstract Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important tool for practical machine learning is the model Exponential Moving Average (EMA), which is a model copy that does not receive gradient information, but instead follows its target model with some momentum. This model EMA can improve the robustness and generalization properties of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL). Prior works have treated the model EMA separately from optimization, leading to different training dynamics across batch sizes and lower model performance. In this work, we provide a scaling rule for optimization in the presence of model EMAs and demonstrate its validity across a range of architectures, optimizers, and data modalities. We also show the rule's validity where the model EMA contributes to the optimization of the target model, enabling us to train EMA-based pseudo-labeling and SSL methods at small and large batch sizes. For SSL, we enable training of BYOL up to batch size 24,576 without sacrificing performance, optimally a 6$\times$ wall-clock time reduction.
    摘要 保持批处理大小中的训练动力是实用机器学习中重要的工具,因为它允许批处理大小和墙 clock 时间之间的交换。这种交换通常是通过缩放规则实现,例如在杂散Gradient Descent中,应该将学习率linearly缩放与批处理大小。另一个重要的实用机器学习工具是模型Exponential Moving Average(EMA),它是一个不接受梯度信息,而是跟随其目标模型的模型 copier ,可以提高超vised learning的稳定性和泛化性,稳定pseudo-labeling,并为Self-Supervised Learning(SSL)提供学习信号。先前的工作将模型 EMA 分离于优化,导致不同的批处理大小中的训练动力,并降低模型性能。在这项工作中,我们提供了优化过程中模型 EMA 的缩放规则,并证明其在不同的架构、优化器和数据模式下的有效性。我们还证明了这种规则在模型 EMA 对目标模型优化的情况下,可以训练 Pseudo-labeling 和 SSL 方法,包括在小批处理大小和大批处理大小下进行训练。为 SSL,我们可以在批处理大小为 24576 的情况下训练 BYOL,无需牺牲性能,实现了墙 clock 时间的6倍减少。

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

  • paper_url: http://arxiv.org/abs/2307.14382
  • repo_url: None
  • paper_authors: Maxime Fontana, Michael Spratling, Miaojing Shi
  • for: 这个论文旨在探讨多任务学习(MTL)在不同部分监督设置下如何实现。
  • methods: 论文使用了不同的参数共享技术来传递知识 между任务。
  • results: 论文介绍了多任务学习引入了多个目标函数的优化问题和挑战,并提出了根据任务关系分组任务的方法来解决这些挑战。
    Abstract Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. By using shared resources to simultaneously calculate multiple outputs, this learning paradigm has the potential to have lower memory requirements and inference times compared to the traditional approach of using separate methods for each task. Previous work in MTL has mainly focused on fully-supervised methods, as task relationships can not only be leveraged to lower the level of data-dependency of those methods but they can also improve performance. However, MTL introduces a set of challenges due to a complex optimisation scheme and a higher labeling requirement. This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges. First, this review analyses how MTL traditionally uses different parameter sharing techniques to transfer knowledge in between tasks. Second, it presents the different challenges arising from such a multi-objective optimisation scheme. Third, it introduces how task groupings can be achieved by analysing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, this review presents the available datasets, tools and benchmarking results of such methods.
    摘要 First, the review analyzes how MTL traditionally uses parameter sharing techniques to transfer knowledge between tasks. Second, it discusses the challenges arising from the multi-objective optimization scheme. Third, it introduces how task groupings can be achieved by analyzing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, the review presents available datasets, tools, and benchmarking results of such methods.Translation notes:* "Multi-Task Learning" (MTL) is translated as "多任务学习" (duō rèn shì xué yì)* "simultaneously" is translated as "同时" (tóng shí)* "exploiting their mutual relationships" is translated as "利用它们之间的关系" (lì yòng tā men zhī jiān de guān xì)* "by using shared resources" is translated as "通过共享资源" (tōng qián gòng yè zī yuán)* "lower memory requirements" is translated as "减少内存需求" (jiǎn shang nèi yì yè xū yè)* "inference times" is translated as "推理时间" (tuī lǐ shí jiān)* "traditional approach" is translated as "传统方法" (chuán tǒng fāng fǎ)* "using separate methods for each task" is translated as "使用单独的方法处理每个任务" (shǐ yòu dan zuò de fāng fǎ xíng yì jīn yè)* "lower data-dependency" is translated as "降低数据依赖" (jiàng dào xù xiàng yì yāng)* "improve performance" is translated as "提高性能" (tím gāo xìng néng)* "mainly focused" is translated as "主要强调" (zhǔ yào qiáng dào)* "fully-supervised methods" is translated as "完全监督的方法" (quán zhěn jiān dū de fāng fǎ)* "partially supervised methods" is translated as "部分监督的方法" (bùzhèng jiān dū de fāng fǎ)* "task relationships" is translated as "任务关系" (tāsk guān xì)* "multi-objective optimization scheme" is translated as "多目标优化方案" (duō mù zhì yǎo fāng yì)* "higher labeling requirement" is translated as "更高的标签要求" (gèng gāo de biāo hǎo yào qiú)* "analyzing task relationships" is translated as "分析任务关系" (fēn xiǎo tāsk guān xì)* "task groupings" is translated as "任务分组" (tāsk fēn yè)* "partially supervised methods applied to MTL" is translated as "对MTL应用部分监督的方法" (duì MTL yì yù bùzhèng jiān dū de fāng fǎ)* "available datasets, tools, and benchmarking results" is translated as "可用的数据集、工具和比较结果" (kě yòu de xiàng jì, gōng jī, bǐ jiào jié yì)

EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence

  • paper_url: http://arxiv.org/abs/2307.14381
  • repo_url: None
  • paper_authors: Ilkay Sikdokur, İnci M. Baytaş, Arda Yurdakul
  • for: 这篇论文旨在提出一种基于 Edge 的深度学习方法,以便在Edge网络中进行计算成本高的训练,同时解决了数据隐私问题。
  • methods: 该方法使用了分布式学习方法,例如联邦学习,来在Edge设备上训练多种弱型模型,并将其ensemble成一个更高级别的模型。Edge设备上实现和训练独立的FPGA设备,并将学习到的数据表示转移到中央服务器进行集成训练。
  • results: 实验结果表明,EdgeConvEns 可以在不同训练场景下超越当前最佳性能,并且需要 fewer communications 和 less data。
    Abstract Deep edge intelligence aims to deploy deep learning models that demand computationally expensive training in the edge network with limited computational power. Moreover, many deep edge intelligence applications require handling distributed data that cannot be transferred to a central server due to privacy concerns. Decentralized learning methods, such as federated learning, offer solutions where models are learned collectively by exchanging learned weights. However, they often require complex models that edge devices may not handle and multiple rounds of network communication to achieve state-of-the-art performances. This study proposes a convolutional ensemble learning approach, coined EdgeConvEns, that facilitates training heterogeneous weak models on edge and learning to ensemble them where data on edge are heterogeneously distributed. Edge models are implemented and trained independently on Field-Programmable Gate Array (FPGA) devices with various computational capacities. Learned data representations are transferred to a central server where the ensemble model is trained with the learned features received from the edge devices to boost the overall prediction performance. Extensive experiments demonstrate that the EdgeConvEns can outperform the state-of-the-art performance with fewer communications and less data in various training scenarios.
    摘要 深入智能目标是在边缘网络中部署需要计算资源充足的训练深度学习模型。此外,许多深入智能应用程序需要处理分布式数据,这些数据不能被传输到中央服务器 Due to privacy concerns. 联合学习方法,如联邦学习,提供了解决方案,其中模型在分布式设备上学习并交换学习到达的权重。然而,这些方法通常需要复杂的模型,边缘设备可能无法处理,并且需要多轮网络交互以达到现场表现。本研究提出了一种 convolutional ensemble learning 方法,称为 EdgeConvEns,它可以在边缘上训练不同计算能力的弱模型,并将数据在边缘处分布的学习结果转移到中央服务器进行集成。边缘设备上实现和训练独立的 Field-Programmable Gate Array (FPGA) 设备,并将学习到的特征传输到中央服务器进行集成模型训练,以提高总预测性能。广泛的实验表明,EdgeConvEns 可以在不同的训练场景下超越现有的表现,并且需要 fewer communications 和 less data。

Source Condition Double Robust Inference on Functionals of Inverse Problems

  • paper_url: http://arxiv.org/abs/2307.13793
  • repo_url: None
  • paper_authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara
  • for: 这篇论文是关于线性逆问题的估计 parameters的研究,特别是linear functionals of solutions to linear inverse problems的估计。
  • methods: 论文使用了doubly robust representation来表示参数,这个表示法取决于解决的 dual linear inverse problem的解。
  • results: 论文提供了第一个source condition double robust inference method,可以在参数 интереBS的附近具有准确性,只要 Either the primal or the dual inverse problem是 suficiently well-posed,而不需要知道哪一个逆问题更加well-posed。这个结果得到了iterated Tikhonov regularized adversarial estimators的新的保证,这些保证适用于一般假设空间上的线性逆问题。
    Abstract We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ensures asymptotic normality around the parameter of interest as long as either the primal or the dual inverse problem is sufficiently well-posed, without knowledge of which inverse problem is the more well-posed one. Our result is enabled by novel guarantees for iterated Tikhonov regularized adversarial estimators for linear inverse problems, over general hypothesis spaces, which are developments of independent interest.
    摘要 我们考虑预测定义为线性函数的参数,即解决线性逆问题中的参数。任一个参数都可以得到一个双重稳定表现,这个表现取决于解决的dual逆问题的解,可以视为对问题传递函数的一种扩展。我们提供了第一个源条件双重稳定推断方法,这个方法可以在参数的数据分布预测顶点附近对参数进行推断,只要primaldual逆问题中的一个问题够单纯,就可以获得 asymptotic normality 的 guarantees,不需要知道哪一个逆问题更加单纯。我们的结果受到iterated Tikhonov regularized adversarial estimator的 novel guarantees 的支持,这些 guarantees 适用于一般假设空间中的线性逆问题,是独立的 interessing 开发。

Histogram Layer Time Delay Neural Networks for Passive Sonar Classification

  • paper_url: http://arxiv.org/abs/2307.13788
  • repo_url: https://github.com/peeples-lab/hltdnn
  • paper_authors: Jarin Ritu, Ethan Barnes, Riley Martell, Alexandra Van Dine, Joshua Peeples
  • for: 提高海上探测陌生目标的精度
  • methods: combine时间延迟神经网络和 histogram层,利用统计上下文提高特征学习和海上听频目标识别
  • results: 比基线模型高效,demonstrate了利用统计上下文的优势Here’s a more detailed explanation of each point:
  • for: The paper is written to improve the accuracy of underwater acoustic target detection in remote marine sensing operations.
  • methods: The proposed method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
  • results: The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
    Abstract Underwater acoustic target detection in remote marine sensing operations is challenging due to complex sound wave propagation. Despite the availability of reliable sonar systems, target recognition remains a difficult problem. Various methods address improved target recognition. However, most struggle to disentangle the high-dimensional, non-linear patterns in the observed target recordings. In this work, a novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification. The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition. The code for this work is publicly available.
    摘要 水下声学目标检测在远程海洋探测操作中是一项复杂的任务,由于声波传播的复杂性。尽管可靠的声纳系统可以提供优质的目标检测结果,但目标识别仍然是一个困难的问题。许多方法尝试解决这个问题,但大多数方法无法分解高维、非线性的目标记录特征。在这项工作中,我们提出了一种新的方法,该方法组合了时间延迟神经网络和 histogram 层,以利用统计上下文来改善声学目标识别。我们的方法在比较基准模型时表现出色,这 demonstartes 在声学目标识别中包含统计上下文的优势。代码 для这项工作公共可用。

The GANfather: Controllable generation of malicious activity to improve defence systems

  • paper_url: http://arxiv.org/abs/2307.13787
  • repo_url: None
  • paper_authors: Ricardo Ribeiro Pereira, Jacopo Bono, João Tiago Ascensão, David Aparício, Pedro Ribeiro, Pedro Bizarro
  • for: 这篇论文的目的是提出一种基于生成 adversarial Networks (GANs) 的方法,以生成具有黑客活动特征的样本,并且不需要标签数据。
  • methods: 这篇论文使用了 GANs 来生成黑客活动的样本,并且引入了一个额外的目标函数,以优化生成的样本具有黑客活动特征。
  • results: 这篇论文在两个实际应用案例中进行了评估,分别是防止贪污和推荐系统。在第一个应用案例中,这篇论文成功地将总金额透过一个网络的账户移动到不同的账户,而不被现有的防护系统检测到。在第二个应用案例中,这篇论文成功地将目标物品推荐到广泛的用户群中,只需要30个生成的黑客攻击者。
    Abstract Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7--4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. We propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 350 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.
    摘要 机器学习方法通常需要标注数据来帮助防御系统检测恶意活动。在某些领域,这些标注数据可能不可得或者不完整。这可能导致检测率低下,假阳性率高,这些现象在例如反走私系统中经常出现。据估计,每年1.7至4万亿欧元被贩卖而不被发现。我们提出了“GANfather”方法,可以生成具有恶意活动特征的样本,无需标注数据。我们建议在传统的生成对抗网络(GANs)损失函数中引入一个额外目标,以奖励生成恶意样本的生成器。最终,我们的目标是通过使用探测器网络作为一种新的和可靠的防御系统,提高恶意活动的检测率。可选地,我们可以让生成器尝试绕过现有的检测系统,这种设置会暴露防御系统的弱点,让探测器网络进行更好的 corrections。我们在两个实际应用中评估了我们的方法:反走私和推荐系统。在前一个应用中,我们通过一个网络的账户来转移大约35万美元,而不被现有系统发现。在后一个应用中,我们通过 Synthetic 攻击者来推荐目标项目,并且只需要30名Synthetic 攻击者。在两个案例中,我们训练了一个新的防御系统,以捕捉Synthetic 攻击。

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

  • paper_url: http://arxiv.org/abs/2307.14380
  • repo_url: None
  • paper_authors: Daniel Kałuża, Andrzej Janusz, Dominik Ślęzak
  • for: 提高活动学习中缺失数据标注的质量
  • methods: 提出了两种基于无标注部分的样本空间的注解统一算法,需要少量或无 intersect между不同专家的标注
  • results: 实验结果表明提出的方法在估计专家的可靠性和实际标签分配方面具有坚定性和超越性,并且在四个公共数据集上达到了最佳效果。
    Abstract Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Their performance is strictly connected with the quality of labels used in training. Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice. To tackle this challenge, active learning algorithms are commonly employed to select only the most relevant data for labeling. However, this is possible only when the quality and quantity of labels acquired from experts are sufficient. Unfortunately, in many applications, a trade-off between annotating individual samples by multiple annotators to increase label quality vs. annotating new samples to increase the total number of labeled instances is necessary. In this paper, we address the issue of faulty data annotations in the context of active learning. In particular, we propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space. The proposed methods require little to no intersection between samples annotated by different experts. Our experiments on four public datasets indicate the robustness and superiority of the proposed methods in both, the estimation of the annotator's reliability, and the assignment of actual labels, against the state-of-the-art algorithms and the simple majority voting.
    摘要

Accuracy Amplification in Differentially Private Logistic Regression: A Pre-Training Approach

  • paper_url: http://arxiv.org/abs/2307.13771
  • repo_url: None
  • paper_authors: Mohammad Hoseinpour, Milad Hoseinpour, Ali Aghagolzadeh
  • for: 这篇论文的目的是提高具有隐私保证的机器学习(DP-ML)模型的准确性。
  • methods: 本论文使用了预训模组来提高DP-ML模型的准确性。这个预训模组首先在公开的训练 dataset 上进行预训,然后在具有隐私保证的私人训练 dataset 上进行微调。
  • results: numerical results show that adding a pre-training module significantly improves the accuracy of the DP logistic regression.
    Abstract Machine learning (ML) models can memorize training datasets. As a result, training ML models over private datasets can violate the privacy of individuals. Differential privacy (DP) is a rigorous privacy notion to preserve the privacy of underlying training datasets in ML models. Yet, training ML models in a DP framework usually degrades the accuracy of ML models. This paper aims to boost the accuracy of a DP-ML model, specifically a logistic regression model, via a pre-training module. In more detail, we initially pre-train our model on a public training dataset that there is no privacy concern about it. Then, we fine-tune our model via the DP logistic regression with the private dataset. In the numerical results, we show that adding a pre-training module significantly improves the accuracy of the DP logistic regression.
    摘要

ClusterSeq: Enhancing Sequential Recommender Systems with Clustering based Meta-Learning

  • paper_url: http://arxiv.org/abs/2307.13766
  • repo_url: None
  • paper_authors: Mohammmadmahdi Maheri, Reza Abdollahzadeh, Bardia Mohammadi, Mina Rafiei, Jafar Habibi, Hamid R. Rabiee
  • for: 这个研究旨在解决用户冰对问题,即用户在推荐系统中的内部状态难以准确决定。
  • methods: 这个研究使用了meta-学习和用户项目信息,以增强预测项目的准确性。
  • results: 实验结果显示, compared to existing meta-learning recommenders, 我们的提案方法可以 achieve a substantial improvement of 16-39% in Mean Reciprocal Rank (MRR)。
    Abstract In practical scenarios, the effectiveness of sequential recommendation systems is hindered by the user cold-start problem, which arises due to limited interactions for accurately determining user preferences. Previous studies have attempted to address this issue by combining meta-learning with user and item-side information. However, these approaches face inherent challenges in modeling user preference dynamics, particularly for "minor users" who exhibit distinct preferences compared to more common or "major users." To overcome these limitations, we present a novel approach called ClusterSeq, a Meta-Learning Clustering-Based Sequential Recommender System. ClusterSeq leverages dynamic information in the user sequence to enhance item prediction accuracy, even in the absence of side information. This model preserves the preferences of minor users without being overshadowed by major users, and it capitalizes on the collective knowledge of users within the same cluster. Extensive experiments conducted on various benchmark datasets validate the effectiveness of ClusterSeq. Empirical results consistently demonstrate that ClusterSeq outperforms several state-of-the-art meta-learning recommenders. Notably, compared to existing meta-learning methods, our proposed approach achieves a substantial improvement of 16-39% in Mean Reciprocal Rank (MRR).
    摘要 实际应用场景中,顺序推荐系统的效果受用户冷启动问题的限制,这种问题由用户的交互数量有限制,难以准确地确定用户的偏好。先前的研究尝试通过将meta学与用户和项信息结合来解决这个问题,但这些方法面临用户偏好动态模型化的内在挑战,特别是对"小用户"来说,他们的偏好比"大用户"更加独特。为了解决这些限制,我们提出了一种新的方法 called ClusterSeq,这是一种基于集群学习的顺序推荐系统。ClusterSeq利用用户序列中的动态信息来提高项预测精度,即使在没有侧 информация的情况下。这个模型保持了"小用户"的偏好,不被"大用户"所掩蔽,同时充分利用用户同一个集群内的共同知识。我们在不同的 benchmark 数据集上进行了广泛的实验,结果表明,ClusterSeq 比许多现有的meta学推荐器表现出较好的效果。empirical 结果表明,ClusterSeq 与现有meta学方法相比,在 Mean Reciprocal Rank(MRR)上实现了16-39%的显著提升。

Implicitly Normalized Explicitly Regularized Density Estimation

  • paper_url: http://arxiv.org/abs/2307.13763
  • repo_url: None
  • paper_authors: Mark Kozdoba, Binyamin Perets, Shie Mannor
  • for: 本研究提出了一种新的非参数density估计方法,基于 Sobolev 范数regularization。这种方法与kernel density estimation不同,可以提供明确可读的模型偏差。
  • methods: 本方法无法closed analytic form的kernel,可以通过采样来approximation。优化问题需要解决是非核vex的,标准的梯度方法不太好。然而,我们表明,采用适当的初始化和自然梯度,可以获得良好的解决方案。
  • results: 本研究使用了recent Anomaly Detection benchmark suite, ADBench,进行评估,并得到了第二好的成绩,在more than 15algorithms中。
    Abstract We propose a new approach to non-parametric density estimation, that is based on regularizing a Sobolev norm of the density. This method is provably different from Kernel Density Estimation, and makes the bias of the model clear and interpretable. While there is no closed analytic form for the associated kernel, we show that one can approximate it using sampling. The optimization problem needed to determine the density is non-convex, and standard gradient methods do not perform well. However, we show that with an appropriate initialization and using natural gradients, one can obtain well performing solutions. Finally, while the approach provides unnormalized densities, which prevents the use of log-likelihood for cross validation, we show that one can instead adapt Fisher Divergence based Score Matching methods for this task. We evaluate the resulting method on the comprehensive recent Anomaly Detection benchmark suite, ADBench, and find that it ranks second best, among more than 15 algorithms.
    摘要 我们提出了一种新的非参数化概率分布估计方法,基于 Sobolev 范数regularization。这种方法与核密度估计方法不同,可以让模型的偏见变得明确和可解释。虽然关联的核函数没有关闭的分析形式,但我们表明可以使用抽象来近似它。估计问题需要解决的非 convex 问题,标准的梯度法不太适用。然而,我们表明,通过适当的初始化和使用自然梯度,可以获得良好的解决方案。虽然方法提供的概率分布无法使用对数分布来进行验证,但我们表明可以使用基于 Fisher 分布的分数匹配方法来实现这一点。我们对最新的 Anomaly Detection 测试集 ADBench 进行了评估,并发现其在超过 15 种算法中排名第二。

UPREVE: An End-to-End Causal Discovery Benchmarking System

  • paper_url: http://arxiv.org/abs/2307.13757
  • repo_url: None
  • paper_authors: Suraj Jyothi Unni, Paras Sheth, Kaize Ding, Huan Liu, K. Selcuk Candan
  • for: 提高 complex socio-behavioral systems 中 causal relationships 的发现,以便更好地做出决策。
  • methods: 提供了一个用户友好的 web-based graphical user interface (GUI),可以同时运行多种算法,视觉化 causal relationships,并评估学习的 causal graphs 的准确性。
  • results: 可以帮助研究者和实践者更好地探索和理解 causal relationships,从而获得更好的决策。
    Abstract Discovering causal relationships in complex socio-behavioral systems is challenging but essential for informed decision-making. We present Upload, PREprocess, Visualize, and Evaluate (UPREVE), a user-friendly web-based graphical user interface (GUI) designed to simplify the process of causal discovery. UPREVE allows users to run multiple algorithms simultaneously, visualize causal relationships, and evaluate the accuracy of learned causal graphs. With its accessible interface and customizable features, UPREVE empowers researchers and practitioners in social computing and behavioral-cultural modeling (among others) to explore and understand causal relationships effectively. Our proposed solution aims to make causal discovery more accessible and user-friendly, enabling users to gain valuable insights for better decision-making.
    摘要 发现复杂社会行为系统中的 causal 关系是挑战性的,但是这是 informed decision-making 的关键。我们提出了 Upload, PREprocess, Visualize, and Evaluate (UPREVE),一个用户友好的网页式 graphical user interface (GUI),用于简化 causal discovery 的过程。UPREVE 允许用户同时运行多个算法,可视化 causal 关系,并评估学习的 causal 图的准确性。它的可访问性和可定制功能使得社会计算和行为文化模型等研究人员能够有效地探索和理解 causal 关系,从而获得价值的情报。我们的提案的目标是使 causal discovery 更加访问ible 和用户友好,使用户能够更好地理解 causal 关系,以便更好的决策。

Solution Path of Time-varying Markov Random Fields with Discrete Regularization

  • paper_url: http://arxiv.org/abs/2307.13750
  • repo_url: None
  • paper_authors: Salar Fattahi, Andres Gomez
  • for: 这个论文目的是解决推理稀疏时变Markov随机场(MRF)的问题,具有不同的抽象和时间正则化。
  • methods: 该论文使用的方法是基于新的受限制优化问题,以提高参数的稀疏性。这个方法可以 Parametrically解决,并且可以在几乎所有稀疏程度下得到解决方案。
  • results: 论文表明,该方法可以在不同类型的时变MRF中实现提高的优化性和精度,并且可以在几乎实际情况下的数据规模下进行 Parametric 解决。此外,论文还实现了在几分钟内解决3000万变量的实例问题。
    Abstract We study the problem of inferring sparse time-varying Markov random fields (MRFs) with different discrete and temporal regularizations on the parameters. Due to the intractability of discrete regularization, most approaches for solving this problem rely on the so-called maximum-likelihood estimation (MLE) with relaxed regularization, which neither results in ideal statistical properties nor scale to the dimensions encountered in realistic settings. In this paper, we address these challenges by departing from the MLE paradigm and resorting to a new class of constrained optimization problems with exact, discrete regularization to promote sparsity in the estimated parameters. Despite the nonconvex and discrete nature of our formulation, we show that it can be solved efficiently and parametrically for all sparsity levels. More specifically, we show that the entire solution path of the time-varying MRF for all sparsity levels can be obtained in $\mathcal{O}(pT^3)$, where $T$ is the number of time steps and $p$ is the number of unknown parameters at any given time. The efficient and parametric characterization of the solution path renders our approach highly suitable for cross-validation, where parameter estimation is required for varying regularization values. Despite its simplicity and efficiency, we show that our proposed approach achieves provably small estimation error for different classes of time-varying MRFs, namely Gaussian and discrete MRFs, with as few as one sample per time. Utilizing our algorithm, we can recover the complete solution path for instances of time-varying MRFs featuring over 30 million variables in less than 12 minutes on a standard laptop computer. Our code is available at \url{https://sites.google.com/usc.edu/gomez/data}.
    摘要 我们研究了推理缺少时间变化Markov随机场(MRF)的问题,其中参数具有不同的整数和时间规则化。由于整数规则化的不可解性,大多数解决这个问题的方法都基于最大 likelihood估计(MLE)的宽松规则化,这并不会导致理想的统计特性,nor scale to the dimensions encountered in realistic settings。在这篇论文中,我们解决这些挑战,我们 departure from the MLE paradigm and resort to a new class of constrained optimization problems with exact, discrete regularization to promote sparsity in the estimated parameters。尽管我们的形式ulation是非 convex和整数的,我们展示了可以有效地和 Parametrically解决这个问题。具体来说,我们表明了时间变化MRF的解的整个解 paths可以在 $\mathcal{O}(pT^3)$ 时间内获得,其中 $T$ 是时间步骤数量,$p$ 是任何时间点的未知参数数量。这种有效和 Parametrically 的解决方法使得我们的方法高度适合 cross-validation,其中需要在不同规则值下进行参数估计。尽管它的简单和高效性,我们证明了我们提议的方法可以在不同类型的时间变化MRFs 中实现可观测小的估计误差,只需要一个时间步骤中的一个样本。我们可以在 less than 12 分钟内在标准笔记计算机上解决了包含超过 30 万个变量的时间变化MRFs 的完整解 paths。我们的代码可以在 \url{https://sites.google.com/usc.edu/gomez/data} 上获取。

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

  • paper_url: http://arxiv.org/abs/2307.13744
  • repo_url: None
  • paper_authors: Yue Niu, Zalan Fabian, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr
  • for: 这个论文目的是提出一种基于准新顿法的轻量级深度神经网络优化算法,以便在大规模分布式深度神经网络优化中使用。
  • methods: 该论文使用了一种叫做mL-BFGS的新算法,它是一种基于准新顿法的满足约束的批量优化算法,具有较低的计算成本和稳定的收敛性。
  • results: 在对一些标准神经网络模型的训练中,mL-BFGS算法比基于SGD、Adam和其他准新顿法的算法得到了更好的性能,同时也比基于准新顿法的算法更快。
    Abstract Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup.
    摘要 对于大规模神经网络训练而言,类新顿方法仍然面临着 significiant 的挑战,主要是在条件 Compute 成本和统计训练中发生的不稳定性问题。一种广泛使用的方法是 L-BFGS,它可以有效地预测 Hessian 的值,但在随机训练中却会出现问题,导致训练不稳定。在这篇论文中,我们提出了 mL-BFGS,一种轻量级的态势-基本方法,它可以在大规模分布式深度神经网络优化中实现类新顿方法的可行性。mL-BFGS 通过将态势给动的思想引入 L-BFGS 更新,很大地减少了随机训练中的条件统计误差,因此稳定了训练的条件。为了在大规模的模型训练中实现分布式计算和内存成本的分摊,mL-BFGS 采用了对称的封页 Hessian 估计。我们提供了支持 mL-BFGS 在随机设定下的均衡分析。为了评估 mL-BFGS 在大规模 DNN 训练中的可能性,我们使用 mL-BFGS 训练了一些benchmark神经网络模型,并与基eline (SGD, Adam, 其他类新顿方法) 进行比较。结果显示,mL-BFGS 可以在随机训练中获得明显的迭代次数和实际时间优化。

ARB: Advanced Reasoning Benchmark for Large Language Models

  • paper_url: http://arxiv.org/abs/2307.13692
  • repo_url: None
  • paper_authors: Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki
  • for: 本研究旨在提供一个新的评价标准 benchmark,以测试大型自然语言模型(LLM)在多个领域的高级推理能力。
  • methods: 本研究使用了一个新的评价标准,即 ARB,该标准包括了多个领域的高级推理问题。此外,研究还使用了一种新的评价方法,即 rubric-based evaluation approach,以提高自动和协助评价能力。
  • results: 研究发现,当前的LLM模型在ARB中的得分较低,只有在一些较为简单的问题上达到了50%的得分。此外,人工评价结果与GPT-4的自动评价结果之间存在了良好的一致性。
    Abstract Large Language Models (LLMs) have demonstrated remarkable performance on various quantitative reasoning and knowledge benchmarks. However, many of these benchmarks are losing utility as LLMs get increasingly high scores, despite not yet reaching expert performance in these domains. We introduce ARB, a novel benchmark composed of advanced reasoning problems in multiple fields. ARB presents a more challenging test than prior benchmarks, featuring problems in mathematics, physics, biology, chemistry, and law. As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge. We evaluate recent models such as GPT-4 and Claude on ARB and demonstrate that current models score well below 50% on more demanding tasks. In order to improve both automatic and assisted evaluation capabilities, we introduce a rubric-based evaluation approach, allowing GPT-4 to score its own intermediate reasoning steps. Further, we conduct a human evaluation of the symbolic subset of ARB, finding promising agreement between annotators and GPT-4 rubric evaluation scores.
    摘要 大语言模型(LLM)已经表现出了非常出色的表现力在不同的量化逻辑和知识测试中。然而,许多这些测试在LLM获得高分后就失去了用途,即使它们还没有在这些领域达到专家水平。我们介绍了ARB,一个新的测试套件,它包含了多个领域的高级逻辑问题。ARB比之前的测试更加具有挑战性,包括数学、物理、生物、化学和法律等领域的问题。为ARB的一个子集,我们引入了一个有chedding的数学和物理问题集,它们需要高级符号逻辑和领域知识。我们使用GPT-4和Claude等最新的模型测试ARB,并发现现在的模型在更加具有挑战性的任务上的表现仍然落后于50%。为了提高自动和协助评估能力,我们引入了一种基于笔记的评估方法,允许GPT-4自己评估其中间的符号逻辑步骤。此外,我们进行了人类评估ARB的符号子集,发现和GPT-4笔记评估得分具有惊人的一致性。

High Probability Analysis for Non-Convex Stochastic Optimization with Clipping

  • paper_url: http://arxiv.org/abs/2307.13680
  • repo_url: None
  • paper_authors: Shaojie Li, Yong Liu
  • for: 这个论文主要针对 Stochastic Optimization 中的 Gradient Clipping 技术,并提供了对这种技术的高机会分析和优化性能 bound。
  • methods: 论文使用了 Stochastic Gradient Descent 和其变种(包括带有权重和步长调整的 SGD),以及 Gradient Clipping 技术。
  • results: 论文提供了对 Stochastic Optimization 算法和 Gradient Clipping 技术的高机会分析和优化性能 bound,并研究了一种强度 bounded 的 $\alpha$-th moment 假设,以推导出更强的 теорем guarantees。
    Abstract Gradient clipping is a commonly used technique to stabilize the training process of neural networks. A growing body of studies has shown that gradient clipping is a promising technique for dealing with the heavy-tailed behavior that emerged in stochastic optimization as well. While gradient clipping is significant, its theoretical guarantees are scarce. Most theoretical guarantees only provide an in-expectation analysis and only focus on optimization performance. In this paper, we provide high probability analysis in the non-convex setting and derive the optimization bound and the generalization bound simultaneously for popular stochastic optimization algorithms with gradient clipping, including stochastic gradient descent and its variants of momentum and adaptive stepsizes. With the gradient clipping, we study a heavy-tailed assumption that the gradients only have bounded $\alpha$-th moments for some $\alpha \in (1, 2]$, which is much weaker than the standard bounded second-moment assumption. Overall, our study provides a relatively complete picture for the theoretical guarantee of stochastic optimization algorithms with clipping.
    摘要 Gradient clipping 是一种常用的技术来稳定神经网络的训练过程。一组不断增长的研究表明,Gradient clipping 是一种有前途的技术,用于处理随机优化中的重 tailed 行为。虽然 Gradient clipping 具有重要性,但其理论保证却 scarce。大多数理论保证都仅提供了预期分析,只关注优化性能。在这篇论文中,我们提供了高概率分析在非对称设定下,并同时 deriv 出优化 bound 和泛化 bound для流行的随机优化算法与 Gradient clipping,包括随机梯度下降和其 variants of momentum 和 adaptive stepsizes。在使用 Gradient clipping 时,我们研究了一种偏值 $\alpha $-th moment 只有bounded 的假设,其中 $\alpha \in (1, 2] $,这是标准二次 moments 假设的很弱条件。总的来说,我们的研究提供了针对随机优化算法与 clipping 的理论保证的相对完整的图像。

RED CoMETS: An ensemble classifier for symbolically represented multivariate time series

  • paper_url: http://arxiv.org/abs/2307.13679
  • repo_url: https://github.com/zy18811/red-comets
  • paper_authors: Luca A. Bennett, Zahraa S. Abdallah
  • For: The paper is written for researchers and practitioners working in the field of multivariate time series classification, particularly in finance, healthcare, engineering, and other related fields.* Methods: The paper proposes a novel ensemble classifier called RED CoMETS, which builds upon the success of Co-eye and extends its capabilities to handle multivariate time series data. The proposed method uses a combination of random enhanced co-eye and symbolic representation to improve the accuracy and efficiency of multivariate time series classification.* Results: The paper demonstrates the performance of RED CoMETS on benchmark datasets from the UCR archive, achieving competitive accuracy compared to state-of-the-art techniques in multivariate settings. Specifically, it achieves the highest reported accuracy in the literature for the ‘HandMovementDirection’ dataset. Additionally, the proposed method significantly reduces computation time compared to Co-eye, making it an efficient and effective choice for multivariate time series classification.Here is the simplified Chinese text for the three key points:* For: 这篇论文是为了推广多变量时间序列分类领域的研究人员和实践者而写的。* Methods: 这篇论文提出了一种新的ensemble分类器 called RED CoMETS,它基于Co-eye的成功并将其扩展到多变量时间序列数据上。提出的方法使用Random Enhanced Co-eye和符号表示来提高多变量时间序列分类的准确性和效率。* Results: 论文通过对UCR数据集的测试,展示了RED CoMETS的性能,与多变量时间序列分类领域的状态 искусственный技术相比,它达到了最高的报告精度。此外,提出的方法还能够显著减少计算时间,使其成为效率和可靠的多变量时间序列分类选择。
    Abstract Multivariate time series classification is a rapidly growing research field with practical applications in finance, healthcare, engineering, and more. The complexity of classifying multivariate time series data arises from its high dimensionality, temporal dependencies, and varying lengths. This paper introduces a novel ensemble classifier called RED CoMETS (Random Enhanced Co-eye for Multivariate Time Series), which addresses these challenges. RED CoMETS builds upon the success of Co-eye, an ensemble classifier specifically designed for symbolically represented univariate time series, and extends its capabilities to handle multivariate data. The performance of RED CoMETS is evaluated on benchmark datasets from the UCR archive, where it demonstrates competitive accuracy when compared to state-of-the-art techniques in multivariate settings. Notably, it achieves the highest reported accuracy in the literature for the 'HandMovementDirection' dataset. Moreover, the proposed method significantly reduces computation time compared to Co-eye, making it an efficient and effective choice for multivariate time series classification.
    摘要 多变量时间序列分类是一个快速发展的研究领域,有实际应用于金融、医疗、工程等领域。multivariate时间序列数据的复杂性来自其高维度、时间相关性和不同长度。这篇论文介绍了一种新的团队分类器called RED CoMETS(随机增强共视 для多变量时间序列),该方法解决了这些挑战。RED CoMETS基于Co-eye Ensemble分类器,该分类器专门为symbolically represented单变量时间序列设计,并扩展其能力以处理多变量数据。本文评估了RED CoMETS的性能,并与state-of-the-art多变量设置比较。结果表明,RED CoMETS在UCAR archive的 benchmark数据集上达到了Literature中最高的报告精度,特别是在'HandMovementDirection'数据集上。此外,提议的方法可以significantly reduce computation time compared to Co-eye,使其成为efficient和effective的多变量时间序列分类方法。

FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13716
  • repo_url: None
  • paper_authors: Leiming Chen, Cihao Dong, Sibo Qiao, Ziling Huang, Kai Wang, Yuming Nie, Zhaoxiang Hou, Cheewei Tan
  • for: 解决传统 federated learning 中客户端模型质量不均匀和恶意上传模型的问题
  • methods: 使用 reinforcement learning 进行模型融合,包括两个阶段:第一阶段过滤恶意模型并选择可信客户端模型参与融合,第二阶段adaptively 调整可信客户端模型的权重并融合最佳全球模型
  • results: 在五种模型融合场景中,我们的算法比基eline algorithms 高于可靠性而保持准确性
    Abstract Traditional federated learning uses the number of samples to calculate the weights of each client model and uses this fixed weight value to fusion the global model. However, in practical scenarios, each client's device and data heterogeneity leads to differences in the quality of each client's model. Thus the contribution to the global model is not wholly determined by the sample size. In addition, if clients intentionally upload low-quality or malicious models, using these models for aggregation will lead to a severe decrease in global model accuracy. Traditional federated learning algorithms do not address these issues. To solve this probelm, we propose FedDRL, a model fusion approach using reinforcement learning based on a two staged approach. In the first stage, Our method could filter out malicious models and selects trusted client models to participate in the model fusion. In the second stage, the FedDRL algorithm adaptively adjusts the weights of the trusted client models and aggregates the optimal global model. We also define five model fusion scenarios and compare our method with two baseline algorithms in those scenarios. The experimental results show that our algorithm has higher reliability than other algorithms while maintaining accuracy.
    摘要 传统的联合学习方法使用客户端模型的样本数来计算每个客户端模型的权重,并使用这些固定权重值进行模型融合。然而,在实际场景中,每个客户端的设备和数据多样性会导致每个客户端模型的质量差异。因此,使用传统的联合学习算法来融合所有客户端模型可能会导致全局模型的准确率下降。此外,如果客户端故意上传低质量或黑客模型,使用这些模型进行融合会导致全局模型的准确率受到严重的影响。传统的联合学习算法不能解决这些问题。为解决这些问题,我们提出了 FedDRL,一种基于强化学习的模型融合方法。在第一阶段,我们的方法可以过滤掉黑客模型,并选择可信worth的客户端模型参与模型融合。在第二阶段,FedDRL算法可以自适应地调整可信worth的客户端模型的权重,并将这些权重加权的全局模型进行融合。我们还定义了五种模型融合场景,并与两种基准算法进行比较。实验结果表明,我们的算法在可靠性和准确率之间做出了折衔。

Towards an AI Accountability Policy

  • paper_url: http://arxiv.org/abs/2307.13658
  • repo_url: None
  • paper_authors: Przemyslaw Grabowicz, Nicholas Perello, Yair Zick
  • for: 这份白皮书是回应美国国家电信和信息管理局(NTIA)发布的“人工智能责任政策请求意见”(AI Accountability Policy Request for Comments)。
  • methods: 该白皮书提供了一套相互连接的建议,用于制定人工智能责任政策。
  • results: 该白皮书的建议旨在确保人工智能技术的应用符合道德和法律要求,保障公民的权益和隐私,并促进人工智能技术的负责任和可靠性。
    Abstract This white paper is a response to the "AI Accountability Policy Request for Comments" by the National Telecommunications and Information Administration of the United States. The question numbers for which comments were requested are provided in superscripts at the end of key sentences answering the respective questions. The white paper offers a set of interconnected recommendations for an AI accountability policy.
    摘要 这份白皮是回应美国国家电信管理局(NTIA)发布的“人工智能责任政策公开征求意见”(AI责任政策公开征求意见)的回应。文中提到的问题号用超文字标注在关键句中回答相应的问题。本白皮提供了一组相互关联的人工智能责任政策建议。

GNN4FR: A Lossless GNN-based Federated Recommendation Framework

  • paper_url: http://arxiv.org/abs/2308.01197
  • repo_url: None
  • paper_authors: Guowei Wu, Weike Pan, Zhong Ming
  • for: 提供一种 Privacy-preserving federated recommendation framework based on Graph Neural Networks (GNNs), which can train a global graph without leaking each user’s private interaction data.
  • methods: 使用 LightGCN 实例化该框架,并证明其与非联合版本等价。
  • results: 实现了全图训练,保持完整的高阶结构信息,使训练过程与非联合版本等价。
    Abstract Graph neural networks (GNNs) have gained wide popularity in recommender systems due to their capability to capture higher-order structure information among the nodes of users and items. However, these methods need to collect personal interaction data between a user and the corresponding items and then model them in a central server, which would break the privacy laws such as GDPR. So far, no existing work can construct a global graph without leaking each user's private interaction data (i.e., his or her subgraph). In this paper, we are the first to design a novel lossless federated recommendation framework based on GNN, which achieves full-graph training with complete high-order structure information, enabling the training process to be equivalent to the corresponding un-federated counterpart. In addition, we use LightGCN to instantiate an example of our framework and show its equivalence.
    摘要 graph neural networks (GNNs) 在推荐系统中得到了广泛的应用,因为它们可以捕捉用户和物品之间的高阶结构信息。然而,这些方法需要收集用户与对应物品之间的个人互动数据,并将其模型在中央服务器上,这会违反隐私法规,如GDPR。到目前为止,没有任何现有的工作可以构建一个全球图 Without leaking each user's private interaction data (i.e., his or her subgraph).在这篇论文中,我们是首次设计了一种新的无损联邦推荐框架基于GNN,可以实现全图训练,并保持高阶结构信息完整性,使训练过程与相应的非联邦 counterpart等价。此外,我们使用 LightGCN 实例化我们的框架,并证明其等价性。

Safety Margins for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13642
  • repo_url: None
  • paper_authors: Alexander Grushin, Walt Woods, Alvaro Velasquez, Simon Khan
  • for: This paper is written for autonomous controllers in freight transportation applications, to help identify when unsafe situations are about to occur and draw timely human oversight.
  • methods: The paper uses a definition of true criticality as the mean reduction in reward given some number of random actions, and computes proxy criticality metrics that can be compared to the true criticality in real-time.
  • results: The paper demonstrates how to leverage these proxy metrics to generate safety margins, which directly tie the consequences of potentially incorrect actions to an anticipated loss in overall performance. The approach is evaluated on learned policies from APE-X and A3C within an Atari environment, and shows how safety margins decrease as agents approach failure states.
    Abstract Any autonomous controller will be unsafe in some situations. The ability to quantitatively identify when these unsafe situations are about to occur is crucial for drawing timely human oversight in, e.g., freight transportation applications. In this work, we demonstrate that the true criticality of an agent's situation can be robustly defined as the mean reduction in reward given some number of random actions. Proxy criticality metrics that are computable in real-time (i.e., without actually simulating the effects of random actions) can be compared to the true criticality, and we show how to leverage these proxy metrics to generate safety margins, which directly tie the consequences of potentially incorrect actions to an anticipated loss in overall performance. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment, and demonstrate how safety margins decrease as agents approach failure states. The integration of safety margins into programs for monitoring deployed agents allows for the real-time identification of potentially catastrophic situations.
    摘要 任何自主控制器都会在某些情况下不安全。可以量化地确定这些不安全情况的发生是关键,以便在例如货物运输应用中引入有效的人工监督。在这种工作中,我们示出了一种可靠地定义自动控制器的真正极点的方法,即通过一些随机动作的平均减少奖励来定义。我们还介绍了一种可实时计算的代理极点指标,可以与真实极点相比较,并且可以利用这些代理指标生成安全优势,直接将可能错误的行为与预计的损失性相关联。我们在APE-X和A3C学习政策中的Atari环境中评估了我们的方法,并证明了安全优势在失败状态附近减少。将安全优势集成到监控部署的程序中,可以实时识别可能catastrophic的情况。

DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm

  • paper_url: http://arxiv.org/abs/2307.14375
  • repo_url: None
  • paper_authors: Ying Xiao, Hou-biao Li, Yu-pu Zhang
    for: 提高非对稳定数据集中的各种减redundancy算法的精度methods: 使用数据驱动的Bregman异常参数优化减redundancy算法(DBGSA),结合Universal Gravitational Algorithm(UGA)将相似点靠拢近于数据集中。构建了重力系数方程,逐渐减少影响因子,并引入Bregman异常分子总能平均信息损失最小化来识别群中心。results: 对四个模拟数据集和六个实际数据集进行了广泛的实验,结果显示DBGSA比其他类似方法和改进的数据集平均提高精度达63.8%。此外,建立了三维网格搜索来比较不同参数值的影响,发现我们的模型提供的参数集是优化的。这些发现证明了DBGSA的高精度和稳定性。
    Abstract With the development of Big data technology, data analysis has become increasingly important. Traditional clustering algorithms such as K-means are highly sensitive to the initial centroid selection and perform poorly on non-convex datasets. In this paper, we address these problems by proposing a data-driven Bregman divergence parameter optimization clustering algorithm (DBGSA), which combines the Universal Gravitational Algorithm to bring similar points closer in the dataset. We construct a gravitational coefficient equation with a special property that gradually reduces the influence factor as the iteration progresses. Furthermore, we introduce the Bregman divergence generalized power mean information loss minimization to identify cluster centers and build a hyperparameter identification optimization model, which effectively solves the problems of manual adjustment and uncertainty in the improved dataset. Extensive experiments are conducted on four simulated datasets and six real datasets. The results demonstrate that DBGSA significantly improves the accuracy of various clustering algorithms by an average of 63.8\% compared to other similar approaches like enhanced clustering algorithms and improved datasets. Additionally, a three-dimensional grid search was established to compare the effects of different parameter values within threshold conditions, and it was discovered the parameter set provided by our model is optimal. This finding provides strong evidence of the high accuracy and robustness of the algorithm.
    摘要 随着大数据技术的发展,数据分析已成为非常重要。传统的聚类算法如K-means受初始中心选择的影响很大,在非对称数据集上表现不佳。在这篇论文中,我们解决这些问题,提出一种基于数据驱动的布格曼异分距度优化聚类算法(DBGSA)。我们将Universal Gravitational Algorithm用于将相似点在数据集中帮助更近。我们构建了重力系数方程,其特点是逐步减少影响因子。此外,我们引入布格曼异分广泛含义力平均信息损失来识别聚类中心,并建立一个距离阈值范围内的超参数标准化模型,以有效解决手动调整和不确定性问题。我们在四个 simulate数据集和六个实际数据集上进行了广泛的实验。结果表明,DBGSA可以在不同的聚类算法上提高准确率的平均提升率为63.8%,比其他类似方法如增强聚类算法和改进的数据集更高。此外,我们建立了一个三维网格搜索,以比较不同参数值在阈值条件下的效果,发现我们的模型提供的参数集是最佳的。这一发现证明了我们的算法的高精度和可靠性。

Turning hazardous volatile matter compounds into fuel by catalytic steam reforming: An evolutionary machine learning approach

  • paper_url: http://arxiv.org/abs/2308.05750
  • repo_url: None
  • paper_authors: Alireza Shafizadeh, Hossein Shahbeik, Mohammad Hossein Nadian, Vijai Kumar Gupta, Abdul-Sattar Nizami, Su Shiung Lam, Wanxi Peng, Junting Pan, Meisam Tabatabaei, Mortaza Aghbashlo
    for: 这个研究旨在开发一个基于机器学习的研究框架,用于模拟、理解和优化 catalytic steam reforming 过程中的材料和反应条件。methods: 这个研究使用了 X-ray diffraction 分析和文献库来获取输入特征,并使用了六种机器学习模型和粒子群搜索算法来优化反应条件。results: 研究发现, ensemble machine learning 模型可以提供最高的预测性能(R2 > 0.976),并且在737.44-725.62 ℃ 的温度范围内,可以实现高达77.2% 的 tar 转化率和产物分布。
    Abstract Chemical and biomass processing systems release volatile matter compounds into the environment daily. Catalytic reforming can convert these compounds into valuable fuels, but developing stable and efficient catalysts is challenging. Machine learning can handle complex relationships in big data and optimize reaction conditions, making it an effective solution for addressing the mentioned issues. This study is the first to develop a machine-learning-based research framework for modeling, understanding, and optimizing the catalytic steam reforming of volatile matter compounds. Toluene catalytic steam reforming is used as a case study to show how chemical/textural analyses (e.g., X-ray diffraction analysis) can be used to obtain input features for machine learning models. Literature is used to compile a database covering a variety of catalyst characteristics and reaction conditions. The process is thoroughly analyzed, mechanistically discussed, modeled by six machine learning models, and optimized using the particle swarm optimization algorithm. Ensemble machine learning provides the best prediction performance (R2 > 0.976) for toluene conversion and product distribution. The optimal tar conversion (higher than 77.2%) is obtained at temperatures between 637.44 and 725.62 {\deg}C, with a steam-to-carbon molar ratio of 5.81-7.15 and a catalyst BET surface area 476.03-638.55 m2/g. The feature importance analysis satisfactorily reveals the effects of input descriptors on model prediction. Operating conditions (50.9%) and catalyst properties (49.1%) are equally important in modeling. The developed framework can expedite the search for optimal catalyst characteristics and reaction conditions, not only for catalytic chemical processing but also for related research areas.
    摘要 化学和生物质处理系统每天都会释放有害物质into the environment。 catalytic reforming可以将这些物质转化为有价值的燃料,但是开发稳定和高效的催化剂是挑战。机器学习可以处理复杂的关系在大数据中,并且可以优化反应条件,因此它是解决这些问题的有效解决方案。本研究是首次开发了基于机器学习的研究框架,用于模型、理解和优化催化气相 reforming的有害物质。toluenecatalytic steam reforming作为一个例子,通过X射线晶体分析等方法获取输入特征,并使用文献库评估催化剂特性和反应条件。机器学习模型由六种模型组成,并使用粒子群优化算法进行优化。 ensemble machine learning提供了最佳预测性能(R2> 0.976) для toluene转化和产物分布。最佳的 tar转化(高于 77.2%)在637.44-725.62 ℃的温度范围内,与气-碳分子比为5.81-7.15和催化剂BET表面积为476.03-638.55 m2/g。特征重要性分析得到了输入描述对模型预测的影响。操作条件(50.9%)和催化剂特性(49.1%)在模型中具有相等的重要性。开发的框架可以加速寻找优化催化剂特性和反应条件,不仅限于催化化学处理,还可以扩展到相关的研究领域。

Scaling machine learning-based chemical plant simulation: A method for fine-tuning a model to induce stable fixed points

  • paper_url: http://arxiv.org/abs/2307.13621
  • repo_url: None
  • paper_authors: Malte Esders, Gimmy Alex Fernandez Ramirez, Michael Gastegger, Satya Swarup Samal
  • for: 这个论文是为了使用机器学习模型直接适应化化学厂数据而写的。
  • methods: 这篇论文使用了一种结构化方法,每个厂区都被一个机器学习模型代表。模型们被连接成一个流程图像,并且在数据上适应模型。
  • results: 对小型化学厂来说,这种方法工作良好,但对大型化学厂来说,由于大量和嵌入循环的循环导致循环解决器不稳定。作者分析了这个问题,并提出了一种方法来精细调整机器学习模型,使得解决循环变得稳定。
    Abstract Idealized first-principles models of chemical plants can be inaccurate. An alternative is to fit a Machine Learning (ML) model directly to plant sensor data. We use a structured approach: Each unit within the plant gets represented by one ML model. After fitting the models to the data, the models are connected into a flowsheet-like directed graph. We find that for smaller plants, this approach works well, but for larger plants, the complex dynamics arising from large and nested cycles in the flowsheet lead to instabilities in the cycle solver. We analyze this problem in depth and show that it is not merely a specialized concern but rather a more pervasive challenge that will likely occur whenever ML is applied to larger plants. To address this problem, we present a way to fine-tune ML models such that solving cycles with the usual methods becomes robust again.
    摘要 (Simplified Chinese translation)理想化的初始原理模型可能不准确。一种alternative是直接将机器学习(ML)模型适应到厂区传感器数据。我们采用一种结构化方法:每个厂区内的单元都被表示为一个ML模型。在给数据适应模型后,模型被连接成一个流程图像的导向图。我们发现,对于小型厂区,这种方法效果很好,但对于更大的厂区,由于大量和嵌套的循环在流程图中,导致循环解决器中的不稳定。我们对这个问题进行了深入分析,并证明这不仅是特殊情况,而是更普遍的挑战,当机器学习应用于更大的厂区时,这种问题将会出现。为解决这个问题,我们提出了一种精细调整ML模型的方法,使得通过常规方法解决循环变得稳定。

AI and ethics in insurance: a new solution to mitigate proxy discrimination in risk modeling

  • paper_url: http://arxiv.org/abs/2307.13616
  • repo_url: None
  • paper_authors: Marguerite Sauce, Antoine Chancel, Antoine Ly
  • For: The paper aims to address the issue of indirect discrimination in insurance pricing and risk selection practices, using a mathematical approach based on linear algebra to reduce the risks of discrimination.* Methods: The paper proposes an innovative method that has not been previously discussed in the literature, which uses mathematical concepts of linear algebra to reduce the risks of indirect discrimination in insurance.* Results: The paper demonstrates the effectiveness of the proposed method in a concrete case of risk selection in life insurance, showing its simplicity of use and promising performance.Here is the same information in Simplified Chinese text:* For: 本研究旨在Addressing indirect discrimination in insurance pricing and risk selection practices, using mathematical approach based on linear algebra to reduce the risks of discrimination.* Methods: 本研究提出了一种innovative method, which uses mathematical concepts of linear algebra to reduce the risks of indirect discrimination in insurance.* Results: 研究demonstrates the effectiveness of the proposed method in a concrete case of risk selection in life insurance, showing its simplicity of use and promising performance.
    Abstract The development of Machine Learning is experiencing growing interest from the general public, and in recent years there have been numerous press articles questioning its objectivity: racism, sexism, \dots Driven by the growing attention of regulators on the ethical use of data in insurance, the actuarial community must rethink pricing and risk selection practices for fairer insurance. Equity is a philosophy concept that has many different definitions in every jurisdiction that influence each other without currently reaching consensus. In Europe, the Charter of Fundamental Rights defines guidelines on discrimination, and the use of sensitive personal data in algorithms is regulated. If the simple removal of the protected variables prevents any so-called `direct' discrimination, models are still able to `indirectly' discriminate between individuals thanks to latent interactions between variables, which bring better performance (and therefore a better quantification of risk, segmentation of prices, and so on). After introducing the key concepts related to discrimination, we illustrate the complexity of quantifying them. We then propose an innovative method, not yet met in the literature, to reduce the risks of indirect discrimination thanks to mathematical concepts of linear algebra. This technique is illustrated in a concrete case of risk selection in life insurance, demonstrating its simplicity of use and its promising performance.
    摘要 机器学习的发展正在受到一般大众的越来越高度关注,最近几年媒体也有许多报导质疑其公正性: racism、性别歧视、等等。由于资料使用的 regulators 在保险业中日益增加注意力,保险业界必须重新思考定价和风险选择实践,以确保更公正的保险。“Equity”是一个哲学概念,在每个司法管辖区都有不同的定义,这些定义彼此影响而无现在达成共识。在欧洲,《欧洲基本权利宣言》提供了歧视指南,而使用敏感个人资料在算法中的使用则是规管的。即使简单地删除保护变数,模型仍然能够间接歧视个人,因为变数之间的隐藏互动可以提高模型的性能(并因此提高风险的量化、价格分 segmentation 等)。我们首先介绍了歧视的主要概念,然后详细介绍了量化歧视的复杂性。接着,我们提出了一种新的方法,尚未在文献中出现过,以减少间接歧视的风险,这种方法基于数学概念的线性代数。这种技术在生命保险中的风险选择中被证明了其简单使用和推荐性的表现。

Team Intro to AI team8 at CoachAI Badminton Challenge 2023: Advanced ShuttleNet for Shot Predictions

  • paper_url: http://arxiv.org/abs/2307.13715
  • repo_url: None
  • paper_authors: Shih-Hong Chen, Pin-Hsuan Chou, Yong-Fu Liu, Chien-An Han
  • for: 提高现有框架ShuttleNet在预测羽毛球球类型和位置的性能,通过利用过去的拍打。
  • methods: 使用过去拍打来改进ShuttleNet框架的预测性能。
  • results: 在IJCAI 2023 CoachAI Badminton Challenge中获得了较好的成绩,比基线要好得多,最终获得了比赛的第一名,并公布了代码。
    Abstract In this paper, our objective is to improve the performance of the existing framework ShuttleNet in predicting badminton shot types and locations by leveraging past strokes. We participated in the CoachAI Badminton Challenge at IJCAI 2023 and achieved significantly better results compared to the baseline. Ultimately, our team achieved the first position in the competition and we made our code available.
    摘要 在这篇论文中,我们的目标是通过利用过去的击球来提高现有框架ShuttleNet在预测羽毛球shot类型和位置的性能。我们参加了IJCAI 2023年CoachAI羽毛球比赛,与基准线比较,得到了显著更好的结果。最终,我们的团队获得了比赛的第一名,并且我们的代码公开了。

Forecasting, capturing and activation of carbon-dioxide (CO$_2$): Integration of Time Series Analysis, Machine Learning, and Material Design

  • paper_url: http://arxiv.org/abs/2307.14374
  • repo_url: None
  • paper_authors: Suchetana Sadhukhan, Vivek Kumar Yadav
  • for: 这项研究旨在进行欧盟27国+英国、意大利、德国和西班牙等国家以及印度日常产业碳排放时序分析,从2019年1月至2023年2月的15个月时间段内获取数据。
  • methods: 该研究使用了卡本监测研究计划提供的近实时活动数据,并对2020年的数据进行排除,以避免 COVID-19 大流行对数据的干扰。然后,研究人员使用主成分分析(PCA)确定排放的主要贡献者。为了提高预测质量,研究人员使用了7天移动平均数据进行进一步分析。
  • results: 研究发现,电力、工业和公路交通三个领域占据了总变量的显著部分。使用长期短期记忆(LSTM)模型对7天移动平均数据进行预测,可以有效地预测排放和提供政策决策、缓减策略和气候变化努力的指导。模型在训练阶段保证稳定性和协调性,并在测试阶段表现出高效率,$R^2$ 值分别为0.8242-0.995。此外,研究人员还提出了使用锆和氮气/铝合金薄膜作为捕捉CO2的非常有效材料,这些材料在此方面超过了 grafene 和氮氧化物薄膜的绑定能量范围。
    Abstract This study provides a comprehensive time series analysis of daily industry-specific, country-wise CO$_2$ emissions from January 2019 to February 2023. The research focuses on the Power, Industry, Ground Transport, Domestic Aviation, and International Aviation sectors in European countries (EU27 & UK, Italy, Germany, Spain) and India, utilizing near-real-time activity data from the Carbon Monitor research initiative. To identify regular emission patterns, the data from the year 2020 is excluded due to the disruptive effects caused by the COVID-19 pandemic. The study then performs a principal component analysis (PCA) to determine the key contributors to CO$_2$ emissions. The analysis reveals that the Power, Industry, and Ground Transport sectors account for a significant portion of the variance in the dataset. A 7-day moving averaged dataset is employed for further analysis to facilitate robust predictions. This dataset captures both short-term and long-term trends and enhances the quality of the data for prediction purposes. The study utilizes Long Short-Term Memory (LSTM) models on the 7-day moving averaged dataset to effectively predict emissions and provide insights for policy decisions, mitigation strategies, and climate change efforts. During the training phase, the stability and convergence of the LSTM models are ensured, which guarantees their reliability in the testing phase. The evaluation of the loss function indicates this reliability. The model achieves high efficiency, as demonstrated by $R^2$ values ranging from 0.8242 to 0.995 for various countries and sectors. Furthermore, there is a proposal for utilizing scandium and boron/aluminium-based thin films as exceptionally efficient materials for capturing CO$_2$ (with a binding energy range from -3.0 to -3.5 eV). These materials are shown to surpass the affinity of graphene and boron nitride sheets in this regard.
    摘要

eess.IV - 2023-07-26

Artifact Restoration in Histology Images with Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2307.14262
  • repo_url: https://github.com/zhenqi-he/artifusion
  • paper_authors: Zhenqi He, Junjun He, Jin Ye, Yiqing Shen
  • for: histological whole slide images (WSIs) restoration, to improve the examination difficulty for pathologists and Computer-Aided Diagnosis (CAD) systems.
  • methods: innovative denoising diffusion probabilistic model called ArtiFusion, which formulates the artifact region restoration as a gradual denoising process and uses a novel Swin-Transformer denoising architecture and time token scheme to capture local-global correlations.
  • results: effective restoration of artifact-corrupted histological WSIs, preserving tissue structures and stain style in artifact-free regions, demonstrated through extensive evaluations.
    Abstract Histological whole slide images (WSIs) can be usually compromised by artifacts, such as tissue folding and bubbles, which will increase the examination difficulty for both pathologists and Computer-Aided Diagnosis (CAD) systems. Existing approaches to restoring artifact images are confined to Generative Adversarial Networks (GANs), where the restoration process is formulated as an image-to-image transfer. Those methods are prone to suffer from mode collapse and unexpected mistransfer in the stain style, leading to unsatisfied and unrealistic restored images. Innovatively, we make the first attempt at a denoising diffusion probabilistic model for histological artifact restoration, namely ArtiFusion.Specifically, ArtiFusion formulates the artifact region restoration as a gradual denoising process, and its training relies solely on artifact-free images to simplify the training complexity.Furthermore, to capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme. Our extensive evaluations demonstrate the effectiveness of ArtiFusion as a pre-processing method for histology analysis, which can successfully preserve the tissue structures and stain style in artifact-free regions during the restoration. Code is available at https://github.com/zhenqi-he/ArtiFusion.
    摘要 histological whole slide images (WSIs) 可以受到artefacts的影响,如组织卷积和气泡,这会提高Pathologist和计算机支持诊断系统(CAD)的评估难度。现有的恢复artefact图像方法包括生成对抗网络(GANs),其中恢复过程 формули为图像-图像传输。这些方法容易受到模式落入和意外传输的困难,导致 restored图像不满意和不真实。在创新的思路下,我们提出了一种denoising扩散概率模型,即ArtiFusion,用于 histological artefact 恢复。具体来说,ArtiFusion将artefact区域恢复视为一种渐进的denoising过程,其训练仅仅基于无artefact的图像,以简化训练复杂性。此外,为了捕捉local-global相关性在区域artefact恢复中,我们设计了一种Swin-Transformer恢复架构,以及一种时间标识符方案。我们的广泛评估表明ArtiFusion作为 histology 分析前置处理方法,能够成功保留组织结构和染色样式在无artefact区域中,并且不会导致图像损害。代码可以在https://github.com/zhenqi-he/ArtiFusion 上获取。

Visual Saliency Detection in Advanced Driver Assistance Systems

  • paper_url: http://arxiv.org/abs/2308.03770
  • repo_url: None
  • paper_authors: Francesco Rundo, Michael Sebastian Rundo, Concetto Spampinato
  • for: 本研究旨在开发一种智能驾驶系统,其能够检测司机的睡眠状况并根据场景的重要性进行分类。
  • methods: 该系统使用了特殊的3D深度学习网络进行semantic segmentation,并在驾驶器上使用STA1295核心和硬件加速器进行实时处理。此外,还使用了车辆轮胎上的生物传感器来监测司机的睡眠状况,并使用1D时间深度卷积网络来分类司机的PPG信号。
  • results: 实验结果表明,该系统能够有效地检测司机的睡眠状况和场景重要性,并且可以准确地评估司机的注意力水平。
    Abstract Visual Saliency refers to the innate human mechanism of focusing on and extracting important features from the observed environment. Recently, there has been a notable surge of interest in the field of automotive research regarding the estimation of visual saliency. While operating a vehicle, drivers naturally direct their attention towards specific objects, employing brain-driven saliency mechanisms that prioritize certain elements over others. In this investigation, we present an intelligent system that combines a drowsiness detection system for drivers with a scene comprehension pipeline based on saliency. To achieve this, we have implemented a specialized 3D deep network for semantic segmentation, which has been pretrained and tailored for processing the frames captured by an automotive-grade external camera. The proposed pipeline was hosted on an embedded platform utilizing the STA1295 core, featuring ARM A7 dual-cores, and embeds an hardware accelerator. Additionally, we employ an innovative biosensor embedded on the car steering wheel to monitor the driver drowsiness, gathering the PhotoPlethysmoGraphy (PPG) signal of the driver. A dedicated 1D temporal deep convolutional network has been devised to classify the collected PPG time-series, enabling us to assess the driver level of attentiveness. Ultimately, we compare the determined attention level of the driver with the corresponding saliency-based scene classification to evaluate the overall safety level. The efficacy of the proposed pipeline has been validated through extensive experimental results.
    摘要 视觉吸引力(Visual Saliency)是人类自然的注意力机制,它使人们在观察环境中强调和提取重要的特征。在汽车研究领域,计算视觉吸引力的技术受到了最近的关注。驾驶时, drivers 会自然地将注意力集中在特定的对象上,使用大脑驱动的注意力机制,它会优先级化某些元素。在本研究中,我们提出了一种智能系统,其combines 驾驶者睡眠检测系统和基于视觉吸引力的场景理解管道。为实现这一目标,我们实现了一种专门的3D深度网络 дляsemantic segmentation,该网络在 automotive-grade 外部摄像头捕捉的帧中进行了预训练和定制。我们的提案的管道在 ARM A7 双核心的 Embedded 平台上运行,并使用了硬件加速器。此外,我们还使用了一种创新的车辙把握 sensor,以监测驾驶者的睡眠状况,并收集了PhotoPlethysmoGraphy(PPG)信号。一个专门的1D时间深度卷积网络被设计用于分类收集的 PPG 时间序列,以评估驾驶者的注意力水平。最后,我们将驾驶者的注意力水平与相应的视觉吸引力基于场景分类进行比较,以评估整体安全水平。我们的实验结果表明,提案的管道具有良好的效果。

Non-Linear Self Augmentation Deep Pipeline for Cancer Treatment outcome Prediction

  • paper_url: http://arxiv.org/abs/2307.14398
  • repo_url: None
  • paper_authors: Francesco Rundo, Concetto Spampinato, Michael Rundo
  • for: 这篇论文旨在探讨如何通过免疫疗法治疗肿瘤,以提高患者的存活率和减少化学疗法对身体的负面影响。
  • methods: 该论文提出了一种新的策略,即利用非线性细胞建筑和深度下游分类器,从肿瘤CT影像中提取和增强2D特征,以提高治疗结果预测的精度。
  • results: 作者们在实验中表明,该策略可以达到约93%的总准确率,表明其效果概括比较出色。
    Abstract Immunotherapy emerges as promising approach for treating cancer. Encouraging findings have validated the efficacy of immunotherapy medications in addressing tumors, resulting in prolonged survival rates and notable reductions in toxicity compared to conventional chemotherapy methods. However, the pool of eligible patients for immunotherapy remains relatively small, indicating a lack of comprehensive understanding regarding the physiological mechanisms responsible for favorable treatment response in certain individuals while others experience limited benefits. To tackle this issue, the authors present an innovative strategy that harnesses a non-linear cellular architecture in conjunction with a deep downstream classifier. This approach aims to carefully select and enhance 2D features extracted from chest-abdomen CT images, thereby improving the prediction of treatment outcomes. The proposed pipeline has been meticulously designed to seamlessly integrate with an advanced embedded Point of Care system. In this context, the authors present a compelling case study focused on Metastatic Urothelial Carcinoma (mUC), a particularly aggressive form of cancer. Performance evaluation of the proposed approach underscores its effectiveness, with an impressive overall accuracy of approximately 93%
    摘要 免疫疗法在治疗癌症方面emerges as a promising approach. 有鼓舞的结果验证了免疫疗药的有效性,对抗肿瘤,导致生存时间增长和化学治疗方法相比,对于患有癌症的患者而言,有着更好的体验和较少的副作用。然而,有效治疗的病人群较小,这表明我们对于特定个体征所需的体化机制仍然缺乏了一致性的理解。为了解决这个问题,作者们提出了一个创新的策略,利用非线性细胞架构,联合深入的下游分类器。这种方法的目的是精确地选择和增强来自胸腹部Computed Tomography(CT)图像的2D特征,以提高治疗结果预测的精度。提案的管线被精心设计,与进步的嵌入式点数检查系统完美融合。在这个上下文中,作者们透过了一个吸引人的案例研究,专注于肉瘤癌(mUC),这是一种特别的攻击性癌症。研究表现了提案的有效性,其总准确率约为93%。

Tackling Scattering and Reflective Flare in Mobile Camera Systems: A Raw Image Dataset for Enhanced Flare Removal

  • paper_url: http://arxiv.org/abs/2307.14180
  • repo_url: None
  • paper_authors: Fengbo Lan, Chang Wen Chen
  • for: 提高手持式摄像头系统的图像质量,特别是解决散射和反射炬的问题。
  • methods: 使用Raw图像 Dataset,该Dataset包括多种不同的手持式摄像头和摄像头设置,并且可以分解成多个patches,以适应不同的捕捉环境。
  • results: 实验结果表明,使用真实图像Dataset可以更好地适应复杂的照明环境,而使用Synthesized数据会降低图像质量。 Raw图像数据也有显著优势在解决散射和反射炬问题。
    Abstract The increasing prevalence of mobile devices has led to significant advancements in mobile camera systems and improved image quality. Nonetheless, mobile photography still grapples with challenging issues such as scattering and reflective flare. The absence of a comprehensive real image dataset tailored for mobile phones hinders the development of effective flare mitigation techniques. To address this issue, we present a novel raw image dataset specifically designed for mobile camera systems, focusing on flare removal. Capitalizing on the distinct properties of raw images, this dataset serves as a solid foundation for developing advanced flare removal algorithms. It encompasses a wide variety of real-world scenarios captured with diverse mobile devices and camera settings. The dataset comprises over 2,000 high-quality full-resolution raw image pairs for scattering flare and 1,100 for reflective flare, which can be further segmented into up to 30,000 and 2,200 paired patches, respectively, ensuring broad adaptability across various imaging conditions. Experimental results demonstrate that networks trained with synthesized data struggle to cope with complex lighting settings present in this real image dataset. We also show that processing data through a mobile phone's internal ISP compromises image quality while using raw image data presents significant advantages for addressing the flare removal problem. Our dataset is expected to enable an array of new research in flare removal and contribute to substantial improvements in mobile image quality, benefiting mobile photographers and end-users alike.
    摘要 “由于移动设备的普及,移动摄像系统已经取得了重要进步,并且提高了图像质量。然而,移动摄影仍然面临着困难的问题,如散射和反射镜光。由于现有的移动设备实验数据集仍然不够完整,因此对于这些问题的发展有限制。为解决这个问题,我们提出了一个新的原始数据集,特别是设计用于移动摄像系统,专注于灯光减除。利用原始图像的特有性,这个数据集成为了发展高级灯光减除算法的坚实基础。它包括了各种真实世界的拍摄情况,运用多种移动设备和相机设定。数据集包含了2,000多个高品质的全分辨率原始图像,用于散射灯光和反射灯光,可以进一步被切割为30,000个和2,200个相应的对称 patch。这使得这个数据集在不同的摄影情况下具有广泛的适用性。我们的实验结果显示,使用现成数据集进行训练的网络对于实际摄影情况下表现不佳。此外,将图像处理通过移动设备的内部ISP也会导致图像质量下降,而使用原始图像数据集则具有明显的优势,用于解决灯光减除问题。我们预期这个数据集将启动新的研究,并对移动图像质量做出重要改善,帮助移动摄影师和最终用户。”

Memory-Efficient Graph Convolutional Networks for Object Classification and Detection with Event Cameras

  • paper_url: http://arxiv.org/abs/2307.14124
  • repo_url: None
  • paper_authors: Kamil Jeziorek, Andrea Pinna, Tomasz Kryjak
  • for: This paper focuses on developing an efficient graph convolutional network (GCN) for processing event data in its original sparse form, with the goal of achieving high accuracy while minimizing computational and memory costs.
  • methods: The authors compare different graph convolution operations and evaluate their performance in terms of execution time, number of trainable model parameters, data format requirements, and training outcomes. They also implement an object detection architecture and evaluate its performance on the N-Caltech101 dataset.
  • results: The authors achieve a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. They also achieve an object detection accuracy of 53.7% mAP@0.5 and an execution rate of 82 graphs per second.Here’s the Chinese translation of the three key information points:
  • for: 这篇论文关注开发高效的图结构神经网络(GCN),用于处理事件数据的原始稀畴形式,以实现高精度while减少计算和内存成本。
  • methods: 作者比较了不同的图 convolution 操作,并评估其性能在执行时间、训练参数数量、数据格式要求和训练结果等方面。他们还实现了一个物体检测架构,并评估其性能在 N-Caltech101 数据集上。
  • results: 作者实现了一个 450 倍减少的特征提取模块参数数量,并将数据表示的大小减少到 4.5 倍,同时保持了 52.3% 的分类精度,相比之下和现有方法相比,提高了 6.3%。他们还实现了一个物体检测精度为 53.7% mAP@0.5 和执行速度为 82 个图/秒。
    Abstract Recent advances in event camera research emphasize processing data in its original sparse form, which allows the use of its unique features such as high temporal resolution, high dynamic range, low latency, and resistance to image blur. One promising approach for analyzing event data is through graph convolutional networks (GCNs). However, current research in this domain primarily focuses on optimizing computational costs, neglecting the associated memory costs. In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity. For this purpose, we performed a comparative analysis of different graph convolution operations, considering factors such as execution time, the number of trainable model parameters, data format requirements, and training outcomes. Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. To further evaluate performance, we implemented the object detection architecture and evaluated its performance on the N-Caltech101 dataset. The results showed an accuracy of 53.7 % mAP@0.5 and reached an execution rate of 82 graphs per second.
    摘要

Periocular biometrics: databases, algorithms and directions

  • paper_url: http://arxiv.org/abs/2307.14111
  • repo_url: None
  • paper_authors: Fernando Alonso-Fernandez, Josef Bigun
  • for: 这篇论文主要是为了探讨 périocular 生物认证技术的现状和未来发展趋势。
  • methods: 论文使用了多种方法,包括Feature extraction from the periocular region, gender classification, ethnicity classification, and the impact of gender transformation or plastic surgery on recognition performance.
  • results: 论文提出了一些关键问题和未来发展趋势,包括使用 périocular 特征提高生物认证精度,以及gender transformation or plastic surgery的影响在认证性能中。
    Abstract Periocular biometrics has been established as an independent modality due to concerns on the performance of iris or face systems in uncontrolled conditions. Periocular refers to the facial region in the eye vicinity, including eyelids, lashes and eyebrows. It is available over a wide range of acquisition distances, representing a trade-off between the whole face (which can be occluded at close distances) and the iris texture (which do not have enough resolution at long distances). Since the periocular region appears in face or iris images, it can be used also in conjunction with these modalities. Features extracted from the periocular region have been also used successfully for gender classification and ethnicity classification, and to study the impact of gender transformation or plastic surgery in the recognition performance. This paper presents a review of the state of the art in periocular biometric research, providing an insight of the most relevant issues and giving a thorough coverage of the existing literature. Future research trends are also briefly discussed.
    摘要 périocular biometrics 已经成为一种独立的模式,由于人们对眼球或面部系统在无控制的环境中的性能有所担忧。 périocular 指的是眼睛附近的脸部区域,包括眼睛、毛发和眉毛。它可以在各种距离范围内获取,表示一种折衔 между整个脸部(可能被近距离 occluded)和眼球xture(没有足够的分辨率)。由于 périocular 区域会出现在face或iris图像中,因此它可以与这些模式一起使用。从 périocular 区域提取的特征已经成功地用于性别分类和民族分类,以及研究性别转换或整形手术对认知性能的影响。这篇论文提供了 periocular biometric 研究的现状报告,并给出了现有文献的全面概述。文章还 briefly discusses 未来的研究趋势。

Video Decoding Energy Estimation Using Processor Events

  • paper_url: http://arxiv.org/abs/2307.14000
  • repo_url: None
  • paper_authors: Christian Herglotz, André Kaup
  • for: 这个论文是用于研究软件视频解码器的处理能力的。
  • methods: 这个论文使用了处理器事件如 instrucion counts 或缓存misses来准确估计软件视频解码器的处理能力。
  • results: 这个论文的研究表明,使用该估计方法可以准确地估计最新的视频编码标准HEVC和VP9中的解码能力,mean estimation error小于6%。
    Abstract In this paper, we show that processor events like instruction counts or cache misses can be used to accurately estimate the processing energy of software video decoders. Therefore, we perform energy measurements on an ARM-based evaluation platform and count processor level events using a dedicated profiling software. Measurements are performed for various codecs and decoder implementations to prove the general viability of our observations. Using the estimation method proposed in this paper, the true decoding energy for various recent video coding standards including HEVC and VP9 can be estimated with a mean estimation error that is smaller than 6%.
    摘要 在本文中,我们证明处理器事件如 instrucion 数或缓存失败可以准确地估计软件视频解码器的处理能量。因此,我们使用专门的 profiling 软件来计数处理器级别事件,并在 ARM 基础设施上进行能量测量。测量结果表明,对于不同的编码器和解码器实现,我们可以使用我们所提出的估计方法来估计最新的视频编码标准HEVC和VP9的真正解码能量,其误差在6%以下。

Hybrid Representation-Enhanced Sampling for Bayesian Active Learning in Musculoskeletal Segmentation of Lower Extremities

  • paper_url: http://arxiv.org/abs/2307.13986
  • repo_url: None
  • paper_authors: Ganping Li, Yoshito Otake, Mazen Soufi, Masashi Taniguchi, Masahide Yagi, Noriaki Ichihashi, Keisuke Uemura, Masaki Takao, Nobuhiko Sugano, Yoshinobu Sato
  • for: 降低医学影像分割任务中手动标注的时间和成本
  • methods: bayesian active learning框架基于 bayesian u-net,采用混合表示式增强采样策略,选择高密度和多样性的不确定样本进行人工修正
  • results: 对两个lower extremity(LE)数据集的MRI和CT图像进行实验,比较不同的采样规则和方法,结果表明提出的方法在两个数据集上具有超越或相等的优势,并且量化结果表明采用混合riteria的方法在musculoskeletal segmentation中表现出优势。
    Abstract Purpose: Obtaining manual annotations to train deep learning (DL) models for auto-segmentation is often time-consuming. Uncertainty-based Bayesian active learning (BAL) is a widely-adopted method to reduce annotation efforts. Based on BAL, this study introduces a hybrid representation-enhanced sampling strategy that integrates density and diversity criteria to save manual annotation costs by efficiently selecting the most informative samples. Methods: The experiments are performed on two lower extremity (LE) datasets of MRI and CT images by a BAL framework based on Bayesian U-net. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using Dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation. Results: The proposed method showed superiority or non-inferiority to other methods on both datasets across two acquisition rules, and quantitative results reveal the pros and cons of the acquisition rules. Our ablation study in volume-wise acquisition shows that the combination of density and diversity criteria outperforms solely using either of them in musculoskeletal segmentation. Conclusion: Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.
    摘要 目的:获取手动标注以训练深度学习(DL)模型的自动分割时间consuming。uncertainty-based Bayesian活动学习(BAL)是一种广泛采用的方法,可以降低手动标注成本。本研究基于BAL,提出了一种混合表示符强化抽样策略,通过高密度和多样性标准来高效地选择最有用的样本进行手动修改。方法:我们在两个下肢(LE)数据集上进行了MRI和CT图像的实验,使用基于Bayesian U-net的BAL框架。我们的方法选择了uncertainty高和多样性高的样本进行手动修改,以便最大化与未标注实例的相似性,最小化与现有训练数据的相似性。我们使用Dice和我们所提出的metric called reduced annotation cost(RAC)进行评估精度和效率。我们进一步evaluate了不同的抽样规则对BAL性能的影响,并实现了效果的鉴定。结果:我们的方法在两个数据集上都与其他方法具有superiority或non-inferiority,并且在两个抽样规则下表现出优异。我们的ablation study表明,混合表示符强化抽样策略在musculoskeletal segmentation中表现出优异。结论:我们的抽样方法可以有效地减少图像分割任务中的手动标注成本。我们的BAL框架和混合表示符强化抽样策略的结合,提供了一种 semi-automatic的方法,可以快速和高效地注解医学图像数据集。

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

  • paper_url: http://arxiv.org/abs/2307.13981
  • repo_url: None
  • paper_authors: Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma
  • for: 这篇论文旨在探讨视频质量评估(BVQA)在实际应用中的监测和改进视频观看体验的角色。
  • methods: 作者采用了基于基本块的最简单的BVQA模型,包括视频预处理(对空间时间下采样进行压缩)、空间质量分析器和可选的时间质量分析器,以及质量回归器。
  • results: 作者通过对八个VQA数据集进行计算分析发现,大多数数据集具有不同程度的易 datasets问题,一些甚至可以使用盲目图像质量评估(BIQA)解决方案。作者还通过比较不同模型变体在这些数据集上的质量预测性能,以及对不同基本建构元素的影响进行ablation分析,证明了他们的结论。这些结果表明BVQA领域的进步不足,同时也提供了constructing next-generation VQA datasets和模型的好做法。
    Abstract Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.
    摘要 《盲视视频质量评估(BVQA)在各种实际视频媒体应用中扮演着不可或缺的角色,负责监测和改进用户的观看体验。作为实验室的一个领域,BVQA模型的改进都是根据一些人类评估的VQA数据进行评估的。因此,我们需要更好地了解现有的VQA数据集,以便正确评估当前的进步。为达到这个目标,我们通过设计简单的BVQA模型进行计算分析。我们的家族中的BVQA模型只能使用基本块:视频预处理器(对于激进的空间时间采样)、空间质量分析器、可选的时间质量分析器以及质量回归器,其中所有的实现都是最简单的。通过对不同模型变体在八个VQA数据集上的质量预测性能进行比较,我们发现大多数数据集具有不同程度的易于评估(Easy Dataset Problem),一些甚至接受盲图质量评估(BIQA)解决方案。我们还通过对这些VQA数据集的模型普适性进行比较,以及对BVQA设计选择的繁殖评估来证明我们的结论。我们的结果表明当前BVQA领域的进步不充分,同时也照明了constructing next-generation VQA datasets和models的好做法。》

A real-time material breakage detection for offshore wind turbines based on improved neural network algorithm

  • paper_url: http://arxiv.org/abs/2307.13765
  • repo_url: None
  • paper_authors: Yantong Liu
  • for: 这种研究是为了提高陆上风电机的稳定性,以便更好地实现可持续能源生产。
  • methods: 这种方法使用了一种改进版的YOLOv8对象检测模型,并添加了一个卷积块注意模块(CBAM)来提高特征识别。研究使用了5,432张风电园的图像和一个公共可用的数据集进行了严格的测试。
  • results: 研究发现了一个显著提高的缺陷检测稳定性,这标志着可持续能源实践中的一个重要进步。
    Abstract The integrity of offshore wind turbines, pivotal for sustainable energy generation, is often compromised by surface material defects. Despite the availability of various detection techniques, limitations persist regarding cost-effectiveness, efficiency, and applicability. Addressing these shortcomings, this study introduces a novel approach leveraging an advanced version of the YOLOv8 object detection model, supplemented with a Convolutional Block Attention Module (CBAM) for improved feature recognition. The optimized loss function further refines the learning process. Employing a dataset of 5,432 images from the Saemangeum offshore wind farm and a publicly available dataset, our method underwent rigorous testing. The findings reveal a substantial enhancement in defect detection stability, marking a significant stride towards efficient turbine maintenance. This study's contributions illuminate the path for future research, potentially revolutionizing sustainable energy practices.
    摘要 风力机 Platform 上的缺陷问题,对可持续能源生产是关键。尽管有各种检测技术,但有限制,包括成本、效率和应用范围。本研究提出一种新的方法,利用高级版YOLOv8对象检测模型,配备Convolutional Block Attention Module (CBAM),以提高特征识别。优化的损失函数进一步优化学习过程。使用来自韩川风电园和公共数据集的5432张图像,我们的方法经过严格测试。发现缺陷检测稳定性得到了显著提高,这标志着可持续能源实践中的重要进步。本研究的贡献,推照未来研究的道路,可能会革命化可持续能源实践。

cs.SD - 2023-07-25

A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis

  • paper_url: http://arxiv.org/abs/2307.13346
  • repo_url: None
  • paper_authors: Li Xiao, Xiuping Yang, Xinhong Li, Weiping Tu, Xiong Chen, Weiyan Yi, Jie Lin, Yuhong Yang, Yanzhen Ren
  • for: This paper aims to identify the obstruction site of the upper airways in patients with Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) by analyzing snoring sounds.
  • methods: The paper proposes a snore-based sleep body position recognition dataset (SSBPR) consisting of 7570 snoring recordings, which includes six distinct labels for sleep body position. The authors use machine learning algorithms to analyze the acoustic features of snoring sounds and identify the sleep body position.
  • results: The experimental results show that snoring sounds exhibit certain acoustic features that can be used effectively to identify body posture during sleep in real-world scenarios.Here’s the information in Simplified Chinese text:
  • for: 这篇论文目标是通过分析呼吸声来识别抑制性睡眠呼吸综合症(OSAHS)患者的顶部空气道堵塞位置。
  • methods: 该论文提出了一个基于呼吸声的睡眠姿态识别数据集(SSBPR),包括7570个呼吸声记录,其中包括6种睡眠姿态标签:躺着、左右两侧躺着、左右两侧头躺着和躺着。作者们使用机器学习算法分析呼吸声的音频特征,以识别睡眠姿态。
  • results: 实验结果表明,呼吸声具有一定的音频特征,可以在实际应用场景中有效地利用呼吸声来识别睡眠姿态。
    Abstract Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a chronic breathing disorder caused by a blockage in the upper airways. Snoring is a prominent symptom of OSAHS, and previous studies have attempted to identify the obstruction site of the upper airways by snoring sounds. Despite some progress, the classification of the obstruction site remains challenging in real-world clinical settings due to the influence of sleep body position on upper airways. To address this challenge, this paper proposes a snore-based sleep body position recognition dataset (SSBPR) consisting of 7570 snoring recordings, which comprises six distinct labels for sleep body position: supine, supine but left lateral head, supine but right lateral head, left-side lying, right-side lying and prone. Experimental results show that snoring sounds exhibit certain acoustic features that enable their effective utilization for identifying body posture during sleep in real-world scenarios.
    摘要 《干扰性呼吸睡眠综合征(OSAHS)》是一种常见的呼吸滥血症,由上呼吸道堵塞引起。吸吮是OSAHS的一个明显的表现,以前的研究已经尝试过通过吸吮声音来确定上呼吸道堵塞的位置。然而,在真实的临床场景下,由于睡眠姿态的影响,这种分类仍然具有挑战性。为解决这个问题,本文提出了一个基于吸吮声音的睡眠姿态识别数据集(SSBPR),包括7570个吸吮记录,其中包括6种不同的睡眠姿态标签:躺平、躺平左半身、躺平右半身、左边躺、右边躺和躺股。实验结果表明,吸吮声音具有一些听频特征,可以有效地在真实的临床场景下用于识别睡眠姿态。

On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

  • paper_url: http://arxiv.org/abs/2307.13343
  • repo_url: None
  • paper_authors: Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung
  • for: 提高语音识别私隐和语音识别精度
  • methods: 使用隐藏声学特征的抗风险层,并在云端执行剩下的模型
  • results: 提高语音识别精度6.2%,降低语音识别人员精度33%
    Abstract Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition (ASR). The proposed framework attaches flexible gradient reversal based speaker adversarial layers to target layers within an ASR model, where speaker adversarial training anonymizes acoustic embeddings generated by the targeted layers to remove speaker identity. We propose on-device deployment by execution of initial layers of the ASR model, and transmitting anonymized embeddings to the cloud, where the rest of the model is executed while preserving privacy. Experimental results show that our method efficiently reduces speaker recognition relative accuracy by 33%, and improves ASR performance by achieving 6.2% relative Word Error Rate (WER) reduction.
    摘要 智能设备通常需要将用户数据传输到云端进行推理,这包括将私人用户信息(如说话人的身份)传输到云端。我们的论文提出了一个隐私增强框架,以保护说话人的隐私while preserving speech recognition accuracy。该框架通过在ASR模型中附加可变梯度逆转基于说话人对抗层来实现说话人匿名化,并在云端执行 оста卷ASR模型,以保持隐私。我们的方法可以减少说话人认可率Relative accuracy by 33%,并提高ASR性能by achieving 6.2% relative Word Error Rate (WER) reduction。

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

  • paper_url: http://arxiv.org/abs/2307.13295
  • repo_url: None
  • paper_authors: Youqiang Zheng, Li Xiao, Weiping Tu, Yuhong Yang, Xinmeng Xu
  • for: 提高低比特率 speech 编码器的质量
  • methods: 结合传统参数编码器和神经 vocoder 的新框架 CQNV,以减少比特率而不损失质量
  • results: 对比 Lyra 和 Encodec,我们的提议方法可以在 1.1 kbps 比特率下获得更高的重建语音质量
    Abstract Recently, speech codecs based on neural networks have proven to perform better than traditional methods. However, redundancy in traditional parameter quantization is visible within the codec architecture of combining the traditional codec with the neural vocoder. In this paper, we propose a novel framework named CQNV, which combines the coarsely quantized parameters of a traditional parametric codec to reduce the bitrate with a neural vocoder to improve the quality of the decoded speech. Furthermore, we introduce a parameters processing module into the neural vocoder to enhance the application of the bitstream of traditional speech coding parameters to the neural vocoder, further improving the reconstructed speech's quality. In the experiments, both subjective and objective evaluations demonstrate the effectiveness of the proposed CQNV framework. Specifically, our proposed method can achieve higher quality reconstructed speech at 1.1 kbps than Lyra and Encodec at 3 kbps.
    摘要 Note:* " parametric codec" transformed into "参数化编码器" (parameterized encoder)* "traditional speech coding parameters" transformed into "传统语音编码参数" (traditional speech coding parameters)* "neural vocoder" transformed into "神经 vocoder" (neural vocoder)* "bitstream" transformed into "比特流" (bitstream)* "reconstructed speech" transformed into "重建语音" (reconstructed speech)

cs.CV - 2023-07-25

Mystique: Deconstructing SVG Charts for Layout Reuse

  • paper_url: http://arxiv.org/abs/2307.13567
  • repo_url: None
  • paper_authors: Chen Chen, Bongshin Lee, Yunhai Wang, Yunjeong Chang, Zhicheng Liu
  • for: 这篇论文的目的是提出一种能够解剖 rectangle-based 图表的方法,以便在新的数据上重用现有的图表。
  • methods: 这篇论文使用了一种混合式 iniciative approach,通过提取坐标轴和标签,将图表的布局 decomposing into four semantic component: mark groups、 spatial relationships、数据编码和图形约束。
  • results: 在 150 个 rectangle-based SVG 图表上,这种方法可以达到 above 85% 的准确率 для坐标和标签提取,以及 96% 的布局 decomposing 精度。在一个图表重用研究中,参与者可以轻松地将现有的图表应用到新的数据上。
    Abstract To facilitate the reuse of existing charts, previous research has examined how to obtain a semantic understanding of a chart by deconstructing its visual representation into reusable components, such as encodings. However, existing deconstruction approaches primarily focus on chart styles, handling only basic layouts. In this paper, we investigate how to deconstruct chart layouts, focusing on rectangle-based ones, as they cover not only 17 chart types but also advanced layouts (e.g., small multiples, nested layouts). We develop an interactive tool, called Mystique, adopting a mixed-initiative approach to extract the axes and legend, and deconstruct a chart's layout into four semantic components: mark groups, spatial relationships, data encodings, and graphical constraints. Mystique employs a wizard interface that guides chart authors through a series of steps to specify how the deconstructed components map to their own data. On 150 rectangle-based SVG charts, Mystique achieves above 85% accuracy for axis and legend extraction and 96% accuracy for layout deconstruction. In a chart reproduction study, participants could easily reuse existing charts on new datasets. We discuss the current limitations of Mystique and future research directions.
    摘要 <>传送给我的文本翻译成简化中文。>以便重用现有的图表,前期研究已经研究了如何从图表的视觉表示中提取SemanticComponents,例如编码。然而,现有的分解方法主要集中在图表风格上,只处理基本布局。在这篇论文中,我们调查了如何从图表布局中提取SemanticComponents,特点Rectangle-based布局,因为它们不仅覆盖了17种图表类型,还包括高级布局(例如小多个、嵌套布局)。我们开发了一个交互工具,名为Mystique,采用杂合主义方法来提取轴和标签,并将图表布局分解成四个SemanticComponents:标记组、空间关系、数据编码和图形约束。Mystique使用了一个帮助chart作者通过一系列步骤指定分解后的组件与他们的数据之间的映射。在150个Rectangle-based SVG图表上,Mystique实现了轴和标签提取的准确率高于85%,布局分解的准确率达96%。在一次图表重制实验中,参与者可以轻松地将现有图表应用到新的数据集上。我们讨论了Mystique的当前限制和未来研究方向。

Model Calibration in Dense Classification with Adaptive Label Perturbation

  • paper_url: http://arxiv.org/abs/2307.13539
  • repo_url: https://github.com/carlisle-liu/aslp
  • paper_authors: Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes
  • for: 这个研究旨在提高深度神经网络的准确性和信任度,以便在安全相关应用中使用。
  • methods: 本研究提出了一种名为 Adaptive Stochastic Label Perturbation (ASLP) 的方法,它可以学习每个训练图像的唯一标签变化水平。ASLP 使用的是我们所提出的 Self-Calibrating Binary Cross Entropy (SC-BCE) 损失函数,它可以统一标签变化程序,包括随机方法(如 DisturbLabel)和标签平滑,以更正均化。
  • results: 实验结果显示,ASLP 可以对封闭的 binary 分类模型进行重大的均化,并且可以保持知道的标签准确率。在 both in-distribution 和 out-of-distribution 数据上,ASLP 可以提高模型的准确性和信任度。
    Abstract For safety-related applications, it is crucial to produce trustworthy deep neural networks whose prediction is associated with confidence that can represent the likelihood of correctness for subsequent decision-making. Existing dense binary classification models are prone to being over-confident. To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. ASLP employs our proposed Self-Calibrating Binary Cross Entropy (SC-BCE) loss, which unifies label perturbation processes including stochastic approaches (like DisturbLabel), and label smoothing, to correct calibration while maintaining classification rates. ASLP follows Maximum Entropy Inference of classic statistical mechanics to maximise prediction entropy with respect to missing information. It performs this while: (1) preserving classification accuracy on known data as a conservative solution, or (2) specifically improves model calibration degree by minimising the gap between the prediction accuracy and expected confidence of the target training label. Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.
    摘要 Translated into Simplified Chinese:为安全相关应用,生成可靠的深度神经网络的预测结果需要与正确性的可信度相关,以便进行后续决策。现有的密集二分类模型容易过于自信。为了改善模型准确性,我们提议使用 Adaptive Stochastic Label Perturbation (ASLP),它学习每个训练图像的特有标签扰动水平。ASLP使用我们提议的 Self-Calibrating Binary Cross Entropy (SC-BCE) 损失函数,它将标签扰动过程、标签平滑和随机扰动等进行统一处理,以更正准确性。ASLP采用经典统计力学中的最大 entropy 推理来最大化预测结果的不确定性,同时: (1) 保持知道数据上的分类精度作为保守解决方案,或 (2) 特定地改善模型准确性度。广泛的结果表明,ASLP可以大幅提高密集二分类模型的准确性和可靠性。代码可以在 https://github.com/Carlisle-Liu/ASLP 上获取。

Not with my name! Inferring artists’ names of input strings employed by Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.13527
  • repo_url: https://github.com/ictlab-unict/not-with-my-name
  • paper_authors: Roberto Leotta, Oliver Giudice, Luca Guarnera, Sebastiano Battiato
  • for: 本研究的目的是探讨Diffusion Models(DM)是否可以生成艺术作品,以及如果可以的话,那么DM是如何学习和复制艺术家的风格和技巧的。
  • methods: 本研究使用了一种叫做Siamese Neural Network的特殊神经网络,以便对生成的图像进行预测和分析。
  • results: 实验结果表明,我们的方法可以准确地预测图像的一部分,并且可以作为图像的输入串进行预测。这些结果表明了我们的方法是一个有用的开始,可以用于预测某个图像的完整输入串。
    Abstract Diffusion Models (DM) are highly effective at generating realistic, high-quality images. However, these models lack creativity and merely compose outputs based on their training data, guided by a textual input provided at creation time. Is it acceptable to generate images reminiscent of an artist, employing his name as input? This imply that if the DM is able to replicate an artist's work then it was trained on some or all of his artworks thus violating copyright. In this paper, a preliminary study to infer the probability of use of an artist's name in the input string of a generated image is presented. To this aim we focused only on images generated by the famous DALL-E 2 and collected images (both original and generated) of five renowned artists. Finally, a dedicated Siamese Neural Network was employed to have a first kind of probability. Experimental results demonstrate that our approach is an optimal starting point and can be employed as a prior for predicting a complete input string of an investigated image. Dataset and code are available at: https://github.com/ictlab-unict/not-with-my-name .
    摘要 Diffusion Models (DM) 是非常有效的生成高质量、真实的图像。然而,这些模型缺乏创造力,只是根据它们的训练数据,遵循文本输入提供于创建时,生成输出。是否可以使用艺术家的名字来生成图像?这意味着如果 DM 能够复制艺术家的作品,那么它们可能已经训练过一些或所有的艺术作品,从而违反版权。在这篇论文中,我们提出了一项初步研究,以确定使用艺术家名字在生成图像的输入串中的概率。为此,我们仅focus在 DALL-E 2 生成的图像和五位著名艺术家的原始和生成图像上。最后,我们使用专门的 Siamese Neural Network 来获得一种首个概率。实验结果表明,我们的方法是一个优秀的起点,可以作为预测完整的输入串的先天预测。数据集和代码可以在:https://github.com/ictlab-unict/not-with-my-name 中找到。

HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird’s Eye View

  • paper_url: http://arxiv.org/abs/2307.13510
  • repo_url: None
  • paper_authors: Yiming Wu, Ruixiang Li, Zequn Qin, Xinhai Zhao, Xi Li
    for:height-based bird’s eye view (BEV) representation for autonomous drivingmethods:explicitly modeling heights in the BEV space using a self-recursive approachresults:achieves state-of-the-art (SOTA) performance compared to camera-only methods without using extra data like LiDAR
    Abstract Vision-based Bird's Eye View (BEV) representation is an emerging perception formulation for autonomous driving. The core challenge is to construct BEV space with multi-camera features, which is a one-to-many ill-posed problem. Diving into all previous BEV representation generation methods, we found that most of them fall into two types: modeling depths in image views or modeling heights in the BEV space, mostly in an implicit way. In this work, we propose to explicitly model heights in the BEV space, which needs no extra data like LiDAR and can fit arbitrary camera rigs and types compared to modeling depths. Theoretically, we give proof of the equivalence between height-based methods and depth-based methods. Considering the equivalence and some advantages of modeling heights, we propose HeightFormer, which models heights and uncertainties in a self-recursive way. Without any extra data, the proposed HeightFormer could estimate heights in BEV accurately. Benchmark results show that the performance of HeightFormer achieves SOTA compared with those camera-only methods.
    摘要 《bird's eye view(BEV)表示方式是自动驾驶视觉形态的一种emerging概念。核心挑战是在多camera视图中构建BEV空间,这是一个一对多的ILL-posed问题。我们对所有的BEV表示生成方法进行了检查,发现大多数都 fall into两类:在图像视图中模型深度或在BEV空间中模型高度,大多数是通过隐式方式来实现。在这种工作中,我们提议在BEV空间中直接模型高度,无需额外数据如LiDAR,并且可以适应任何相机装置和类型。从理论角度来看,我们证明了高度基于方法和深度基于方法之间的等价性。考虑到等价性和高度模型的一些优点,我们提议HeightFormer,它在自我循环方式中模型高度和不确定性。无需额外数据,提议的HeightFormer可以在BEV空间中估计高度的准确性。 benchmark结果表明,HeightFormer的性能与camera-only方法相比,达到了最高的SOTA水平。》Note: "SOTA" stands for "State of the Art", which means the highest level of performance currently achieved.

NormAUG: Normalization-guided Augmentation for Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.13492
  • repo_url: None
  • paper_authors: Lei Qi, Hongpeng Yang, Yinghuan Shi, Xin Geng
  • for: 提高深度学习模型在指导学习中的性能, Addressing the challenge of domain shift between training and test sets.
  • methods: 提出了一种名为 NormAUG(Normalization-guided Augmentation)的简单 yet effective方法,通过在不同领域的批处理中进行扩充,提高模型的泛化能力。
  • results: 在多个 benchmark datasets 上进行了广泛的实验, validate the effectiveness of our proposed method, and show that it can effectively improve the performance of deep learning models in supervised learning tasks.
    Abstract Deep learning has made significant advancements in supervised learning. However, models trained in this setting often face challenges due to domain shift between training and test sets, resulting in a significant drop in performance during testing. To address this issue, several domain generalization methods have been developed to learn robust and domain-invariant features from multiple training domains that can generalize well to unseen test domains. Data augmentation plays a crucial role in achieving this goal by enhancing the diversity of the training data. In this paper, inspired by the observation that normalizing an image with different statistics generated by different batches with various domains can perturb its feature, we propose a simple yet effective method called NormAUG (Normalization-guided Augmentation). Our method includes two paths: the main path and the auxiliary (augmented) path. During training, the auxiliary path includes multiple sub-paths, each corresponding to batch normalization for a single domain or a random combination of multiple domains. This introduces diverse information at the feature level and improves the generalization of the main path. Moreover, our NormAUG method effectively reduces the existing upper boundary for generalization based on theoretical perspectives. During the test stage, we leverage an ensemble strategy to combine the predictions from the auxiliary path of our model, further boosting performance. Extensive experiments are conducted on multiple benchmark datasets to validate the effectiveness of our proposed method.
    摘要 根据图像不同域生成的不同统计数据的观察,我们提出了一种简单 yet 有效的方法,即 NormAUG(normalization-guided Augmentation)。我们的方法包括两个路径:主路径和辅助(扩展)路径。在训练阶段,辅助路径包括多个子路径,每个子路径对应一个域或一个随机组合多个域的批normalization。这引入了多样化的信息水平,从而提高主路径的泛化性。此外,我们的 NormAUG 方法有效地降低了基于理论上的最大界限,以提高泛化性。在测试阶段,我们利用了一种协同策略,将辅助路径的预测结果 ensemble,进一步提高表现。我们在多个标准 benchmark 数据集上进行了广泛的实验,以验证我们的提议的效果。

Cos R-CNN for Online Few-shot Object Detection

  • paper_url: http://arxiv.org/abs/2307.13485
  • repo_url: None
  • paper_authors: Gratianus Wesley Putra Data, Henry Howard-Jenkins, David Murray, Victor Prisacariu
  • for: 这篇论文旨在提出一个简单的例子基本的几何�CNN模型,用于在网络几何中进行几何物类检测。
  • methods: 这个模型使用了学习比较任务,将未见类别表示为对象的例子图像,并通过它们的相似性进行检测。
  • results: 这个模型在5-way ImageNet几何检测测试 benchmark 上取得了最佳结果,在线上1/5/10-shot情况下高于8/3/1%,并在线上20-way几何VOC中运行所有类别,在新类别上表现最好。
    Abstract We propose Cos R-CNN, a simple exemplar-based R-CNN formulation that is designed for online few-shot object detection. That is, it is able to localise and classify novel object categories in images with few examples without fine-tuning. Cos R-CNN frames detection as a learning-to-compare task: unseen classes are represented as exemplar images, and objects are detected based on their similarity to these exemplars. The cosine-based classification head allows for dynamic adaptation of classification parameters to the exemplar embedding, and encourages the clustering of similar classes in embedding space without the need for manual tuning of distance-metric hyperparameters. This simple formulation achieves best results on the recently proposed 5-way ImageNet few-shot detection benchmark, beating the online 1/5/10-shot scenarios by more than 8/3/1%, as well as performing up to 20% better in online 20-way few-shot VOC across all shots on novel classes.
    摘要 我们提出了Cos R-CNN,一种简单的示例基于的R-CNN形式,用于在线少量示例Object检测。即可以在图像中检测到未经调整的新类别 объек。Cos R-CNN将检测视为学习比较任务,未seen类是用示例图像表示,并基于这些示例图像来检测对象。cosine类型的分类头允许在示例嵌入空间进行动态适应分类参数,并促进类别的嵌入空间减少,不需要手动调整距离度量参数。这种简单的形式在最新的5种ImageNet几shot检测benchmark上达到了最佳结果,在在线1/5/10-shot场景中超过8/3/1%,并在在线20-way几shotVOC中所有陌生类上达到了20%的提升。

Weakly-supervised 3D Pose Transfer with Keypoints

  • paper_url: http://arxiv.org/abs/2307.13459
  • repo_url: https://github.com/jinnan-chen/3d-pose-transfer
  • paper_authors: Jinnan Chen, Chen Li, Gim Hee Lee
  • for: 本研究旨在解决3D姿态质量转移中的三大挑战:缺乏不同人物表演同一个姿态的对应数据集,分离姿态信息和形状信息,将姿态转移应用到不同骨架上。
  • methods: 我们提出了一种新的弱监督键点基于框架,使用不同骨架的逆 kinematics 计算源和目标骨架之间的变换。我们的方法仅需要键点监督,可以应用到不同骨架上,并且具有形状不变的特点,允许提取目标骨架中的姿态信息而不是形状信息。我们还设计了一种自我超频重建来实现自监督的姿态转移,不需要与target和source骨架具有同样的姿态和形状。
  • results: 我们在人体和动物数据集上进行了评估,与状态静的无监督方法相比,我们的方法具有更高的性能,甚至与完全监督方法相当。在更复杂的 Mixamo 数据集上进行测试,我们的方法能够正确地处理具有不同骨架和衣物的骨架。跨数据集评估表明了我们的方法具有强大的总体化能力。
    Abstract The main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies. We thus propose a novel weakly-supervised keypoint-based framework to overcome these difficulties. Specifically, we use a topology-agnostic keypoint detector with inverse kinematics to compute transformations between the source and target meshes. Our method only requires supervision on the keypoints, can be applied to meshes with different topologies and is shape-invariant for the target which allows extraction of pose-only information from the target meshes without transferring shape information. We further design a cycle reconstruction to perform self-supervised pose transfer without the need for ground truth deformed mesh with the same pose and shape as the target and source, respectively. We evaluate our approach on benchmark human and animal datasets, where we achieve superior performance compared to the state-of-the-art unsupervised approaches and even comparable performance with the fully supervised approaches. We test on the more challenging Mixamo dataset to verify our approach's ability in handling meshes with different topologies and complex clothes. Cross-dataset evaluation further shows the strong generalization ability of our approach.
    摘要 主要3D姿态传输挑战包括:1)缺乏不同人物表演同一姿态的对称训练数据;2)分离姿态信息和形状信息于目标网格;3)应用于不同顶点数的网格上。我们因此提出了一种新的弱型监督基点方法来解决这些挑战。我们使用不同顶点数的网格上的 topology-agnostic 基点检测器,并使用 inverse kinematics 计算源和目标网格之间的变换。我们的方法只需要监督基点,可以应用于不同顶点数的网格上,并且具有形状不变的特性,允许从目标网格中提取姿态信息而不是形状信息。我们进一步设计了一种自我监督的循环重建来实现无监督 pose transfer,不需要与target和source网格具有同样的姿态和形状的ground truth扭曲网格。我们在人类和动物数据集上评估了我们的方法,与无监督方法相比,我们达到了更高的性能,甚至与完全监督方法相比具有相似的性能。我们在更加具有挑战性的 Mixamo 数据集上进行了测试,以验证我们的方法可以处理不同顶点数的网格和复杂的衣服。cross-dataset评估还表明了我们的方法具有强大的泛化能力。

An Explainable Model-Agnostic Algorithm for CNN-based Biometrics Verification

  • paper_url: http://arxiv.org/abs/2307.13428
  • repo_url: None
  • paper_authors: Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose M. Buades, Prayag Tiwari, Josef Bigun
  • for: 这 paper 描述了对生物ometric验证设定下进行 Local Interpretable Model-Agnostic Explanations (LIME) AI 方法的适应。
  • methods: 这 paper 使用了对生物ometric验证设定下的两个 CNN 模型(基于 MobileNetv2 和 ResNet50),通过对输入图像的干扰版本的特征向量之间的高 Cosine 相似性来实现解释性。
  • results: 这 paper 实现了对 face biometrics 中的两个 CNN 模型(基于 MobileNetv2 和 ResNet50)的解释性。
    Abstract This paper describes an adaptation of the Local Interpretable Model-Agnostic Explanations (LIME) AI method to operate under a biometric verification setting. LIME was initially proposed for networks with the same output classes used for training, and it employs the softmax probability to determine which regions of the image contribute the most to classification. However, in a verification setting, the classes to be recognized have not been seen during training. In addition, instead of using the softmax output, face descriptors are usually obtained from a layer before the classification layer. The model is adapted to achieve explainability via cosine similarity between feature vectors of perturbated versions of the input image. The method is showcased for face biometrics with two CNN models based on MobileNetv2 and ResNet50.
    摘要

A signal processing interpretation of noise-reduction convolutional neural networks

  • paper_url: http://arxiv.org/abs/2307.13425
  • repo_url: None
  • paper_authors: Luis A. Zavala-Mondragón, Peter H. N. de With, Fons van der Sommen
  • for: 这篇论文旨在提供一种统一的理论框架,用于解释深度卷积神经网络(Encoding-Decoding CNNs)的内部工作机制,并且可以帮助设计更加有效率的新型神经网络架构。
  • methods: 这篇论文使用了深度卷积神经网络的基本原理,以及信号处理领域的基本概念,将其与深度学习领域的研究相结合,以建立一个自 contenido的理论框架。
  • results: 这篇论文通过这种新的理论框架,可以帮助理解深度卷积神经网络的内部工作机制,并且可以用于设计更加有效率的新型神经网络架构。
    Abstract Encoding-decoding CNNs play a central role in data-driven noise reduction and can be found within numerous deep-learning algorithms. However, the development of these CNN architectures is often done in ad-hoc fashion and theoretical underpinnings for important design choices is generally lacking. Up to this moment there are different existing relevant works that strive to explain the internal operation of these CNNs. Still, these ideas are either scattered and/or may require significant expertise to be accessible for a bigger audience. In order to open up this exciting field, this article builds intuition on the theory of deep convolutional framelets and explains diverse ED CNN architectures in a unified theoretical framework. By connecting basic principles from signal processing to the field of deep learning, this self-contained material offers significant guidance for designing robust and efficient novel CNN architectures.
    摘要 encoding-decoding CNNs 在数据驱动噪声reduction中扮演中心角色,可以在多种深度学习算法中找到。然而, develop these CNN architectures 通常是done in ad-hoc fashion,lacking theoretical underpinnings for important design choices。 Until now, there are different existing relevant works that strive to explain the internal operation of these CNNs,but these ideas are either scattered and/or may require significant expertise to be accessible for a bigger audience。 In order to open up this exciting field, this article builds intuition on the theory of deep convolutional framelets and explains diverse ED CNN architectures in a unified theoretical framework。By connecting basic principles from signal processing to the field of deep learning,this self-contained material offers significant guidance for designing robust and efficient novel CNN architectures。Here's the text with the traditional Chinese characters:Encoding-Decoding CNNs 在数据驱动噪音reduction中扮演中心角色,可以在多种深度学习算法中找到。然而,开发这些CNN架构通常是done in ad-hoc fashion,lacking theoretical underpinnings for important design choices。 Until now, there are different existing relevant works that strive to explain the internal operation of these CNNs,but these ideas are either scattered and/or may require significant expertise to be accessible for a bigger audience。 In order to open up this exciting field, this article builds intuition on the theory of deep convolutional framelets and explains diverse ED CNN architectures in a unified theoretical framework。By connecting basic principles from signal processing to the field of deep learning,this self-contained material offers significant guidance for designing robust and efficient novel CNN architectures。

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

  • paper_url: http://arxiv.org/abs/2307.13412
  • repo_url: None
  • paper_authors: Stylianos I. Venieris, Javier Fernandez-Marques, Nicholas D. Lane
  • for: 这个研究是为了提高FPGA上的深度学习运算效率和能效性。
  • methods: 本研究使用了一种叫做”on-the-fly”的方法,通过在 runtime 中预先将矩阵扩展为较大的矩阵,以提高内存系结的效率。此外,研究者还提出了一个自动对应硬件-软件架构的方法,以提高精度和效率的平衡。
  • results: 研究结果显示,提案的框架可以实现2.57倍的效率提升比高性能的GPU设计,并且可以实现3.94倍的高效率数据频谱比较一般的FPGA-based CNN加速器。
    Abstract The unprecedented accuracy of convolutional neural networks (CNNs) across a broad range of AI tasks has led to their widespread deployment in mobile and embedded settings. In a pursuit for high-performance and energy-efficient inference, significant research effort has been invested in the design of FPGA-based CNN accelerators. In this context, single computation engines constitute a popular approach to support diverse CNN modes without the overhead of fabric reconfiguration. Nevertheless, this flexibility often comes with significantly degraded performance on memory-bound layers and resource underutilisation due to the suboptimal mapping of certain layers on the engine's fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. This paper presents unzipFPGA, a novel CNN inference system that counteracts the limitations of existing CNN engines. The proposed framework comprises a novel CNN hardware architecture that introduces a weights generator module that enables the on-chip on-the-fly generation of weights, alleviating the negative impact of limited bandwidth on memory-bound layers. We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair, leading to an improved accuracy-performance balance. Finally, we introduce an input selective processing element (PE) design that balances the load between PEs in suboptimally mapped layers. The proposed framework yields hardware designs that achieve an average of 2.57x performance efficiency gain over highly optimised GPU designs for the same power constraints and up to 3.94x higher performance density over a diverse range of state-of-the-art FPGA-based CNN accelerators.
    摘要 “ convolutional neural networks (CNNs) 在各种人工智能任务中的无前例精度,使得它们在 mobil 和嵌入式设定中广泛应用。为了实现高性能和能效的推察,研究人员对 FPGA 基于 CNN 加速器的设计进行了很大的投入。在这个上下文中,单一 computation engine 成为了广泛使用的方法,以支持多种 CNN 模式,而不需要组织预设的组件重新配置。然而,这种灵活性通常会带来内存维护层的性能下降和资源处理不当用,从而导致某些层的对应不佳。在这个研究中,我们调查了这种问题的影响,并提出了一个 novel CNN 推察系统,称为 unzipFPGA。这个架构包括一个新的 CNN 硬件架构,其中包括一个可以在 run time 中实现 weights 的生成 module,以解决由限制的带宽所导致的负面影响。我们还将 unzipFPGA 扩展到一个自动化的硬件感知方法,以适应目标 CNN-device 组合,从而获得更好的精度-性能平衡。最后,我们引入了一个输入选择处理元素 (PE) 设计,以对不同的 PE 进行负载均衡。提案的架构可以实现与高度优化的 GPU 设计相同的性能效率,同时具有更高的性能密度。”

Scoring Cycling Environments Perceived Safety using Pairwise Image Comparisons

  • paper_url: http://arxiv.org/abs/2307.13397
  • repo_url: https://github.com/mncosta/scoring_pairwise
  • paper_authors: Miguel Costa, Manuel Marques, Felix Wilhelm Siebert, Carlos Lima Azevedo, Filipe Moura
  • for: 这个研究旨在探讨如何分析和理解人们对自行车安全性的感受,以及城市环境和自行车情况对这种感受的影响。
  • methods: 这个研究使用了对实际图像的评估,让受试者选择他们认为更安全的自行车环境。研究还使用了多种对比方法来评估自行车环境的安全性。
  • results: 研究发现,城市环境和自行车情况对人们对自行车安全性的感受产生了重要影响。这种方法可以帮助城市规划师设计更有效的措施,以促进自行车模式的普及。此外,这种方法可以continuously评估自行车环境的改进,并快速评估措施的效果。
    Abstract Today, many cities seek to transition to more sustainable transportation systems. Cycling is critical in this transition for shorter trips, including first-and-last-mile links to transit. Yet, if individuals perceive cycling as unsafe, they will not cycle and choose other transportation modes. This study presents a novel approach to identifying how the perception of cycling safety can be analyzed and understood and the impact of the built environment and cycling contexts on such perceptions. We base our work on other perception studies and pairwise comparisons, using real-world images to survey respondents. We repeatedly show respondents two road environments and ask them to select the one they perceive as safer for cycling. We compare several methods capable of rating cycling environments from pairwise comparisons and classify cycling environments perceived as safe or unsafe. Urban planning can use this score to improve interventions' effectiveness and improve cycling promotion campaigns. Furthermore, this approach facilitates the continuous assessment of changing cycling environments, allows for a short-term evaluation of measures, and is efficiently deployed in different locations or contexts.
    摘要

Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines

  • paper_url: http://arxiv.org/abs/2307.13375
  • repo_url: https://github.com/alexanderjaus/atlasdataset
  • paper_authors: Alexander Jaus, Constantin Seibold, Kelsey Hermann, Alexandra Walter, Kristina Giske, Johannes Haubold, Jens Kleesiek, Rainer Stiefelhagen
  • for: 本研究开发了一种自动生成医学影像分割数据集的方法,使用nnU-Net基于pseudo标签和医学指导pseudo标签纠正。
  • methods: 本方法首先使用nnU-Net进行pseudo标签生成,然后通过结合多个碎片化知识库,生成了一个涵盖整个人体CT扫描图的142个小块级标签数据集,并得到了专家认可。
  • results: 我们的方法不需要手动标注数据,并在BTCV数据集上达到85%的dice分数。此外,我们还对数据集进行了可扩展的自动检查和高质量专家检查,以确保数据集的可靠性和医学有效性。
    Abstract In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which experts have approved. Our proposed procedure does not rely on manual annotation during the label aggregation stage. We examine its plausibility and usefulness using three complementary checks: Human expert evaluation which approved the dataset, a Deep Learning usefulness benchmark on the BTCV dataset in which we achieve 85% dice score without using its training dataset, and medical validity checks. This evaluation procedure combines scalable automated checks with labor-intensive high-quality expert checks. Besides the dataset, we release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
    摘要 在这个研究中,我们提出了一种方法用于自动生成医学影像分割数据集,该方法包括基于nnU-Net的假标注和骨化假标注纠正。通过将各种碎片化知识库集成起来,我们生成了一个整体CT扫描图像的数据集,包含142个块级标签,对533个卷积提供了全面的解剖学覆盖。我们的提posed方法不需要手动标注 durante el etiquetado de etiquetas stage。我们使用三种 complementary checks to evaluate the plausibility and usefulness of our method: expert evaluation by human, deep learning usefulness benchmark on the BTCV dataset, and medical validity checks. This evaluation procedure combines scalable automated checks with labor-intensive high-quality expert checks. In addition to the dataset, we release our trained unified anatomical segmentation model, capable of predicting 142 anatomical structures on CT data.

Kefa: A Knowledge Enhanced and Fine-grained Aligned Speaker for Navigation Instruction Generation

  • paper_url: http://arxiv.org/abs/2307.13368
  • repo_url: https://github.com/haitianzeng/KEFA
  • paper_authors: Haitian Zeng, Xiaohan Wang, Wenguan Wang, Yi Yang
  • for: 提高视力语言导航中的导航指令生成性能
  • methods: 提出了知识更新模块和适应时间Alignment方法,以强化特征表示和精细对齐生成的指令和观察序列
  • results: 在R2R和UrbanWalk数据集上实现了视力语言导航中的导航指令生成性能的状态作者
    Abstract We introduce a novel speaker model \textsc{Kefa} for navigation instruction generation. The existing speaker models in Vision-and-Language Navigation suffer from the large domain gap of vision features between different environments and insufficient temporal grounding capability. To address the challenges, we propose a Knowledge Refinement Module to enhance the feature representation with external knowledge facts, and an Adaptive Temporal Alignment method to enforce fine-grained alignment between the generated instructions and the observation sequences. Moreover, we propose a new metric SPICE-D for navigation instruction evaluation, which is aware of the correctness of direction phrases. The experimental results on R2R and UrbanWalk datasets show that the proposed KEFA speaker achieves state-of-the-art instruction generation performance for both indoor and outdoor scenes.
    摘要 我们介绍了一种新的说话模型,即KEFA模型,用于生成导航指令。现有的视语导航中的说话模型受到不同环境中视觉特征的域外差和时间固定不足的挑战。为解决这些挑战,我们提出了知识精化模块,用于增强特征表示,以及适应时间对齐方法,用于确保生成的指令和观察序列之间的细腻对齐。此外,我们提出了一个新的评价指标SPICE-D,用于评价导航指令的正确性。实验结果表明,我们提出的KEFA说话模型在R2R和UrbanWalk数据集上实现了导航指令生成性能的州际之最。

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding

  • paper_url: http://arxiv.org/abs/2307.13363
  • repo_url: None
  • paper_authors: Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao
  • for: 本研究旨在使用自由语言描述将3D点云中的目标对象Localize。
  • methods: 我们提出了一种关注关系的一stage框架,名为3D相对位置感知网络(3DRP-Net),可以有效地捕捉对象之间的相对空间关系并增强对象特征。
  • results: 我们的方法在三个 benchmark(ScanRefer和Nr3D/Sr3D)中的总体性能比所有现状最佳方法高。
    Abstract 3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description. Typically, the sentences describing the target object tend to provide information about its relative relation between other objects and its position within the whole scene. In this work, we propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3DRP-Net), which can effectively capture the relative spatial relationships between objects and enhance object attributes. Specifically, 1) we propose a 3D Relative Position Multi-head Attention (3DRP-MA) module to analyze relative relations from different directions in the context of object pairs, which helps the model to focus on the specific object relations mentioned in the sentence. 2) We designed a soft-labeling strategy to alleviate the spatial ambiguity caused by redundant points, which further stabilizes and enhances the learning process through a constant and discriminative distribution. Extensive experiments conducted on three benchmarks (i.e., ScanRefer and Nr3D/Sr3D) demonstrate that our method outperforms all the state-of-the-art methods in general. The source code will be released on GitHub.
    摘要 三维视觉定位是目标对象在三维点云中的地址定位,通常通过自由形式的语言描述。这些句子通常提供对目标对象的相对关系和场景中的位置信息。在这种工作中,我们提议一种关注相对关系的一stage框架,名为三维相对位置感知网络(3DRP-Net),可以有效地捕捉对象之间的相对空间关系并增强对象特征。Specifically,我们提出了一个三维相对位置多头注意模块(3DRP-MA)来分析对象之间的相对关系,从不同方向上在对象对中分析这些关系,以帮助模型专注于文本中提到的特定对象关系。另外,我们设计了一种软标注策略,以解决由重复点所引起的空间歧义,从而使模型更加稳定和精准。我们在三个 benchmark(即ScanRefer和Nr3D/Sr3D)进行了广泛的实验,结果显示,我们的方法在总体上超过了所有现有的方法。我们将代码发布到 GitHub。

Of Mice and Pose: 2D Mouse Pose Estimation from Unlabelled Data and Synthetic Prior

  • paper_url: http://arxiv.org/abs/2307.13361
  • repo_url: None
  • paper_authors: Jose Sosa, Sharn Perry, Jane Alty, David Hogg
  • for: 这paper是为了解决动物Behavior tracking和量度的问题,尤其是在ecology, biology和neuroscience等领域,因为大量的动物记录数据已经被生成,但是一些计算机视觉技术无法利用这些数据,因为缺乏标注。
  • methods: 我们提出了一种使用无标注图像来估算鼠体姿 pose的方法,基于最近的自动学习人体 pose估算方法,使用单张图像和一组无对应的2Dpose图像,通过GAN框架来生成empirical prior of 2Dpose。我们适应了这种方法到鼠的肢体结构,并生成了synthetic 3D鼠模型来生成empirical prior。
  • results: 我们在一个新的鼠视频数据集上进行了实验,并与手动获取的ground truth进行比较。我们还与一种已有的supervised state-of-the-art方法进行比较,并显示了Promising results,即使没有paired training data。此外,我们还使用了一个马图像集来展示这种设置的潜在应用性。
    Abstract Numerous fields, such as ecology, biology, and neuroscience, use animal recordings to track and measure animal behaviour. Over time, a significant volume of such data has been produced, but some computer vision techniques cannot explore it due to the lack of annotations. To address this, we propose an approach for estimating 2D mouse body pose from unlabelled images using a synthetically generated empirical pose prior. Our proposal is based on a recent self-supervised method for estimating 2D human pose that uses single images and a set of unpaired typical 2D poses within a GAN framework. We adapt this method to the limb structure of the mouse and generate the empirical prior of 2D poses from a synthetic 3D mouse model, thereby avoiding manual annotation. In experiments on a new mouse video dataset, we evaluate the performance of the approach by comparing pose predictions to a manually obtained ground truth. We also compare predictions with those from a supervised state-of-the-art method for animal pose estimation. The latter evaluation indicates promising results despite the lack of paired training data. Finally, qualitative results using a dataset of horse images show the potential of the setting to adapt to other animal species.
    摘要 许多领域,如生态学、生物学和神经科学,通过动物记录跟踪和测量动物行为。随着时间的推移,这些数据的量已经很大,但一些计算机视觉技术无法探索它们,因为缺乏注释。为解决这个问题,我们提出了一种方法,通过使用生成的经验性姿势先验来估算无注释图像中的2D鼠体姿势。我们的提议基于最近的自动适应人体 pose 估算方法,该方法使用单张图像和一组无对应的2D姿势集来生成一个GAN框架中的经验性姿势先验。我们适应了鼠的四肢结构,并生成了一个synthetic 3D鼠模型中的经验性姿势先验,从而避免手动注释。在一个新的鼠视频数据集上进行了实验,我们评估了方法的性能,并与一种已有的supervised状态的动物pose估算方法进行比较。后者的评估结果表明,我们的方法在缺乏对应数据的情况下可以获得承诺的结果。此外,使用一个马图像集来表征其他动物种类的可能性的质性结果也提供了。

Prior Based Online Lane Graph Extraction from Single Onboard Camera Image

  • paper_url: http://arxiv.org/abs/2307.13344
  • repo_url: https://github.com/ybarancan/lanewae
  • paper_authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool
  • for: 本研究旨在提供一种在线 Bird’s-Eye-View 车道图生成方法,以便普遍和可靠地自动驾驶。
  • methods: 该方法使用优先信息提高估计质量。优先信息通过一种基于 transformer 的 Wasserstein 自编码器从数据集提取。然后,自编码器用于提高初始车道图估计。这是通过 latent 空间向量优化来实现的,该优化劝说车道图估计与优先信息保持一致。
  • results: 对 NuScenes 和 Argoverse 两个标准数据集进行测试,结果显示提议方法与现有方法相比有显著改善。
    Abstract The local road network information is essential for autonomous navigation. This information is commonly obtained from offline HD-Maps in terms of lane graphs. However, the local road network at a given moment can be drastically different than the one given in the offline maps; due to construction works, accidents etc. Moreover, the autonomous vehicle might be at a location not covered in the offline HD-Map. Thus, online estimation of the lane graph is crucial for widespread and reliable autonomous navigation. In this work, we tackle online Bird's-Eye-View lane graph extraction from a single onboard camera image. We propose to use prior information to increase quality of the estimations. The prior is extracted from the dataset through a transformer based Wasserstein Autoencoder. The autoencoder is then used to enhance the initial lane graph estimates. This is done through optimization of the latent space vector. The optimization encourages the lane graph estimation to be logical by discouraging it to diverge from the prior distribution. We test the method on two benchmark datasets, NuScenes and Argoverse. The results show that the proposed method significantly improves the performance compared to state-of-the-art methods.
    摘要 地方路网信息是自动导航的关键。这些信息通常来自于离线高级地图,表示为几何图。然而,当地方路网在给定时刻发生重大变化,比如建筑工程或意外等,那么离线地图中提供的信息可能不准确。此外,自动汽车可能位于离线地图中没有覆盖的位置。因此,在线计算路网图是自动导航的关键。在这种情况下,我们解决了在单个摄像头图像上进行在线鸟瞰视图lane图Estimation。我们提议使用先前信息来提高估计质量。这些先前信息通过一种基于trasnformer的 Wasserstein Autoencoder提取于dataset中。然后,这种autoencoder用于提高初始lane图估计。这是通过对潜在空间向量进行优化来实现的,这种优化抑制了lane图估计与先前分布的偏离。我们对NuScenes和Argoverse两个标准数据集进行测试,结果表明,我们提posed方法与当前最佳方法相比,有 significan improvement。

Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks

  • paper_url: http://arxiv.org/abs/2307.13337
  • repo_url: None
  • paper_authors: Cheeun Hong, Kyoung Mu Lee
    for:* This paper aims to address the distribution mismatch problem in image super-resolution (SR) networks, which can lead to severe accuracy loss when using low-bit quantization.methods:* The proposed method, called ODM, uses a new quantization-aware training framework that regularizes the variance in features during training to reduce the distribution mismatch problem.* ODM also introduces distribution offsets to layers with a significant mismatch, which either scales or shifts channel-wise features.results:* ODM effectively outperforms existing SR quantization approaches with similar or fewer computations, demonstrating the importance of reducing the distribution mismatch problem.
    Abstract Quantization is a promising approach to reduce the high computational complexity of image super-resolution (SR) networks. However, compared to high-level tasks like image classification, low-bit quantization leads to severe accuracy loss in SR networks. This is because feature distributions of SR networks are significantly divergent for each channel or input image, and is thus difficult to determine a quantization range. Existing SR quantization works approach this distribution mismatch problem by dynamically adapting quantization ranges to the variant distributions during test time. However, such dynamic adaptation incurs additional computational costs that limit the benefits of quantization. Instead, we propose a new quantization-aware training framework that effectively Overcomes the Distribution Mismatch problem in SR networks without the need for dynamic adaptation. Intuitively, the mismatch can be reduced by directly regularizing the variance in features during training. However, we observe that variance regularization can collide with the reconstruction loss during training and adversely impact SR accuracy. Thus, we avoid the conflict between two losses by regularizing the variance only when the gradients of variance regularization are cooperative with that of reconstruction. Additionally, to further reduce the distribution mismatch, we introduce distribution offsets to layers with a significant mismatch, which either scales or shifts channel-wise features. Our proposed algorithm, called ODM, effectively reduces the mismatch in distributions with minimal computational overhead. Experimental results show that ODM effectively outperforms existing SR quantization approaches with similar or fewer computations, demonstrating the importance of reducing the distribution mismatch problem. Our code is available at https://github.com/Cheeun/ODM.
    摘要 量化是一种有前途的方法,可以降低图像超分辨率网络的计算复杂性。然而,相比高级任务如图像分类,低位数量化会导致SR网络的准确性丢失。这是因为SR网络的特征分布非常分散,难以确定量化范围。现有的SR量化方法会在测试时动态适应量化范围,以适应变化的特征分布。然而,这种动态适应带来额外的计算成本,限制了量化的优点。相反,我们提出了一个新的量化意识训练框架,可以有效地超越分布匹配问题。我们发现,可以在训练时直接规范特征变量,以减少分布匹配问题。然而,我们发现,变量规范可能会与重建损失冲突,影响SR准确性。因此,我们避免了这两个损失之间的冲突,通过在重建损失的梯度下规范变量。此外,为了进一步减少分布匹配问题,我们引入了分布偏移,以调整不同特征的分布。我们提出的ODM算法,可以有效地减少分布匹配问题,而且计算成本很低。实验结果表明,ODM可以有效地超越现有的SR量化方法,并且需要相同或更少的计算资源,这说明了分布匹配问题的重要性。我们的代码可以在https://github.com/Cheeun/ODM上获取。

Unmasking Anomalies in Road-Scene Segmentation

  • paper_url: http://arxiv.org/abs/2307.13316
  • repo_url: https://github.com/shyam671/Mask2Anomaly-Unmasking-Anomalies-in-Road-Scene-Segmentation
  • paper_authors: Shyam Nandan Rai, Fabio Cermelli, Dario Fontanel, Carlo Masone, Barbara Caputo
  • for: 这篇论文主要目标是提高道路Scene anomaly detection的精度。
  • methods: 该方法基于mask classification的思想,并包括全球封装注意力模块、mask contrastive learning和面精度提高等技术新特性。
  • results: 该方法在多个benchmark上达到了新的状态方法,特别是在每个像素和组件级别评估中减少了false positives率60%。
    Abstract Anomaly segmentation is a critical task for driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning individually about each pixel without considering their contextual semantics results in high uncertainty around the objects' boundaries and numerous false positives. We propose a paradigm change by shifting from a per-pixel classification to a mask classification. Our mask-based method, Mask2Anomaly, demonstrates the feasibility of integrating an anomaly detection method in a mask-classification architecture. Mask2Anomaly includes several technical novelties that are designed to improve the detection of anomalies in masks: i) a global masked attention module to focus individually on the foreground and background regions; ii) a mask contrastive learning that maximizes the margin between an anomaly and known classes; and iii) a mask refinement solution to reduce false positives. Mask2Anomaly achieves new state-of-the-art results across a range of benchmarks, both in the per-pixel and component-level evaluations. In particular, Mask2Anomaly reduces the average false positives rate by 60% wrt the previous state-of-the-art. Github page: https://github.com/shyam671/Mask2Anomaly-Unmasking-Anomalies-in-Road-Scene-Segmentation.
    摘要 traditional driving application中的异常分割问题是一个关键任务,通常是以每个像素为单位进行分类的。然而,不考虑每个像素的语义上下文会导致对象边界的高度不确定性和多个假阳性。我们提议一种思路转变,即从每个像素分类转变为Mask分类。我们的Mask2异常方法在Mask分类架构中实现了异常检测方法的集成。Mask2异常包括了一些技术创新,用于改进异常检测在Mask中的精度:1. 全局掩码注意力模块,用于对前景和背景区域进行各自焦点处理。2. 掩码对比学习,以提高异常和已知类之间的边界差距。3. 掩码修正解决方案,以减少假阳性。Mask2异常实现了新的状态anner-of-the-art结果,并在像素级和组件级评估中都达到了新的高度。具体来说,Mask2异常相比前一个状态anner-of-the-art,减少了平均假阳性率60%。GitHub页面:https://github.com/shyam671/Mask2Anomaly-Unmasking-Anomalies-in-Road-Scene-Segmentation。

Mitigating Cross-client GANs-based Attack in Federated Learning

  • paper_url: http://arxiv.org/abs/2307.13314
  • repo_url: None
  • paper_authors: Hong Huang, Xinyu Lei, Tao Xiang
  • For: The paper aims to improve the security of federated learning (FL) schemes by mitigating the cross-client generative adversarial networks (GANs) attack, which can reconstruct samples from other clients.* Methods: The paper proposes a technique called Federated Ensemble Data-free Knowledge Distillation (Fed-EDKD) to resist the C-GANs attack. Fed-EDKD involves each client submitting a local model to the server for obtaining an ensemble global model, and then using data-free knowledge distillation techniques to transfer knowledge from the ensemble global model to a compressed model.* Results: The experimental results demonstrate that Fed-EDKD significantly mitigates the C-GANs attack while only incurring a slight accuracy degradation of FL.Here are the three key points in Simplified Chinese text:* For: 该论文目标是提高联合学习(FL)方案的安全性,防止跨客户端生成对抗网络(GANs)攻击,该攻击可以从其他客户端中重建样本。* Methods: 该论文提出了联合集成数据自由知识传播技术(Fed-EDKD)来防止GANs攻击。Fed-EDKD方法是每个客户端将本地模型提交到服务器,从服务器获取ensemble全球模型,然后使用数据自由知识传播技术将ensemble全球模型中的知识传播到压缩模型。* Results: 实验结果表明,Fed-EDKD有效防止GANs攻击,同时仅带来FL的精度下降。
    Abstract Machine learning makes multimedia data (e.g., images) more attractive, however, multimedia data is usually distributed and privacy sensitive. Multiple distributed multimedia clients can resort to federated learning (FL) to jointly learn a global shared model without requiring to share their private samples with any third-party entities. In this paper, we show that FL suffers from the cross-client generative adversarial networks (GANs)-based (C-GANs) attack, in which a malicious client (i.e., adversary) can reconstruct samples with the same distribution as the training samples from other clients (i.e., victims). Since a benign client's data can be leaked to the adversary, this attack brings the risk of local data leakage for clients in many security-critical FL applications. Thus, we propose Fed-EDKD (i.e., Federated Ensemble Data-free Knowledge Distillation) technique to improve the current popular FL schemes to resist C-GANs attack. In Fed-EDKD, each client submits a local model to the server for obtaining an ensemble global model. Then, to avoid model expansion, Fed-EDKD adopts data-free knowledge distillation techniques to transfer knowledge from the ensemble global model to a compressed model. By this way, Fed-EDKD reduces the adversary's control capability over the global model, so Fed-EDKD can effectively mitigate C-GANs attack. Finally, the experimental results demonstrate that Fed-EDKD significantly mitigates C-GANs attack while only incurring a slight accuracy degradation of FL.
    摘要 机器学习使 multimedia 数据更加吸引人,然而 multimedia 数据通常是分布式并且敏感。多个分布式 multimedia 客户可以使用联邦学习(FL)来共同学习全局共享模型,而不需要将私人样本分享给任何第三方机构。在这篇论文中,我们表明了 FL 受到跨客户生成 adversarial networks(GANs) Attack,在这种攻击中,一个邪恶客户(即敌对者)可以从其他客户(即受害者)中重建样本的分布。由于benign客户的数据可以被敌对者泄露,这种攻击可能导致客户端的本地数据泄露。因此,我们提出了 Fed-EDKD(即联邦ensemble数据free知识distillation)技术,以提高当前流行的 FL 方案,抵御 C-GANs 攻击。在 Fed-EDKD 中,每个客户提交本地模型到服务器,以获取ensemble全局模型。然后,为了避免模型扩展,Fed-EDKD 采用数据free知识distillation技术,将知识从ensemble全局模型传递到压缩模型。通过这种方式,Fed-EDKD 降低了敌对者对全局模型的控制能力,因此可以有效抵御 C-GANs 攻击。最后,实验结果表明,Fed-EDKD 可以有效抵御 C-GANs 攻击,仅受到轻度的 FL 减少。

CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer

  • paper_url: http://arxiv.org/abs/2307.13310
  • repo_url: None
  • paper_authors: Zhiwen Shao, Yuchen Su, Yong Zhou, Fanrong Meng, Hancheng Zhu, Bing Liu, Rui Yao
  • for: 提出一种基于Contour Transformer的自然语言场景文本检测方法,以提高文本检测精度和效率。
  • methods: 使用Contour Initialization Module生成初始文本轮廓,并采用Contour Refinement Module进行反复调整文本轮廓,以 capture contextual information和进行进度性轮廓变换。还采用Adaptive Training Strategy和Re-score Mechanism等技术来提高模型的性能。
  • results: 经过EXTensive experiments on four challenging datasets,CT-Net表现出了较高的准确率和效率,比如CT-Net在CTW1500和Total-Text datasets上的F-measure分别达到了86.1和87.8。
    Abstract Contour based scene text detection methods have rapidly developed recently, but still suffer from inaccurate frontend contour initialization, multi-stage error accumulation, or deficient local information aggregation. To tackle these limitations, we propose a novel arbitrary-shaped scene text detection framework named CT-Net by progressive contour regression with contour transformers. Specifically, we first employ a contour initialization module that generates coarse text contours without any post-processing. Then, we adopt contour refinement modules to adaptively refine text contours in an iterative manner, which are beneficial for context information capturing and progressive global contour deformation. Besides, we propose an adaptive training strategy to enable the contour transformers to learn more potential deformation paths, and introduce a re-score mechanism that can effectively suppress false positives. Extensive experiments are conducted on four challenging datasets, which demonstrate the accuracy and efficiency of our CT-Net over state-of-the-art methods. Particularly, CT-Net achieves F-measure of 86.1 at 11.2 frames per second (FPS) and F-measure of 87.8 at 10.1 FPS for CTW1500 and Total-Text datasets, respectively.
    摘要 <>Translate the given text into Simplified Chinese.<>近期,基于 kontour 的Scene文本检测方法有很大的发展,但仍然受到初始 kontour 的不准确、多Stage 的错误积累以及地方信息的不足等限制。为了解决这些局限性,我们提出了一种新的arbitrary-shaped Scene文本检测框架,即CT-Net,通过进行进度ive contour regression with contour transformers。具体来说,我们首先采用一种contour initialization module,该模块可以生成不需要任何后处理的粗糙文本 kontour。然后,我们采用 contour refinement module,该模块可以在循环方式下进行文本 kontour 的细化,以捕捉更多的上下文信息并进行进度ive global kontour 的变换。此外,我们还提出了一种适应性训练策略,使得 kontour transformers 可以学习更多的可能的变换路径,并引入了一种重新分配机制,可以有效地降低假阳性。我们在四个挑战性 datasets 上进行了广泛的实验,结果表明 CT-Net 的准确率和效率比现有方法高。特别是,CT-Net 在 CTW1500 和 Total-Text datasets 上 achieved F-measure of 86.1 at 11.2 frames per second (FPS) and F-measure of 87.8 at 10.1 FPS, respectively.

Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception

  • paper_url: http://arxiv.org/abs/2307.13300
  • repo_url: None
  • paper_authors: Chuanyu Luo, Nuo Cheng, Sikun Ma, Jun Xiang, Xiaohan Li, Shengguang Lei, Pu Li
  • for: The paper is written for improving the performance of deep learning models for 3D environment perception, specifically by proposing a novel local feature descriptor called mini-PointNetPlus.
  • methods: The paper uses pillarization/voxelization methods to convert point cloud data into pillars/voxels, and then applies a 2D/3D convolutional neural network (CNN) to process the data. The proposed descriptor, mini-PointNetPlus, separately projects the data points to individual features, leading to a permutation invariant and fully utilizing the features.
  • results: The proposed descriptor demonstrates a considerable performance improvement for 3D perception compared to the pioneer work PointNet, as proven in experiments.
    Abstract Common deep learning models for 3D environment perception often use pillarization/voxelization methods to convert point cloud data into pillars/voxels and then process it with a 2D/3D convolutional neural network (CNN). The pioneer work PointNet has been widely applied as a local feature descriptor, a fundamental component in deep learning models for 3D perception, to extract features of a point cloud. This is achieved by using a symmetric max-pooling operator which provides unique pillar/voxel features. However, by ignoring most of the points, the max-pooling operator causes an information loss, which reduces the model performance. To address this issue, we propose a novel local feature descriptor, mini-PointNetPlus, as an alternative for plug-and-play to PointNet. Our basic idea is to separately project the data points to the individual features considered, each leading to a permutation invariant. Thus, the proposed descriptor transforms an unordered point cloud to a stable order. The vanilla PointNet is proved to be a special case of our mini-PointNetPlus. Due to fully utilizing the features by the proposed descriptor, we demonstrate in experiment a considerable performance improvement for 3D perception.
    摘要 常用的深度学习模型 для 3D 环境识别常使用柱化/体积化方法将点云数据转换为柱/体积,然后使用 2D/3D 卷积神经网络(CNN)进行处理。点云网络(PointNet)是深度学习模型中的一个开创性的工作,广泛应用于当地特征描述器,用于提取点云特征。这是通过使用对称的最大汇聚操作来实现的,该操作提供了唯一的柱/体积特征。然而,由于忽略大多数点,最大汇聚操作会导致信息损失,从而降低模型性能。为解决这个问题,我们提出了一种新的本地特征描述器,mini-PointNetPlus,作为PointNet的替换。我们的基本想法是分别将数据点 proyect 到各自的特征上,每个特征带来一种排序不变的变换。因此,我们的描述器将无序点云转换为稳定的排序。vanilla PointNet 被证明是 mini-PointNetPlus 的特殊情况。由于完全利用特征,我们在实验中证明了使用我们的描述器可以获得3D 识别的显著性能提升。

High-Resolution Volumetric Reconstruction for Clothed Humans

  • paper_url: http://arxiv.org/abs/2307.13282
  • repo_url: None
  • paper_authors: Sicong Tang, Guangyuan Wang, Qing Ran, Lingzhi Li, Li Shen, Ping Tan
  • for: 重建人体形象 from 少量RGB图像
  • methods: 使用volume representation,3D convolution,coarse-to-fine strategy,voxel culling,subspace sparse convolution
  • results: 比state-of-the-art方法减少mean point-to-surface(P2S)精度 more than 50%,实现约2mm的准确性,并且图像从我们的文本模型中得到更高的PSNR值
    Abstract We present a novel method for reconstructing clothed humans from a sparse set of, e.g., 1 to 6 RGB images. Despite impressive results from recent works employing deep implicit representation, we revisit the volumetric approach and demonstrate that better performance can be achieved with proper system design. The volumetric representation offers significant advantages in leveraging 3D spatial context through 3D convolutions, and the notorious quantization error is largely negligible with a reasonably large yet affordable volume resolution, e.g., 512. To handle memory and computation costs, we propose a sophisticated coarse-to-fine strategy with voxel culling and subspace sparse convolution. Our method starts with a discretized visual hull to compute a coarse shape and then focuses on a narrow band nearby the coarse shape for refinement. Once the shape is reconstructed, we adopt an image-based rendering approach, which computes the colors of surface points by blending input images with learned weights. Extensive experimental results show that our method significantly reduces the mean point-to-surface (P2S) precision of state-of-the-art methods by more than 50% to achieve approximately 2mm accuracy with a 512 volume resolution. Additionally, images rendered from our textured model achieve a higher peak signal-to-noise ratio (PSNR) compared to state-of-the-art methods.
    摘要 我们提出了一种新的方法,用于从稀疏的RGB图像集(例如1-6张)中重建披露人体。尽管最近的研究已经取得了很好的成果,我们还是返回到了Volume representation的方法,并证明了更好的性能可以通过合适的系统设计实现。Volume representation具有利用3D空间上下文的3D卷积的优势,而且量化误差在合理的卷积分辨率(例如512)下是极其忽略不起的。为了处理内存和计算成本,我们提议了一种复杂的粗化-细化策略,包括voxel culling和子空间稀疏卷积。我们的方法首先使用离散的视觉封顶来计算粗略的形状,然后将注意力集中在粗略形状附近进行细化。一旦形状重建完成,我们采用了基于图像的渲染方法,该方法通过权重混合输入图像来计算表面点的颜色。我们的实验结果表明,我们的方法可以在512卷积分辨率下将平均点到表面精度(P2S)降低至少于50%,并且图像从我们的纹理模型中获得的PSNR值高于状态艺术方法。

GaitFormer: Revisiting Intrinsic Periodicity for Gait Recognition

  • paper_url: http://arxiv.org/abs/2307.13259
  • repo_url: None
  • paper_authors: Qian Wu, Ruixuan Xiao, Kaixin Xu, Jingcheng Ni, Boxun Li, Ziyao Xu
  • for: 本研究旨在提高人体步态识别精度,通过分析视频水平人体阴影,而不是仅仅依靠外观信息。
  • methods: 本研究提出了一种插件化策略,名为时间周期对齐策略(TPA),该策略利用人体步态序列中的周期性和细致的时间相关性,以提高识别性能。TPA策略包括两个关键组件:一是适应 Fourier-transform位编码(AFPE),该组件可适应地将特征和整数时钟信号转换为敏感于周期步态的嵌入。二是时间聚合模块(TAM),该组件可分解嵌入为趋势和季节性组分,并提取有用的时间相关性,以识别主要组分,而排除杂音噪声。
  • results: 我们基于TPA策略提出了一种简单有效的基eline方法,并在三个常用的公共数据集(CASIA-B、OU-MVLP、GREW)上进行了广泛的实验。结果表明,我们的提议方法在多个benchmark测试中达到了当前最佳性能。
    Abstract Gait recognition aims to distinguish different walking patterns by analyzing video-level human silhouettes, rather than relying on appearance information. Previous research on gait recognition has primarily focused on extracting local or global spatial-temporal representations, while overlooking the intrinsic periodic features of gait sequences, which, when fully utilized, can significantly enhance performance. In this work, we propose a plug-and-play strategy, called Temporal Periodic Alignment (TPA), which leverages the periodic nature and fine-grained temporal dependencies of gait patterns. The TPA strategy comprises two key components. The first component is Adaptive Fourier-transform Position Encoding (AFPE), which adaptively converts features and discrete-time signals into embeddings that are sensitive to periodic walking patterns. The second component is the Temporal Aggregation Module (TAM), which separates embeddings into trend and seasonal components, and extracts meaningful temporal correlations to identify primary components, while filtering out random noise. We present a simple and effective baseline method for gait recognition, based on the TPA strategy. Extensive experiments conducted on three popular public datasets (CASIA-B, OU-MVLP, and GREW) demonstrate that our proposed method achieves state-of-the-art performance on multiple benchmark tests.
    摘要 走姿识别目标是通过分析视频级别的人体擦抹图来分辨不同的步态模式,而不是仅仅依靠外观信息。过去的研究中,大多数关于走姿识别的研究都是提取局部或全局的时空特征,而忽略了走姿序列中的自然 périodic 特征,这些特征可以在完全利用时,可以帮助提高性能。在这种工作中,我们提出了一种插件式策略,即时间周期对齐策略(TPA),该策略利用走姿序列中的自然时间周期特征,以及步态模式中的细致时间相关性,从而增强识别性。TPA策略包括两个关键组件。首先是适应 Fourier 变换位置编码(AFPE),该组件可以将特征和离散时间信号转换成敏感于走姿序列时间周期特征的嵌入。其次是时间聚合模块(TAM),该模块可以将嵌入分解成趋势和季节性组件,并提取有用的时间相关性,以识别主要组件,同时滤除随机噪声。我们提出了一种简单而有效的基线方法,基于 TPA 策略,并在三个流行的公共数据集(CASIA-B、OU-MVLP 和 GREW)上进行了广泛的实验,结果表明,我们的提出的方法在多个benchmark测试中达到了当前领域的状态的最佳性能。

Conditional Cross Attention Network for Multi-Space Embedding without Entanglement in Only a SINGLE Network

  • paper_url: http://arxiv.org/abs/2307.13254
  • repo_url: None
  • paper_authors: Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, Shunghyun Choi, Yeong Hyeon Gu
  • for: 这个研究的目的是创建一个高效的单一标签物预测模型,以应对实际情况中的多个具体特征(如形状、颜色、长度等)。
  • methods: 我们提出了一个名为 Conditional Cross-Attention Network 的方法,它可以将多个具体特征转换为分开的多个空间表示,只需一个基础模型。我们使用了交叉注意机制来融合和转换条件(具体特征)的信息。
  • results: 我们的方法在多个标准 benchmark dataset上取得了稳定的state-of-the-art表现,包括 FashionAI、DARN、DeepFashion 和 Zappos50K。与之前的方法不同,我们的方法不受 benchmark dataset 的影响,表现一直优良。
    Abstract Many studies in vision tasks have aimed to create effective embedding spaces for single-label object prediction within an image. However, in reality, most objects possess multiple specific attributes, such as shape, color, and length, with each attribute composed of various classes. To apply models in real-world scenarios, it is essential to be able to distinguish between the granular components of an object. Conventional approaches to embedding multiple specific attributes into a single network often result in entanglement, where fine-grained features of each attribute cannot be identified separately. To address this problem, we propose a Conditional Cross-Attention Network that induces disentangled multi-space embeddings for various specific attributes with only a single backbone. Firstly, we employ a cross-attention mechanism to fuse and switch the information of conditions (specific attributes), and we demonstrate its effectiveness through a diverse visualization example. Secondly, we leverage the vision transformer for the first time to a fine-grained image retrieval task and present a simple yet effective framework compared to existing methods. Unlike previous studies where performance varied depending on the benchmark dataset, our proposed method achieved consistent state-of-the-art performance on the FashionAI, DARN, DeepFashion, and Zappos50K benchmark datasets.
    摘要 很多研究在视觉任务中尝试创建有效的嵌入空间,以便在图像中预测单个对象。然而,在实际情况下,大多数对象具有多个特定属性,如形状、颜色和长度,每个属性包含多个类。要将模型应用到实际场景中,需要能够分别识别对象的细腻特征。传统的嵌入多个特定属性到单个网络中的方法经常导致杂化,无法分别识别每个属性的细腻特征。为解决这个问题,我们提议一个名为Conditional Cross-Attention Network的方法,它可以生成独立的多个空间嵌入,用于不同的特定属性。首先,我们使用交叉注意机制将条件(特定属性)的信息融合和转换。我们通过多种视觉化示例展示了其效果。其次,我们是第一次应用视Transformer于细化图像检索任务,并提出了一个简单而有效的框架,与现有方法相比。与过去的研究不同,我们的提议方法在FashionAI、DARN、DeepFashion和Zappos50K benchmark dataset上实现了一致的状态空间性表现。

Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering

  • paper_url: http://arxiv.org/abs/2307.13250
  • repo_url: None
  • paper_authors: Yi Cheng, Hehe Fan, Dongyun Lin, Ying Sun, Mohan Kankanhalli, Joo-Hwee Lim
  • for: 提高视频问答(VideoQA)中的复杂空间和时间关系捕捉和理解,以及使用关键词更好地捕捉问题特点。
  • methods: 提出了一种关键词意识Relative Spatio-Temporal(KRST)图网络,包括使用注意力机制将关键词纳入问题编码,以及将关键词意识导入视频图构建。同时,通过 integrating relative relation modeling来更好地捕捉视频中对象之间的空间和时间关系。
  • results: 在TGIF-QA、MSVD-QA和MSRVTT-QA等多个数据集上进行了广泛的实验,证明了KRST的超过多种状态艺术方法的优越性。
    Abstract The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions. Existing graph-based methods for VideoQA usually ignore keywords in questions and employ a simple graph to aggregate features without considering relative relations between objects, which may lead to inferior performance. In this paper, we propose a Keyword-aware Relative Spatio-Temporal (KRST) graph network for VideoQA. First, to make question features aware of keywords, we employ an attention mechanism to assign high weights to keywords during question encoding. The keyword-aware question features are then used to guide video graph construction. Second, because relations are relative, we integrate the relative relation modeling to better capture the spatio-temporal dynamics among object nodes. Moreover, we disentangle the spatio-temporal reasoning into an object-level spatial graph and a frame-level temporal graph, which reduces the impact of spatial and temporal relation reasoning on each other. Extensive experiments on the TGIF-QA, MSVD-QA and MSRVTT-QA datasets demonstrate the superiority of our KRST over multiple state-of-the-art methods.
    摘要 主要挑战在视频问答(VideoQA)是捕捉和理解问题中的复杂空间和时间关系。现有的图学方法通常忽略问题中的关键词并使用简单的图来汇聚特征,这可能会导致性能下降。在这篇论文中,我们提出了关键词意识的相对空间时间(KRST)图网络来解决这个问题。首先,为了让问题特征意识到关键词,我们使用注意力机制来在问题编码中分配高权重到关键词。关键词意识的问题特征然后用来导引视频图建构。其次,因为关系是相对的,我们将相对关系模型纳入更好地捕捉视频中对象节点之间的空间时间动态。此外,我们将空间时间理解分解成对象级别的空间图和帧级别的时间图,这会减少空间和时间关系的相互影响。我们在TGIF-QA、MSVD-QA和MSRVTT-QA datasets上进行了广泛的实验,并证明了我们的KRST方法在多个现状顶峰方法之上。

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

  • paper_url: http://arxiv.org/abs/2307.13244
  • repo_url: https://github.com/alibabaresearch/advancedliteratemachinery
  • paper_authors: Cheng Da, Peng Wang, Cong Yao
  • For: The paper is written for scene text recognition (STR), which is an active research topic in computer vision. The authors aim to tackle the challenging problem of STR by incorporating linguistic knowledge into the model.* Methods: The authors use a vision STR model built upon the Vision Transformer (ViT) and a tailored Adaptive Addressing and Aggregation (A$^3$) module. They also propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model, using subword representations (BPE and WordPiece) in addition to the conventional character level representation.* Results: The proposed MGP-STR algorithm achieves an average recognition accuracy of 94% on standard benchmarks for scene text recognition, and also achieves state-of-the-art results on widely-used handwritten benchmarks and more challenging scene text datasets.Here are the three key points in Simplified Chinese text:* For: 这篇论文是为了Scene Text Recognition(STR)做出一种新的方法。* Methods: 作者使用了基于Vision Transformer(ViT)的视觉STR模型,并提出了一种适应性地址和聚合(A$^3$)模块。另外,他们还提出了一种多级预测策略,以在模型中注入语言特征。* Results: 提案的MGP-STR算法可以在标准的STR测试集上达到94%的识别率,同时在手写测试集和更加具有挑战性的Scene Text测试集上也达到了状态之最的结果。
    Abstract Due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this tough problem, numerous innovative methods have been successively proposed, and incorporating linguistic knowledge into STR models has recently become a prominent trend. In this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet functionally powerful vision STR model, which is built upon ViT and a tailored Adaptive Addressing and Aggregation (A$^3$) module. It already outperforms most previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. To integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, \ie, subword representations (BPE and WordPiece) widely used in NLP are introduced into the output space, in addition to the conventional character level representation, while no independent language model (LM) is adopted. To produce the final recognition results, two strategies for effectively fusing the multi-granularity predictions are devised. The resultant algorithm (termed MGP-STR) is able to push the performance envelope of STR to an even higher level. Specifically, MGP-STR achieves an average recognition accuracy of $94\%$ on standard benchmarks for scene text recognition. Moreover, it also achieves state-of-the-art results on widely-used handwritten benchmarks as well as more challenging scene text datasets, demonstrating the generality of the proposed MGP-STR algorithm. The source code and models will be available at: \url{https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR}.
    摘要 due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. to tackle this tough problem, numerous innovative methods have been successively proposed, and incorporating linguistic knowledge into STR models has recently become a prominent trend. in this work, we first draw inspiration from the recent progress in Vision Transformer (ViT) to construct a conceptually simple yet functionally powerful vision STR model, which is built upon ViT and a tailored Adaptive Addressing and Aggregation (A$^3$) module. it already outperforms most previous state-of-the-art models for scene text recognition, including both pure vision models and language-augmented methods. to integrate linguistic knowledge, we further propose a Multi-Granularity Prediction strategy to inject information from the language modality into the model in an implicit way, 例如,使用字符级别表示和wordpiece的subword表示,while no independent language model (LM) is adopted. to produce the final recognition results, two strategies for effectively fusing the multi-granularity predictions are devised. the resultant algorithm (termed MGP-STR) is able to push the performance envelope of STR to an even higher level. specifically, MGP-STR achieves an average recognition accuracy of 94% on standard benchmarks for scene text recognition. moreover, it also achieves state-of-the-art results on widely-used handwritten benchmarks as well as more challenging scene text datasets, demonstrating the generality of the proposed MGP-STR algorithm. the source code and models will be available at: https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/MGP-STR。

Fashion Matrix: Editing Photos by Just Talking

  • paper_url: http://arxiv.org/abs/2307.13240
  • repo_url: https://github.com/zheng-chong/fashionmatric
  • paper_authors: Zheng Chong, Xujie Zhang, Fuwei Zhao, Zhenyu Xie, Xiaodan Liang
  • for: 这个论文旨在探讨如何使用大型自然语言模型(LLM)来构建智能系统,以便在时尚领域内进行图像编辑。
  • methods: 该论文提出了一种层次结构的AI系统,名为时尚矩阵(Fashion Matrix),可以通过语音指令来编辑图像。这个系统使用LLM作为基础支持,并在用户的指令下进行迭代交互。具体来说,该系统使用了多种Semantic Segmentation Models(如Grounded-SAM、MattingAnything等)来定义特定的编辑面罩,然后使用Visual Foundation Models(如Stable Diffusion、ControlNet等)来从文本提示和面罩中生成编辑后的图像。
  • results: 实验表明,Fashion Matrix可以充分发挥大型自然语言模型在时尚编辑领域的合作潜力。
    Abstract The utilization of Large Language Models (LLMs) for the construction of AI systems has garnered significant attention across diverse fields. The extension of LLMs to the domain of fashion holds substantial commercial potential but also inherent challenges due to the intricate semantic interactions in fashion-related generation. To address this issue, we developed a hierarchical AI system called Fashion Matrix dedicated to editing photos by just talking. This system facilitates diverse prompt-driven tasks, encompassing garment or accessory replacement, recoloring, addition, and removal. Specifically, Fashion Matrix employs LLM as its foundational support and engages in iterative interactions with users. It employs a range of Semantic Segmentation Models (e.g., Grounded-SAM, MattingAnything, etc.) to delineate the specific editing masks based on user instructions. Subsequently, Visual Foundation Models (e.g., Stable Diffusion, ControlNet, etc.) are leveraged to generate edited images from text prompts and masks, thereby facilitating the automation of fashion editing processes. Experiments demonstrate the outstanding ability of Fashion Matrix to explores the collaborative potential of functionally diverse pre-trained models in the domain of fashion editing.
    摘要 utilization of Large Language Models (LLMs) for the construction of AI systems has garnered significant attention across diverse fields. The extension of LLMs to the domain of fashion holds substantial commercial potential but also inherent challenges due to the intricate semantic interactions in fashion-related generation. To address this issue, we developed a hierarchical AI system called Fashion Matrix dedicated to editing photos by just talking. This system facilitates diverse prompt-driven tasks, encompassing garment or accessory replacement, recoloring, addition, and removal. Specifically, Fashion Matrix employs LLM as its foundational support and engages in iterative interactions with users. It employs a range of Semantic Segmentation Models (e.g., Grounded-SAM, MattingAnything, etc.) to delineate the specific editing masks based on user instructions. Subsequently, Visual Foundation Models (e.g., Stable Diffusion, ControlNet, etc.) are leveraged to generate edited images from text prompts and masks, thereby facilitating the automation of fashion editing processes. Experiments demonstrate the outstanding ability of Fashion Matrix to explore the collaborative potential of functionally diverse pre-trained models in the domain of fashion editing.

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

  • paper_url: http://arxiv.org/abs/2307.13236
  • repo_url: None
  • paper_authors: Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang
  • for: Audio-visual segmentation (AVS) task, specifically to segment sounding objects in video frames using audio cues.
  • methods: Introduces a multimodal transformer architecture that enables deep fusion and aggregation of audio-visual features, as well as an audio-aware query-enhanced transformer decoder that explicitly focuses on the segmentation of pinpointed sounding objects based on audio signals.
  • results: Outperforms previous methods and demonstrates better generalization ability in multi-sound and open-set scenarios.
    Abstract The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues. However, current fusion-based methods have the performance limitations due to the small receptive field of convolution and inadequate fusion of audio-visual features. To overcome these issues, we propose a novel \textbf{Au}dio-aware query-enhanced \textbf{TR}ansformer (AuTR) to tackle the task. Unlike existing methods, our approach introduces a multimodal transformer architecture that enables deep fusion and aggregation of audio-visual features. Furthermore, we devise an audio-aware query-enhanced transformer decoder that explicitly helps the model focus on the segmentation of the pinpointed sounding objects based on audio signals, while disregarding silent yet salient objects. Experimental results show that our method outperforms previous methods and demonstrates better generalization ability in multi-sound and open-set scenarios.
    摘要 目的是对视频帧中的听起来对象进行分割,使用听音信号作为cue。然而,现有的融合方法受到小感知区域和不足的听视特征融合的限制。为了解决这些问题,我们提出了一种新的听音意识 Query-强化 transformer(AuTR)方法。与现有方法不同,我们的方法引入了多Modal transformer架构,允许深度融合和听视特征的积累。此外,我们开发了一种听音意识Query-强化 transformer解码器,具体地帮助模型根据听音信号进行对象分割,而忽略无声却突出的对象。实验结果表明,我们的方法在多音和开放集成enario中表现出色,并且在多音和开放集成enario中具有更好的泛化能力。

Strivec: Sparse Tri-Vector Radiance Fields

  • paper_url: http://arxiv.org/abs/2307.13226
  • repo_url: https://github.com/zerg-overmind/strivec
  • paper_authors: Quankai Gao, Qiangeng Xu, Hao Su, Ulrich Neumann, Zexiang Xu
  • for: 该文章旨在提出一种新的神经表示方法,用于模型3D场景为辐射场的local tensor feature网格。
  • methods: 该方法利用tensor decomposition, builds upon recent work TensoRF,并使用cloud of local tensors和classic CANDECOMP/PARAFAC(CP)归一化来分解每个tensor成三个向量,表示本地特征分布 along spatial axes,并压缩表示本地神经场的compact neural field。
  • results: 该方法可以实现更好的渲染质量,使用相对较少的参数,比如TensoRF和Instant-NGP。
    Abstract We propose Strivec, a novel neural representation that models a 3D scene as a radiance field with sparsely distributed and compactly factorized local tensor feature grids. Our approach leverages tensor decomposition, following the recent work TensoRF, to model the tensor grids. In contrast to TensoRF which uses a global tensor and focuses on their vector-matrix decomposition, we propose to utilize a cloud of local tensors and apply the classic CANDECOMP/PARAFAC (CP) decomposition to factorize each tensor into triple vectors that express local feature distributions along spatial axes and compactly encode a local neural field. We also apply multi-scale tensor grids to discover the geometry and appearance commonalities and exploit spatial coherence with the tri-vector factorization at multiple local scales. The final radiance field properties are regressed by aggregating neural features from multiple local tensors across all scales. Our tri-vector tensors are sparsely distributed around the actual scene surface, discovered by a fast coarse reconstruction, leveraging the sparsity of a 3D scene. We demonstrate that our model can achieve better rendering quality while using significantly fewer parameters than previous methods, including TensoRF and Instant-NGP.
    摘要 我们提出了Strivec,一种新的神经表示方法,它将三维场景视为一个辐射场,并使用稀疏分布的本地维度特征网格来模型。我们的方法利用了矩阵分解,建立在最近的TensoRF工作之上,通过对每个矩阵进行类似于CP分解(CANDECOMP/PARAFAC),将每个矩阵分解成三个向量,表示地方特征分布在空间轴上,并压缩地表示当地神经场。我们还使用多尺度矩阵网格来探索场景的几何和外观共同点,并利用多个本地尺度的空间同步来提高渲染质量。最终,我们通过将多个本地矩阵的神经特征进行汇聚来预测场景的辐射场性质。我们的三向量矩阵在实际场景表面上稀疏分布,通过快速粗略重建来发现。我们示示了我们的模型可以在使用更少参数的情况下达到更好的渲染质量,比之前的方法,包括TensoRF和Instant-NGP。

Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras

  • paper_url: http://arxiv.org/abs/2307.13215
  • repo_url: https://github.com/divamgupta/image-segmentation-keras
  • paper_authors: Divam Gupta
  • for: 本文提供了一个全面的 semantic segmentation 库,包含多种流行的 segmentation 模型,如 SegNet、FCN、UNet 和 PSPNet。
  • methods: 本文提供了多种 segmentation 模型的实现,并对其进行了评估和比较,为研究人员和实践者提供了一套强大的工具集,用于解决多种分类挑战。
  • results: 本文对多个数据集进行了评估和比较,以便为研究人员和实践者提供参考结果,帮助他们更好地选择合适的 segmentation 模型。
    Abstract Semantic segmentation plays a vital role in computer vision tasks, enabling precise pixel-level understanding of images. In this paper, we present a comprehensive library for semantic segmentation, which contains implementations of popular segmentation models like SegNet, FCN, UNet, and PSPNet. We also evaluate and compare these models on several datasets, offering researchers and practitioners a powerful toolset for tackling diverse segmentation challenges.
    摘要 semantic segmentation 在计算机视觉任务中扮演着重要的角色,允许精确地理解图像的每个像素。在这篇论文中,我们提供了一个全面的Semantic Segmentation库,包括流行的 segmentation 模型如 SegNet、FCN、UNet 和 PSPNet。我们还对这些模型在多个数据集上进行了评估和比较,为研究人员和实践者提供了一套强大的工具集,用于解决多样化的 segmentation 挑战。

GeoTransformer: Fast and Robust Point Cloud Registration with Geometric Transformer

  • paper_url: http://arxiv.org/abs/2308.03768
  • repo_url: https://github.com/qinzheng93/geotransformer
  • paper_authors: Zheng Qin, Hao Yu, Changjian Wang, Yulan Guo, Yuxing Peng, Slobodan Ilic, Dewen Hu, Kai Xu
  • for: 本文targets the problem of point cloud registration, specifically focusing on accurate correspondence extraction without relying on keypoints.
  • methods: 方法基于Geometric Transformer(GeoTransformer),利用对超点的匹配来学习 геометрических特征,使匹配方法具有对静止变换的抗干扰和低 overlap场景中的稳定性。
  • results: 实验结果表明,GeoTransformer可以达到高精度匹配,无需进行RANSAC,从而提高了匹配精度和注册精度。特别是在3DLoMatch benchmark上,我们的方法提高了匹配率18%到31%和注册精度7点以上。
    Abstract We study the problem of extracting accurate correspondences for point cloud registration. Recent keypoint-free methods have shown great potential through bypassing the detection of repeatable keypoints which is difficult to do especially in low-overlap scenarios. They seek correspondences over downsampled superpoints, which are then propagated to dense points. Superpoints are matched based on whether their neighboring patches overlap. Such sparse and loose matching requires contextual features capturing the geometric structure of the point clouds. We propose Geometric Transformer, or GeoTransformer for short, to learn geometric feature for robust superpoint matching. It encodes pair-wise distances and triplet-wise angles, making it invariant to rigid transformation and robust in low-overlap cases. The simplistic design attains surprisingly high matching accuracy such that no RANSAC is required in the estimation of alignment transformation, leading to $100$ times acceleration. Extensive experiments on rich benchmarks encompassing indoor, outdoor, synthetic, multiway and non-rigid demonstrate the efficacy of GeoTransformer. Notably, our method improves the inlier ratio by $18{\sim}31$ percentage points and the registration recall by over $7$ points on the challenging 3DLoMatch benchmark. Our code and models are available at \url{https://github.com/qinzheng93/GeoTransformer}.
    摘要 我们研究点云注册问题中的准确匹配问题。最近的关键点无法方法已经表现出了很大的潜力,它们通过绕过复现关键点的检测而实现了更加简单的匹配方式。它们在下采样后的超点上寻找匹配,然后将匹配推广到密集点云。超点的匹配基于他们邻近的补丁 overlap。这种稀疏和松散的匹配需要捕捉点云的几何结构。我们提出了Geometric Transformer,简称为GeoTransformer,用于学习几何特征以实现Robust superpoint匹配。它编码了对称的距离和三角形的角度,使其对于平移变换不变和低 overlap情况下具有抗锁定性。我们的简单设计听起来 surprisingly high匹配精度,无需RANSAC,从而实现了100倍的加速。我们的实验结果表明,GeoTransformer在含有室内、外部、 sintetic、多方和非RIGID的丰富benchmark上都具有remarkable的效果。其中,我们的方法提高了3DLoMatchbenchmark上的准确比例by 18-31个百分点和注册记忆by more than 7个点。我们的代码和模型可以在以下链接中找到:https://github.com/qinzheng93/GeoTransformer。

An Investigation into Glomeruli Detection in Kidney H&E and PAS Images using YOLO

  • paper_url: http://arxiv.org/abs/2307.13199
  • repo_url: https://github.com/AlexeyAB/darknet
  • paper_authors: Kimia Hemmatirad, Morteza Babaie, Jeffrey Hodgin, Liron Pantanowitz, H. R. Tizhoosh
    for:This paper aims to assist pathologists in detecting glomeruli in human kidney images using computerized solutions, specifically using the YOLO-v4 object detector.methods:The YOLO-v4 model was used to detect glomeruli in human kidney images, and the model was trained on whole slide images. The model was fine-tuned using a private dataset from the University of Michigan, and tested on the same dataset using two different stains (H&E and PAS).results:The results show that the YOLO-v4 model can achieve high specificity and sensitivity in detecting glomeruli in human kidney images, with an average specificity and sensitivity for all experiments. The model’s performance was also compared to existing segmentation methods on the same datasets, and the results show that the YOLO-v4 model outperforms these methods.
    Abstract Context: Analyzing digital pathology images is necessary to draw diagnostic conclusions by investigating tissue patterns and cellular morphology. However, manual evaluation can be time-consuming, expensive, and prone to inter- and intra-observer variability. Objective: To assist pathologists using computerized solutions, automated tissue structure detection and segmentation must be proposed. Furthermore, generating pixel-level object annotations for histopathology images is expensive and time-consuming. As a result, detection models with bounding box labels may be a feasible solution. Design: This paper studies. YOLO-v4 (You-Only-Look-Once), a real-time object detector for microscopic images. YOLO uses a single neural network to predict several bounding boxes and class probabilities for objects of interest. YOLO can enhance detection performance by training on whole slide images. YOLO-v4 has been used in this paper. for glomeruli detection in human kidney images. Multiple experiments have been designed and conducted based on different training data of two public datasets and a private dataset from the University of Michigan for fine-tuning the model. The model was tested on the private dataset from the University of Michigan, serving as an external validation of two different stains, namely hematoxylin and eosin (H&E) and periodic acid-Schiff (PAS). Results: Average specificity and sensitivity for all experiments, and comparison of existing segmentation methods on the same datasets are discussed. Conclusions: Automated glomeruli detection in human kidney images is possible using modern AI models. The design and validation for different stains still depends on variability of public multi-stain datasets.
    摘要 Context: 分析数字 PATHOLOGY 图像是必要的,以便从 Investigate 组织趋势和细胞形态中得出诊断结论。然而,手动评估可能会占用大量时间和成本,并且可能会存在Inter-和 intra-观察者的差异。目的:通过计算机化解决方案,自动检测和分类组织结构。此外,生成 Histopathology 图像的像素级对象标注是昂贵的和时间consuming。因此,使用 bounding box 标签的检测模型可能是一个可行的解决方案。设计:本文研究了 YOLO-v4(You-Only-Look-Once),一种实时物体检测器,用于微scopic 图像。YOLO 使用单个神经网络预测多个 bounding box 和对象类概率。YOLO 可以通过训练整个扫描图像来提高检测性能。本文使用 YOLO-v4 进行人肾图像中glomeruli 检测。多个实验基于不同的训练数据,包括两个公共数据集和大学Michigan 私有数据集进行了微调。模型在大学Michigan 私有数据集上进行了测试,并作为对 H&E 和 PAS 两种染料的外部验证。结果:本文提出了一些均衡性和敏感性的平均值,并与其他分 segmentation 方法在同一个数据集上进行了比较。结论:使用现代 AI 模型,自动检测人肾图像中的glomeruli 是可能的。不同的染料设计仍然取决于多个公共多种染料数据集的变化。

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

  • paper_url: http://arxiv.org/abs/2307.13136
  • repo_url: None
  • paper_authors: Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
  • for: 这个论文是为了评估对图像网络上的基本模型进行大规模数据训练是否能够提高对真实世界的泛化能力。
  • methods: 这个论文使用了两个 datasets of objects from households across the globe,并进行了广泛的实验研究,包括对nearly 100种视觉模型进行了评估。
  • results: 研究发现,通过标准的基本模型训练方法,模型在不同地区的性能差距较大,Foundation CLIP 模型也存在大量地区性能差距。此外,论文还发现,通过简单地在最后一层添加更 represervative 的数据进行再训练,可以减少地区性能差距。
    Abstract For more than a decade, researchers have measured progress in object recognition on ImageNet-based generalization benchmarks such as ImageNet-A, -C, and -R. Recent advances in foundation models, trained on orders of magnitude more data, have begun to saturate these standard benchmarks, but remain brittle in practice. This suggests standard benchmarks, which tend to focus on predefined or synthetic changes, may not be sufficient for measuring real world generalization. Consequently, we propose studying generalization across geography as a more realistic measure of progress using two datasets of objects from households across the globe. We conduct an extensive empirical evaluation of progress across nearly 100 vision models up to most recent foundation models. We first identify a progress gap between standard benchmarks and real-world, geographical shifts: progress on ImageNet results in up to 2.5x more progress on standard generalization benchmarks than real-world distribution shifts. Second, we study model generalization across geographies by measuring the disparities in performance across regions, a more fine-grained measure of real world generalization. We observe all models have large geographic disparities, even foundation CLIP models, with differences of 7-20% in accuracy between regions. Counter to modern intuition, we discover progress on standard benchmarks fails to improve geographic disparities and often exacerbates them: geographic disparities between the least performant models and today's best models have more than tripled. Our results suggest scaling alone is insufficient for consistent robustness to real-world distribution shifts. Finally, we highlight in early experiments how simple last layer retraining on more representative, curated data can complement scaling as a promising direction of future work, reducing geographic disparity on both benchmarks by over two-thirds.
    摘要 To address this, we propose studying generalization across geography as a more realistic measure of progress. We evaluate nearly 100 vision models, including the most recent foundation models, on two datasets of objects from households around the world. Our results show that there is a significant gap between progress on ImageNet and real-world geographical shifts. While progress on ImageNet results in up to 2.5 times more progress on standard generalization benchmarks, it does not improve geographic disparities and often exacerbates them. In fact, the geographic disparities between the least performant models and today's best models have more than tripled.Our findings suggest that scaling alone is not sufficient for consistent robustness to real-world distribution shifts. However, we do find that simple last layer retraining on more representative, curated data can complement scaling and reduce geographic disparity on both benchmarks by over two-thirds. These results highlight the importance of considering real-world geographical variations when evaluating progress in object recognition.

simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects

  • paper_url: http://arxiv.org/abs/2307.13133
  • repo_url: None
  • paper_authors: Maria Bauza, Antonia Bronars, Yifan Hou, Ian Taylor, Nikhil Chavan-Dafle, Alberto Rodriguez
    for: 这篇论文旨在解决机器人抓取和安放精度问题。methods: 本论文提出了一种基于模拟和感知的方法,称为simPLE,可以帮助机器人在不知道任务的情况下,准确地抓取和安放多种不同形状的物体。results: 在使用 dual-arm 机器人和视听感知系统的实验中,simPLE 能够成功地将 15 种不同形状的物体安放到有序排列中,成功率高达 90% 以上,并且在 6 种物体上达到 1mm 的准确性。
    Abstract Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects. Videos are available at http://mcube.mit.edu/research/simPLE.html .
    摘要 现有的机器人系统存在明确的一致性和精度之间的矛盾。已部署的机器人 manipulate 解决方案通常处于单一任务的解决方案,缺乏精度,即能够解决多个任务而不失去精度。本文探讨精度和通用性的pick-and-place解决方案。在精度的pick-and-place中,机器人将无结构的物品变换为结构化的安排,可以促进进一步的操作。我们提出了simPLE(从 simulate 到 Pick Localize 和 PLacE)作为精度和通用性的pick-and-place解决方案。simPLE通过学习,可以准确地找到、重新抓取并将物品放置在正确的位置,只需要物品 CAD 模型,没有先前经验。我们开发了三个主要 ком成分:任务意识 grasping、视听感知和重新抓 планиuning。任务意识 grasping 计算物品的可行性,包括稳定、可见和放置的优势。视听感知模型通过对实际观察与 simulations 进行比较,通过超参数学习来学习。最后,我们解决了一个最短路径问题,以计算手动重新抓取的desired robot 动作。在配备视听感知的双手机器人上,我们通过 simPLE 成功地完成了15种不同的物品的pick-and-place。物品的形状范围广泛,simPLE 在90% 的时间内成功地将物品放置到结构化的安排中,距离1毫米。视频可以在http://mcube.mit.edu/research/simPLE.html 上查看。

Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review

  • paper_url: http://arxiv.org/abs/2307.13125
  • repo_url: https://github.com/Arminsbss/tumor-classification
  • paper_authors: Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Su Ruan
  • for: 这篇论文主要关注的是如何使用深度生成模型来增强医疗影像分析,特别是对于医疗领域的训练数据有限制,并且训练数据的获得可能是成本高且受到隐私法规限制。
  • methods: 这篇论文评论了三种深度生成模型,包括Variational Autoencoders、Generative Adversarial Networks和Diffusion Models,这些模型可以生成更加真实和多样的数据,并且可以帮助提高医疗影像分析中的深度学习算法性能。
  • results: 这篇论文评论了这些模型在不同的下游任务中的表现,包括分类、分 segmentation 和 Cross-modal Translation,并且评估了这些模型的优点和缺点,并提出了未来研究的方向。
    Abstract Deep learning has become a popular tool for medical image analysis, but the limited availability of training data remains a major challenge, particularly in the medical field where data acquisition can be costly and subject to privacy regulations. Data augmentation techniques offer a solution by artificially increasing the number of training samples, but these techniques often produce limited and unconvincing results. To address this issue, a growing number of studies have proposed the use of deep generative models to generate more realistic and diverse data that conform to the true distribution of the data. In this review, we focus on three types of deep generative models for medical image augmentation: variational autoencoders, generative adversarial networks, and diffusion models. We provide an overview of the current state of the art in each of these models and discuss their potential for use in different downstream tasks in medical imaging, including classification, segmentation, and cross-modal translation. We also evaluate the strengths and limitations of each model and suggest directions for future research in this field. Our goal is to provide a comprehensive review about the use of deep generative models for medical image augmentation and to highlight the potential of these models for improving the performance of deep learning algorithms in medical image analysis.
    摘要 深度学习已经成为医疗图像分析中广泛使用的工具,但是培训数据的有限性仍然是一个主要挑战,特别是在医疗领域,数据收集可能是成本高昂的并且受到隐私法规限制。数据扩充技术可以人工地增加培训样本数量,但这些技术通常会生成有限和不真实的结果。为解决这个问题,一些研究在医疗图像增强中使用深度生成模型,以生成更加真实和多样的数据,这些数据遵循实际数据的分布。在本文中,我们关注了三种深度生成模型,即变量自动编码器、生成对抗网络和扩散模型,并对它们在不同的下游任务中的当前状态进行了概述。我们还评估了每种模型的优缺点,并提出了未来研究的方向。我们的目标是提供一篇全面的深度生成模型在医疗图像增强中的评review,并高亮这些模型在医疗图像分析中的潜在优势。

Automatic Infant Respiration Estimation from Video: A Deep Flow-based Algorithm and a Novel Public Benchmark

  • paper_url: http://arxiv.org/abs/2307.13110
  • repo_url: https://github.com/ostadabbas/infant-respiration-estimation
  • paper_authors: Sai Kumar Reddy Manne, Shaotong Zhu, Sarah Ostadabbas, Michael Wan
    for:This paper aims to develop a deep-learning method for estimating respiratory rate and waveform from plain video footage in natural settings, with the goal of providing fully automatic, continuous, and contactless respiratory monitoring for infants.methods:The proposed method, called AIRFlowNet, combines video-extracted optical flow input and spatiotemporal convolutional processing tuned to the infant domain. The model is trained using a novel spectral bandpass loss function and a public annotated infant respiration dataset (AIR-125) with 125 videos drawn from eight infant subjects.results:Compared to other state-of-the-art methods, AIRFlowNet significantly outperforms other state-of-the-art methods in respiratory rate estimation, achieving a mean absolute error of $\sim$2.9 breaths per minute.
    Abstract Respiration is a critical vital sign for infants, and continuous respiratory monitoring is particularly important for newborns. However, neonates are sensitive and contact-based sensors present challenges in comfort, hygiene, and skin health, especially for preterm babies. As a step toward fully automatic, continuous, and contactless respiratory monitoring, we develop a deep-learning method for estimating respiratory rate and waveform from plain video footage in natural settings. Our automated infant respiration flow-based network (AIRFlowNet) combines video-extracted optical flow input and spatiotemporal convolutional processing tuned to the infant domain. We support our model with the first public annotated infant respiration dataset with 125 videos (AIR-125), drawn from eight infant subjects, set varied pose, lighting, and camera conditions. We include manual respiration annotations and optimize AIRFlowNet training on them using a novel spectral bandpass loss function. When trained and tested on the AIR-125 infant data, our method significantly outperforms other state-of-the-art methods in respiratory rate estimation, achieving a mean absolute error of $\sim$2.9 breaths per minute, compared to $\sim$4.7--6.2 for other public models designed for adult subjects and more uniform environments.
    摘要 呼吸是新生儿的生命指标之一,不间断的呼吸监测对新生儿 particurlary 重要。然而,新生儿强健和触感型感测器存在舒适性、卫生性和皮肤健康等问题,特别是对幼儿。为了实现完全自动、无接触、不间断的呼吸监测,我们开发了一种深度学习方法,可以从普通的视频流中提取呼吸速率和呼吸波形。我们称之为婴儿呼吸流基网络(AIRFlowNet),它将视频提取的光流输入和空间时间卷积处理结合,特制 для婴儿领域。我们为这种模型提供了首个公共标注 infant 呼吸数据集(AIR-125),包括8名婴儿的125个视频,具有多种姿势、照明和摄像头条件。我们还包括手动呼吸注释和使用新的spectral bandpass损失函数来优化AIRFlowNet 的训练。当我们在AIR-125 infant数据集上训练和测试AIRFlowNet时,它与其他公共模型相比,在呼吸速率估计方面显著超越,具有$\sim$2.9 breaths per minute的平均绝对误差,与$\sim$4.7--6.2的其他公共模型设计 для成人主题和更加均匀的环境相比。

General-Purpose Multi-Modal OOD Detection Framework

  • paper_url: http://arxiv.org/abs/2307.13069
  • repo_url: None
  • paper_authors: Viet Duong, Qiong Wu, Zhengyi Zhou, Eric Zavesky, Jiahe Chen, Xiangzhou Liu, Wen-Ling Hsu, Huajie Shao
  • for: 本研究的目的是 simultaneously detect 多个不同的 OOD 场景,以提高 ML 系统的安全性和可靠性。
  • methods: 我们提出了一种通用的 weakly-supervised OOD detection 框架,called WOOD,它结合了一个二分类器和一个对比学习组件,以便充分利用两者的优点。我们采用了 Hinge loss 来约束 ID 和 OOD 样本的准确性。
  • results: 我们在多个实际世界数据集上测试了提出的 WOOD 模型,并得到了比现状态方法更高的 OOD 检测精度。特别是,我们的方法可以同时在三个不同的 OOD 场景中具有高准确性。
    Abstract Out-of-distribution (OOD) detection identifies test samples that differ from the training data, which is critical to ensuring the safety and reliability of machine learning (ML) systems. While a plethora of methods have been developed to detect uni-modal OOD samples, only a few have focused on multi-modal OOD detection. Current contrastive learning-based methods primarily study multi-modal OOD detection in a scenario where both a given image and its corresponding textual description come from a new domain. However, real-world deployments of ML systems may face more anomaly scenarios caused by multiple factors like sensor faults, bad weather, and environmental changes. Hence, the goal of this work is to simultaneously detect from multiple different OOD scenarios in a fine-grained manner. To reach this goal, we propose a general-purpose weakly-supervised OOD detection framework, called WOOD, that combines a binary classifier and a contrastive learning component to reap the benefits of both. In order to better distinguish the latent representations of in-distribution (ID) and OOD samples, we adopt the Hinge loss to constrain their similarity. Furthermore, we develop a new scoring metric to integrate the prediction results from both the binary classifier and contrastive learning for identifying OOD samples. We evaluate the proposed WOOD model on multiple real-world datasets, and the experimental results demonstrate that the WOOD model outperforms the state-of-the-art methods for multi-modal OOD detection. Importantly, our approach is able to achieve high accuracy in OOD detection in three different OOD scenarios simultaneously. The source code will be made publicly available upon publication.
    摘要 外部数据(OOD)检测可以识别测试样本与训练数据之间的差异,这是机器学习(ML)系统的安全性和可靠性的关键。虽然许多方法已经开发了用于检测单modal OOD样本,但只有一些关注了多modal OOD检测。现有的对比学习基于方法主要研究了一个给定的图像和其相应的文本描述来自新领域的多modal OOD检测场景。但实际世界中部署的ML系统可能会面临更多的异常场景,如感知器故障、坏天气和环境变化。因此,我们的目标是同时从多个不同的OOD场景中同精细地检测OOD样本。为达到这个目标,我们提出了一个通用强制监督OOD检测框架,called WOOD,它将对比学习和二分类器的优点相互融合。为了更好地分解ID和OOD样本的准确表示,我们采用了缩限损失来约束它们之间的相似性。此外,我们开发了一个新的分数指标,以集成binary分类器和对比学习的预测结果,以便更好地识别OOD样本。我们在多个实际世界数据集上测试了提议的WOOD模型,实验结果表明,WOOD模型在多modal OOD检测中超过了现有方法的性能。重要的是,我们的方法能够同时在三个不同的OOD场景中同精细地检测OOD样本。代码将在出版时公开。

On the characteristics of natural hydraulic dampers: An image-based approach to study the fluid flow behaviour inside the human meniscal tissue

  • paper_url: http://arxiv.org/abs/2307.13060
  • repo_url: None
  • paper_authors: J. Waghorne, F. P. Bonomo, A. Rabbani, D. Bell, O. Barrera
  • for: 这个研究旨在了解股骨细胞层中流体流动的行为以及其与结构的关系,以便更好地理解股骨疾病的发展、治疗方法的设计和生物材料的设计。
  • methods: 这个研究使用了计算流体动力学(CFD)和图像分析(CFD-IA)的新方法,通过高分辨率3D微计算tomography扫描来分析人类股骨内部的流体流动。
  • results: 研究发现,股骨内部的流体流动与结构参数(扭曲度、连接度、孔隙率、孔径size)存在 statistically significant 相关性。一些通道的Re值可达1400,并且在输入速度为1.6m/s时出现了非达尔cy的 regime。location-dependent permeability ranges from 20-32 Darcy。在高输入速度下,流体速度和扭曲度之间存在强相关性,以及与通道径 diameter 的相关性。
    Abstract The meniscal tissue is a layered material with varying properties influenced by collagen content and arrangement. Understanding the relationship between structure and properties is crucial for disease management, treatment development, and biomaterial design. The internal layer of the meniscus is softer and more deformable than the outer layers, thanks to interconnected collagen channels that guide fluid flow. To investigate these relationships, we propose a novel approach that combines Computational Fluid Dynamics (CFD) with Image Analysis (CFD-IA). We analyze fluid flow in the internal architecture of the human meniscus across a range of inlet velocities (0.1mm/s to 1.6m/s) using high-resolution 3D micro-computed tomography scans. Statistical correlations are observed between architectural parameters (tortuosity, connectivity, porosity, pore size) and fluid flow parameters (Re number distribution, permeability). Some channels exhibit Re values of 1400 at an inlet velocity of 1.6m/s, and a transition from Darcy's regime to a non-Darcian regime occurs around an inlet velocity of 0.02m/s. Location-dependent permeability ranges from 20-32 Darcy. Regression modelling reveals a strong correlation between fluid velocity and tortuosity at high inlet velocities, as well as with channel diameter at low inlet velocities. At higher inlet velocities, flow paths deviate more from the preferential direction, resulting in a decrease in the concentration parameter by an average of 0.4. This research provides valuable insights into the fluid flow behaviour within the meniscus and its structural influences.
    摘要 人门韧带组织是一种层次结构,其特性受到含氧残基的含量和排列方式的影响。理解这些结构和性能之间的关系是疾病管理、治疗开发和生物材料设计的关键。人门韧带内部层次结构比外层更软和可变形,这是因为充满气流的彩虹涂层通道导致的。为了研究这些关系,我们提出了一种结合计算流动力学(CFD)和图像分析(CFD-IA)的新方法。我们使用高分辨率3D微型计算机断层扫描器来分析人门韧带内部的液体流动情况,并对输入速度(0.1mm/s至1.6m/s)进行了 Statistical correlations were observed between architectural parameters (tortuosity, connectivity, porosity, pore size) and fluid flow parameters (Re number distribution, permeability). Some channels exhibited Re values of 1400 at an inlet velocity of 1.6m/s, and a transition from Darcy's regime to a non-Darcian regime occurred around an inlet velocity of 0.02m/s. Location-dependent permeability ranged from 20-32 Darcy. Regression modeling revealed a strong correlation between fluid velocity and tortuosity at high inlet velocities, as well as with channel diameter at low inlet velocities. At higher inlet velocities, flow paths deviated more from the preferential direction, resulting in a decrease in the concentration parameter by an average of 0.4. This research provides valuable insights into the fluid flow behavior within the meniscus and its structural influences. investigation of fluid flow behavior within the meniscus across a range of inlet velocities. We found that the internal architecture of the meniscus has a significant impact on fluid flow, and that there is a strong correlation between architectural parameters and fluid flow parameters. Our findings provide valuable insights into the relationship between structure and properties in the meniscus, and have important implications for disease management, treatment development, and biomaterial design.

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

  • paper_url: http://arxiv.org/abs/2307.12980
  • repo_url: https://github.com/JindongGu/Awesome-Prompting-on-Vision-Language-Model
  • paper_authors: Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr
    for: This paper provides a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models, including multimodal-to-text generation models, image-text matching models, and text-to-image generation models.methods: The paper discusses various prompting methods for vision-language models, including manually created natural language instructions and automatically generated prompts as natural language instructions or vector representations.results: The paper summarizes and discusses the results of prompt engineering on vision-language models, including the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. The paper also discusses the commonalities and differences between prompting on vision-language models, language models, and vision models, as well as the challenges, future directions, and research opportunities in this field.Here is the information in Simplified Chinese text:for: 这篇论文提供了三种视觉语言模型的前沿研究报告,包括多模态文本生成模型、图像文本匹配模型以及文本图像生成模型。methods: 论文讨论了不同类型的提示方法,包括手动创建的自然语言指令以及自动生成的提示。results: 论文总结并讨论了视觉语言模型上的提示工程结果,包括基于提示进行预测而无需更新模型参数,以及使用大型预训练模型在实际任务中更加容易应用。论文还讨论了视觉语言模型、语言模型和视模型之间的相似性和不同点,以及这一领域的挑战、未来发展和研究机遇。
    Abstract Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e.g. Flamingo), image-text matching models (e.g. CLIP), and text-to-image generation models (e.g. Stable Diffusion). For each type of model, a brief model summary, prompting methods, prompting-based applications, and the corresponding responsibility and integrity issues are summarized and discussed. Furthermore, the commonalities and differences between prompting on vision-language models, language models, and vision models are also discussed. The challenges, future directions, and research opportunities are summarized to foster future research on this topic.
    摘要 广泛应用工程技术是一种方法,它利用大型预训练模型,通过添加任务特定的提示(即提示),以适应新任务。提示可以手动创建为自然语言指令,或者生成自然语言指令或者向量表示。广泛应用工程技术允许基于提示进行预测,而不需要更新模型参数,并且可以轻松地应用大型预训练模型在实际任务中。在过去几年中,广泛应用工程技术在自然语言处理领域得到了广泛的研究。在最近几年中,它也在视觉语言模型中得到了广泛的研究。然而,目前没有一篇系统的概述了广泛应用工程技术在预训练视觉语言模型上的研究。这篇论文旨在提供了广泛应用工程技术在三种类型的预训练视觉语言模型上的全面概述:多模态到文本生成模型(例如FLAMINGO)、图像文本匹配模型(例如CLIP)和文本到图像生成模型(例如稳定扩散)。对于每种模型,我们将 briefly描述模型的概述、提示方法、基于提示的应用和相应的责任和道德问题。此外,我们还将讨论广泛应用工程技术在视觉语言模型、语言模型和视觉模型之间的相似和不同。 finally,我们将 SUMMARIZE 未来的挑战、未来方向和研究机会,以促进未来在这个领域的研究。

DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

  • paper_url: http://arxiv.org/abs/2307.12972
  • repo_url: https://github.com/IDEA-Research/3D-deformable-attention
  • paper_authors: Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang
  • for: 提高2D图像特征的3D检测精度,通过将多视图2D图像特征映射到一个统一的3D空间中。
  • methods: 提出了一种新的操作符,即3D DeFormable Attention (DFA3D),用于2D-to-3D特征提升,该操作符可以帮助解决depth ambiguity问题,并且可以逐层进行特征细化。
  • results: 实验结果表明,DFA3D可以提高nuScenes数据集上的平均精度+1.41%,并且在高质量深度信息可用时可以达到+15.1%的提高。
    Abstract In this paper, we propose a new operator, called 3D DeFormable Attention (DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image features into a unified 3D space for 3D object detection. Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem. In contrast, our DFA3D-based method first leverages the estimated depth to expand each view's 2D feature map to 3D and then utilizes DFA3D to aggregate features from the expanded 3D feature maps. With the help of DFA3D, the depth ambiguity problem can be effectively alleviated from the root, and the lifted features can be progressively refined layer by layer, thanks to the Transformer-like architecture. In addition, we propose a mathematically equivalent implementation of DFA3D which can significantly improve its memory efficiency and computational speed. We integrate DFA3D into several methods that use 2D attention-based feature lifting with only a few modifications in code and evaluate on the nuScenes dataset. The experiment results show a consistent improvement of +1.41\% mAP on average, and up to +15.1\% mAP improvement when high-quality depth information is available, demonstrating the superiority, applicability, and huge potential of DFA3D. The code is available at https://github.com/IDEA-Research/3D-deformable-attention.git.
    摘要 在这篇论文中,我们提出了一新的操作符,即3D DeFormable Attention(DFA3D),用于2D-to-3D特征提升,该操作将多视图2D图像特征转换到一个统一的3D空间中,用于3D对象检测。现有的特征提升方法,如Lift-Splat-based和2D attention-based,可以通过利用估计的深度来获得pseudo LiDAR特征,然后将其扩展到3D空间,这是一个一旦性操作而不包含特征细化,或者忽略深度并通过2D attention机制提升特征,这可以达到更细的 semantics,但是受到深度抽象问题困扰。相比之下,我们的DFA3D-based方法首先利用估计的深度来扩展每个视图的2D特征图到3D,然后通过DFA3D机制来聚合来自扩展的3D特征图中的特征。通过DFA3D的帮助,可以有效解决深度抽象问题,并且可以逐层进行特征细化, благо于Transformer-like架构。此外,我们还提出了DFA3D的数学等效实现方式,可以显著提高内存利用率和计算速度。我们将DFA3D integrate到了使用2D attention-based特征提升的一些方法中,只需要在代码中做一些微调,并对nuScenes数据集进行评估。实验结果表明,DFA3D可以提供+1.41\% mAP的平均提升,并且在高质量深度信息可用时可以达到+15.1\% mAP的最大提升,这说明DFA3D的超越、可应用性和巨大的潜力。代码可以在https://github.com/IDEA-Research/3D-deformable-attention.git中找到。

Volcanic ash delimitation using Artificial Intelligence based on Pix2Pix

  • paper_url: http://arxiv.org/abs/2307.12970
  • repo_url: None
  • paper_authors: Christian Carrillo, Gissela Torres, Christian Mejia-Escobar
  • for: 这项研究的目的是提出一种基于深度学习的 ash 云定义方法,以帮助预测和 mitigate 火山喷发的影响。
  • methods: 该方法使用 Pix2Pix 模型,一种基于生成对抗网络的技术,将多spectral 卫星图像转换为黑白 ash 云图像。
  • results: 试验结果表明,该方法可以准确地定义 ash 云,并且可以在任何地区应用。这种方法可以帮助预测和 mitigate 火山喷发的影响,成为一种有用的工具。
    Abstract Volcanic eruptions emit ash that can be harmful to human health and cause damage to infrastructure, economic activities and the environment. The delimitation of ash clouds allows to know their behavior and dispersion, which helps in the prevention and mitigation of this phenomenon. Traditional methods take advantage of specialized software programs to process the bands or channels that compose the satellite images. However, their use is limited to experts and demands a lot of time and significant computational resources. In recent years, Artificial Intelligence has been a milestone in the computational treatment of complex problems in different areas. In particular, Deep Learning techniques allow automatic, fast and accurate processing of digital images. The present work proposes the use of the Pix2Pix model, a type of generative adversarial network that, once trained, learns the mapping of input images to output images. The architecture of such a network consisting of a generator and a discriminator provides the versatility needed to produce black and white ash cloud images from multispectral satellite images. The evaluation of the model, based on loss and accuracy plots, a confusion matrix, and visual inspection, indicates a satisfactory solution for accurate ash cloud delineation, applicable in any area of the world and becomes a useful tool in risk management.
    摘要

Learning Dense Correspondences between Photos and Sketches

  • paper_url: http://arxiv.org/abs/2307.12967
  • repo_url: https://github.com/cogtoolslab/photo-sketch-correspondence
  • paper_authors: Xuanchen Lu, Xiaolong Wang, Judith E Fan
  • For: The paper aims to support the ability of artificial systems to understand visual images at different levels of abstraction, with a focus on sketch-photo correspondence.* Methods: The paper introduces a new sketch-photo correspondence benchmark called $\textit{PSC6k}$, which contains 150K annotations of 6250 sketch-photo pairs across 125 object categories. The authors also propose a self-supervised method for learning dense correspondences between sketch-photo pairs, using a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo.* Results: The authors found that their approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, their benchmark also revealed systematic differences between predictions of the suite of models they tested and those of humans.
    Abstract Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic. Moreover, human sketch understanding goes beyond categorization -- critically, it also entails understanding how individual elements within a sketch correspond to parts of the physical world it represents. What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, $\textit{PSC6k}$, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. Our model uses a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo extracted by a contrastive learning-based ConvNet backbone. We found that this approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, our benchmark also revealed systematic differences between predictions of the suite of models we tested and those of humans. Taken together, our work suggests a promising path towards developing artificial systems that achieve more human-like understanding of visual images at different levels of abstraction. Project page: https://photo-sketch-correspondence.github.io
    摘要 人类可以轻松地理解绘图和实际世界之间的连接,即使绘图不够真实。此外,人类绘图理解不仅是分类,而且还包括理解绘图中的个体元素与物理世界中的部件之间的对应关系。为解答这个问题,我们提出了两个贡献:首先,我们 introduce a new sketch-photo correspondence benchmark, $\textit{PSC6k}$,包含150,000个绘图-照片对的125个物品类别中的6,250个对。我们将现有的Sketchy数据集补充了细化的对应 metadata。其次,我们提出了一种自动学习的方法,用于学习绘图-照片对的密集对应关系。我们基于现有的对应学习方法,使用一个空间变换网络来估计绘图和照片中latent表示的截面流。我们发现这种方法在多个强基elines上表现出色,并且生成的预测与其他截面基eline相比具有较高的准确性。然而,我们的benchmark还发现系统性的差异 между模型的预测和人类的预测。总之,我们的工作表明了在不同层次的视觉图像理解方面可以开发出更人类化的人工系统。我们的研究可能会为视觉计算机科学和机器学习领域的发展提供新的思路和方法。项目页面:https://photo-sketch-correspondence.github.io

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

  • paper_url: http://arxiv.org/abs/2307.12964
  • repo_url: None
  • paper_authors: Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar
  • for: This paper focuses on the task of text-to-video retrieval, specifically addressing the issue of neglecting audio information in previous methods.
  • methods: The proposed method, TEFAL, uses two independent cross-modal attention blocks to enable the text to attend to the audio and video representations separately, producing both audio and video representations conditioned on the text query.
  • results: The proposed method achieves better than state-of-the-art performance consistently across four benchmark datasets, including MSR-VTT, LSMDC, VATEX, and Charades, demonstrating its efficacy in capturing complementary audio and video information pertinent to the text query.Here’s the simplified Chinese version of the three key points:
  • for: 这篇论文关注了文本到视频回归任务,特别是之前的方法忽略了音频信息的问题。
  • methods: 提议的方法TEFAL使用了两个独立的跨模态注意力块,使文本能够独立地对音频和视频表示进行注意力调整,生成了基于文本查询的音频和视频表示。
  • results: 提议的方法在四个标准测试集MSR-VTT、LSMDC、VATEX和Charades上取得了比前STATE-OF-THE-ART性能更好的结果, demonstarting its efficacy in capturing相关的音频和视频信息。
    Abstract Text-to-video retrieval systems have recently made significant progress by utilizing pre-trained models trained on large-scale image-text pairs. However, most of the latest methods primarily focus on the video modality while disregarding the audio signal for this task. Nevertheless, a recent advancement by ECLIPSE has improved long-range text-to-video retrieval by developing an audiovisual video representation. Nonetheless, the objective of the text-to-video retrieval task is to capture the complementary audio and video information that is pertinent to the text query rather than simply achieving better audio and video alignment. To address this issue, we introduce TEFAL, a TExt-conditioned Feature ALignment method that produces both audio and video representations conditioned on the text query. Instead of using only an audiovisual attention block, which could suppress the audio information relevant to the text query, our approach employs two independent cross-modal attention blocks that enable the text to attend to the audio and video representations separately. Our proposed method's efficacy is demonstrated on four benchmark datasets that include audio: MSR-VTT, LSMDC, VATEX, and Charades, and achieves better than state-of-the-art performance consistently across the four datasets. This is attributed to the additional text-query-conditioned audio representation and the complementary information it adds to the text-query-conditioned video representation.
    摘要 Text-to-video遥感系统在最近几年内已经取得了重要进步,通过使用预训练模型,这些模型在大规模的图像-文本对中训练。然而,大多数最新的方法主要关注视频模式,而忽略了声音信号。然而,ECLIPSE的最新进展已经改进了长距离文本-视频遥感。不过,文本-视频遥感任务的目标是捕捉文本查询中相关的声音和视频信息,而不仅仅是实现更好的声音和视频对齐。为解决这个问题,我们提出了TEFAL方法,它是一种基于文本查询的特征对齐方法,它生成了基于文本查询的声音和视频表示。相比使用仅仅的 audiovisual注意块,我们的方法使用两个独立的跨模态注意块,这些注意块使得文本能够独立地对声音和视频表示进行注意。我们的提议方法在四个标准数据集上进行了评估,这些数据集包括声音:MSR-VTT、LSMDC、VATEX和Charades,并在这些数据集上达到了比前方的性能。这是因为我们的方法添加了基于文本查询的声音表示,这个表示提供了与文本查询conditioned的视频表示相 complementary的信息。

HOOD: Real-Time Robust Human Presence and Out-of-Distribution Detection with Low-Cost FMCW Radar

  • paper_url: http://arxiv.org/abs/2308.02396
  • repo_url: None
  • paper_authors: Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach
  • for: 这个论文的目的是提出一种实时稳定的人员存在检测方法,以解决在室内环境中 millimeter-wave频率调制连续扫描(FMCW)雷达中人员存在检测的挑战。
  • methods: 该方法基于60GHz短距离FMCW雷达,并利用干扰图像(RDI)来实现实时人员存在和异常检测。方法根据存在或缺失人员的情况,将检测问题转化为异常检测问题,并通过一种重建性架构来实现。
  • results: 在使用60GHz短距离FMCW雷达进行数据收集后,该方法在HOOD测试数据集上 achieve an average AUROC of 94.36%。此外,对比之前的State-of-the-art(SOTA)异常检测方法,HOOD方法在常见的异常检测指标上表现更高。实时实验结果可以在https://muskahya.github.io/HOOD中查看。
    Abstract Human presence detection in indoor environments using millimeter-wave frequency-modulated continuous-wave (FMCW) radar is challenging due to the presence of moving and stationary clutters in indoor places. This work proposes "HOOD" as a real-time robust human presence and out-of-distribution (OOD) detection method by exploiting 60 GHz short-range FMCW radar. We approach the presence detection application as an OOD detection problem and solve the two problems simultaneously using a single pipeline. Our solution relies on a reconstruction-based architecture and works with radar macro and micro range-Doppler images (RDIs). HOOD aims to accurately detect the "presence" of humans in the presence or absence of moving and stationary disturbers. Since it is also an OOD detector, it aims to detect moving or stationary clutters as OOD in humans' absence and predicts the current scene's output as "no presence." HOOD is an activity-free approach that performs well in different human scenarios. On our dataset collected with a 60 GHz short-range FMCW Radar, we achieve an average AUROC of 94.36%. Additionally, our extensive evaluations and experiments demonstrate that HOOD outperforms state-of-the-art (SOTA) OOD detection methods in terms of common OOD detection metrics. Our real-time experiments are available at: https://muskahya.github.io/HOOD
    摘要 人体存在检测在室内环境中使用毫米波频率调制连续波(FMCW)雷达是具有挑战性,因为室内存在移动和静止干扰物。这项工作提出了“HOOD”实时可靠人体存在和异常检测方法,通过利用60GHz短距离FMCW雷达。我们将存在检测应用作为异常检测问题,并将两个问题同时解决在单一管道中。我们的解决方案基于重建 architecture,并与雷达宽 macro和微范围Doppler图像(RDI)结合。HOOD hopes to accurately detect the "presence" of humans in the presence or absence of moving and stationary disturbers。此外,它还 hopes to检测移动或静止干扰物作为异常,并预测当前场景的输出为“无存”。HOOD是一种活动无关的方法,在不同的人类场景中表现良好。根据我们收集的60GHz短距离FMCW雷达数据集,我们实现了平均AUROC为94.36%。此外,我们的广泛评估和实验表明,HOOD在常见异常检测指标上表现出色,超过了现状顶峰(SOTA)异常检测方法。实时实验可以在:https://muskahya.github.io/HOOD

Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2307.12909
  • repo_url: None
  • paper_authors: Shangzhan Zhang, Sida Peng, Yinji ShenTu, Qing Shuai, Tianrun Chen, Kaicheng Yu, Hujun Bao, Xiaowei Zhou
  • for: 本文提出了一种新的方法,用于编辑动态场景中的NeRF的本地外观。
  • methods: 本方法使用了一种新的表示方法,将编辑区域的表示插入到原始NeRF和旋转学习网络中,以便在不同的帧数据上进行渲染和插值。
  • results: 经过广泛的评估,本方法可以准确地编辑动态场景中的NeRF外观,并且可以保持空间和时间上的一致性。
    Abstract Recently, the editing of neural radiance fields (NeRFs) has gained considerable attention, but most prior works focus on static scenes while research on the appearance editing of dynamic scenes is relatively lacking. In this paper, we propose a novel framework to edit the local appearance of dynamic NeRFs by manipulating pixels in a single frame of training video. Specifically, to locally edit the appearance of dynamic NeRFs while preserving unedited regions, we introduce a local surface representation of the edited region, which can be inserted into and rendered along with the original NeRF and warped to arbitrary other frames through a learned invertible motion representation network. By employing our method, users without professional expertise can easily add desired content to the appearance of a dynamic scene. We extensively evaluate our approach on various scenes and show that our approach achieves spatially and temporally consistent editing results. Notably, our approach is versatile and applicable to different variants of dynamic NeRF representations.
    摘要 近些时候,神经辐射场(NeRF)的编辑技术已经吸引了广泛的关注,但大多数前一些作品都是静止场景的,关于动态场景的外观编辑研究相对较少。在这篇论文中,我们提出了一种新的框架,用于编辑动态NeRF的本地外观。具体来说,我们引入了一种修改区域的本地表面表示,可以在训练视频帧中插入并与原始NeRF和扭曲学习的运动表示网络一起渲染。通过我们的方法,用户无需专业技能就可以轻松地添加愿望的内容到动态场景的外观中。我们对多个场景进行了广泛的评估,并证明了我们的方法可以在空间和时间上实现一致的编辑结果。值得一提的是,我们的方法可以应用于不同的动态NeRF表示方式。

cs.AI - 2023-07-25

Argument Attribution Explanations in Quantitative Bipolar Argumentation Frameworks (Technical Report)

  • paper_url: http://arxiv.org/abs/2307.13582
  • repo_url: None
  • paper_authors: Xiang Yin, Nico Potyka, Francesca Toni
  • for: 这篇论文旨在解释Argumentation Frameworks(AFs)的量化结果,具体来说是解释Quantitative Bipolar Argumentation Frameworks(QBAFs)中的话题论点。
  • methods: 该论文提出了一种新的Argument Attribution Explanations(AAEs)理论,兼用机器学习中的特征归因来解释AFs中的论点。
  • results: 论文通过两个实践案例(即假新闻检测和电影推荐系统)来示例AAEs的应用性。
    Abstract Argumentative explainable AI has been advocated by several in recent years, with an increasing interest on explaining the reasoning outcomes of Argumentation Frameworks (AFs). While there is a considerable body of research on qualitatively explaining the reasoning outcomes of AFs with debates/disputes/dialogues in the spirit of extension-based semantics, explaining the quantitative reasoning outcomes of AFs under gradual semantics has not received much attention, despite widespread use in applications. In this paper, we contribute to filling this gap by proposing a novel theory of Argument Attribution Explanations (AAEs) by incorporating the spirit of feature attribution from machine learning in the context of Quantitative Bipolar Argumentation Frameworks (QBAFs): whereas feature attribution is used to determine the influence of features towards outputs of machine learning models, AAEs are used to determine the influence of arguments towards topic arguments of interest. We study desirable properties of AAEs, including some new ones and some partially adapted from the literature to our setting. To demonstrate the applicability of our AAEs in practice, we conclude by carrying out two case studies in the scenarios of fake news detection and movie recommender systems.
    摘要 争议解释AI在最近几年来得到了许多人的支持,感兴趣的是解释Argumentation Frameworks(AFs)的结果的逻辑过程。虽然有许多关于使用辩论/争议/对话来解释AFs的质量的研究,但是对于使用加权 semantics来解释AFs的量化逻辑结果没有很多关注,尽管这在应用中广泛使用。在这篇论文中,我们减轻这一点的空白,我们提出了一种新的Argument Attribution Explanations(AAEs)理论,该理论基于机器学习中的特征归因,用于解释QBAFs中的话题Arguments。而特征归因用于确定机器学习模型输出的特征对输出产生的影响,AAEs则用于确定话题Arguments对QBAFs中的话题Arguments的影响。我们研究了AAEs的愉悦性质,包括一些新的和一些从文献中部分适应我们的设置。为了证明AAEs在实践中的可行性,我们在假新闻检测和电影推荐系统两个场景中进行了两个案例研究。

Reinterpreting survival analysis in the universal approximator age

  • paper_url: http://arxiv.org/abs/2307.13579
  • repo_url: https://github.com/sdittmer/survival_analysis_sumo_plus_plus
  • paper_authors: Sören Dittmer, Michael Roberts, Jacobus Preller, AIX COVNET, James H. F. Rudd, John A. D. Aston, Carola-Bibiane Schönlieb
  • for: 本研究旨在提供用于深度学习中survival分析的工具,以便充分发挥survival分析的潜在力量。
  • methods: 本研究使用的方法包括新的损失函数、评价指标和首个universal approximating网络,这些工具可以无需数值integation生成survival曲线。
  • results: 研究表明,新的损失函数和模型在大规模的数据研究中表现出色,超过其他方法的表现。
    Abstract Survival analysis is an integral part of the statistical toolbox. However, while most domains of classical statistics have embraced deep learning, survival analysis only recently gained some minor attention from the deep learning community. This recent development is likely in part motivated by the COVID-19 pandemic. We aim to provide the tools needed to fully harness the potential of survival analysis in deep learning. On the one hand, we discuss how survival analysis connects to classification and regression. On the other hand, we provide technical tools. We provide a new loss function, evaluation metrics, and the first universal approximating network that provably produces survival curves without numeric integration. We show that the loss function and model outperform other approaches using a large numerical study.
    摘要 生存分析是统计工具箱中的一个重要组成部分。然而,在经典统计领域中,深度学习已经广泛应用,而生存分析则只是在深度学习社区中最近才得到了一些微的注意。这种最近的发展可能与COVID-19大流行有关。我们的目标是为生存分析在深度学习中充分发挥作用提供工具。一方面,我们讨论了生存分析与分类和回归之间的联系。另一方面,我们提供了技术工具。我们提出了一个新的损失函数、评估指标和首个可靠地生成Survival Curve的网络。我们通过大规模的数值研究表明,我们的损失函数和模型在其他方法的比较中表现出色。

A Dual-mode Local Search Algorithm for Solving the Minimum Dominating Set Problem

  • paper_url: http://arxiv.org/abs/2307.16815
  • repo_url: None
  • paper_authors: Enqiang Zhu, Yu Zhang, Shengzhi Wang, Darren Strash, Chanjuan Liu
  • for: 解决图形中最小控制集(MinDS)问题,即找到一个最小的集合 $D$,使得每个不在 $D$ 中的顶点都与至少一个 $D$ 中的顶点相邻。
  • methods: 我们提出了一种有效的本地搜索算法(DmDS),它采用了两种不同的顶点交换方案来解决MinDS问题。此外,我们还提出了一种基于频率的顶点选择 criterion,以解决其他算法中的各种绑定情况,以及一种新的Initial Solution质量提高策略,基于批处理和扰动。
  • results: 我们对 seven 个数据集进行了比较,包括 346 个实例(或家族),最多有十亿个顶点。实验结果表明,DmDS 在大多数实例中具有最高的准确率,并在广泛的实际图形上发现了许多更好的解决方案。
    Abstract Given a graph, the minimum dominating set (MinDS) problem is to identify a smallest set $D$ of vertices such that every vertex not in $D$ is adjacent to at least one vertex in $D$. The MinDS problem is a classic $\mathcal{NP}$-hard problem and has been extensively studied because of its many disparate applications in network analysis. To solve this problem efficiently, many heuristic approaches have been proposed to obtain a good solution within an acceptable time limit. However, existing MinDS heuristic algorithms are always limited by various tie-breaking cases when selecting vertices, which slows down the effectiveness of the algorithms. In this paper, we design an efficient local search algorithm for the MinDS problem, named DmDS -- a dual-mode local search framework that probabilistically chooses between two distinct vertex-swapping schemes. We further address limitations of other algorithms by introducing vertex selection criterion based on the frequency of vertices added to solutions to address tie-breaking cases, and a new strategy to improve the quality of the initial solution via a greedy-based strategy integrated with perturbation. We evaluate DmDS against the state-of-the-art algorithms on seven datasets, consisting of 346 instances (or families) with up to tens of millions of vertices. Experimental results show that DmDS obtains the best performance in accuracy for almost all instances and finds much better solutions than state-of-the-art MinDS algorithms on a broad range of large real-world graphs.
    摘要 Existing MinDS heuristic algorithms are limited by various tie-breaking cases when selecting vertices, which slows down their effectiveness. In this paper, we propose an efficient local search algorithm for the MinDS problem, called DmDS, which uses a dual-mode local search framework that probabilistically chooses between two distinct vertex-swapping schemes.To address limitations of other algorithms, we introduce a vertex selection criterion based on the frequency of vertices added to solutions to address tie-breaking cases, and a new strategy to improve the quality of the initial solution via a greedy-based strategy integrated with perturbation.We evaluate DmDS against state-of-the-art algorithms on seven datasets, consisting of 346 instances (or families) with up to tens of millions of vertices. Experimental results show that DmDS obtains the best performance in accuracy for almost all instances and finds much better solutions than state-of-the-art MinDS algorithms on a broad range of large real-world graphs.Here is the text in Simplified Chinese:给定一个图,最小控制集(MinDS)问题是找到最小的集合 $D$ 的 vertices,使得每个不在 $D$ 中的 vertex 都与至少一个在 $D$ 中的 vertex 相邻。这是一个 класси的 $\mathcal{NP}$-hard 问题,广泛的研究了因为它在网络分析中的许多实际应用。现有的 MinDS 规则算法都受到不同的选择情况的限制,这会使得它们的效iveness降低。在这篇论文中,我们提出一种高效的本地搜索算法 для MinDS 问题,名为 DmDS,它使用了一种 dual-mode 本地搜索框架, probabilistically 选择两种不同的 vertex-swapping 策略。为了解决其他算法的限制,我们引入一个基于频率的 vertex 选择标准,以 Address 选择情况中的僵尸性,并 introducing a new strategy to improve the quality of the initial solution via a greedy-based strategy integrated with perturbation。我们对 seven 个 datasets 进行了对比,这些 datasets 包括 346 个实例(或家族),最多达到了 tens of millions 的 vertices。实验结果显示,DmDS 在大多数实例中具有最高的准确性,并在许多实际世界图上发现了 much better 的解决方案,远超现有的 MinDS 算法。

The Impact of Imperfect XAI on Human-AI Decision-Making

  • paper_url: http://arxiv.org/abs/2307.13566
  • repo_url: None
  • paper_authors: Katelyn Morrison, Philipp Spitzer, Violet Turri, Michelle Feng, Niklas Kühl, Adam Perer
  • for: 这研究旨在探讨人类和AI协作中如何处理不准确的解释,以提高人类和AI协作的效果。
  • methods: 本研究采用了混合方法,包括人类参与者136人的混合研究,以评估人类在鸟种识别任务中对不准确解释的影响。
  • results: 研究发现,不准确解释会影响人类对AI的依赖度和人类-AI团队性能。此外,解释的强度也影响人类的决策行为。这些发现有助于理解人类和AI协作中的不准确解释的影响,并提供设计人类-AI协作系统的指南。
    Abstract Explainability techniques are rapidly being developed to improve human-AI decision-making across various cooperative work settings. Consequently, previous research has evaluated how decision-makers collaborate with imperfect AI by investigating appropriate reliance and task performance with the aim of designing more human-centered computer-supported collaborative tools. Several human-centered explainable AI (XAI) techniques have been proposed in hopes of improving decision-makers' collaboration with AI; however, these techniques are grounded in findings from previous studies that primarily focus on the impact of incorrect AI advice. Few studies acknowledge the possibility for the explanations to be incorrect even if the AI advice is correct. Thus, it is crucial to understand how imperfect XAI affects human-AI decision-making. In this work, we contribute a robust, mixed-methods user study with 136 participants to evaluate how incorrect explanations influence humans' decision-making behavior in a bird species identification task taking into account their level of expertise and an explanation's level of assertiveness. Our findings reveal the influence of imperfect XAI and humans' level of expertise on their reliance on AI and human-AI team performance. We also discuss how explanations can deceive decision-makers during human-AI collaboration. Hence, we shed light on the impacts of imperfect XAI in the field of computer-supported cooperative work and provide guidelines for designers of human-AI collaboration systems.
    摘要 <>人工智能技术在协作工作场景中快速发展,以提高人机协作决策。先前的研究已经评估了人与不完美AI的协作方式,并设计了更人类中心的计算机支持协作工具。然而,这些技术多数基于先前研究中关注 incorrect AI 建议的影响。很少的研究承认可能存在 incorrect 的解释,即使 AI 建议正确。因此,理解 incorrect XAI 如何影响人机协作决策是关键。在这种情况下,我们通过一项强大的混合方法用户研究,卷入 136 名参与者,评估 incorrect 解释如何影响人们决策行为,包括他们的专业水平和解释的强硬程度。我们的发现表明 incorrect XAI 和参与者的专业水平对人机协作的可靠性和性能产生了影响。我们还讨论了解释如何在人机协作中欺骗决策者。因此,我们为计算机支持协作系统的设计提供了指导,并着重于人机协作中 incorrect XAI 的影响。

Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities

  • paper_url: http://arxiv.org/abs/2307.13565
  • repo_url: https://github.com/predopt/predopt-benchmarks
  • paper_authors: Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, Ferdinando Fioretto
  • for: 这篇论文主要是为了介绍决策关注学习(DFL)这一新兴机器学习 paradigma,它将预测和优化结合在一个端到端系统中,以便在不确定环境下做出优化决策。
  • methods: 论文介绍了各种将机器学习和优化模型集成的技术,并提出了一种分类DFL方法的 Taxonomy,以及一些适用于DFL的测试数据集和任务。
  • results: 论文进行了广泛的实验评估,对DFL方法进行了valuable的探索和评估,并提供了有价值的Future research direction。
    Abstract Decision-focused learning (DFL) is an emerging paradigm in machine learning which trains a model to optimize decisions, integrating prediction and optimization in an end-to-end system. This paradigm holds the promise to revolutionize decision-making in many real-world applications which operate under uncertainty, where the estimation of unknown parameters within these decision models often becomes a substantial roadblock. This paper presents a comprehensive review of DFL. It provides an in-depth analysis of the various techniques devised to integrate machine learning and optimization models, introduces a taxonomy of DFL methods distinguished by their unique characteristics, and conducts an extensive empirical evaluation of these methods proposing suitable benchmark dataset and tasks for DFL. Finally, the study provides valuable insights into current and potential future avenues in DFL research.
    摘要 决策关注学习(DFL)是一种emerging paradigm在机器学习领域,它允许模型通过端到端系统来优化决策,并将预测和优化结合在一起。这种方法在不确定环境下进行决策,对决策模型中未知参数的估计成为了一个重要的障碍。本文提供了DFL的全面回顾,包括不同方法的集成、机器学习和优化模型的分类、以及对这些方法的广泛实验评估。最后,研究还提供了DFL研究的当前和未来可能的方向。Here's the translation of the text in Traditional Chinese:决策关注学习(DFL)是一种emerging paradigm在机器学习领域,它允许模型透过端到端系统来优化决策,并将预测和优化结合在一起。这种方法在不确定环境下进行决策,对决策模型中未知参数的估计成为了一个重要的障碍。本文提供了DFL的全面回顾,包括不同方法的集成、机器学习和优化模型的分类、以及对这些方法的广泛实验评估。最后,研究还提供了DFL研究的现在和未来可能的方向。

On Solving the Rubik’s Cube with Domain-Independent Planners Using Standard Representations

  • paper_url: http://arxiv.org/abs/2307.13552
  • repo_url: None
  • paper_authors: Bharath Muppasani, Vishal Pallagani, Biplav Srivastava, Forest Agostinelli
    for:这篇论文的目的是将 Rubik’s Cube puzzle 表示为 PDDL 语言,以便更好地访问 PDDL планировщики、竞赛和知识工程工具,并使其更易于人类阅读。methods:该论文使用了 PDDL 语言表示 Rubik’s Cube puzzle,并与现有的方法进行比较。其中包括使用 DeepCubeA 搜索算法和 Scorpion планировщиker 的 State-Action-Space+ 表示方法,以及 FastDownward 搜索算法和 FF 规则的组合。results:该论文的实验结果显示,使用 PDDL 语言表示 Rubik’s Cube puzzle可以提高 solve 率,但是不同的表示方法和搜索算法之间存在负荷和优化的贸易offs。Specifically, DeepCubeA 搜索算法可以解决所有问题,但只有78.5%是优化的计划;Scorpion планировщиker 可以解决61.50%的问题,其中79.64%是优化的计划。
    Abstract Rubik's Cube (RC) is a well-known and computationally challenging puzzle that has motivated AI researchers to explore efficient alternative representations and problem-solving methods. The ideal situation for planning here is that a problem be solved optimally and efficiently represented in a standard notation using a general-purpose solver and heuristics. The fastest solver today for RC is DeepCubeA with a custom representation, and another approach is with Scorpion planner with State-Action-Space+ (SAS+) representation. In this paper, we present the first RC representation in the popular PDDL language so that the domain becomes more accessible to PDDL planners, competitions, and knowledge engineering tools, and is more human-readable. We then bridge across existing approaches and compare performance. We find that in one comparable experiment, DeepCubeA (trained with 12 RC actions) solves all problems with varying complexities, albeit only 78.5% are optimal plans. For the same problem set, Scorpion with SAS+ representation and pattern database heuristics solves 61.50% problems optimally, while FastDownward with PDDL representation and FF heuristic solves 56.50% problems, out of which 79.64% of the plans generated were optimal. Our study provides valuable insights into the trade-offs between representational choice and plan optimality that can help researchers design future strategies for challenging domains combining general-purpose solving methods (planning, reinforcement learning), heuristics, and representations (standard or custom).
    摘要 瑞比克立方体(RC)是一个知名且 computationally challenging 的游戏,它激发了人工智能研究者们开发高效的代表法和解决方法。理想情况是将问题解决得最优化地,使用标准notation representation 和通用的解决器和规则。目前最快的解决器是 DeepCubeA WITH custom representation,另一种方法是使用 Scorpion плаanner WITH State-Action-Space+(SAS+) representation。在这篇论文中,我们将 RC 的第一个 representation 在流行的 PDDL 语言中提供,使得Domain 变得更加可达性和可读性更高,并且可以用于 PDDL плаanner、竞赛和知识工程工具。然后,我们将现有的方法相互连接,并比较性能。我们发现在一个相同的实验中,DeepCubeA(已经训练有 12 RC 动作)可以解决具有不同复杂性的所有问题,但只有 78.5% 是优化的方案。对于同一个问题集,Scorpion WITH SAS+ representation 和模式数据库规则可以解决 61.50% 问题,而 FastDownward WITH PDDL representation 和 FF 规则可以解决 56.50% 问题,其中 79.64% 的方案是优化的。我们的研究提供了有价值的对于表示选择和方案优化的交易所,可以帮助研究人员设计未来在复杂的 Domain 中结合通用解决方法(规划、强化学习)、规则和表示(标准或自定义)的策略。

A Planning Ontology to Represent and Exploit Planning Knowledge for Performance Efficiency

  • paper_url: http://arxiv.org/abs/2307.13549
  • repo_url: None
  • paper_authors: Bharath Muppasani, Vishal Pallagani, Biplav Srivastava, Raghava Mutharaju, Michael N. Huhns, Vignesh Narayanan
  • for: 本研究旨在解决自动规划问题,即找到将智能机器人从初始状态转移到目标状态的有效动作序列。
  • methods: 本研究使用国际规划竞赛(IPC)数据 construct了规划ontology,并通过实验在两个用例中示出ontology可以选择有potential的规划器并提高其性能using macros。
  • results: 实验结果表明,使用规划ontology可以选择适合域的规划器并提高其性能。同时,研究者还为社区提供了规划ontology和相关资源,以便进一步研究。
    Abstract Ontologies are known for their ability to organize rich metadata, support the identification of novel insights via semantic queries, and promote reuse. In this paper, we consider the problem of automated planning, where the objective is to find a sequence of actions that will move an agent from an initial state of the world to a desired goal state. We hypothesize that given a large number of available planners and diverse planning domains; they carry essential information that can be leveraged to identify suitable planners and improve their performance for a domain. We use data on planning domains and planners from the International Planning Competition (IPC) to construct a planning ontology and demonstrate via experiments in two use cases that the ontology can lead to the selection of promising planners and improving their performance using macros - a form of action ordering constraints extracted from planning ontology. We also make the planning ontology and associated resources available to the community to promote further research.
    摘要 Ontologies 知道如何组织富有 metadata,支持通过semantic queries提取新的发现,并促进重用。在这篇论文中,我们考虑自动规划问题, objective 是找到一个将智能机器从初始状态转移到目标状态的 sequences of actions。我们假设, given 大量可用的 плаanner 和多样化的规划领域; 它们携带着重要信息,可以用来选择适合的 плаanner 并提高其性能。我们使用国际规划竞赛(IPC)的数据construct 规划ontology,并通过实验示例二进行了证明,规划ontology 可以选择promising planners 并使其性能提高。我们还将规划ontology 和相关资源公开发布,以便进一步的研究。

Group Activity Recognition in Computer Vision: A Comprehensive Review, Challenges, and Future Perspectives

  • paper_url: http://arxiv.org/abs/2307.13541
  • repo_url: None
  • paper_authors: Chuanchuan Wang, Ahmad Sufril Azlan Mohamed
  • For: 这篇论文主要研究目标是为了提高群体活动识别技术,具体来说是通过Global interactivity和活动的方式进行识别。* Methods: 这篇论文使用了多种方法,包括传统方法、基于空间结构的方法、描述符、非深度学习方法、层次回归神经网络(HRNN)、关系模型和注意机制等。* Results: 这篇论文对群体活动识别方法进行了全面的审视和比较,并提出了一种基于关系网络的模块化方法,并进行了实验验证。
    Abstract Group activity recognition is a hot topic in computer vision. Recognizing activities through group relationships plays a vital role in group activity recognition. It holds practical implications in various scenarios, such as video analysis, surveillance, automatic driving, and understanding social activities. The model's key capabilities encompass efficiently modeling hierarchical relationships within a scene and accurately extracting distinctive spatiotemporal features from groups. Given this technology's extensive applicability, identifying group activities has garnered significant research attention. This work examines the current progress in technology for recognizing group activities, with a specific focus on global interactivity and activities. Firstly, we comprehensively review the pertinent literature and various group activity recognition approaches, from traditional methodologies to the latest methods based on spatial structure, descriptors, non-deep learning, hierarchical recurrent neural networks (HRNN), relationship models, and attention mechanisms. Subsequently, we present the relational network and relational architectures for each module. Thirdly, we investigate methods for recognizing group activity and compare their performance with state-of-the-art technologies. We summarize the existing challenges and provide comprehensive guidance for newcomers to understand group activity recognition. Furthermore, we review emerging perspectives in group activity recognition to explore new directions and possibilities.
    摘要 There has been significant research attention on identifying group activities, and this work aims to provide a comprehensive review of the current progress in this field. We will focus on global interactivity and activities, and our approach will include the following steps:1. Literature review: We will review the relevant literature and various group activity recognition approaches, from traditional methodologies to the latest methods based on spatial structure, descriptors, non-deep learning, hierarchical recurrent neural networks (HRNN), relationship models, and attention mechanisms.2. Relational network and architectures: We will present the relational network and relational architectures for each module.3. Methods for recognizing group activity: We will investigate methods for recognizing group activity and compare their performance with state-of-the-art technologies.4. Challenges and future directions: We will summarize the existing challenges and provide comprehensive guidance for newcomers to understand group activity recognition. Additionally, we will review emerging perspectives in group activity recognition to explore new directions and possibilities.Overall, this work aims to provide a comprehensive overview of the current state of group activity recognition technology and its applications, as well as to explore new directions and possibilities for future research.

Spectrum-guided Multi-granularity Referring Video Object Segmentation

  • paper_url: http://arxiv.org/abs/2307.13537
  • repo_url: https://github.com/bo-miao/sgmg
  • paper_authors: Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian
  • for: 这个论文是为了解决现有的视频对象 segmentation (R-VOS) 技术中的 feature drift 问题,以提高 segmentation 效果。
  • methods: 该论文提出了一种 Spectrum-guided Multi-granularity (SgMg) 方法, Direct segmentation 在编码特征上进行,并使用视觉细节进行优化mask。同时,提出了 Spectrum-guided Cross-modal Fusion (SCF) 方法,在 spectral 频谱上进行了跨模态拟合。
  • results: 实验表明,SgMg 方法在四个视频测试集上达到了当前最佳性能,与 nearest competitor 相比,提高了2.8% 点的 Ref-YouTube-VOS 性能。同时,通过扩展 SgMg,实现了多对象 R-VOS,不仅快速响应,还可以保持满意的性能。
    Abstract Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (low-resolution) vision-language features to segment the decoded high-resolution features. We discovered that this causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation. This negatively affects the ability of segmentation kernels. To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks. In addition, we propose Spectrum-guided Cross-modal Fusion (SCF) to perform intra-frame global interactions in the spectral domain for effective multimodal representation. Finally, we extend SgMg to perform multi-object R-VOS, a new paradigm that enables simultaneous segmentation of multiple referred objects in a video. This not only makes R-VOS faster, but also more practical. Extensive experiments show that SgMg achieves state-of-the-art performance on four video benchmark datasets, outperforming the nearest competitor by 2.8% points on Ref-YouTube-VOS. Our extended SgMg enables multi-object R-VOS, runs about 3 times faster while maintaining satisfactory performance. Code is available at https://github.com/bo-miao/SgMg.
    摘要 当前的视频对象 segmentation (R-VOS) 技术从编码的低分辨率视Language特征中提取条件kernels来 segment decode高分辨率特征。我们发现这会导致重要的特征漂移,使segmentation kernels在前向计算中困难以感知。这 negatively affects the ability of segmentation kernels。为解决这个问题,我们提出了spectrum-guided Multi-granularity (SgMg)方法,它直接在编码特征上进行分 segmentation和使用视觉细节进一步优化Mask。此外,我们提出了spectrum-guided Cross-modal Fusion (SCF),它在spectral domain中进行了intra-frame global interactions,以实现有效的 Multimodal Representation。 finally,我们扩展了SgMg,以实现多对象R-VOS,一种新的 paradigm,可以同时 segment multiple referred objects in a video。这不仅使R-VOS更快,而且更实用。我们的扩展SgMg在四个视频 benchmark dataset上进行了广泛的实验,并达到了状态的art Performance,比 nearest competitor高2.8%点。我们的扩展SgMg可以在3倍的速度下维持满意的性能。代码可以在https://github.com/bo-miao/SgMg 中找到。

Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection

  • paper_url: http://arxiv.org/abs/2307.13529
  • repo_url: None
  • paper_authors: Yichao Cao, Xiu Su, Qingfei Tang, Feng Yang, Shan You, Xiaobo Lu, Chang Xu
  • for: 提高人员对象互动(HOI)检测的精度,使用视觉模型解决人员对象互动的复杂关系。
  • methods: 提出了一个系统atic和统一的框架(RmLR),通过结构化文本知识来增强HOI检测。首先,分析了两阶段HOI检测器中的交互信息损失,并提出了重新挖掘策略来生成更全面的视觉表示。其次,设计了更细化的句子和单词级别对齐和知识传递策略,以有效地解决多个交互和多个文本之间的多对多匹配问题。这些策略可以减轻在多个交互同时发生时出现的匹配混乱问题,从而提高对齐过程的有效性。
  • results: 实验结果表明,我们的方法可以减轻HOI检测的困难,并在公共测试集上达到状态 искусственный智能性能的最高水平。我们进一步分析了不同组成部分的影响,以便更好地理解我们的方法的作用。
    Abstract Human-Object Interaction (HOI) detection is a challenging computer vision task that requires visual models to address the complex interactive relationship between humans and objects and predict HOI triplets. Despite the challenges posed by the numerous interaction combinations, they also offer opportunities for multimodal learning of visual texts. In this paper, we present a systematic and unified framework (RmLR) that enhances HOI detection by incorporating structured text knowledge. Firstly, we qualitatively and quantitatively analyze the loss of interaction information in the two-stage HOI detector and propose a re-mining strategy to generate more comprehensive visual representation.Secondly, we design more fine-grained sentence- and word-level alignment and knowledge transfer strategies to effectively address the many-to-many matching problem between multiple interactions and multiple texts.These strategies alleviate the matching confusion problem that arises when multiple interactions occur simultaneously, thereby improving the effectiveness of the alignment process. Finally, HOI reasoning by visual features augmented with textual knowledge substantially improves the understanding of interactions. Experimental results illustrate the effectiveness of our approach, where state-of-the-art performance is achieved on public benchmarks. We further analyze the effects of different components of our approach to provide insights into its efficacy.
    摘要 人机物交互(HOI)检测是一个复杂的计算机视觉任务,需要视觉模型处理人与物之间的复杂交互关系,并预测HOI triplets。尽管交互组合多样化,但它们也提供了多模式学习视觉文本的机会。在这篇论文中,我们提出了一个系统性和统一的框架(RmLR),增强HOI检测的能力,并包括结构化文本知识。首先,我们质量和量上分析了两stage HOI检测器中的交互信息损失,并提出了重新挖掘策略,以生成更全面的视觉表示。其次,我们设计了更细grained的句子和单词级别对齐和知识传递策略,以有效地Address多对多匹配问题。这些策略可以减少同时发生多个交互时的匹配混乱问题,从而改善对齐过程的效果。最后,通过视觉特征加上文本知识来进行HOI理解,可以大幅提高交互的理解能力。实验结果表明,我们的方法可以在公共 benchMark上达到领先的性能。我们进一步分析了不同组成部分的效果,以提供对其效果的深入分析。

FacTool: Factuality Detection in Generative AI – A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

  • paper_url: http://arxiv.org/abs/2307.13528
  • repo_url: https://github.com/gair-nlp/factool
  • paper_authors: I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu
  • for: 检测生成模型中的错误信息
  • methods: 提出了一个任务和领域无关的框架,用于检测由大语言模型生成的文本中的错误信息
  • results: 在四个不同的任务(知识基础问答、代码生成、数学逻辑和科学文献评论)中,实验结果表明提出的方法有效。
    Abstract The emergence of generative pre-trained models has facilitated the synthesis of high-quality text, but it has also posed challenges in identifying factual errors in the generated text. In particular: (1) A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models. (2) Generated texts tend to be lengthy and lack a clearly defined granularity for individual facts. (3) There is a scarcity of explicit evidence available during the process of fact checking. With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). Experiments on four different tasks (knowledge-based QA, code generation, mathematical reasoning, and scientific literature review) show the efficacy of the proposed method. We release the code of FacTool associated with ChatGPT plugin interface at https://github.com/GAIR-NLP/factool .
    摘要 <>请将以下文本翻译成简化中文。<>生成模型的出现使得高质量文本的合成变得更加容易,但也使得检测生成文本中的事实错误变得更加困难。特别是:(1)更多的任务现在面临着增加的事实错误风险。(2)生成的文本往往很长,缺乏明确的粒度来分割个别的事实。(3)在 фактиче检查过程中没有明确的证据。为了解决这些挑战,在这篇论文中,我们提出了 FacTool,一种任务和领域无关的检测文本生成模型中的事实错误框架。实验在四个不同的任务(知识基础问答、代码生成、数学推理和科学文献综述)中展示了提案的效果。我们将 FacTool 相关的 ChatGPT 插件接口的代码发布在 GitHub 上,请参考

An Empirical Study on Fairness Improvement with Multiple Protected Attributes

  • paper_url: http://arxiv.org/abs/2308.01923
  • repo_url: None
  • paper_authors: Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman
  • for: 该论文主要针对多个保护特征的公平性提升,而现有研究多数只是针对单个保护特征进行公平性提升。
  • methods: 该论文对11种现状顶尖公平性提升方法进行了广泛的研究,并分析了不同的数据集、度量和机器学习模型在考虑多个保护特征时的效果。
  • results: 研究发现,只考虑单一保护特征进行公平性提升可能会导致其他保护特征的不公平性增加,这种增加的比例可达88.3%(57.5%的平均值)。此外,对多个保护特征进行公平性提升不会带来减少准确性的代价,但是处理多个保护特征时的精度和回归率增加约5倍和8倍。这些结果有重要的意义,将只报告准确性作为机器学习性能指标是不充分的。
    Abstract Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effectiveness of these methods with different datasets, metrics, and ML models when considering multiple protected attributes. The results reveal that improving fairness for a single protected attribute can largely decrease fairness regarding unconsidered protected attributes. This decrease is observed in up to 88.3% of scenarios (57.5% on average). More surprisingly, we find little difference in accuracy loss when considering single and multiple protected attributes, indicating that accuracy can be maintained in the multiple-attribute paradigm. However, the effect on precision and recall when handling multiple protected attributes is about 5 times and 8 times that of a single attribute. This has important implications for future fairness research: reporting only accuracy as the ML performance metric, which is currently common in the literature, is inadequate.
    摘要 现有研究主要是在受保护特征单个方面提高机器学习软件的公平性,但这并不是现实中的情况,用户通常有多个受保护特征。这篇论文进行了多个受保护特征公平性改进的广泛研究,涵盖了11种现状最佳实践方法。我们对不同的数据集、度量和机器学习模型进行了这些方法的分析,并发现了以下结论:在考虑多个受保护特征时,改进公平性对单个受保护特征的改进可以导致其他受保护特征的公平性下降,这种下降的比例在88.3%的情况下(57.5%的平均值)。而且,我们发现在考虑单个和多个受保护特征时,精度的影响几乎没有变化,这意味着在多个受保护特征的情况下,精度可以保持在相同的水平。然而,处理多个受保护特征时,精度和准确率的影响是单个受保护特征的8倍和5倍。这有重要的实践意义:现在流行的 literatura 中报道精度作为机器学习性能指标是不充分的。

Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction

  • paper_url: http://arxiv.org/abs/2307.13497
  • repo_url: None
  • paper_authors: Gabriele Picco, Marcos Martínez Galindo, Alberto Purpura, Leopold Fuchs, Vanessa López, Hoang Thanh Lam
  • for: 这项研究的目的是提供一个可比较多种现代Zero-Shot Learning(ZSL)方法的框架,以便研究人员可以通过对标准benchmark数据集进行比较。
  • methods: 该框架使用了大量预训练语言模型,并提出了许多新的方法,从而导致了ZSL性能的明显提高。
  • results: 该研究提出了一个名为Zshot的新的ZSL框架,该框架包含了可extendible和可评估的API,以及多种优化技术,如管道 ensemble和可视化工具,以提高ZSL性能。
    Abstract The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.
    摘要 zero-shot learning (ZSL) 任务是指在训练中没有看到的文本中预测实体或关系。 ZSL 已成为一个重要的研究领域,因为特定领域的标注数据稀缺,其应用也在过去几年内不断增长。随着大型预训言语模型的出现,一些新的方法被提出,从而导致了 ZSL 性能的明显提升。现在,研究社区和业界均有强烈的需求,一个涵盖最新的 ZSL 方法和预训言语模型的全面框架。在这个研究中,我们提出了一个名为 Zshot 的新的 ZSL 框架,旨在解决以下问题。我们的主要目标是提供一个平台, allowing researchers 可以比较不同的状态对 ZSL 方法的标准 benchmark 数据集。此外,我们设计了我们的框架可以支持产业,通过在 SpaCy NLP 管道中提供可靠的 API。我们的 API 可扩展和评估,其中包括将 pipeline 结合使用以提高准确性,以及可用的 SpaCy 扩展包中的可视化工具。

Duet: efficient and scalable hybriD neUral rElation undersTanding

  • paper_url: http://arxiv.org/abs/2307.13494
  • repo_url: https://github.com/GIS-PuppetMaster/Duet
  • paper_authors: Kaixin Zhang, Hongzhi Wang, Yabin Lu, Ziqi Li, Chang Shu, Yu Yan, Donghua Yang
  • for: 估算卡尔达ности(cardinality estimation)问题,尤其是在高卡尔达ности和高维度表上,以提高learned cardinality estimator的实际应用。
  • methods: 引入 predicate information into autoregressive model,并提出了一种稳定、高效、可扩展的混合方法(Duet),可以直接估算卡尔达ности而不需要采样或非 differentiable process,从而降低推理复杂度从 O(n) 降至 O(1),并在高卡尔达ности和高维度表上达到更高的准确性。
  • results: 实验结果表明,Duet 可以实现所有设计目标,并在 CPU 上实现更低的推理成本,而且在 GPU 上的大多数学习方法上实现更高的准确性。
    Abstract Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches face the data and workload drift problem for a long time. Although both query-driven and hybrid methods are proposed to avoid this problem, even the state-of-the-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high-dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicates information into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduces the inference complexity from O(n) to O(1) compared to Naru and UAE but also achieve higher accuracy on high cardinality and high-dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.
    摘要 现代学习 cardinality 估计方法已经达到了高精度,比传统方法更高。 among 学习方法中, Query-driven 方法面临着数据和工作负载漂移问题,持续时间很长。 although Query-driven 和混合方法都是为了避免这个问题,即使是当前最佳的它们也受到高训练和估计成本、有限扩展性、不稳定性和高维度高 cardinality 表的长板块分布问题的影响,这些问题对实际应用 cardinality 估计器产生了严重的影响。在本文中,我们证明了大多数这些问题是由广泛使用进度 sampling 所引起的。我们解决这个问题,通过将 predicate 信息添加到权重 autoregressive 模型,并提出了 Duet,一种稳定、高效和可扩展的混合方法,可以直接无需采样或任何不可微分过程,对高 cardinality 和高维度表进行 cardinality 估计,可以将推理复杂度从 O(n) 降低至 O(1),比 Naru 和 UAE 更高。实验结果表明,Duet 可以实现所有设计目标,并且在 CPU 上比大多数学习方法在 GPU 上更具实际性,甚至在推理成本方面也更低。

Integrating processed-based models and machine learning for crop yield prediction

  • paper_url: http://arxiv.org/abs/2307.13466
  • repo_url: None
  • paper_authors: Michiel G. J. Kallenberg, Bernardo Maestrini, Ron van Bree, Paul Ravensbergen, Christos Pylianidis, Frits van Evert, Ioannis N. Athanasiadis
  • for: 预测哈密瓜产量
  • methods: 使用混合元模型方法
  • results: 比基eline方法更好,但需更多实际数据 validate its practical effectiveness。Here’s the full translation of the abstract in Simplified Chinese:预测哈密瓜产量通常 involve theory-driven process-based 植物生长模型,它们在地方条件下困难准确化,或者数据驱动机器学习方法,它们需要大量数据。在这项工作中,我们调查了使用混合元模型方法进行哈密瓜产量预测。我们使用植物生长模型生成了一个数据集,并对其进行(预)训练一个卷积神经网络,然后使用观察数据进行精度调整。在Silico中,我们的元模型方法比基eline方法更好。在实际试验中,我们的方法与植物生长模型相比,在77个商业场景中表现相当,但是在303个试验场景中,两者都比一个简单的直线回归方法和专门设计的预处理方法差一些。我们的发现表明元模型方法在准确预测哈密瓜产量方面有潜力,但是需要更多的实际数据 validate its practical effectiveness。
    Abstract Crop yield prediction typically involves the utilization of either theory-driven process-based crop growth models, which have proven to be difficult to calibrate for local conditions, or data-driven machine learning methods, which are known to require large datasets. In this work we investigate potato yield prediction using a hybrid meta-modeling approach. A crop growth model is employed to generate synthetic data for (pre)training a convolutional neural net, which is then fine-tuned with observational data. When applied in silico, our meta-modeling approach yields better predictions than a baseline comprising a purely data-driven approach. When tested on real-world data from field trials (n=303) and commercial fields (n=77), the meta-modeling approach yields competitive results with respect to the crop growth model. In the latter set, however, both models perform worse than a simple linear regression with a hand-picked feature set and dedicated preprocessing designed by domain experts. Our findings indicate the potential of meta-modeling for accurate crop yield prediction; however, further advancements and validation using extensive real-world datasets is recommended to solidify its practical effectiveness.
    摘要 卷积预测通常使用理论驱动的生物物理型或数据驱动机器学方法。前者具有难以调整本地条件的缺点,而后者需要大量数据。在这种工作中,我们调查了混合元模型方法用于预测食用产量。我们使用生长模型生成人工数据,并将其用于(预)训练卷积神经网络,然后精度地调整 Observational data。在虚拟环境中,我们的元模型方法比基准组的数据驱动方法更好。在实际数据集(n=303)和商业场景(n=77)中,元模型方法与生长模型具有相似的性能,但是在这两个场景中,所有模型都比一个简单的直线回归和专门为域专家设计的特定预处理方法更差。我们的发现表明元模型方法在准确预测卷积产量方面有潜力,但是进一步的进展和验证使用广泛的实际数据集是建议的,以固定其实际效果。

Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion

  • paper_url: http://arxiv.org/abs/2307.13463
  • repo_url: None
  • paper_authors: James Z. Wang, Sicheng Zhao, Chenyan Wu, Reginald B. Adams, Michelle G. Newman, Tal Shafir, Rachelle Tsachor
  • for: 这篇论文旨在探讨计算机和机器人领域中人工情感智能技术的发展,以及这种技术如何改变计算机视觉领域的研究。
  • methods: 这篇论文使用了多种方法,包括心理学、工程学和艺术等多个领域的研究成果,以提供一个全面的、多元的视听媒体情感分析领域的概述。
  • results: 这篇论文提出了计算机视觉领域中自动理解表达或诱发情感的技术存在一些挑战和限制,并提出了未来研究的重要方向和途径。
    Abstract The emergence of artificial emotional intelligence technology is revolutionizing the fields of computers and robotics, allowing for a new level of communication and understanding of human behavior that was once thought impossible. While recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion", coupled with the inherently subjective nature of emotions and their intricate nuances. In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry. Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field.
    摘要 人工情感智能技术的出现正在改变计算机和机器人领域,allowing for a new level of communication and understanding of human behavior that was once thought impossible. However, recent advancements in deep learning have transformed the field of computer vision, automated understanding of evoked or expressed emotions in visual media remains in its infancy. This foundering stems from the absence of a universally accepted definition of "emotion", coupled with the inherently subjective nature of emotions and their intricate nuances.In this article, we provide a comprehensive, multidisciplinary overview of the field of emotion analysis in visual media, drawing on insights from psychology, engineering, and the arts. We begin by exploring the psychological foundations of emotion and the computational principles that underpin the understanding of emotions from images and videos. We then review the latest research and systems within the field, accentuating the most promising approaches. We also discuss the current technological challenges and limitations of emotion analysis, underscoring the necessity for continued investigation and innovation. We contend that this represents a "Holy Grail" research problem in computing and delineate pivotal directions for future inquiry.Finally, we examine the ethical ramifications of emotion-understanding technologies and contemplate their potential societal impacts. Overall, this article endeavors to equip readers with a deeper understanding of the domain of emotion analysis in visual media and to inspire further research and development in this captivating and rapidly evolving field.

Fundamental causal bounds of quantum random access memories

  • paper_url: http://arxiv.org/abs/2307.13460
  • repo_url: None
  • paper_authors: Yunfei Wang, Yuri Alexeev, Liang Jiang, Frederic T. Chong, Junyu Liu
  • for: 本研究旨在探讨量子Random Access Memory(QRAM)在量子物理原理下的限制,以确定量子计算应用程序在数据科学领域的长期表现。
  • methods: 本研究使用相对论和量子多体系统的 Lieb-Robinson bounds来探讨量子快速记忆器的内在约束。
  • results: 研究发现,使用量子声学系统的硬件设计,QRAM 可以处理 $\mathcal{O}(10^7)$ 逻辑量子 bits 在一维结构中,而在二维和三维结构中可以处理 $\mathcal{O}(10^{15})$ 到 $\mathcal{O}(10^{20})$ 和 $\mathcal{O}(10^{24})$ 量子 bits соответpectively。这些约束适用于其他量子硬件系统。研究结果表明,量子物理原理的限制对量子计算应用程序的长期表现有重要的影响,并且提出了可能提高性能的量子记忆器设计。
    Abstract Quantum devices should operate in adherence to quantum physics principles. Quantum random access memory (QRAM), a fundamental component of many essential quantum algorithms for tasks such as linear algebra, data search, and machine learning, is often proposed to offer $\mathcal{O}(\log N)$ circuit depth for $\mathcal{O}(N)$ data size, given $N$ qubits. However, this claim appears to breach the principle of relativity when dealing with a large number of qubits in quantum materials interacting locally. In our study we critically explore the intrinsic bounds of rapid quantum memories based on causality, employing the relativistic quantum field theory and Lieb-Robinson bounds in quantum many-body systems. In this paper, we consider a hardware-efficient QRAM design in hybrid quantum acoustic systems. Assuming clock cycle times of approximately $10^{-3}$ seconds and a lattice spacing of about 1 micrometer, we show that QRAM can accommodate up to $\mathcal{O}(10^7)$ logical qubits in 1 dimension, $\mathcal{O}(10^{15})$ to $\mathcal{O}(10^{20})$ in various 2D architectures, and $\mathcal{O}(10^{24})$ in 3 dimensions. We contend that this causality bound broadly applies to other quantum hardware systems. Our findings highlight the impact of fundamental quantum physics constraints on the long-term performance of quantum computing applications in data science and suggest potential quantum memory designs for performance enhancement.
    摘要

Monte-Carlo Tree Search for Multi-Agent Pathfinding: Preliminary Results

  • paper_url: http://arxiv.org/abs/2307.13453
  • repo_url: None
  • paper_authors: Yelisey Pitanov, Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov
  • for: 这个论文研究了多 Agent Pathfinding 问题,即在图形结构下,每个代理都有唯一的起点和目标点,需要找到一个不受碰撞的多个路径,使每个代理都能够达到其目标点。
  • methods: 我们使用 Monte-Carlo Tree Search (MCTS) 来解决这个问题。MCTS 在各种问题中表现出色,如游戏等,但在多 Agent Pathfinding 中并未得到广泛研究。我们提出了一种专门为多 Agent Pathfinding 设计的 MCTS 变体。我们在 Compute 奖励的方法中使用了特定的路径来帮助代理人员达到目标点,同时保留了代理人员可以离开路径以避免碰撞的能力。
  • results: 我们对基eline планинг算法,例如 A*,进行比较,并证明了我们的方法在多 Agent Pathfinding 中表现出色,超过了基eline 方法。
    Abstract In this work we study a well-known and challenging problem of Multi-agent Pathfinding, when a set of agents is confined to a graph, each agent is assigned a unique start and goal vertices and the task is to find a set of collision-free paths (one for each agent) such that each agent reaches its respective goal. We investigate how to utilize Monte-Carlo Tree Search (MCTS) to solve the problem. Although MCTS was shown to demonstrate superior performance in a wide range of problems like playing antagonistic games (e.g. Go, Chess etc.), discovering faster matrix multiplication algorithms etc., its application to the problem at hand was not well studied before. To this end we introduce an original variant of MCTS, tailored to multi-agent pathfinding. The crux of our approach is how the reward, that guides MCTS, is computed. Specifically, we use individual paths to assist the agents with the the goal-reaching behavior, while leaving them freedom to get off the track if it is needed to avoid collisions. We also use a dedicated decomposition technique to reduce the branching factor of the tree search procedure. Empirically we show that the suggested method outperforms the baseline planning algorithm that invokes heuristic search, e.g. A*, at each re-planning step.
    摘要 To address this, we introduce an original variant of MCTS tailored to multi-agent pathfinding. The key aspect of our approach is how the reward, which guides the MCTS, is computed. Specifically, we use individual paths to assist the agents in reaching their goals while allowing them to deviate from the planned path if necessary to avoid collisions. We also employ a dedicated decomposition technique to reduce the branching factor of the tree search procedure.Empirically, we show that our suggested method outperforms a baseline planning algorithm that invokes heuristic search, such as A\*, at each re-planning step.

A behavioural transformer for effective collaboration between a robot and a non-stationary human

  • paper_url: http://arxiv.org/abs/2307.13447
  • repo_url: None
  • paper_authors: Ruaridh Mon-Williams, Theodoros Stouraitis, Sethu Vijayakumar
  • for: This paper aims to address the challenges of human-robot collaboration in non-stationary environments, where human behavior changes over time.
  • methods: The authors propose a principled meta-learning framework and develop a conditional transformer called Behaviour-Transform (BeTrans) to adapt to new human agents with non-stationary behaviors.
  • results: BeTrans effectively collaborates with simulated human agents and adapts faster to non-stationary simulated human agents than state-of-the-art techniques.Here’s the full text in Simplified Chinese:
  • for: 本研究旨在解决人机合作中的非站点环境问题,其中人类行为随着时间的变化。
  • methods: 作者提出了一种原理化的元学习框架,并开发了一种名为行为变换(BeTrans)的条件变换器,以适应新的人类代理者具有非站点行为的情况。
  • results: BeTrans在模拟人类代理者中的原创自定义环境中显示了与非站点模拟人类代理者的更好的协作和更快的适应速度,比STATE-OF-THE-ART技术更高。
    Abstract A key challenge in human-robot collaboration is the non-stationarity created by humans due to changes in their behaviour. This alters environmental transitions and hinders human-robot collaboration. We propose a principled meta-learning framework to explore how robots could better predict human behaviour, and thereby deal with issues of non-stationarity. On the basis of this framework, we developed Behaviour-Transform (BeTrans). BeTrans is a conditional transformer that enables a robot agent to adapt quickly to new human agents with non-stationary behaviours, due to its notable performance with sequential data. We trained BeTrans on simulated human agents with different systematic biases in collaborative settings. We used an original customisable environment to show that BeTrans effectively collaborates with simulated human agents and adapts faster to non-stationary simulated human agents than SOTA techniques.
    摘要 人机合作中的一大挑战是由人类行为引起的非站点性,这会导致环境转移和人机合作困难。我们提出了一种原则性的元学习框架,以便让机器人更好地预测人类行为,从而解决非站点性问题。基于这个框架,我们开发了行为变换(BeTrans)。BeTrans 是一种 Conditional Transformer,它允许机器人代理人类快速适应新的非站点人类行为,因为它在序列数据上表现出了显著的性能。我们在模拟人类代理人中进行了训练,并在合作 Setting 中验证了 BeTrans 的效果。我们使用了一个自定义的环境,以示 BeTrans 可以快速适应非站点人类行为,并且比标准技术更快。

On the Learning Dynamics of Attention Networks

  • paper_url: http://arxiv.org/abs/2307.13421
  • repo_url: https://github.com/vashisht-rahul/on-the-learning-dynamics-of-attention-networks
  • paper_authors: Rahul Vashisht, Harish G. Ramaswamy
  • for: 本研究的目的是探讨Attention模型的不同损失函数(soft attention、hard attention和latent variable marginal likelihood(LVML))在模型学习中的影响。
  • methods: 本研究使用了三种不同的损失函数来训练Attention模型,包括soft attention、hard attention和LVML。
  • results: 研究发现不同的损失函数会导致Attention模型的不同行为和结果。在训练过程中,使用soft attention损失函数可以让注意力模型在初始化阶段快速改进,但后续会降低。相反,使用hard attention损失函数可以使注意力模型在训练过程中保持稳定。此外,研究还提出了一种简单的混合方法,该方法结合了不同损失函数的优点,并在一些半人工和实际数据集上进行了测试。
    Abstract Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets
    摘要 注意模型通常通过优化三种标准损失函数来学习,它们分别被称为软注意力、硬注意力和隐变量概率 marginal likelihood(LVML)注意力。这三种方法都是为了找到两个模型---一个`焦点'模型可以选择正确的输入段落,以及一个`分类'模型可以处理选择的段落并生成目标标签。然而,它们在选取段落的方式不同,从而导致了不同的动力学和最终结果。我们观察到每种模型学习的独特签名,并解释这是因为分类模型在梯度下降过程中的演化。我们还对这些方法进行了简单的分析,并 deriv了关于参数轨迹的关闭式表达式。在软注意力损失函数下,焦点模型在初始化时快速提升,然后后来受阻。相反,硬注意力损失函数 behave in the opposite fashion。基于我们的观察,我们提出了一种简单的混合方法,将不同的损失函数的优点相互融合,并在一些半Synthetic和实际世界数据集上进行了证明。

Synthesis of Procedural Models for Deterministic Transition Systems

  • paper_url: http://arxiv.org/abs/2307.14368
  • repo_url: None
  • paper_authors: Javier Segovia-Aguas, Jonathan Ferrer-Mestres, Sergio Jiménez
  • for: 这篇论文旨在提出一种总体方法,用于生成某种逻辑系统的状态转移模型。
  • methods: 该方法采用抽象搜索,在Random-Access Machine(RAM)上使用有限Memory和简单的指令集来生成结构化程序。
  • results: 该方法可以生成符合给定输入集的状态转移模型,并且可以针对不同的目标语言进行模型化。
    Abstract This paper introduces a general approach for synthesizing procedural models of the state-transitions of a given discrete system. The approach is general in that it accepts different target languages for modeling the state-transitions of a discrete system; different model acquisition tasks with different target languages, such as the synthesis of STRIPS action models, or the update rule of a cellular automaton, fit as particular instances of our general approach. We follow an inductive approach to synthesis meaning that a set of examples of state-transitions, represented as (pre-state, action, post-state) tuples, are given as input. The goal is to synthesize a structured program that, when executed on a given pre-state, outputs its associated post-state. Our synthesis method implements a combinatorial search in the space of well-structured terminating programs that can be built using a Random-Access Machine (RAM), with a minimalist instruction set, and a finite amount of memory. The combinatorial search is guided with functions that asses the complexity of the candidate programs, as well as their fitness to the given input set of examples.
    摘要

A short review of the main concerns in A.I. development and application within the public sector supported by NLP and TM

  • paper_url: http://arxiv.org/abs/2308.02042
  • repo_url: None
  • paper_authors: Carlos Ferreira
  • for: 这个研究旨在捕捉公共领域中AI应用的数据隐私、伦理、可解释性、信任性和公平性问题的研究趋势。
  • methods: 该研究使用了NLP和TM基础概念,对ACMDigital Library和IEEE Xplore会议论文进行了两年内的查询和分析,以捕捉相关信息。
  • results: 研究结果显示,公平性是最常见的关注点,而数据隐私则是最少的关注点(即使它在大多数文章中都是embedded),而信任性则是最为显著的关注点。
    Abstract Artificial Intelligence is not a new subject, and business, industry and public sectors have used it in different ways and contexts and considering multiple concerns. This work reviewed research papers published in ACM Digital Library and IEEE Xplore conference proceedings in the last two years supported by fundamental concepts of Natural Language Processing (NLP) and Text Mining (TM). The objective was to capture insights regarding data privacy, ethics, interpretability, explainability, trustworthiness, and fairness in the public sector. The methodology has saved analysis time and could retrieve papers containing relevant information. The results showed that fairness was the most frequent concern. The least prominent topic was data privacy (although embedded in most articles), while the most prominent was trustworthiness. Finally, gathering helpful insights about those concerns regarding A.I. applications in the public sector was also possible.
    摘要 人工智能不是新的话题,商业、工业和公共部门在不同的方式和上下文中使用它,并考虑多种关注。这项工作查询了过去两年ACM数字图书馆和IEEE Xplore会议论文,基于自然语言处理(NLP)和文本挖掘(TM)的基本概念。目标是捕捉公共部门中关于数据隐私、伦理、可解性、可信度和公平性的视角。方法包括文献分析,减少分析时间,检索包含相关信息的论文。结果表明,公平性是最常见的关注点,而数据隐私即使在大多数文章中隐藏,也是最少提到的话题。最后,对于人工智能在公共部门中的应用中有所获得有用的洞察。

Towards Bridging the Digital Language Divide

  • paper_url: http://arxiv.org/abs/2307.13405
  • repo_url: None
  • paper_authors: Gábor Bella, Paula Helm, Gertraud Koch, Fausto Giunchiglia
  • for: 帮助各种语言技术扩展到受欠发达语言领域
  • methods: 通过对语言技术的研发方法进行修改,以减少语言偏见
  • results: 通过与本地社区的合作,提高语言技术的多样性和准确性
    Abstract It is a well-known fact that current AI-based language technology -- language models, machine translation systems, multilingual dictionaries and corpora -- focuses on the world's 2-3% most widely spoken languages. Recent research efforts have attempted to expand the coverage of AI technology to `under-resourced languages.' The goal of our paper is to bring attention to a phenomenon that we call linguistic bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. Linguistic bias is manifested in uneven per-language performance even in the case of similar test conditions. We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented, and that can even become ethically problematic as they disregard valuable aspects of diversity as well as the needs of the language communities themselves. As our attempt at building diversity-aware language resources, we present a new initiative that aims at reducing linguistic bias through both technological design and methodology, based on an eye-level collaboration with local communities.
    摘要 现在的人工智能语言技术,包括语言模型、机器翻译系统、多语言词典和语料库,它们主要集中在世界上2-3%最广泛使用的语言上。 latest research efforts have attempted to expand the coverage of AI technology to "under-resourced languages." However, we have noticed a phenomenon that we call "linguistic bias" in multilingual language processing systems, which exhibits a hardwired yet involuntary and hidden representational preference towards certain languages. This bias is manifested in uneven per-language performance, even under similar test conditions. We argue that biased technology is often the result of research and development methodologies that do not fully consider the complexity of the languages being represented, and can even become ethically problematic as they disregard valuable aspects of diversity and the needs of language communities themselves. To address this issue, we present a new initiative that aims to reduce linguistic bias through both technological design and methodology, based on eye-level collaboration with local communities.

Predicting Code Coverage without Execution

  • paper_url: http://arxiv.org/abs/2307.13383
  • repo_url: https://github.com/microsoft/coverage-eval
  • paper_authors: Michele Tufano, Shubham Chandel, Anisha Agarwal, Neel Sundaresan, Colin Clement
  • for: 这篇论文的目的是为了评估大语言模型(LLM)对代码执行的理解程度,并提出了一个新的任务——代码覆盖率预测任务。
  • methods: 该论文使用了机器学习算法来减少计算代码覆盖率的成本,并且使用了人工生成的代码和测试用例来评估模型的性能。
  • results: 研究发现,OpenAI的GPT-4和GPT-3.5-Turbo、Google的BARD和Anthropic的Claude等四种state-of-the-art LLM在代码覆盖率预测任务中表现出色,并且 argue that code coverage as a metric and pre-training data source are valuable for overall LLM performance on software engineering tasks。
    Abstract Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution with additional overhead for the instrumentation. Furthermore, computing coverage of any snippet of code requires the whole program context. Using Machine Learning to amortize this expensive process could lower the cost of code coverage by requiring only the source code context, and the task of code coverage prediction can be a novel benchmark for judging the ability of models to understand code. We propose a novel benchmark task called Code Coverage Prediction for Large Language Models (LLMs). We formalize this task to evaluate the capability of LLMs in understanding code execution by determining which lines of a method are executed by a given test case and inputs. We curate and release a dataset we call COVERAGEEVAL by executing tests and code from the HumanEval dataset and collecting code coverage information. We report the performance of four state-of-the-art LLMs used for code-related tasks, including OpenAI's GPT-4 and GPT-3.5-Turbo, Google's BARD, and Anthropic's Claude, on the Code Coverage Prediction task. Finally, we argue that code coverage as a metric and pre-training data source are valuable for overall LLM performance on software engineering tasks.
    摘要 “代码覆盖率”是一个广泛使用的度量来量化程式码中不同元素的执行情况。计算代码覆盖率需要费时consumption,需要将代码建立和执行,并且需要额外的实现工具。此外,计算任何一段代码的覆盖率都需要整个程式码上下文。使用机器学习来优化这个费时的过程可以降低代码覆盖率的成本,只需要提供代码上下文,而不需要整个程式码。我们提出一个新的benchmark任务,名为代码覆盖预测(Code Coverage Prediction),用于评估大型自然语言模型(LLMs)的能力。我们正式定义这个任务,以评估LLMs对代码执行的理解度。我们组织了一个名为COVERAGEEVAL的数据集,通过执行HumanEval测试数据并收集代码覆盖信息。我们报告了四个现代LLMs的表现,包括OpenAI的GPT-4和GPT-3.5-Turbo、Google的BARD、以及Anthropic的Claude,在代码覆盖预测任务上的表现。最后,我们认为代码覆盖率作为度量和预训数据来源是LLM在软件工程任务上的重要因素。”

Empower Your Model with Longer and Better Context Comprehension

  • paper_url: http://arxiv.org/abs/2307.13365
  • repo_url: https://github.com/yileijin/attention-transition
  • paper_authors: Yifei Gao, Lei Wang, Jun Fang, Longhua Hu, Jun Cheng
  • for: 提高 LL 模型在较长和复杂的上下文中的理解能力,以便更好地应用在实际场景中。
  • methods: 提出了一种新的技术 called Attention Transition,通过增强模型内部信息传递的能力,使模型能够更好地理解较长的上下文,无需额外训练或影响生成流畅性。
  • results: 在 XSum 数据集上进行了实验,与 GPT4 进行比较,得到了显著的改善,证明了 Attention Transition 的有效性。
    Abstract Recently, with the emergence of numerous Large Language Models (LLMs), the implementation of AI has entered a new era. Irrespective of these models' own capacity and structure, there is a growing demand for LLMs to possess enhanced comprehension of longer and more complex contexts with relatively smaller sizes. Models often encounter an upper limit when processing sequences of sentences that extend beyond their comprehension capacity and result in off-topic or even chaotic responses. While several recent works attempt to address this issue in various ways, they rarely focus on "why models are unable to compensate or strengthen their capabilities on their own". In this paper, we thoroughly investigate the nature of information transfer within LLMs and propose a novel technique called Attention Transition. This technique empowers models to achieve longer and better context comprehension with minimal additional training or impact on generation fluency. Our experiments are conducted on the challenging XSum dataset using LLaMa-7b model with context token length ranging from 800 to 1900. Results demonstrate that we achieve substantial improvements compared with the original generation results evaluated by GPT4.
    摘要 现在,许多大语言模型(LLM)的出现,AI的实现进入了新的时代。无论这些模型的本身能力和结构,有越来越多的需求要LLM具备更好的长文本理解能力,即使文本长度较短。 modeloften encounter an upper limit when processing sequences of sentences that extend beyond their comprehension capacity and result in off-topic or even chaotic responses。 although several recent works attempt to address this issue in various ways, they rarely focus on "why models are unable to compensate or strengthen their capabilities on their own".在这篇论文中,我们全面调查LLM中信息传递的本质,并提出一种新的技术 called Attention Transition。这种技术使得模型可以在不需要额外训练或影响生成流畅性的情况下,实现更长更好的文本理解。我们对XSum数据集使用LLaMa-7b模型,Context Token length在800到1900之间进行了实验。结果显示,我们在评估于GPT4的原始生成结果的基础上获得了显著提高。

Do humans and Convolutional Neural Networks attend to similar areas during scene classification: Effects of task and image type

  • paper_url: http://arxiv.org/abs/2307.13345
  • repo_url: None
  • paper_authors: Romy Müller, Marcel Duerschmidt, Julian Ullrich, Carsten Knoll, Sascha Weber, Steffen Seitz
  • for: 本研究旨在探讨深度学习模型如 convolutional neural networks (CNN) 是如何决定是否与人类注意力相似的因素?而前一代研究主要关注技术因素,很少关注人类注意力的因素。
  • methods: 我们在 presente 研究中采用了多种任务来诱导人类注意力地图,包括自发的视线探索、意图的视线指向以及手动选择区域。此外,我们还使用了不同类型的图像,包括单一的醒目对象、室内场景和无明确对象定义的类别。
  • results: 我们发现,人类任务对于图像类型有很大的影响。对于对象,人类手动选择生成的地图和 CNN 的注意力地图最为相似,而自发视线任务干得影响相对较小。对于室内场景,自发视线任务生成的地图和 CNN 的注意力地图最为不同,而手动选择任务生成的地图和 CNN 的注意力地图相似度较高。这些结果表明,在比较人类和 CNN 的注意力时,需要考虑人类因素。
    Abstract Deep Learning models like Convolutional Neural Networks (CNN) are powerful image classifiers, but what factors determine whether they attend to similar image areas as humans do? While previous studies have focused on technological factors, little is known about the role of factors that affect human attention. In the present study, we investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN. We varied the intentionality of human tasks, ranging from spontaneous gaze during categorization over intentional gaze-pointing up to manual area selection. Moreover, we varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category. The human attention maps generated in this way were compared to the CNN attention maps revealed by explainable artificial intelligence (Grad-CAM). The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact. For indoor scenes, spontaneous gaze produced the least similarity, while for landscapes, similarity was equally low across all human tasks. To better understand these results, we also compared the different human attention maps to each other. Our results highlight the importance of taking human factors into account when comparing the attention of humans and CNN.
    摘要 Translation notes:* "Deep Learning models" is translated as "深度学习模型" (shēn dào xué xí mó del)* "Convolutional Neural Networks" is translated as "卷积神经网络" (jué shū shēn xīn wǎng luò)* "human attention maps" is translated as "人类注意地图" (rén xìng zhù yì dì tu)* "CNN attention maps" is translated as "CNN注意地图" (CNN zhù yì dì tu)* "explainable artificial intelligence" is translated as "可解释人工智能" (kě jiě jiě rén xīn zhī neng)* "Grad-CAM" is translated as "Grad-CAM" (Grad-CAM)* "human tasks" is translated as "人类任务" (rén xìng zhī yè)* "image characteristics" is translated as "图像特点" (tú xiàng tè qǐ)* "singular objects" is translated as "单一物体" (dan yī wù tǐ)* "indoor scenes" is translated as "室内场景" (shì nérie jīng jì)* "landscapes" is translated as "风景" (fēng jǐng)* "intentionality of human tasks" is translated as "人类任务的意图性" (rén xìng zhī yè de yì tú xìng)* "specific eye movement task" is translated as "特定眼动任务" (tè dìng jǐng yù zhí zhì yè)* "manual area selection" is translated as "手动区域选择" (shǒu dòng qū yù zhì yè)Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China.

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

  • paper_url: http://arxiv.org/abs/2307.13339
  • repo_url: None
  • paper_authors: Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju
  • for: 这个论文的目的是解释为何 chain-of-thought (CoT) 提示能使大型自然语言模型(LLM)在各种问答任务上具有更高的准确率。
  • methods: 这篇论文使用了 gradient-based feature attribution 方法,以生成输入字符串对模型输出的影响度量。
  • results: 研究发现,CoT 提示不会使输入字符串中相关的 Token 的重要性增加,但可以增加提问和模型输出变化时 Token 的稳定性。
    Abstract Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.
    摘要 <>chain-of-thought(CoT)提示有效地提高了大型语言模型(LLM)在各种问题回答任务上的准确率。 although understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. we address this question by leveraging gradient-based feature attribution methods, which produce saliency scores that capture the influence of input tokens on model output. specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

  • paper_url: http://arxiv.org/abs/2307.13332
  • repo_url: None
  • paper_authors: Philip Amortila, Nan Jiang, Csaba Szepesvári
  • for: 这篇论文是关于 reinforcement learning(RL)中的函数估计精度的研究。特别是研究函数估计精度如何受到函数错误的影响。
  • methods: 这篇论文使用了 linear off-policy value function estimation 方法,并在不同的设定下(如 weighted $L_2$-norm、$L_\infty$ norm、状态别名和状态空间覆盖率)研究了函数估计精度的优化因子。
  • results: 研究发现,在不同的设定下,函数估计精度受到多个因素的影响,其中包括函数错误和状态别名等。这些因素的优化因子可以用来评估函数估计精度的困难程度。
    Abstract Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.
    摘要 theoretical guarantees in reinforcement learning (RL) 知道 suffer 多个 multiplication blow-up factors with respect to the misspecification error of function approximation. yet, the nature of such approximation factors -- especially their optimal form in a given learning problem -- is poorly understood. In this paper, we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.Here's the Chinese translation of the text:理论保证在强化学习(RL)中知道会受到函数近似错误的多个多项式增长因素的影响。然而,这些近似因素的最佳形式在给定的学习问题中仍然不够了解。在这篇论文中,我们研究了这个问题在线性偏离策略估值函数估计中,这里有许多未解之处。我们在各种设置下研究了近似因素,包括使用权重$L_2$-norm(其权重是在线状态分布上)、$L_\infty$ norm、状态别名和状态空间的完整性 vs. 部分覆盖。我们确定了所有设置的优化的极限增长因素(即常数),并且发现了这些因素在不正确的函数近似下的评估难度。 Specifically, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.这里的 bounds 发现了 $L_2(\mu)$ norm 下的两个实例依赖的因素,以及 $L_\infty$ norm 下的一个因素,这些因素在函数近似错误下的评估难度。

2-Level Reinforcement Learning for Ships on Inland Waterways

  • paper_url: http://arxiv.org/abs/2307.16769
  • repo_url: https://github.com/marwaltz/tud_rl
  • paper_authors: Martin Waltz, Niklas Paulig, Ostap Okhrin
  • for: 这个论文目的是控制自主水面车(ASV)在内陆水道(IW)上,基于深度强化学习(DRL)。
  • methods: 该框架包括两级:一级是高级本地路径规划(LPP)单元,另一级是低级路径跟踪(PF)单元,每个单元都包含一个DRL代理。LPP代理负责考虑附近船只、交通规则和水道的几何,而PF代理负责低级杆控制,并考虑水下船只的杆控制、环境力量(风、浪、涨潮)的影响。
  • results: 在模拟环境中,两个代理都进行了广泛验证,使用德国北部的下落河为例子,并使用实际的AIS轨迹来模拟其他船只的行为。
    Abstract This paper proposes a realistic modularized framework for controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework comprises two levels: a high-level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of nearby vessels, traffic rules, and the geometry of the waterway. We thereby leverage a recently proposed spatial-temporal recurrent neural network architecture, which is transferred to continuous action spaces. The PF agent is responsible for low-level actuator control while accounting for shallow water influences on the marine craft and the environmental forces winds, waves, and currents. Both agents are thoroughly validated in simulation, employing the lower Elbe in northern Germany as an example case and using real AIS trajectories to model the behavior of other ships.
    摘要

Learning Autonomous Ultrasound via Latent Task Representation and Robotic Skills Adaptation

  • paper_url: http://arxiv.org/abs/2307.13323
  • repo_url: None
  • paper_authors: Xutian Deng, Junnan Jiang, Wen Cheng, Miao Li
  • for: 提高机器人超声扫描的自动化精度和效率
  • methods: 使用多Modal ultrasound技术和自动适应学习方法
  • results: 实验结果显示,提议方法可以生成适应不同人群的复杂超声策略,并实现了对比较好的量化结果
    Abstract As medical ultrasound is becoming a prevailing examination approach nowadays, robotic ultrasound systems can facilitate the scanning process and prevent professional sonographers from repetitive and tedious work. Despite the recent progress, it is still a challenge to enable robots to autonomously accomplish the ultrasound examination, which is largely due to the lack of a proper task representation method, and also an adaptation approach to generalize learned skills across different patients. To solve these problems, we propose the latent task representation and the robotic skills adaptation for autonomous ultrasound in this paper. During the offline stage, the multimodal ultrasound skills are merged and encapsulated into a low-dimensional probability model through a fully self-supervised framework, which takes clinically demonstrated ultrasound images, probe orientations, and contact forces into account. During the online stage, the probability model will select and evaluate the optimal prediction. For unstable singularities, the adaptive optimizer fine-tunes them to near and stable predictions in high-confidence regions. Experimental results show that the proposed approach can generate complex ultrasound strategies for diverse populations and achieve significantly better quantitative results than our previous method.
    摘要 现在医疗超声成为主流检查方法,Robotic超声系统可以帮助扫描过程,避免专业医疗人员的重复和厌烦工作。尽管最近做出了一些进步,但是还是面临着自动完成超声检查的挑战,主要原因是缺乏适当的任务表示方法,以及将学习到的技能通用化到不同的病人身上。为解决这些问题,我们在这篇论文中提出了缺失任务表示和机器人技能适应。在线阶段,我们使用了完全自我超vised框架,将多modal超声技能集成到低维度概率模型中,考虑了临床证明的超声图像、探针 orientations和触摸力。在线阶段,概率模型会选择和评估最佳预测。对于不稳定的孤点,适应优化器进行了微调,使其在高信任区域靠近和稳定预测。实验结果表明,我们的方法可以生成适应不同人口的复杂超声策略,并取得了significantly更好的量化结果,比我们之前的方法更好。

Towards Integrated Traffic Control with Operating Decentralized Autonomous Organization

  • paper_url: http://arxiv.org/abs/2308.03769
  • repo_url: None
  • paper_authors: Shengyue Yao, Jingru Yu, Yi Yu, Jia Xu, Xingyuan Dai, Honghai Li, Fei-Yue Wang, Yilun Lin
  • for: 提高智能交通系统(ITS)的集成控制能力,考虑多种多样智能代理的优化和扩展。
  • methods: 基于分布式自治组织(DAO)框架,实现全局能源消耗效率(ECE)的全球协商,并通过奖励机制优化本地目标。另外,对DAO结构硬直性问题进行了解决方案。
  • results: 通过 numerics 实验,提出的方法可以在各种情况下更快达成全局目标,并且可以提高本地目标。这表明该方法在智能交通系统集成控制中具有潜在的应用前景。
    Abstract With a growing complexity of the intelligent traffic system (ITS), an integrated control of ITS that is capable of considering plentiful heterogeneous intelligent agents is desired. However, existing control methods based on the centralized or the decentralized scheme have not presented their competencies in considering the optimality and the scalability simultaneously. To address this issue, we propose an integrated control method based on the framework of Decentralized Autonomous Organization (DAO). The proposed method achieves a global consensus on energy consumption efficiency (ECE), meanwhile to optimize the local objectives of all involved intelligent agents, through a consensus and incentive mechanism. Furthermore, an operation algorithm is proposed regarding the issue of structural rigidity in DAO. Specifically, the proposed operation approach identifies critical agents to execute the smart contract in DAO, which ultimately extends the capability of DAO-based control. In addition, a numerical experiment is designed to examine the performance of the proposed method. The experiment results indicate that the controlled agents can achieve a consensus faster on the global objective with improved local objectives by the proposed method, compare to existing decentralized control methods. In general, the proposed method shows a great potential in developing an integrated control system in the ITS
    摘要 随着智能交通系统(ITS)的复杂度的增加,一种能够考虑丰富多种智能代理人的集中化控制方法是感到需要。然而,现有的中央化或分布式控制方法未能同时考虑优化和可扩展性。为解决这个问题,我们提议一种基于分布式自治组织(DAO)的集中化控制方法。该方法可以在全球范围内达成能源消耗效率(ECE)的全球协议,同时通过协议和激励机制来优化所有参与的智能代理人的本地目标。此外,我们还提出了一种对 DAO 的结构硬直性问题的操作算法。具体来说,该算法可以在 DAO 中标识关键代理人执行智能合同,从而扩展 DAO 基础的能力。此外,我们还设计了一个数值实验,以评估提议方法的性能。实验结果表明,由于提议方法,控制代理人可以更快达成全球目标,并且提高本地目标。总之,我们的方法在智能交通系统中集中化控制方法具有很大的潜力。

Word Sense Disambiguation as a Game of Neurosymbolic Darts

  • paper_url: http://arxiv.org/abs/2307.16663
  • repo_url: None
  • paper_authors: Tiansi Dong, Rafet Sifa
  • for: 本研究旨在提出一种新的神经符号方法来解决自然语言理解和知识工程中的词意划分问题。
  • methods: 该方法基于一种嵌入式的 Configuration of Nested Balls (CNB) 模型,其中每个词 embedding 的中心点具有一定的稳定性,并且可以准确地表示词义的含义。而 inclusion 关系 между球体可以准确地表示符号 гиперonym 关系 между词义,从而实现了简单的逻辑推理。
  • results: 在使用预训练 n-ball 嵌入后,我们在 WSD 数据集上进行了一系列实验,并取得了 F1 分数在 90.1% 到 100.0% 之间的结果。这表明了我们的方法可以超越深度学习方法的楼层效果。
    Abstract Word Sense Disambiguation (WSD) is one of the hardest tasks in natural language understanding and knowledge engineering. The glass ceiling of 80% F1 score is recently achieved through supervised deep-learning, enriched by a variety of knowledge graphs. Here, we propose a novel neurosymbolic methodology that is able to push the F1 score above 90%. The core of our methodology is a neurosymbolic sense embedding, in terms of a configuration of nested balls in n-dimensional space. The centre point of a ball well-preserves word embedding, which partially fix the locations of balls. Inclusion relations among balls precisely encode symbolic hypernym relations among senses, and enable simple logic deduction among sense embeddings, which cannot be realised before. We trained a Transformer to learn the mapping from a contextualized word embedding to its sense ball embedding, just like playing the game of darts (a game of shooting darts into a dartboard). A series of experiments are conducted by utilizing pre-training n-ball embeddings, which have the coverage of around 70% training data and 75% testing data in the benchmark WSD corpus. The F1 scores in experiments range from 90.1% to 100.0% in all six groups of test data-sets (each group has 4 testing data with different sizes of n-ball embeddings). Our novel neurosymbolic methodology has the potential to break the ceiling of deep-learning approaches for WSD. Limitations and extensions of our current works are listed.
    摘要

Imperceptible Physical Attack against Face Recognition Systems via LED Illumination Modulation

  • paper_url: http://arxiv.org/abs/2307.13294
  • repo_url: None
  • paper_authors: Junbin Fang, Canjian Jiang, You Jiang, Puxi Lin, Zhaojie Chen, Yujing Sun, Siu-Ming Yiu, Zoe L. Jiang
  • for: 本研究旨在提出一种实用、执行、不显而又低计算量的LED照明模拓ersion adversarial攻击,以攻击数据驱动的面Recognition视系统。
  • methods: 该攻击方法基于LED照明模拓ersion,通过快速幅度调整场景LED照明的强度来生成不可见的明暗变化,并利用CMOS图像感知器的滚动闸效果,将明暗信息加入到捕捉到的脸像中。
  • results: 对于Well-known的面检测模型Dlib、MTCNN和RetinaFace,DoS攻击达成率分别为97.67%、100%和100%,而对于面验证模型Dlib、FaceNet和ArcFace,掩饰攻击达成率均为100%。
    Abstract Although face recognition starts to play an important role in our daily life, we need to pay attention that data-driven face recognition vision systems are vulnerable to adversarial attacks. However, the current two categories of adversarial attacks, namely digital attacks and physical attacks both have drawbacks, with the former ones impractical and the latter one conspicuous, high-computational and inexecutable. To address the issues, we propose a practical, executable, inconspicuous and low computational adversarial attack based on LED illumination modulation. To fool the systems, the proposed attack generates imperceptible luminance changes to human eyes through fast intensity modulation of scene LED illumination and uses the rolling shutter effect of CMOS image sensors in face recognition systems to implant luminance information perturbation to the captured face images. In summary,we present a denial-of-service (DoS) attack for face detection and a dodging attack for face verification. We also evaluate their effectiveness against well-known face detection models, Dlib, MTCNN and RetinaFace , and face verification models, Dlib, FaceNet,and ArcFace.The extensive experiments show that the success rates of DoS attacks against face detection models reach 97.67%, 100%, and 100%, respectively, and the success rates of dodging attacks against all face verification models reach 100%.
    摘要 尽管人脸识别开始在我们日常生活中扮演重要角色,但是我们需要注意到数据驱动的人脸识别视觉系统容易受到反对攻击。然而,当前两种反对攻击方法,即数字攻击和物理攻击都有缺点,前者不实用,后者突出、计算高、不执行。为了解决这些问题,我们提议一种实用、执行、不露出来、计算低的反对攻击方法,基于LED照明模拟。通过快速强度模拟场景LED照明的快速强度变化,并使用CMOS图像感知器中的滚动闸效果,我们生成不可见的明暗变化,让人类眼睛无法感受到。总之,我们提出了一种人脸检测系统的拒绝服务(DoS)攻击和躲避攻击,并对知名的人脸检测模型Dlib、MTCNN和RetinaFace,以及人脸验证模型Dlib、FaceNet和ArcFace进行了广泛的测试,结果显示,对人脸检测模型的DoS攻击成功率为97.67%、100%和100%,对所有人脸验证模型的躲避攻击成功率均为100%。

Reinforcement Learning -based Adaptation and Scheduling Methods for Multi-source DASH

  • paper_url: http://arxiv.org/abs/2308.11621
  • repo_url: https://github.com/ntnghia1908/Master_Thesis
  • paper_authors: Nghia T. Nguyen, Long Luu, Phuong L. Vo, Thi Thanh Sang Nguyen, Cuong T. Do, Ngoc-thanh Nguyen
  • for: 这个论文主要研究多源视频流ING的高质量体验(QoE)优化。
  • methods: 该论文提出了两种RL算法来优化多源视频流的QoE:RL-based adaptation with greedy scheduling(RLAGS)和RL-based adaptation and scheduling(RLAS)。
  • results: 经过广泛的 simulations validate 了提出的算法的效率。
    Abstract Dynamic adaptive streaming over HTTP (DASH) has been widely used in video streaming recently. In DASH, the client downloads video chunks in order from a server. The rate adaptation function at the video client enhances the user's quality-of-experience (QoE) by choosing a suitable quality level for each video chunk to download based on the network condition. Today networks such as content delivery networks, edge caching networks, content-centric networks,... usually replicate video contents on multiple cache nodes. We study video streaming from multiple sources in this work. In multi-source streaming, video chunks may arrive out of order due to different conditions of the network paths. Hence, to guarantee a high QoE, the video client needs not only rate adaptation but also chunk scheduling. Reinforcement learning (RL) has emerged as the state-of-the-art control method in various fields in recent years. This paper proposes two algorithms for streaming from multiple sources: RL-based adaptation with greedy scheduling (RLAGS) and RL-based adaptation and scheduling (RLAS). We also build a simulation environment for training and evaluating. The efficiency of the proposed algorithms is proved via extensive simulations with real-trace data.
    摘要 “对于多源串流,由于不同的网络路径,可能会有弹性的播放顺序。因此,确保高质量体验(QoE)需要不仅进行率适应,还需要进行块调度。对于多源串流,本文提出了两个算法:基于强化学习(RL)的适应调度(RLAGS)和基于RL的适应调度和调度(RLAS)。我们还建立了一个实验环境,用于训练和评估。经过广泛的实验,我们证明了提案的算法的效率。”Note: Simplified Chinese is used here, as it is more commonly used in mainland China and is the standard for most online content. Traditional Chinese is used in Taiwan and Hong Kong, and is a more complex and nuanced version of the language.

Curvature-based Transformer for Molecular Property Prediction

  • paper_url: http://arxiv.org/abs/2307.13275
  • repo_url: None
  • paper_authors: Yili Chen, Zhengyu Li, Zheng Wan, Hui Yu, Xian Wei
  • for: 提高基于人工智能的药物设计中分子属性预测的能力
  • methods: 引入Discretization of Ricci Curvature来提高图像神经网络模型对分子图数据的结构信息抽取能力
  • results: 在PCQM4M-LST、MoleculeNet等化学分子数据集上进行了实验,与Uni-Mol、Graphormer等模型进行比较,结果表明该方法可以达到状态艺术的结果,并且证明了Discretized Ricci curvature可以反映分子结构和功能关系。
    Abstract The prediction of molecular properties is one of the most important and challenging tasks in the field of artificial intelligence-based drug design. Among the current mainstream methods, the most commonly used feature representation for training DNN models is based on SMILES and molecular graphs, although these methods are concise and effective, they also limit the ability to capture spatial information. In this work, we propose Curvature-based Transformer to improve the ability of Graph Transformer neural network models to extract structural information on molecular graph data by introducing Discretization of Ricci Curvature. To embed the curvature in the model, we add the curvature information of the graph as positional Encoding to the node features during the attention-score calculation. This method can introduce curvature information from graph data without changing the original network architecture, and it has the potential to be extended to other models. We performed experiments on chemical molecular datasets including PCQM4M-LST, MoleculeNet and compared with models such as Uni-Mol, Graphormer, and the results show that this method can achieve the state-of-the-art results. It is proved that the discretized Ricci curvature also reflects the structural and functional relationship while describing the local geometry of the graph molecular data.
    摘要 预测分子性质是人工智能基于药物设计的一个最重要和挑战性任务。现有主流方法中,最常用的特征表示方法是基于SMILES和分子图,尽管这些方法简洁有效,但它们也限制了捕捉空间信息的能力。在这种工作中,我们提出了几何基于变换器的Curvature-based Transformer,以提高分子图数据中的结构信息提取能力。为了嵌入曲率信息,我们在节点特征计算时将拟合分数加入节点特征中,从而将曲率信息作为位置编码。这种方法可以在原始网络结构不变的情况下,将曲率信息引入模型,并且具有扩展性。我们在PCQM4M-LST、MoleculeNet等化学分子数据集上进行了实验,并与Uni-Mol、Graphormer等模型进行比较,结果表明,这种方法可以实现领先的结果。此外,我们还发现,积分 Ricci 曲率也可以反映分子结构和功能关系,并描述分子图数据的地方几何结构。

Unbiased Weight Maximization

  • paper_url: http://arxiv.org/abs/2307.13270
  • repo_url: None
  • paper_authors: Stephen Chung
  • for: 本研究旨在提出一种生物学可能性的人工神经网络(ANN)训练方法,即将每个单元视为一个随机强化学习(RL)代理,从而将网络视为一群代理。这种方法可以更好地模仿生物系统中观察到的 synaptic plasticity 的形式。
  • methods: 本研究使用的方法包括 REINFORCE 本地学习规则,以及一种名为 Weight Maximization 的新方法。Weight Maximization 将每个隐藏单元的奖励信号替换为其发射量的 нор,从而让每个隐藏单元可以最大化其发射量的 norm 而不是全局奖励信号。
  • results: 研究人员通过分析Weight Maximization的理论性质和提出一种变体 Unbiased Weight Maximization,发现这种新的学习规则可以提高学习速度和最终性能。特别是,在我们所知道的情况下,这是第一种不偏不倚于网络单元数量的学习规则,可以快速地学习一个 Bernoulli-logistic 网络。
    Abstract A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. Nevertheless, this learning method is often slow and scales poorly with network size due to inefficient structural credit assignment, since a single reward signal is broadcast to all units without considering individual contributions. Weight Maximization, a proposed solution, replaces a unit's reward signal with the norm of its outgoing weight, thereby allowing each hidden unit to maximize the norm of the outgoing weight instead of the global reward signal. In this research report, we analyze the theoretical properties of Weight Maximization and propose a variant, Unbiased Weight Maximization. This new approach provides an unbiased learning rule that increases learning speed and improves asymptotic performance. Notably, to our knowledge, this is the first learning rule for a network of Bernoulli-logistic units that is unbiased and scales well with the number of network's units in terms of learning speed.
    摘要 一种生物学可能性的方法 для训练人工神经网络(ANN)是将每个单元视为一个随机强化学习(RL)代理,从而考虑网络为一群代理。因此,所有单元都可以通过REINFORCE本地学习规则,该规则由全局奖励信号修饰,更加接近生物观察到的 synaptic plasticity 形式。然而,这种学习方法通常慢速并且与网络大小成比例差化学分,因为不充分考虑单元各自的贡献。Weight Maximization 是一种提议的解决方案,它将每个隐藏单元的奖励信号替换为单元的出口权重的 нор,从而让每个隐藏单元可以最大化出口权重的 norm 而不是全局奖励信号。在这份研究报告中,我们分析了Weight Maximization 的理论性质和一种变体,即偏函数Weight Maximization。这种新的学习规则提供了一种不偏学习规则,可以提高学习速度和最终性能。值得注意的是,到我们所知,这是一种可以快速学习和与网络单元数量成比例增长的学习规则,对于一个由 Bernoulli-logistic 单元组成的网络来说。

LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

  • paper_url: http://arxiv.org/abs/2307.13269
  • repo_url: https://github.com/sail-sg/lorahub
  • paper_authors: Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, Min Lin
  • for: 这篇论文旨在研究LoRA(低级别适应)的可组合性,以实现新任务的适应性。
  • methods: 该论文提出了LoraHub框架,可以策略性地组合多个LoRA模块,从多个任务中学习各种不同的技能。
  • results: 实验结果表明,LoraHub可以在几个shot数据量的情况下,模拟内在学习的表现,而不需要具体的例子。此外,LoraHub的组合不需要新的参数或梯度。
    Abstract Low-rank adaptations (LoRA) are often employed to fine-tune large language models (LLMs) for new tasks. This paper investigates LoRA composability for cross-task generalization and introduces LoraHub, a strategic framework devised for the purposive assembly of LoRA modules trained on diverse given tasks, with the objective of achieving adaptable performance on unseen tasks. With just a few examples from a novel task, LoraHub enables the fluid combination of multiple LoRA modules, eradicating the need for human expertise. Notably, the composition requires neither additional model parameters nor gradients. Our empirical results, derived from the Big-Bench Hard (BBH) benchmark, suggest that LoraHub can effectively mimic the performance of in-context learning in few-shot scenarios, excluding the necessity of in-context examples alongside each inference input. A significant contribution of our research is the fostering of a community for LoRA, where users can share their trained LoRA modules, thereby facilitating their application to new tasks. We anticipate this resource will widen access to and spur advancements in general intelligence as well as LLMs in production. Code will be available at https://github.com/sail-sg/lorahub.
    摘要 低阶 adaptations(LoRA)常用于细化大语言模型(LLM)以适应新任务。这篇论文研究LoRA的可组合性,并提出了LoraHub,一个战略性框架,用于策略性将LoRA模块训练在多种任务上,以实现对未看过任务的适应性。只需几个例子,LoraHub可以快速组合多个LoRA模块,不需要人工专业知识。更重要的是,组合不需要额外参数或梯度。我们的实验结果,基于Big-Bench Hard(BBH)benchmark,表明LoraHub可以有效模拟少数例大学习的表现,排除需要在每个推理输入 alongside的具体例子。我们的研究的一个重要贡献是推动LoRA社区,用户可以共享自己训练好的LoRA模块,从而使其应用于新任务。我们预计这种资源将扩大LLM的应用范围和推动生产环境中的普通智能。代码将在https://github.com/sail-sg/lorahub上提供。

Federated Split Learning with Only Positive Labels for resource-constrained IoT environment

  • paper_url: http://arxiv.org/abs/2307.13266
  • repo_url: None
  • paper_authors: Praveen Joshi, Chandra Thapa, Mohammed Hasanuzzaman, Ted Scully, Haithem Afli
  • for: 提高 IoT 设备数据隐私和提高模型训练效率
  • methods: 使用 federated split learning (SFPL) 技术,包括随机洗涤数据和本地批量正则化
  • results: SFPL 比 SFL 提高了模型训练效率和精度,具体比例为: + CIFAR-100 数据集上 ResNet-56 和 ResNet-32 模型的比例分别为 51.54 和 32.57 + CIFAR-10 数据集上 ResNet-32 和 ResNet-8 模型的比例分别为 9.23 和 8.52
    Abstract Distributed collaborative machine learning (DCML) is a promising method in the Internet of Things (IoT) domain for training deep learning models, as data is distributed across multiple devices. A key advantage of this approach is that it improves data privacy by removing the necessity for the centralized aggregation of raw data but also empowers IoT devices with low computational power. Among various techniques in a DCML framework, federated split learning, known as splitfed learning (SFL), is the most suitable for efficient training and testing when devices have limited computational capabilities. Nevertheless, when resource-constrained IoT devices have only positive labeled data, multiclass classification deep learning models in SFL fail to converge or provide suboptimal results. To overcome these challenges, we propose splitfed learning with positive labels (SFPL). SFPL applies a random shuffling function to the smashed data received from clients before supplying it to the server for model training. Additionally, SFPL incorporates the local batch normalization for the client-side model portion during the inference phase. Our results demonstrate that SFPL outperforms SFL: (i) by factors of 51.54 and 32.57 for ResNet-56 and ResNet-32, respectively, with the CIFAR-100 dataset, and (ii) by factors of 9.23 and 8.52 for ResNet-32 and ResNet-8, respectively, with CIFAR-10 dataset. Overall, this investigation underscores the efficacy of the proposed SFPL framework in DCML.
    摘要 “分布式合作机器学习(DCML)是互联网东西(IoT)领域的一种有前途的方法,用于训练深度学习模型,因为数据分布在多个设备上。这种方法的优点在于,它提高了数据隐私,因为不需要将原始数据集中化,同时也使得 IoT 设备 WITH 较低的计算能力得到启发。在 DCML 框架中, federated split learning(SFL)是最适合高效地训练和测试,因为设备具有有限的计算能力。然而,当 IoT 设备具有只有正例数据时,SFL 中的多类分类深度学习模型无法实现或提供低效果。为了解决这些挑战,我们提出了 splitfed learning with positive labels(SFPL)。SFPL 使用随机排序函数将客户端上收到的数据进行销毁,然后将其提供给服务器进行模型训练。此外,SFPL 还在推理阶段添加了本地批处理正则化。我们的结果表明,SFPL 在 CIFAR-100 和 CIFAR-10 datasets 上分别比 SFL 提高了51.54 和 32.57 倍,并且在 CIFAR-10 datasets 上比 SFL 提高了9.23 和 8.52 倍。总的来说,这种研究证明了我们提出的 SFPL 框架在 DCML 中的效果。”Note: Please note that the translation is in Simplified Chinese, and the words and phrases in bold are the ones that are translated.

Structural Credit Assignment with Coordinated Exploration

  • paper_url: http://arxiv.org/abs/2307.13256
  • repo_url: None
  • paper_authors: Stephen Chung
  • for: 训练人工神经网络(ANN),使用生物学可能性的方法。
  • methods: 每个单元 treated as 随机强化学习(RL)代理,使用REINFORCE本地学习规则,且受到全局奖励信号的调整,更加符合生物观察到的 synaptic plasticity 形式。
  • results: 协调探索可以大幅提高训练速度,并且可以超过 straight-through estimator(STE)反propagation。
    Abstract A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. However, this learning method tends to be slow and does not scale well with the size of the network. This inefficiency arises from two factors impeding effective structural credit assignment: (i) all units independently explore the network, and (ii) a single reward is used to evaluate the actions of all units. Accordingly, methods aimed at improving structural credit assignment can generally be classified into two categories. The first category includes algorithms that enable coordinated exploration among units, such as MAP propagation. The second category encompasses algorithms that compute a more specific reward signal for each unit within the network, like Weight Maximization and its variants. In this research report, our focus is on the first category. We propose the use of Boltzmann machines or a recurrent network for coordinated exploration. We show that the negative phase, which is typically necessary to train Boltzmann machines, can be removed. The resulting learning rules are similar to the reward-modulated Hebbian learning rule. Experimental results demonstrate that coordinated exploration significantly exceeds independent exploration in training speed for multiple stochastic and discrete units based on REINFORCE, even surpassing straight-through estimator (STE) backpropagation.
    摘要 一种生物学可能性的人工神经网络(ANN)训练方法是将每个单元视为一个随机奖励学习(RL)代理,从而考虑网络为一个团队。因此,所有单元都可以通过REINFORCE本地学习规则,该规则由全局奖励信号调整,更加接近生物观察到的 synaptic plasticity 形式。然而,这种学习方法通常慢并不能Scalable 到网络的大小。这种缺效果来自两个因素:(i)所有单元独立探索网络,(ii)全网络使用单一奖励评价所有单元的行为。因此,可以将方法分为两类:第一类包括使用MAP卷积算法进行协调探索的算法,第二类包括计算网络内每个单元的具体奖励信号的算法,如Weight Maximization 和其变种。在这份研究报告中,我们注重第一类。我们提议使用 Boltzmann 机或回归网络进行协调探索。我们发现,通常需要训练 Boltzmann 机的负阶可以被除去,其结果的学习规则与奖励调整的 Hebbian 学习规则类似。实验结果表明,协调探索在多个随机和离散单元基于 REINFORCE 训练速度上明显超过独立探索,甚至超过 straight-through estimator(STE)反propagation。

GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers

  • paper_url: http://arxiv.org/abs/2307.13251
  • repo_url: https://github.com/vinairesearch/gapro
  • paper_authors: Tuan Duc Ngo, Binh-Son Hua, Khoi Nguyen
  • for: 这篇论文主要针对的是3D点云实例分割的问题,即使使用软指导下进行解决。
  • methods: 我们提出了一种新的实例分割方法,即使使用软指导下进行解决。我们的方法包括从矩形框签到实例分割网络的训练。此外,我们还使用了自适应策略来进一步提高方法的性能。
  • results: 我们的实验表明,我们的方法可以比前一代软指导下的实例分割方法表现更好,并且与现有的全指导方法具有相似的性能。此外,我们还证明了我们的方法可以适应不同的全指导方法,只需使用我们生成的 Pseudo 标签进行训练即可。
    Abstract Instance segmentation on 3D point clouds (3DIS) is a longstanding challenge in computer vision, where state-of-the-art methods are mainly based on full supervision. As annotating ground truth dense instance masks is tedious and expensive, solving 3DIS with weak supervision has become more practical. In this paper, we propose GaPro, a new instance segmentation for 3D point clouds using axis-aligned 3D bounding box supervision. Our two-step approach involves generating pseudo labels from box annotations and training a 3DIS network with the resulting labels. Additionally, we employ the self-training strategy to improve the performance of our method further. We devise an effective Gaussian Process to generate pseudo instance masks from the bounding boxes and resolve ambiguities when they overlap, resulting in pseudo instance masks with their uncertainty values. Our experiments show that GaPro outperforms previous weakly supervised 3D instance segmentation methods and has competitive performance compared to state-of-the-art fully supervised ones. Furthermore, we demonstrate the robustness of our approach, where we can adapt various state-of-the-art fully supervised methods to the weak supervision task by using our pseudo labels for training. The source code and trained models are available at https://github.com/VinAIResearch/GaPro.
    摘要

RoSAS: Deep Semi-Supervised Anomaly Detection with Contamination-Resilient Continuous Supervision

  • paper_url: http://arxiv.org/abs/2307.13239
  • repo_url: https://github.com/xuhongzuo/rosas
  • paper_authors: Hongzuo Xu, Yijie Wang, Guansong Pang, Songlei Jian, Ning Liu, Yongjun Wang
    for: 这篇论文是为了解决半有向式异常检测方法中的两个限制而撰写的。这两个限制分别是:1)无法处理没有标签的异常(即异常污染),这可能导致学习过程中的混乱;2)仅使用类别型标签(例如二进制或排序标签),这会导致异常分析 scores 的学习得到极其连续的分布。methods: 这篇论文提出了一种新的半有向式异常检测方法,其中提案了一种称为“污染抑制连续超级指导”的新方法。这种方法利用标签的混合来创建新的标签数据,以减少异常污染的影响。同时,这种方法还加入了一个对应于特征学习的目标,以增强网络的弹性和适应力。results: 根据11个真实世界数据集的实验结果,这篇论文的方法与现有的竞争者相比,能够提高20%-30%的AUC-PR表现,并且在不同的异常污染水平和标签数量中具有更好的适应能力和更高的稳定性。
    Abstract Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (such as binary or ordinal data labels) is exploited, which leads to suboptimal learning of anomaly scores that essentially take on a continuous distribution. Therefore, this paper proposes a novel semi-supervised anomaly detection method, which devises \textit{contamination-resilient continuous supervisory signals}. Specifically, we propose a mass interpolation method to diffuse the abnormality of labeled anomalies, thereby creating new data samples labeled with continuous abnormal degrees. Meanwhile, the contaminated area can be covered by new data samples generated via combinations of data with correct labels. A feature learning-based objective is added to serve as an optimization constraint to regularize the network and further enhance the robustness w.r.t. anomaly contamination. Extensive experiments on 11 real-world datasets show that our approach significantly outperforms state-of-the-art competitors by 20%-30% in AUC-PR and obtains more robust and superior performance in settings with different anomaly contamination levels and varying numbers of labeled anomalies. The source code is available at https://github.com/xuhongzuo/rosas/.
    摘要 semi-supervised异常检测方法可以利用一些异常示例来提高性能,但它们仍然受到两种限制:1)无标签异常(即异常污染)可能会导致学习过程中的干扰,当所有无标签数据被用作模型训练时;2)只利用精确的数据标签(如二进制或排序数据标签),这会导致异常分数的学习被强制为精确的连续分布。因此,本文提出了一种新的 semi-supervised异常检测方法,即使用“污染 resistant 连续指导信号”。具体来说,我们提出了一种质量 interpolating 方法,以填充标记为异常的数据中的异常性,并创建新的数据样本,其标签为连续的异常度。同时,污染区域可以被新生成的数据样本覆盖,这些样本由正确标签的数据组合生成。此外,我们还添加了一个基于特征学习的目标函数,以便为抗污染regular化网络,进一步提高对异常污染的Robustness。我们在11个实际世界数据集上进行了广泛的实验,结果表明,我们的方法在AUC-PR方面比状态艺术竞争者提高20%-30%,并在不同的异常污染水平和变量数量的情况下具有更加稳定和优秀的性能。代码可以在https://github.com/xuhongzuo/rosas/获取。

Multilevel Large Language Models for Everyone

  • paper_url: http://arxiv.org/abs/2307.13221
  • repo_url: None
  • paper_authors: Yuanhao Gong
  • for: 将大语言模型连接到一起,实现更高级别的功能,基于用户个人输入和互联网信息。
  • methods: 利用人脑蓝图的多层次结构,连接通用和专业型大语言模型,以实现更高效的自然语言处理、计算机视觉任务、专业助手、商业和医疗应用。
  • results: 提出了一种基于用户个人输入和互联网信息的多层次大语言模型,可以减少冗余并提高性能,适用于多种应用场景。
    Abstract Large language models have made significant progress in the past few years. However, they are either generic {\it or} field specific, splitting the community into different groups. In this paper, we unify these large language models into a larger map, where the generic {\it and} specific models are linked together and can improve each other, based on the user personal input and information from the internet. The idea of linking several large language models together is inspired by the functionality of human brain. The specific regions on the brain cortex are specific for certain low level functionality. And these regions can jointly work together to achieve more complex high level functionality. Such behavior on human brain cortex sheds the light to design the multilevel large language models that contain global level, field level and user level models. The user level models run on local machines to achieve efficient response and protect the user's privacy. Such multilevel models reduce some redundancy and perform better than the single level models. The proposed multilevel idea can be applied in various applications, such as natural language processing, computer vision tasks, professional assistant, business and healthcare.
    摘要 Our multilevel approach includes global, field, and user levels, with user-level models running on local machines to ensure efficient response and protect user privacy. This approach reduces redundancy and performs better than single-level models, and it can be applied to various applications such as natural language processing, computer vision tasks, professional assistance, business, and healthcare.

One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2307.13220
  • repo_url: https://github.com/wangziblake/pisf
  • paper_authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen, Di Guo, Xiaobo Qu
    for:这个研究旨在提高快速磁共振成像(MRI)的扫描时间,并使用深度学习(DL)来进行图像重建。methods:本研究使用了一个名为Physics-Informed Synthetic data learning framework(PISF),这是一个可以在多个实验设计下进行测试和训练的框架。PISF使用了一个单一的训练模型,可以在多个实验设计下进行图像重建。results:研究发现,使用PISF可以实现对多种实验设计的图像重建,并且可以在实验设计之间进行一致性的重建。此外,PISF还可以在不同的显示器和中心之间进行一致性的重建。对10名医生进行评价后,PISF的优秀适应性得到了证明。
    Abstract Magnetic resonance imaging (MRI) is a principal radiological modality that provides radiation-free, abundant, and diverse information about the whole human body for medical diagnosis, but suffers from prolonged scan time. The scan time can be significantly reduced through k-space undersampling but the introduced artifacts need to be removed in image reconstruction. Although deep learning (DL) has emerged as a powerful tool for image reconstruction in fast MRI, its potential in multiple imaging scenarios remains largely untapped. This is because not only collecting large-scale and diverse realistic training data is generally costly and privacy-restricted, but also existing DL methods are hard to handle the practically inevitable mismatch between training and target data. Here, we present a Physics-Informed Synthetic data learning framework for Fast MRI, called PISF, which is the first to enable generalizable DL for multi-scenario MRI reconstruction using solely one trained model. For a 2D image, the reconstruction is separated into many 1D basic problems and starts with the 1D data synthesis, to facilitate generalization. We demonstrate that training DL models on synthetic data, integrated with enhanced learning techniques, can achieve comparable or even better in vivo MRI reconstruction compared to models trained on a matched realistic dataset, reducing the demand for real-world MRI data by up to 96%. Moreover, our PISF shows impressive generalizability in multi-vendor multi-center imaging. Its excellent adaptability to patients has been verified through 10 experienced doctors' evaluations. PISF provides a feasible and cost-effective way to markedly boost the widespread usage of DL in various fast MRI applications, while freeing from the intractable ethical and practical considerations of in vivo human data acquisitions.
    摘要 To address these challenges, we present a Physics-Informed Synthetic data learning framework for Fast MRI, called PISF. This framework enables generalizable DL for multi-scenario MRI reconstruction using solely one trained model. For a 2D image, the reconstruction is separated into many 1D basic problems, starting with 1D data synthesis to facilitate generalization. We demonstrate that training DL models on synthetic data, integrated with enhanced learning techniques, can achieve comparable or even better in vivo MRI reconstruction compared to models trained on a matched realistic dataset, reducing the demand for real-world MRI data by up to 96%. Moreover, our PISF shows impressive generalizability in multi-vendor multi-center imaging, and its excellent adaptability to patients has been verified through 10 experienced doctors' evaluations.PISF provides a feasible and cost-effective way to markedly boost the widespread usage of DL in various fast MRI applications, while freeing from the intractable ethical and practical considerations of in vivo human data acquisitions.

Adversarial Deep Hedging: Learning to Hedge without Price Process Modeling

  • paper_url: http://arxiv.org/abs/2307.13217
  • repo_url: None
  • paper_authors: Masanori Hirano, Kentaro Minami, Kentaro Imajo
  • for: 这个论文是为了探讨deep hedging框架在不完全市场中的应用,以及如何使用机器学习来 Addressing Market frictions和其他实际市场条件。
  • methods: 这个论文提出了一种新的框架,即对抗深度减值(Adversarial Deep Hedging),它是基于对抗学习的。在这个框架中,一个农家和一个生成器,分别模拟了基础资产过程和基础资产过程,在对抗的情况下被训练。这种方法可以不Explicitly model the underlying asset process,并且可以学习一个Robust hedger。
  • results: 通过numerical experiments,我们示示了我们的提议方法在实际市场数据上的竞争性表现。
    Abstract Deep hedging is a deep-learning-based framework for derivative hedging in incomplete markets. The advantage of deep hedging lies in its ability to handle various realistic market conditions, such as market frictions, which are challenging to address within the traditional mathematical finance framework. Since deep hedging relies on market simulation, the underlying asset price process model is crucial. However, existing literature on deep hedging often relies on traditional mathematical finance models, e.g., Brownian motion and stochastic volatility models, and discovering effective underlying asset models for deep hedging learning has been a challenge. In this study, we propose a new framework called adversarial deep hedging, inspired by adversarial learning. In this framework, a hedger and a generator, which respectively model the underlying asset process and the underlying asset process, are trained in an adversarial manner. The proposed method enables to learn a robust hedger without explicitly modeling the underlying asset process. Through numerical experiments, we demonstrate that our proposed method achieves competitive performance to models that assume explicit underlying asset processes across various real market data.
    摘要 深度投资是一种基于深度学习的derivative投资框架,可以在不完全市场中实现效果性的补偿。深度投资的优点在于它可以处理不同的实际市场条件,如市场阻力,这些条件在传统的数学金融框架中很难处理。深度投资基于市场模拟,因此下面资产价值过程模型是关键。然而,现有的文献中的深度投资经常采用传统的数学金融模型,如 Браунов运动和随机振荡模型,找到有效的下面资产模型 для深度投资学习是一个挑战。在这项研究中,我们提出了一种新的框架,即反对抗深度投资, inspirited by adversarial learning。在这个框架中,一个投资者和一个生成器,分别模拟下面资产过程和下面资产过程,在对抗的方式下进行训练。我们的提议的方法可以不Explicitly模型下面资产过程,却可以学习一个有效的投资者。通过数值实验,我们示出了我们的提议方法可以与假设下面资产过程的模型相比,在各种实际市场数据上达到竞争性的性能。

FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

  • paper_url: http://arxiv.org/abs/2307.13214
  • repo_url: None
  • paper_authors: Huy Q. Le, Minh N. H. Nguyen, Chu Myaet Thwal, Yu Qiao, Chaoning Zhang, Choong Seon Hong
  • for: 提出了一种基于多模态学习的联合学习框架,以便在多个客户端协同训练一个通用全球模型,而不需要分享私人数据。
  • methods: 提出了一种 semi-supervised learning 方法,使得客户端可以从不同的模式中提取表示,并将其交换到服务器和客户端中。同时,我们还提出了一种基于液化的多模态嵌入知识传输机制,以便在客户端和服务器之间共享知识。
  • results: 经过广泛的实验,我们发现 FedMEKT 可以在多modal human activity recognition 任务中提高全球编码器性能,同时保护用户隐私和个人数据,并且需要更少的通信成本。
    Abstract Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL approaches still rely on the labeled data at the client side, which is limited in real-world applications due to the inability of self-annotation from users. In light of these limitations, we propose a novel multimodal FL framework that employs a semi-supervised learning approach to leverage the representations from different modalities. Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset. Our FedMEKT iteratively updates the generalized global encoders with the joint embedding knowledge from the participating clients. Thereby, to address the modality discrepancy and labeled data constraint in existing FL systems, our proposed FedMEKT comprises local multimodal autoencoder learning, generalized multimodal autoencoder construction, and generalized classifier learning. Through extensive experiments on three multimodal human activity recognition datasets, we demonstrate that FedMEKT achieves superior global encoder performance on linear evaluation and guarantees user privacy for personal data and model parameters while demanding less communication cost than other baselines.
    摘要 联合学习(FL)提供了一个分散式机器学习模式,让多个客户端合作训练一个通用的全球模型,无需分享私人数据。现有大部分研究仅提出传统的FL系统,仅适用于单一模式的数据,因此限制了其在未来个性化应用中的潜力。另外,大多数FL方法仍然依赖客户端上的标签数据,实际上在实际应用中因为用户无法自动标注数据而受限。为了解决这些限制,我们提出了一个新的多modal FL框架,它使用了 semi-supervised 学习方法,以利用不同模式之间的表示。我们发展了一个炼制基于的多modal嵌入知识传递机制,即 FedMEKT,让服务器和客户端可以将它们的学习模型中的通用知识交换。我们的 FedMEKT 逐步更新通用全球嵌入器,使用参与客户端的共同知识。这样可以解决现有 FL 系统中的模式差异和标签数据限制,我们的提案包括本地多modal自适应器学习、通用多modal自适应器建构和通用分类学习。经过广泛的实验,我们在三个多modal人类活动识别数据集上证明了 FedMEKT 可以实现更好的全球嵌入器性能,并保证用户隐私和个人数据,同时需要更少的通信成本。

Gait Cycle-Inspired Learning Strategy for Continuous Prediction of Knee Joint Trajectory from sEMG

  • paper_url: http://arxiv.org/abs/2307.13209
  • repo_url: None
  • paper_authors: Xueming Fu, Hao Zheng, Luyan Liu, Wenjuan Zhong, Haowen Liu, Wenxuan Xiong, Yuyang Zhang, Yifeng Chen, Dong Wei, Mingjie Dong, Yefeng Zheng, Mingming Zhang
  • for: 预测下肢运动意图是控制机器人外科手臂和 prosthetic 臂的关键。
  • methods: 本文提出了一种结合两种步征学习策略来减少人类股关节轨迹预测性能的问题。
  • results: 实验结果显示,我们的模型可以预测股关节角度的平均Root Mean Square Error(RMSE)为3.03(0.49)度和50ms之前。这是相关文献中已知的最佳性能,与其他文献相比,减少RMSE至少9.5%。
    Abstract Predicting lower limb motion intent is vital for controlling exoskeleton robots and prosthetic limbs. Surface electromyography (sEMG) attracts increasing attention in recent years as it enables ahead-of-time prediction of motion intentions before actual movement. However, the estimation performance of human joint trajectory remains a challenging problem due to the inter- and intra-subject variations. The former is related to physiological differences (such as height and weight) and preferred walking patterns of individuals, while the latter is mainly caused by irregular and gait-irrelevant muscle activity. This paper proposes a model integrating two gait cycle-inspired learning strategies to mitigate the challenge for predicting human knee joint trajectory. The first strategy is to decouple knee joint angles into motion patterns and amplitudes former exhibit low variability while latter show high variability among individuals. By learning through separate network entities, the model manages to capture both the common and personalized gait features. In the second, muscle principal activation masks are extracted from gait cycles in a prolonged walk. These masks are used to filter out components unrelated to walking from raw sEMG and provide auxiliary guidance to capture more gait-related features. Experimental results indicate that our model could predict knee angles with the average root mean square error (RMSE) of 3.03(0.49) degrees and 50ms ahead of time. To our knowledge this is the best performance in relevant literatures that has been reported, with reduced RMSE by at least 9.5%.
    摘要 预测下肢运动意图是控制外骨骼机器人和人工肢的关键。表面电 MYography (sEMG) 在最近几年来引起了越来越多的关注,因为它可以在实际运动之前预测人体运动意图。然而,人体 JOINT 轨迹的预测性能仍然是一个挑战,这是因为人体之间和个体之间存在差异。前者是由生物学特征(如身高和体重)和个人偏好的步态所致,而后者是由不规则的肌肉活动所引起的。本文提出了一种将两种步征学习策略 integrate 到模型中,以减少预测人体 knee Joint 轨迹的挑战。首先,我们决定将 knee Joint 角度分解成运动模式和振荡强度两个部分。前者在各个个体中表现出低变异性,而后者则表现出高变异性。通过分解这两个部分,我们可以通过不同的网络实体学习两者。这种方法可以捕捉到各个个体的共同和个性化步态特征。其次,我们从步征征ycle中提取了肌肉主动活动面。这些面用于过滤 raw sEMG 中不相关于步行的组分,并提供辅助指导以捕捉更多的步行特征。实验结果表明,我们的模型可以预测 knee Joint 角度的平均根据 Mean Square Error (RMSE) 为 3.03(0.49)度,并在50毫秒前预测。根据我们所知,这是相关文献中最佳的性能,相比前一个最佳性能减少了至少9.5%。

Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

  • paper_url: http://arxiv.org/abs/2307.14364
  • repo_url: None
  • paper_authors: Yang Jiao, Kai Yang, Dongjin Song
  • for: 解决分布式环境中 asynchronous updating 问题,以及如何有效地利用 prior distribution 和适度地调整 robustness 水平。
  • methods: 提出了 asynchronous distributed algorithm ASPIRE algorithm with EASE method,并开发了新的 uncertainty set - constrained D-norm uncertainty set,以便有效地利用 prior distribution 和控制 robustness 水平。
  • results: 理论分析表明提出的算法可靠地 converge,并且 iteration complexity 也得到了分析。 empirical studies 表明该方法可以快速 converge,对数据不同性和 malicious attacks 具有抗锋性,并且可以控制 robustness 水平和性能之间的负荷。
    Abstract Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been widely applied in diverse applications, e.g., network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to different scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the federated distributionally robust optimization (FDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.
    摘要 Distributionally Robust Optimization (DRO),targeting at finding an optimal decision that minimizes the worst-case cost over the ambiguity set of probability distribution, has been widely applied in various fields, such as network behavior analysis and risk management. However, existing DRO techniques face three key challenges:1. How to deal with asynchronous updating in a distributed environment;2. How to effectively leverage the prior distribution;3. How to properly adjust the degree of robustness according to different scenarios.To address these challenges, we propose an asynchronous distributed algorithm, named Asynchronous Single-loop Alternating Gradient Projection (ASPIRE) algorithm with the Iterative Active Set method (EASE) to solve the Federated Distributionally Robust Optimization (FDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness.Our theoretical analysis shows that the proposed algorithm is guaranteed to converge, and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, remain robust against data heterogeneity as well as malicious attacks, but also trade off robustness with performance.

Blockchain-based Optimized Client Selection and Privacy Preserved Framework for Federated Learning

  • paper_url: http://arxiv.org/abs/2308.04442
  • repo_url: None
  • paper_authors: Attia Qammar, Abdenacer Naouri, Jianguo Ding, Huansheng Ning
  • for: 这个研究旨在提出一个基于区块链的优化客户端选择和隐私保证的联边学习框架,以解决传统联边学习结构中的单点失灵攻击和随机选择客户端对模型训练的影响。
  • methods: 我们提出了三种智能合约:1)客户端注册合约、2)前向拍卖合约来选择优化客户端进行联边学习模型训练、3)支付和赔偿合约。另外,我们还实现了完全几何加密(CKKS)方法,以保证在传输本地模型更新时,资料的隐私不会被泄露。
  • results: 我们在 benchmark 数据集上评估了我们的提案,并与现有的研究进行比较。结果显示,我们的方法可以实现高精度率和隐私保证的联边学习框架,并且具有分散的自然 caracteristics。
    Abstract Federated learning is a distributed mechanism that trained large-scale neural network models with the participation of multiple clients and data remains on their devices, only sharing the local model updates. With this feature, federated learning is considered a secure solution for data privacy issues. However, the typical FL structure relies on the client-server, which leads to the single-point-of-failure (SPoF) attack, and the random selection of clients for model training compromised the model accuracy. Furthermore, adversaries try for inference attacks i.e., attack on privacy leads to gradient leakage attacks. We proposed the blockchain-based optimized client selection and privacy-preserved framework in this context. We designed the three kinds of smart contracts such as 1) registration of clients 2) forward bidding to select optimized clients for FL model training 3) payment settlement and reward smart contracts. Moreover, fully homomorphic encryption with Cheon, Kim, Kim, and Song (CKKS) method is implemented before transmitting the local model updates to the server. Finally, we evaluated our proposed method on the benchmark dataset and compared it with state-of-the-art studies. Consequently, we achieved a higher accuracy rate and privacy-preserved FL framework with decentralized nature.
    摘要 federated learning 是一种分布式机制,通过多个客户端参与训练大规模神经网络模型,保留数据在客户端上,只将本地模型更新共享。由于这种特点, federated learning 被视为一种保障数据隐私的解决方案。然而, Typical FL 结构依赖于客户端-服务器模型,导致单点失败攻击(SPoF)和随机选择客户端进行模型训练,从而影响模型精度。此外,敌方会尝试进行推理攻击,即袭击隐私导致梯度泄露攻击。我们在这种情况下提出了基于区块链的优化客户端选择和隐私保护框架。我们设计了三种种智能合约,包括1)客户端注册 2)向服务器进行前置拍卖选择优化客户端进行 FL 模型训练 3)支付和奖励智能合约。此外,我们实现了使用 Cheon、Kim、Kim 和 Song(CKKS)方法的完全同质加密,以便在向服务器传输本地模型更新之前对其进行加密。最后,我们对标准数据集进行评估,并与现有研究进行比较。因此,我们实现了高精度率和隐私保护的 FL 框架,并具有分布式的自然Characteristics。

Knowledge-enhanced Neuro-Symbolic AI for Cybersecurity and Privacy

  • paper_url: http://arxiv.org/abs/2308.02031
  • repo_url: None
  • paper_authors: Aritran Piplai, Anantaa Kotal, Seyedreza Mohseni, Manas Gaur, Sudip Mittal, Anupam Joshi
  • for: 该论文旨在探讨如何使用神经网络和符号知识图来提高人类可理解性和安全性在人工智能系统中。
  • methods: 该论文使用了神经网络和符号知识图的组合方法,以提高对复杂数据空间的探索和学习,同时保持可理解性和安全性。
  • results: 该论文表明,通过神经网络和符号知识图的组合,可以在Cybersecurity和隐私等高度需要人工智能可解释性的领域中提高AI系统的准确性和安全性。
    Abstract Neuro-Symbolic Artificial Intelligence (AI) is an emerging and quickly advancing field that combines the subsymbolic strengths of (deep) neural networks and explicit, symbolic knowledge contained in knowledge graphs to enhance explainability and safety in AI systems. This approach addresses a key criticism of current generation systems, namely their inability to generate human-understandable explanations for their outcomes and ensure safe behaviors, especially in scenarios with \textit{unknown unknowns} (e.g. cybersecurity, privacy). The integration of neural networks, which excel at exploring complex data spaces, and symbolic knowledge graphs, which represent domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts. This article describes how applications in cybersecurity and privacy, two most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from Neuro-Symbolic AI.
    摘要 neural network 和 symbolic knowledge graph 的结合,即 Neuro-Symbolic AI,是一个快速发展的领域,它可以提高 AI 系统的解释性和安全性。这种方法可以解决现有系统的一个批评,即无法生成人类理解的解释,特别是在“未知未知”(如隐私、安全)的场景下。 neural network 可以很好地探索复杂数据空间,而 symbolic knowledge graph 可以表示领域知识,这使得 AI 系统可以由专家理解的方式进行推理、学习和泛化。本文介绍了如何通过 Neuro-Symbolic AI 应用于隐私和安全领域,这两个领域对 AI 系统的解释性和高精度性有特别高的需求。

Counterfactual Explanation Policies in RL

  • paper_url: http://arxiv.org/abs/2307.13192
  • repo_url: None
  • paper_authors: Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal
  • for: 这个论文的目的是解释RL策略的可解释性,并提供一种基于对比的策略分析方法。
  • methods: 该论文使用了对比方法,将策略视为一种可变的对比,并通过对比来分析策略的改进。
  • results: 实验结果表明,COUNTERPOL可以生成有用的对比解释,帮助分析RL策略的性能改进。 论文在五个不同的RL环境中进行了广泛的实验,并证明了对比解释的实用性。
    Abstract As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in mapping observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL. Extensive empirical analysis shows the efficacy of COUNTERPOL in generating explanations for (un)learning skills while keeping close to the original policy. Our results on five different RL environments with diverse state and action spaces demonstrate the utility of counterfactual explanations, paving the way for new frontiers in designing and developing counterfactual policies.
    摘要 为了使机器学习(RL)代理人在各种决策问题中使用奖励偏好,正在使得RL政策的可追踪性变得越来越重要。然而,现有的工作几乎没有系统地理解这些复杂的政策,尤其是在对比方式下进行分析。在这项工作中,我们提出了Counterpol,第一个使用对比解释来分析RL政策的框架。我们通过在RL中 incorporating counterfactuals into supervised learning,使得政策更容易理解。我们还证明了Counterpol与常用的信任区间基本策略优化方法在RL中的理论联系。我们的实验结果表明,Counterpol可以快速生成对应于不同奖励目标的解释,同时保持着原始政策的相似性。我们在五种不同的RL环境中进行了extensive empirical analysis,并证明了对于不同的状态和动作空间,Counterpol可以提供有用的对比解释,开启了新的前ier征学习和开发对比政策的可能性。

Digital Emotion Regulation on Social Media

  • paper_url: http://arxiv.org/abs/2307.13187
  • repo_url: None
  • paper_authors: Akriti Verma, Shama Islam, Valeh Moghaddam, Adnan Anwar
  • for: 这篇论文主要是关于如何利用数字技术来调节情绪 state,以支持伦理的技术设计、开发和部署。
  • methods: 论文使用了社交媒体应用程序的不同特性和功能来描述不同阶段的情绪调节过程。
  • results: 研究发现了不同社交媒体应用程序在不同阶段的情绪调节过程中的应用,以及最近的研究对社交媒体应用程序的情绪调节 intervenciones。
    Abstract Emotion regulation is the process of consciously altering one's affective state, that is the underlying emotional state such as happiness, confidence, guilt, anger etc. The ability to effectively regulate emotions is necessary for functioning efficiently in everyday life. Today, the pervasiveness of digital technology is being purposefully employed to modify our affective states, a process known as digital emotion regulation. Understanding digital emotion regulation can help support the rise of ethical technology design, development, and deployment. This article presents an overview of digital emotion regulation in social media applications, as well as a synthesis of recent research on emotion regulation interventions for social media. We share our findings from analysing state-of-the-art literature on how different social media applications are utilised at different stages in the process of emotion regulation.
    摘要 情绪调节是指意识地改变自己的情绪状态,包括内在的情绪状态如快乐、自信、罪愧、愤怒等。有效地调节情绪是日常生活中必要的。随着数字技术的普及,人们正在意识地利用这些技术来修改自己的情绪状态,这被称为数字情绪调节。了解数字情绪调节可以帮助促进伦理技术的设计、开发和投入。本文提供了社交媒体应用程序中数字情绪调节的概述,以及最新的研究成果表明在社交媒体上进行情绪调节的干预措施。我们分析了最新的文献,描述了不同的社交媒体应用程序在不同阶段的情绪调节过程中的使用。

Opinion Mining Using Population-tuned Generative Language Models

  • paper_url: http://arxiv.org/abs/2307.13173
  • repo_url: None
  • paper_authors: Allmin Susaiyah, Abhinay Pandya, Aki Härmä
  • for: 用于挖掘文本收集中的意见
  • methods: 使用生成语言模型,通过特定的方法和数据集进行训练
  • results: 可以学习和传递意见到semantic类,保持极性分布
    Abstract We present a novel method for mining opinions from text collections using generative language models trained on data collected from different populations. We describe the basic definitions, methodology and a generic algorithm for opinion insight mining. We demonstrate the performance of our method in an experiment where a pre-trained generative model is fine-tuned using specifically tailored content with unnatural and fully annotated opinions. We show that our approach can learn and transfer the opinions to the semantic classes while maintaining the proportion of polarisation. Finally, we demonstrate the usage of an insight mining system to scale up the discovery of opinion insights from a real text corpus.
    摘要 我们提出了一种新的方法,用于从文本集中挖掘意见使用生成语言模型,这些模型在不同的人口数据上进行训练。我们描述了基本定义、方法和一个通用的算法 для意见洞察挖掘。我们在实验中使用预训练的生成模型进行微调,使用特定的内容和假备注意意见。我们显示了我们的方法可以学习并传递意见到Semantic类中,同时保持极性分布。最后,我们示出了一个洞察挖掘系统可以扩大从真实文本集中发现意见洞察的能力。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Investigating the Robustness of Sequential Recommender Systems Against Training Data Perturbations: an Empirical Study

  • paper_url: http://arxiv.org/abs/2307.13165
  • repo_url: None
  • paper_authors: Filippo Betello, Federico Siciliano, Pushkar Mishra, Fabrizio Silvestri
  • for: 这个论文旨在研究Sequential Recommender Systems(SRSs)在训练数据中的稳定性,具体来说是研究在 temporally ordered sequence 中移除items的影响。
  • methods: 这个论文使用了两种不同的SRS模型,在多个数据集上进行了评估,使用了Normalized Discounted Cumulative Gain(NDCG)和Rank Sensitivity List metric来衡量性能。
  • results: 研究发现,在序列中移除items的末端位置会导致性能下降,NDCG下降可达60%,而从开头或中间位置移除items没有显著影响。这些发现表明考虑训练数据中items的位置是重要的,这将有助于设计更加稳定的SRSs。
    Abstract Sequential Recommender Systems (SRSs) have been widely used to model user behavior over time, but their robustness in the face of perturbations to training data is a critical issue. In this paper, we conduct an empirical study to investigate the effects of removing items at different positions within a temporally ordered sequence. We evaluate two different SRS models on multiple datasets, measuring their performance using Normalized Discounted Cumulative Gain (NDCG) and Rank Sensitivity List metrics. Our results demonstrate that removing items at the end of the sequence significantly impacts performance, with NDCG decreasing up to 60\%, while removing items from the beginning or middle has no significant effect. These findings highlight the importance of considering the position of the perturbed items in the training data and shall inform the design of more robust SRSs.
    摘要

Improving Primary Healthcare Workflow Using Extreme Summarization of Scientific Literature Based on Generative AI

  • paper_url: http://arxiv.org/abs/2307.15715
  • repo_url: None
  • paper_authors: Gregor Stiglic, Leon Kopitar, Lucija Gosak, Primoz Kocbek, Zhe He, Prithwish Chakraborty, Pablo Meyer, Jiang Bian
    for: 这个研究的目的是探究生成人工智能在减轻医疗专业人员的认知压力方面的潜力,以便更好地帮助他们保持最新的科学文献知识。methods: 这个研究使用了生成型人工智能技术,基于大规模语言模型,对科学报告摘要进行概括。results: 研究结果表明,使用生成型人工智能 для文献综述是高效的,可以减少医疗专业人员查找科学文献所需的时间。然而,研究还发现,当全文摘要不可用时,EXTRACTED知识的准确性会下降。这种破坏性技术有potential可以大幅减少医疗专业人员保持最新科学文献知识的时间,但是进一步的发展可能需要帮助他们更好地理解摘要中的知识。
    Abstract Primary care professionals struggle to keep up to date with the latest scientific literature critical in guiding evidence-based practice related to their daily work. To help solve the above-mentioned problem, we employed generative artificial intelligence techniques based on large-scale language models to summarize abstracts of scientific papers. Our objective is to investigate the potential of generative artificial intelligence in diminishing the cognitive load experienced by practitioners, thus exploring its ability to alleviate mental effort and burden. The study participants were provided with two use cases related to preventive care and behavior change, simulating a search for new scientific literature. The study included 113 university students from Slovenia and the United States randomized into three distinct study groups. The first group was assigned to the full abstracts. The second group was assigned to the short abstracts generated by AI. The third group had the option to select a full abstract in addition to the AI-generated short summary. Each use case study included ten retrieved abstracts. Our research demonstrates that the use of generative AI for literature review is efficient and effective. The time needed to answer questions related to the content of abstracts was significantly lower in groups two and three compared to the first group using full abstracts. The results, however, also show significantly lower accuracy in extracted knowledge in cases where full abstract was not available. Such a disruptive technology could significantly reduce the time required for healthcare professionals to keep up with the most recent scientific literature; nevertheless, further developments are needed to help them comprehend the knowledge accurately.
    摘要 We conducted a study with 113 university students from Slovenia and the United States, randomly assigned to three groups. The first group was given full abstracts, the second group was given AI-generated short summaries, and the third group had the option to choose a full abstract or the AI-generated summary. Each use case study included ten retrieved abstracts.Our findings show that using generative AI for literature review is efficient and effective. The time needed to answer questions related to the content of abstracts was significantly lower in groups two and three compared to the first group using full abstracts. However, the results also showed that accuracy in extracted knowledge was significantly lower when full abstracts were not available.This disruptive technology has the potential to significantly reduce the time required for healthcare professionals to keep up with the most recent scientific literature. However, further developments are needed to help them comprehend the knowledge accurately.

Why Don’t You Clean Your Glasses? Perception Attacks with Dynamic Optical Perturbations

  • paper_url: http://arxiv.org/abs/2307.13131
  • repo_url: None
  • paper_authors: Yi Han, Matthew Chan, Eric Wengrowski, Zhuohuan Li, Nils Ole Tippenhauer, Mani Srivastava, Saman Zonouz, Luis Garcia
  • for: 这篇论文的目的是研究攻击自适应系统中的机器学习模型,以及这些模型在物理世界中的攻击。
  • methods: 这篇论文使用了一种名为“EvilEye”的人在中渠攻击,利用透明屏幕生成动态物理攻击示例。这种攻击利用了相机的光学特性,在不同的照明条件下引起识别错误。
  • results: 实验表明,EvilEye生成的攻击示例在环境噪声和自适应系统的动态变化下表现得非常稳定,可以高效绕过当前物理世界攻击检测框架。此外,EvilEye可以针对不同的物体实现高度的攻击成功率。
    Abstract Camera-based autonomous systems that emulate human perception are increasingly being integrated into safety-critical platforms. Consequently, an established body of literature has emerged that explores adversarial attacks targeting the underlying machine learning models. Adapting adversarial attacks to the physical world is desirable for the attacker, as this removes the need to compromise digital systems. However, the real world poses challenges related to the "survivability" of adversarial manipulations given environmental noise in perception pipelines and the dynamicity of autonomous systems. In this paper, we take a sensor-first approach. We present EvilEye, a man-in-the-middle perception attack that leverages transparent displays to generate dynamic physical adversarial examples. EvilEye exploits the camera's optics to induce misclassifications under a variety of illumination conditions. To generate dynamic perturbations, we formalize the projection of a digital attack into the physical domain by modeling the transformation function of the captured image through the optical pipeline. Our extensive experiments show that EvilEye's generated adversarial perturbations are much more robust across varying environmental light conditions relative to existing physical perturbation frameworks, achieving a high attack success rate (ASR) while bypassing state-of-the-art physical adversarial detection frameworks. We demonstrate that the dynamic nature of EvilEye enables attackers to adapt adversarial examples across a variety of objects with a significantly higher ASR compared to state-of-the-art physical world attack frameworks. Finally, we discuss mitigation strategies against the EvilEye attack.
    摘要 摄像头基于自动化系统,模拟人类感知,在安全关键平台中得到普遍应用。因此,一个已经形成的文献出现,探讨机器学习模型的攻击。对于攻击者来说,在物理世界中进行攻击是有利的,因为这 eliminates the need to compromise digital systems。然而,物理世界具有对攻击修改的"生存性"问题,即环境噪声和自动化系统的动态性。在这篇论文中,我们采用了感知先采集的方法。我们介绍了一种基于透明显示器的人在中间攻击,称为EvilEye。EvilEye利用摄像头的光学来导致分类错误,并在不同的照明条件下实现高度的攻击成功率(ASR),并 circumvent state-of-the-art physical adversarial detection frameworks。我们的广泛实验表明,EvilEye生成的physical perturbations是对环境光度条件的变化具有更高的Robustness,相比之下,现有的物理扰动框架。我们还证明了EvilEye的动态性可以在不同的物体上实现更高的ASR,比之前的物理世界攻击框架。最后,我们讨论了对EvilEye攻击的防御策略。

A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe

  • paper_url: http://arxiv.org/abs/2307.14361
  • repo_url: None
  • paper_authors: Sanad Aburass, Osama Dorgham, Jamil Al Shaqsi
  • for: 这种研究是为了使用Kaggle的个性化医疗:再定义癌症治疗数据集来分类基因突变。
  • methods: 这个模型使用了LSTM、BiLSTM、CNN、GRU和GloVe ensemble模型来实现这一目标。
  • results: 这个模型的准确率、精度、准确率、F1分数和平均平方误差都高于所有其他模型,并且需要更少的训练时间,因此是性能和效率的完美结合。
    Abstract This study presents an ensemble model combining LSTM, BiLSTM, CNN, GRU, and GloVe to classify gene mutations using Kaggle's Personalized Medicine: Redefining Cancer Treatment dataset. The results were compared against well-known transformers like as BERT, Electra, Roberta, XLNet, Distilbert, and their LSTM ensembles. Our model outperformed all other models in terms of accuracy, precision, recall, F1 score, and Mean Squared Error. Surprisingly, it also needed less training time, resulting in a perfect combination of performance and efficiency. This study demonstrates the utility of ensemble models for difficult tasks such as gene mutation classification.
    摘要 Translation notes:* "LSTM" and "BiLSTM" were translated as "长ShortTermMemory" (CHángshòu Dàimengyī) and "双向LongShortTermMemory" (Shuāngxiàng CHángshòu Dàimengyī) respectively.* "CNN" was translated as "卷积神经网络" (Jiànpán Jīngxīn Wǎngwǎng)* "GRU" was translated as "幂等循环神经网络" (Jìdé Xiàngxīng Jīngxīn Wǎngwǎng)* "GloVe" was translated as "全球最佳 embeddings" (Quánqīu Zuìjiā Embeddings)* "BERT" was translated as "Bidirectional Encoder Representations from Transformers" (Bìxiàngdìng Jīngxīn Fāngyìng)* "Electra" was translated as "Electra: A Method for Estimating the Representation of a Set of Words" (Électra: A Method for Estimating the Representation of a Set of Words)* "Roberta" was translated as "Roberta: A Simple and Efficient Transformer for Language Understanding" (Roberta: A Simple and Efficient Transformer for Language Understanding)* "XLNet" was translated as "XLNet: Generalized Autoencoders for Language Understanding" (XLNet: Generalized Autoencoders for Language Understanding)* "Distilbert" was translated as "DistilBERT: Distilled BERT for Efficient and Compact Language Models" (DistilBERT: Distilled BERT for Efficient and Compact Language Models)* "ensemble" was translated as "组合" (Zǔzhōng)Please note that the translation is in Simplified Chinese, and the translation may vary depending on the context and the specific dialect.

Deep Bradley-Terry Rating: Quantifying Properties from Comparisons

  • paper_url: http://arxiv.org/abs/2307.13709
  • repo_url: None
  • paper_authors: Satoru Fujii
  • for: 该论文旨在解决实际世界中不直接可观察的许多属性的难题,通过使用grade human scores作为目标标签进行训练。
  • methods: 该论文提出了一种名为深度布莱德利-泰勒评分(DBTR)的新机器学习框架,该框架将布莱德利-泰勒模型 integrates into neural network structure,并在不平等环境下进行扩展。
  • results: 经过实验分析,DBTR成功地学习和估计所需的属性。
    Abstract Many properties in the real world can't be directly observed, making them difficult to learn. To deal with this challenging problem, prior works have primarily focused on estimating those properties by using graded human scores as the target label in the training. Meanwhile, rating algorithms based on the Bradley-Terry model are extensively studied to evaluate the competitiveness of players based on their match history. In this paper, we introduce the Deep Bradley-Terry Rating (DBTR), a novel machine learning framework designed to quantify and evaluate properties of unknown items. Our method seamlessly integrates the Bradley-Terry model into the neural network structure. Moreover, we generalize this architecture further to asymmetric environments with unfairness, a condition more commonly encountered in real-world settings. Through experimental analysis, we demonstrate that DBTR successfully learns to quantify and estimate desired properties.
    摘要 很多现实世界中的属性是直接观察不到的,使得学习变得困难。以前的工作主要是通过使用排名为目标标签进行训练来估算这些属性。而BRADLEY-TERRY模型的评分算法在评估玩家的竞技水平上广泛研究。在这篇论文中,我们介绍了深度BRADLEY-TERRY评分(DBTR),一种新的机器学习框架,用于评估和评价未知的物品属性。我们将BRADLEY-TERRY模型集成到神经网络结构中,并将其扩展到不平等环境下,更加符合实际世界中的情况。我们通过实验分析,证明DBTR可以成功地评估和估算所需的属性。

Getting pwn’d by AI: Penetration Testing with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00121
  • repo_url: https://github.com/ipa-lab/hackingBuddyGPT
  • paper_authors: Andreas Happe, Jürgen Cito
  • for: 该论文探讨了使用大语言模型(如GPT3.5)来补充安全测试人员,以增强安全测试的效率和质量。
  • methods: 论文采用了高级语言模型进行具体的任务规划和低级漏洞搜索两种使用场景,并实现了在虚拟机上实现了封闭反馈循环,让LLM分析机器状态并提供攻击方式。
  • results: 论文初步结果显示,使用大语言模型可以帮助提高安全测试的效率和质量,并且可以帮助找到一些潜在的漏洞。但是,论文还需要进一步的改进和优化。
    Abstract The field of software security testing, more specifically penetration testing, is an activity that requires high levels of expertise and involves many manual testing and analysis steps. This paper explores the potential usage of large-language models, such as GPT3.5, to augment penetration testers with AI sparring partners. We explore the feasibility of supplementing penetration testers with AI models for two distinct use cases: high-level task planning for security testing assignments and low-level vulnerability hunting within a vulnerable virtual machine. For the latter, we implemented a closed-feedback loop between LLM-generated low-level actions with a vulnerable virtual machine (connected through SSH) and allowed the LLM to analyze the machine state for vulnerabilities and suggest concrete attack vectors which were automatically executed within the virtual machine. We discuss promising initial results, detail avenues for improvement, and close deliberating on the ethics of providing AI-based sparring partners.
    摘要 field 软件安全测试,具体来说是渗透测试,需要高水平的专业知识和多种手动测试和分析步骤。这篇论文探讨使用大型自然语言模型,如GPT3.5,来补充渗透测试员的人工智能对手。我们探讨在两个不同的用例中使用AI模型:一是高级任务规划 для安全测试任务,二是低级漏洞搜寻在易于攻击的虚拟机中。对于后一个,我们实现了关闭反馈循环,使用SSH连接到易于攻击的虚拟机,让LLM分析机器状态以找到漏洞并建议具体的攻击方式,然后自动在虚拟机中执行。我们讨论了初步的结果,详细描述改进的方向,并关于提供AI基本对手的伦理问题。

An Explainable Geometric-Weighted Graph Attention Network for Identifying Functional Networks Associated with Gait Impairment

  • paper_url: http://arxiv.org/abs/2307.13108
  • repo_url: https://github.com/favour-nerrise/xgw-gat
  • paper_authors: Favour Nerrise, Qingyu Zhao, Kathleen L. Poston, Kilian M. Pohl, Ehsan Adeli
  • for: 这个研究的目的是为了更好地理解parkinson病的motor进程,以开发更有效和个性化的治疗方法。
  • methods: 这个研究使用了一种可解释的、几何的、weighted-graph注意力神经网络(xGW-GAT),用于预测parkinson病患者的跑步困难程度。
  • results: xGW-GAT模型可以从resting-state功能MRI数据中提取出跑步困难相关的功能连接图,并且可以提供可解释的功能子网络,对于parkinson病患者的motor困难提供了解释。
    Abstract One of the hallmark symptoms of Parkinson's Disease (PD) is the progressive loss of postural reflexes, which eventually leads to gait difficulties and balance problems. Identifying disruptions in brain function associated with gait impairment could be crucial in better understanding PD motor progression, thus advancing the development of more effective and personalized therapeutics. In this work, we present an explainable, geometric, weighted-graph attention neural network (xGW-GAT) to identify functional networks predictive of the progression of gait difficulties in individuals with PD. xGW-GAT predicts the multi-class gait impairment on the MDS Unified PD Rating Scale (MDS-UPDRS). Our computational- and data-efficient model represents functional connectomes as symmetric positive definite (SPD) matrices on a Riemannian manifold to explicitly encode pairwise interactions of entire connectomes, based on which we learn an attention mask yielding individual- and group-level explainability. Applied to our resting-state functional MRI (rs-fMRI) dataset of individuals with PD, xGW-GAT identifies functional connectivity patterns associated with gait impairment in PD and offers interpretable explanations of functional subnetworks associated with motor impairment. Our model successfully outperforms several existing methods while simultaneously revealing clinically-relevant connectivity patterns. The source code is available at https://github.com/favour-nerrise/xGW-GAT .
    摘要 Our model represents functional connectomes as symmetric positive definite (SPD) matrices on a Riemannian manifold to explicitly encode pairwise interactions of entire connectomes. Based on this, we learn an attention mask that yields individual- and group-level explainability. Applied to our resting-state functional MRI (rs-fMRI) dataset of individuals with PD, xGW-GAT identifies functional connectivity patterns associated with gait impairment in PD and provides interpretable explanations of functional subnetworks associated with motor impairment. Our model outperforms several existing methods while providing clinically relevant connectivity patterns. The source code is available at https://github.com/favour-nerrise/xGW-GAT.

How to use LLMs for Text Analysis

  • paper_url: http://arxiv.org/abs/2307.13106
  • repo_url: https://github.com/cssmodels/howtousellms
  • paper_authors: Petter Törnberg
  • for: 这篇论文是用于介绍大语言模型(LLM)在社会科学中的应用。
  • methods: 论文使用Python语言和API进行文本分析,包括文本标注和分类、情感分析和批判话语分析等多种任务。
  • results: 论文通过使用LLM来分析政治文本,并成功地超越了现有的状态。
    Abstract This guide introduces Large Language Models (LLM) as a highly versatile text analysis method within the social sciences. As LLMs are easy-to-use, cheap, fast, and applicable on a broad range of text analysis tasks, ranging from text annotation and classification to sentiment analysis and critical discourse analysis, many scholars believe that LLMs will transform how we do text analysis. This how-to guide is aimed at students and researchers with limited programming experience, and offers a simple introduction to how LLMs can be used for text analysis in your own research project, as well as advice on best practices. We will go through each of the steps of analyzing textual data with LLMs using Python: installing the software, setting up the API, loading the data, developing an analysis prompt, analyzing the text, and validating the results. As an illustrative example, we will use the challenging task of identifying populism in political texts, and show how LLMs move beyond the existing state-of-the-art.
    摘要 这个指南介绍大语言模型(LLM)作为社会科学中高度灵活的文本分析方法。由于LLM是容易使用、便宜、快速并可应用于广泛的文本分析任务,从文本标注和分类到情感分析和批判性文本分析,许多学者认为LLM会改变我们如何进行文本分析。这本引导是向没有programming经验的学生和研究人员的,提供了使用Python来进行文本分析的简单入门,以及最佳实践的建议。我们将通过每个步骤来分析文本数据使用LLM,包括安装软件、设置API、加载数据、开发分析提示、分析文本和验证结果。作为一个示例,我们使用政治文本中的 populism 识别任务,并示出如何使用LLM超越现有的状态。

Contrastive Example-Based Control

  • paper_url: http://arxiv.org/abs/2307.13101
  • repo_url: https://github.com/khatch31/laeo
  • paper_authors: Kyle Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov, Sergey Levine, Chelsea Finn
  • for: 这篇论文的目的是提出一种基于实例的控制方法,可以在无线务动态环境中学习Q值函数。
  • methods: 这种方法使用数据驱动的方法,从转移动力和高返回状态中学习一个隐式模型,而不是直接学习奖励函数。
  • results: 对比基线方法,这种方法在多种状态基于和图像基于的离线控制任务中表现出色,并且在数据集大小增加时显示了更好的稳定性和扩展性。
    Abstract While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.
    摘要 虽然许多实际问题可以借助强化学习解决,但这些问题很少遵循MDP模型:与环境交互往往是昂贵的,并且指定奖励函数是困难的。受这些挑战的推动,先前的工作已经开发出了基于数据的方法,这些方法通过从转移动力学中采样而学习,并使用高奖状态的示例来标记转移。这些方法可以在许多任务上达到良好的结果,但它们可能复杂,需要减少和时间差更新。在这篇论文中,我们提出了一种没有奖励函数的离线控制方法,这种方法学习了多步转移的隐藏模型,而不是奖励函数。我们证明这种隐藏模型可以表示离线控制问题中的Q值。在一系列基于状态和图像的离线控制任务上,我们的方法超过了基准值,并且进行了附加的robustness和数据集大小的扩展试验。

Comparative Analysis of Drug-GPT and ChatGPT LLMs for Healthcare Insights: Evaluating Accuracy and Relevance in Patient and HCP Contexts

  • paper_url: http://arxiv.org/abs/2307.16850
  • repo_url: None
  • paper_authors: Giorgos Lysandrou, Roma English Owen, Kirsty Mursec, Grant Le Brun, Elizabeth A. L. Fairley
  • for: 这个研究旨在比较三个生成预训练变换器(GPT)解决方案在问答(Q&A) Setting中的表现:Drug-GPT 3、Drug-GPT 4 和 ChatGPT,以医疗应用场景为背景。研究的目的是确定哪一个模型可以在涉及到患有过敏性皮肤炎(AD)患者经验和医疗专业人员(HCP)关于糖尿病讨论中提供最准确和有 relevance 的答案。
  • methods: 这个研究使用了三个GPT模型:Drug-GPT 3、Drug-GPT 4 和 ChatGPT,通过精心编辑的患者和医疗专业人员社交媒体和讨论区域的数据来支持这三个模型。
  • results: 研究结果表明,三个模型都能生成有 relevance 和准确的答案,但Drug-GPT 3 和 Drug-GPT 4 通过使用专门编辑的患者和医疗专业人员社交媒体和讨论区域数据,为患者和医疗专业人员提供了更加有target 和深入的报告。ChatGPT 是一个更通用的模型,可以为读者提供高度概括的了解这些主题,但可能缺乏Drug-GPT模型所具备的深度和个人经验。
    Abstract This study presents a comparative analysis of three Generative Pre-trained Transformer (GPT) solutions in a question and answer (Q&A) setting: Drug-GPT 3, Drug-GPT 4, and ChatGPT, in the context of healthcare applications. The objective is to determine which model delivers the most accurate and relevant information in response to prompts related to patient experiences with atopic dermatitis (AD) and healthcare professional (HCP) discussions about diabetes. The results demonstrate that while all three models are capable of generating relevant and accurate responses, Drug-GPT 3 and Drug-GPT 4, which are supported by curated datasets of patient and HCP social media and message board posts, provide more targeted and in-depth insights. ChatGPT, a more general-purpose model, generates broader and more general responses, which may be valuable for readers seeking a high-level understanding of the topics but may lack the depth and personal insights found in the answers generated by the specialized Drug-GPT models. This comparative analysis highlights the importance of considering the language model's perspective, depth of knowledge, and currency when evaluating the usefulness of generated information in healthcare applications.
    摘要 Translation notes:* "Atopic dermatitis" (AD) is translated as "恶性皮肤炎" (éviation skin rash)* "Healthcare professional" (HCP) is translated as "医疗专业人员" (yījīu zhōngyè rényuè)* "Curated datasets" is translated as "精选数据集" (jīngxuǎn numérique)* "Social media and message board posts" is translated as "社交媒体和讨论版块" (shèjiāo tiēdī yǔ tǎo luó bǎ)* "General-purpose model" is translated as "通用模型" (tōngyòng módeli)* "Specialized models" is translated as "专业模型" (zhuāngyè módeli)

Making Metadata More FAIR Using Large Language Models

  • paper_url: http://arxiv.org/abs/2307.13085
  • repo_url: None
  • paper_authors: Sowmya S. Sundaram, Mark A. Musen
  • for: 这篇论文是为了解决实验数据中的metadata问题,尤其是对于不同的metadata数据进行比较和分组。
  • methods: 这篇论文使用自然语言处理(NLP)技术,开发了一个名为FAIRMetaText的应用程序,可以比较metadata中的自然语言描述,并提供一个数学性相似度的衡量方法,以便对metadata进行分组或找到相似的替代词。
  • results: 这篇论文透过对公开 available的研究artifacts进行详细的研究,证明了FAIRMetaText的算法可以大幅提高metadata相关的任务,包括搜寻、分组和替代词等。
    Abstract With the global increase in experimental data artifacts, harnessing them in a unified fashion leads to a major stumbling block - bad metadata. To bridge this gap, this work presents a Natural Language Processing (NLP) informed application, called FAIRMetaText, that compares metadata. Specifically, FAIRMetaText analyzes the natural language descriptions of metadata and provides a mathematical similarity measure between two terms. This measure can then be utilized for analyzing varied metadata, by suggesting terms for compliance or grouping similar terms for identification of replaceable terms. The efficacy of the algorithm is presented qualitatively and quantitatively on publicly available research artifacts and demonstrates large gains across metadata related tasks through an in-depth study of a wide variety of Large Language Models (LLMs). This software can drastically reduce the human effort in sifting through various natural language metadata while employing several experimental datasets on the same topic.
    摘要 global 实验数据的增加,将它们集成一起是一个主要障碍 - 坏的metadata。为了bridging这个差距,这个工作提出了一个基于自然语言处理(NLP)的应用程序,called FAIRMetaText,它比较metadata的自然语言描述。具体来说,FAIRMetaText使用自然语言描述来提供两个条件之间的数学相似度测量。这个测量可以用来分析不同的metadata,提供符合性检查或组织相似的条件。这个软件可以帮助大幅提高人工过滤不同主题的自然语言metadata的时间和努力。Here's a breakdown of the translation:* global 实验数据 (global experimental data) becomes 实验数据的增加 (increase in experimental data)* 将它们集成一起 (harnessing them in a unified fashion) becomes 将它们集成一起 (collecting them together)* 坏的metadata (bad metadata) becomes 坏的metadata (incorrect or incomplete metadata)* bridging 这个差距 (bridging the gap) becomes 帮助大幅提高 (helping to greatly improve)* 这个工作 (this work) becomes 这个软件 (this software)* called FAIRMetaText (called FAIRMetaText) becomes 叫做 FAIRMetaText (called FAIRMetaText)* 比较metadata (compare metadata) becomes 比较metadata (compare metadata)* 自然语言描述 (natural language description) becomes 自然语言描述 (natural language description)* 提供两个条件之间的数学相似度测量 (provide a mathematical similarity measure between two terms) becomes 提供两个条件之间的数学相似度测量 (provide a mathematical similarity measure between two terms)* 这个测量可以用来 (this measurement can be used to) becomes 这个测量可以用来 (this measurement can be used to)* 分析不同的metadata (analyze varied metadata) becomes 分析不同的metadata (analyze different metadata)* 提供符合性检查 (provide compliance checks) becomes 提供符合性检查 (provide compliance checks)* 组织相似的条件 (group similar terms) becomes 组织相似的条件 (group similar terms)I hope this helps! Let me know if you have any further questions or if you'd like me to translate anything else.

Fairness Under Demographic Scarce Regime

  • paper_url: http://arxiv.org/abs/2307.13081
  • repo_url: None
  • paper_authors: Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji
  • for: 提高模型的公平性和准确性之间的贸易offs
  • methods: 引入不确定性认识,并在具有最低不确定性的样本上遵循公平性约束
  • results: 比 класси型的属性分类器更好地平衡公平性和准确性,并且在一些实际场景下超过使用真实敏感属性的模型。
    Abstract Most existing works on fairness assume the model has full access to demographic information. However, there exist scenarios where demographic information is partially available because a record was not maintained throughout data collection or due to privacy reasons. This setting is known as demographic scarce regime. Prior research have shown that training an attribute classifier to replace the missing sensitive attributes (proxy) can still improve fairness. However, the use of proxy-sensitive attributes worsens fairness-accuracy trade-offs compared to true sensitive attributes. To address this limitation, we propose a framework to build attribute classifiers that achieve better fairness-accuracy trade-offs. Our method introduces uncertainty awareness in the attribute classifier and enforces fairness on samples with demographic information inferred with the lowest uncertainty. We show empirically that enforcing fairness constraints on samples with uncertain sensitive attributes is detrimental to fairness and accuracy. Our experiments on two datasets showed that the proposed framework yields models with significantly better fairness-accuracy trade-offs compared to classic attribute classifiers. Surprisingly, our framework outperforms models trained with constraints on the true sensitive attributes.
    摘要 Our method introduces uncertainty awareness in the attribute classifier and enforces fairness on samples with demographic information inferred with the lowest uncertainty. We show empirically that enforcing fairness constraints on samples with uncertain sensitive attributes is detrimental to fairness and accuracy. Our experiments on two datasets showed that the proposed framework yields models with significantly better fairness-accuracy trade-offs compared to classic attribute classifiers. Surprisingly, our framework outperforms models trained with constraints on the true sensitive attributes.In simplified Chinese:大多数现有的公平研究假设模型拥有完整的人口信息。然而,有些场景中人口信息只有部分可用,如数据采集或隐私问题,这种情况被称为人口缺乏 режим。先前的研究表明,使用代理敏感特征来取代缺失的人口信息可以提高公平性。然而,使用代理敏感特征会对公平精度负面影响。为解决这些限制,我们提出了一个框架,用于建立具有更好的公平精度负面的属性分类器。我们的方法会在属性分类器中引入不确定性意识,并在拥有最低不确定性的人口信息上遵循公平约束。我们的实验表明,对不确定的敏感特征进行公平约束是对公平和准确性的负面影响。我们的方案在两个 datasets 上进行了实验,结果表明,我们的框架可以在公平精度负面上取得显著更好的性能,并且超过使用真实敏感特征进行约束的模型。这种情况下,我们的框架可以更好地处理人口缺失的问题。

Adaptive Certified Training: Towards Better Accuracy-Robustness Tradeoffs

  • paper_url: http://arxiv.org/abs/2307.13078
  • repo_url: None
  • paper_authors: Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard
  • for: 本研究旨在提高深度学习模型的可靠性,尤其是在实际应用中。
  • methods: 我们提出了一种基于适应证明半径的新训练方法,以提高模型的标准准确率和鲁棒性。
  • results: 我们在MNIST、CIFAR-10和TinyImageNet datasets上进行了实验,并证明了我们的方法可以提高模型的鲁棒性,并且在标准准确率保持不变的情况下提高模型的鲁棒性。特别是在CIFAR-10和TinyImageNet上,我们的方法可以提高模型的鲁棒性至多两倍,并且在同等标准准确率水平下。
    Abstract As deep learning models continue to advance and are increasingly utilized in real-world systems, the issue of robustness remains a major challenge. Existing certified training methods produce models that achieve high provable robustness guarantees at certain perturbation levels. However, the main problem of such models is a dramatically low standard accuracy, i.e. accuracy on clean unperturbed data, that makes them impractical. In this work, we consider a more realistic perspective of maximizing the robustness of a model at certain levels of (high) standard accuracy. To this end, we propose a novel certified training method based on a key insight that training with adaptive certified radii helps to improve both the accuracy and robustness of the model, advancing state-of-the-art accuracy-robustness tradeoffs. We demonstrate the effectiveness of the proposed method on MNIST, CIFAR-10, and TinyImageNet datasets. Particularly, on CIFAR-10 and TinyImageNet, our method yields models with up to two times higher robustness, measured as an average certified radius of a test set, at the same levels of standard accuracy compared to baseline approaches.
    摘要

LLM-Rec: Personalized Recommendation via Prompting Large Language Models

  • paper_url: http://arxiv.org/abs/2307.15780
  • repo_url: None
  • paper_authors: Hanjia Lyu, Song Jiang, Hanqing Zeng, Qifan Wang, Si Zhang, Ren Chen, Chris Leung, Jiajie Tang, Yinglong Xia, Jiebo Luo
  • for: 提高个性化推荐性能
  • methods: 使用大语言模型(LLM)输入增强strategies,包括基本提示、推荐驱动提示、参与度引导提示和推荐驱动+参与度引导提示
  • results: 结合LLM生成的增强输入文本后,个性化推荐性能得到提高,推荐驱动和参与度引导提示策略可以启动LLM理解全球和本地项目特点。
    Abstract We investigate various prompting strategies for enhancing personalized recommendation performance with large language models (LLMs) through input augmentation. Our proposed approach, termed LLM-Rec, encompasses four distinct prompting strategies: (1) basic prompting, (2) recommendation-driven prompting, (3) engagement-guided prompting, and (4) recommendation-driven + engagement-guided prompting. Our empirical experiments show that incorporating the augmented input text generated by LLM leads to improved recommendation performance. Recommendation-driven and engagement-guided prompting strategies are found to elicit LLM's understanding of global and local item characteristics. This finding highlights the importance of leveraging diverse prompts and input augmentation techniques to enhance the recommendation capabilities with LLMs.
    摘要 我们研究了多种提示策略,以提高大语言模型(LLM)个性化推荐性能。我们提出的方法,称之为LLM-Rec,包括四种不同的提示策略:(1)基础提示,(2)推荐驱动提示,(3)参与指导提示,和(4)推荐驱动+参与指导提示。我们的实验表明,通过将LLM生成的增强输入文本 integrating 到推荐系统中,可以提高推荐性能。推荐驱动和参与指导提示策略可以引导LLM理解全球和本地项目特征。这一发现强调了利用多种提示和输入增强技术,以提高LLM推荐能力。

Personalized Category Frequency prediction for Buy It Again recommendations

  • paper_url: http://arxiv.org/abs/2308.01195
  • repo_url: None
  • paper_authors: Amit Pande, Kunal Ghosh, Rankyung Park
  • for: 这个论文的目的是提出一种基于个性化类别的推荐系统,以帮助零售商提高用户体验和网站参与度。
  • methods: 该论文提出了一种叫做层次PCIC模型,它包括个性化类别模型(PC模型)和个性化类别下的项模型(IC模型)。PC模型生成了个性化的类别列表,IC模型排名类别下的项目。这个模型使用生存分布模型和时间序列模型来捕捉产品的通用消耗率和趋势。
  • results: 相比十二个基准模型,PCIC提高了NDCG达16%,同时提高了回归率约2%。PCIC可以在大规模数据集上进行批量训练(耗时8个小时),并在一家大型零售商的官方网站上进行AB测试,导致用户参与度得到了显著提高。
    Abstract Buy It Again (BIA) recommendations are crucial to retailers to help improve user experience and site engagement by suggesting items that customers are likely to buy again based on their own repeat purchasing patterns. Most existing BIA studies analyze guests personalized behavior at item granularity. A category-based model may be more appropriate in such scenarios. We propose a recommendation system called a hierarchical PCIC model that consists of a personalized category model (PC model) and a personalized item model within categories (IC model). PC model generates a personalized list of categories that customers are likely to purchase again. IC model ranks items within categories that guests are likely to consume within a category. The hierarchical PCIC model captures the general consumption rate of products using survival models. Trends in consumption are captured using time series models. Features derived from these models are used in training a category-grained neural network. We compare PCIC to twelve existing baselines on four standard open datasets. PCIC improves NDCG up to 16 percent while improving recall by around 2 percent. We were able to scale and train (over 8 hours) PCIC on a large dataset of 100M guests and 3M items where repeat categories of a guest out number repeat items. PCIC was deployed and AB tested on the site of a major retailer, leading to significant gains in guest engagement.
    摘要 Buy It Again(BIA)建议对零售商非常重要,可以帮助提高用户体验和网站参与度,通过建议客户可能会再次购买的商品,基于客户的重复购买模式。大多数现有的BIA研究分析客人个性化行为的项目粒度。我们提出了一种推荐系统,即层次PCIC模型,它包括个性化类别模型(PC模型)和个性化类别内项模型(IC模型)。PC模型生成了客户可能会购买的个性化类别列表。IC模型在类别内排名客户可能会消耗的项目。层次PCIC模型捕捉了产品的总消耗率,使用生存模型记录时间序列模型。这些模型中的特征被用于训练类别粒度的神经网络。我们与12个基准模型进行比较,PCIC提高了NDCG达16%,同时提高了回归率约2%。我们可以在8小时内扩展和训练PCIC模型(100万客户和300万项目),并在一家大型零售商的官方网站上部署PCIC模型,导致用户参与度显著增长。

Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

  • paper_url: http://arxiv.org/abs/2307.12983
  • repo_url: https://github.com/Improbable-AI/pql
  • paper_authors: Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
  • for: 这个论文是为了提高复杂任务的强化学习效率,特别是利用高性能的GPU加速器进行数据采集和训练。
  • methods: 这个论文使用了并行$Q$-学习(PQL)算法,该算法可以并行采集数据、学习策略和价值函数,从而提高强化学习的效率。
  • results: 该论文通过实验表明,使用PQL算法可以在短短的wall-clock时间内完成复杂任务的强化学习训练,并且能够保持偏离策略的数据效率。此外,论文还 investigate了强化学习学习速度的关键因素。
    Abstract Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that $Q$-learning can be scaled to \textit{tens of thousands of parallel environments} and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.
    摘要 <>将文本翻译成简化中文。<>因为复杂任务需要大量训练数据,因此强化学习需要较长的时间。 latest advances in GPU-based simulation, such as Isaac Gym, have sped up data collection by thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in longer wall-clock training time. This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining the superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Unlike prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that $Q$-learning can be scaled to tens of thousands of parallel environments and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.

3D-LLM: Injecting the 3D World into Large Language Models

  • paper_url: http://arxiv.org/abs/2307.12981
  • repo_url: https://github.com/UMass-Foundation-Model/3D-LLM
  • paper_authors: Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
  • for: This paper is written for proposing a new family of 3D language models (3D-LLMs) that can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks.
  • methods: The paper uses three types of prompting mechanisms to collect over 300k 3D-language data covering tasks such as captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and more. The paper also utilizes a 3D feature extractor to obtain 3D features from rendered multi-view images, and uses 2D VLMs as the backbone to train the 3D-LLMs. Additionally, the paper introduces a 3D localization mechanism to better capture 3D spatial information.
  • results: The paper shows that the proposed 3D-LLMs outperform state-of-the-art baselines on the ScanQA dataset, with a BLEU-1 score that surpasses the state-of-the-art score by 9%. Additionally, the paper shows that the 3D-LLMs outperform 2D VLMs on held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue, and provides qualitative examples of the model’s ability to perform tasks beyond the scope of existing LLMs and VLMs.Here’s the format you requested:
  • for: 这篇论文是为了提出一种新的3D语言模型(3D-LLMs),可以将3D点云和其特征作为输入,并执行多种3D相关任务。
  • methods: 论文使用三种提问机制来收集超过300k个3D语言数据,覆盖包括captioning、dense captioning、3D问答、任务分解、3D静止、3D-assisted dialog、导航等任务。论文还利用了一种3D特征提取器来从渲染多视图图像中提取3D特征。在训练3D-LLMs时,论文使用2D VLMs作为基础。此外,论文还引入了3D地址机制,使3D-LLMs更好地捕捉3D空间信息。
  • results: 论文显示,提出的3D-LLMs在ScanQA数据集上超过了状态艺术基线,BLEU-1分数高于状态艺术分数 by 9%。此外,论文还显示,3D-LLMs在固定数据集上超过了2D VLMs,并提供了质量例子,表明模型可以完成超出现有LLMs和VLMs的任务。
    Abstract Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs. Project Page: : https://vis-www.cs.umass.edu/3dllm/.
    摘要 大型语言模型(LLM)和视力语言模型(VLM)已经证明可以在多个任务上表现出色,如常识理解。尽管这些模型强大,但它们不是基于3D物理世界的,这个世界包括更加复杂的概念,如空间关系、可用性、物理、布局等。在这项工作中,我们提议将3D世界注入到大型语言模型中,并 introduce a whole new family of 3D-LLMs。Specifically, 3D-LLMs can take 3D点云和其特征作为输入,并执行一系列3D相关任务,包括captioning、dense captioning、3D问答、任务分解、3D定位、3D辅助对话、导航等。通过我们设计的三种提示机制,我们能够收集超过300k的3D语言数据覆盖这些任务。为有效地训练3D-LLMs,我们首先利用3D特征EXTRACTOR提取3D特征从渲染多视图图像中。然后,我们使用2D VLMs作为我们的背部来训练我们的3D-LLMs。通过引入3D本地化机制,3D-LLMs可以更好地捕捉3D空间信息。ScanQA实验结果显示,我们的模型超过了状态机的基准值(例如BLEU-1分数超过了状态机的分数 by 9%)。此外,我们在我们保留的数据集上进行3D captioning、任务组合和3D辅助对话的实验,我们的模型超过了2D VLMs。质量例子也表明我们的模型可以完成更多的任务,超出现有LLMs和VLMs的范围。项目页面:https://vis-www.cs.umass.edu/3dllm/。

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.12968
  • repo_url: https://github.com/ben-eysenbach/ac-connection
  • paper_authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov
  • for: 这个论文的目的是解释一些离线RL算法的正则化方法,以提高其性能。
  • methods: 这个论文使用了一些常用的离线RL算法,如CQL和一步RL,并对它们进行了正则化。
  • results: 研究发现,使用一步RL可以得到类似于critic正则化的性能,但是需要更多的计算资源。而在实际应用中,使用一步RL可以实现稳定和简单的RL方法,但是其性能可能不及critic正则化。
    Abstract As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, truncate policy iteration after just one step. This ``early stopping'' makes one-step RL simple and stable, but can limit its asymptotic performance. Critic regularization typically requires more compute but has appealing lower-bound guarantees. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While practical implementations violate our assumptions and critic regularization is typically applied with smaller regularization coefficients, our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters. Our results that every problem can be solved with a single step of policy improvement, but rather that one-step RL might be competitive with critic regularization on RL problems that demand strong regularization.
    摘要 “与有限数据的机器学习问题相似,有效的离线RL算法需要仔细的规则化以避免过拟合。一步方法在做出一步策略改进后就结束,而批处规则化方法则在多个步骤策略改进中使用规则化目标。这些方法看起来很不同。一步方法,如偏好权重回归和 conditional behavioral cloning,在做出一步策略改进后就结束。这种``早期停止''使得一步RL简单和稳定,但可能限制其极限性能。批处规则化通常需要更多的计算资源,但它具有吸引人的下界保证。在这篇论文中,我们将一步和批处规则化方法之间 Draw a close connection:在应用多步批处规则化方法时,使用规则化系数为1就等于一步RL。虽然实践中的假设不符合我们的假设,但我们的实验表明,我们的分析对实际的离线RL方法(CQL和一步RL)的实现进行了准确和可靠的预测。我们的结果表明,每个问题都可以通过一步策略改进来解决,但是一步RL可能与批处规则化在RL问题上具有强规则化的情况下竞争。”

Enhancing image captioning with depth information using a Transformer-based framework

  • paper_url: http://arxiv.org/abs/2308.03767
  • repo_url: None
  • paper_authors: Aya Mahmoud Ahmed, Mohamed Yousef, Khaled F. Hussain, Yousef Bassyouni Mahdy
  • for: 提高图像描述性能
  • methods: 使用 transformer 架构,RGB 图像和其对应的深度图像进行共同描述
  • results: 在 NYU-v2 和 Stanford 图像段落描述数据集上实现了提高描述性能,并提出了一个更正版的 NYU-v2 数据集。Here’s the full Chinese text:
  • for: 本文 investigate Whether integrating depth information with RGB images can enhance the captioning task and generate better descriptions.
  • methods: 我们提出了一个基于 transformer 架构的 RGB 图像和其对应的深度图像共同描述框架。
  • results: 我们在 NYU-v2 和 Stanford 图像段落描述数据集上实现了提高描述性能,并提出了一个更正版的 NYU-v2 数据集。
    Abstract Captioning images is a challenging scene-understanding task that connects computer vision and natural language processing. While image captioning models have been successful in producing excellent descriptions, the field has primarily focused on generating a single sentence for 2D images. This paper investigates whether integrating depth information with RGB images can enhance the captioning task and generate better descriptions. For this purpose, we propose a Transformer-based encoder-decoder framework for generating a multi-sentence description of a 3D scene. The RGB image and its corresponding depth map are provided as inputs to our framework, which combines them to produce a better understanding of the input scene. Depth maps could be ground truth or estimated, which makes our framework widely applicable to any RGB captioning dataset. We explored different fusion approaches to fuse RGB and depth images. The experiments are performed on the NYU-v2 dataset and the Stanford image paragraph captioning dataset. During our work with the NYU-v2 dataset, we found inconsistent labeling that prevents the benefit of using depth information to enhance the captioning task. The results were even worse than using RGB images only. As a result, we propose a cleaned version of the NYU-v2 dataset that is more consistent and informative. Our results on both datasets demonstrate that the proposed framework effectively benefits from depth information, whether it is ground truth or estimated, and generates better captions. Code, pre-trained models, and the cleaned version of the NYU-v2 dataset will be made publically available.
    摘要 captioning图像是一个复杂的Scene理解任务,搭配计算机视觉和自然语言处理。虽然图像描述模型已经在生成出excelente描述,但这个领域主要集中在生成2D图像的单个句子。这篇论文 investigates whether integrating depth信息withRGB图像可以提高描述任务并生成更好的描述。为此,我们提出了一个基于Transformer架构的encoder-decoder框架,用于生成3D场景的多句子描述。RGB图像和其相应的深度图被提供为我们框架的输入,我们将它们结合以生成更好的对输入场景的理解。深度图可以是真实的或估计的,使我们的框架适用于任何RGB描述数据集。我们实现了不同的融合方法来融合RGB和深度图像。实验在NYU-v2数据集和Stanford图像段落描述数据集进行。在我们的NYU-v2数据集工作中,我们发现了不一致的标签,这阻碍了使用深度信息提高描述任务的好处。结果甚至比使用RGB图像alone更差。因此,我们提出了一个更正版的NYU-v2数据集,其标签更加一致和有用。我们的结果在两个数据集上表明,我们的提案的框架可以受益于深度信息,无论是真实的或估计的,并生成更好的描述。代码、预训练模型和更正版的NYU-v2数据集将公开发布。

RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment

  • paper_url: http://arxiv.org/abs/2307.12950
  • repo_url: https://github.com/facebookresearch/rlcd
  • paper_authors: Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian
  • for: 本研究旨在开发一种不使用人类反馈的自然语言原则Alignment方法,以提高语言模型的表现。
  • methods: 本方法使用模拟的偏好对,包括高质量和低质量示例,通过对比正向和负向提示来训练偏好模型。然后,使用奖励学习来改进基础不aligned语言模型。
  • results: 实验表明,RLCD方法在三种多样化的对齐任务中(无害性、有益性、故事简 outline生成)均表现出色,并在7B和30B模型缩放下表现出超过RLAIF(Bai et al., 2022b)和上下文混合(Huang et al., 2022)基eline的result。
    Abstract We propose Reinforcement Learning from Contrast Distillation (RLCD), a method for aligning language models to follow natural language principles without using human feedback. RLCD trains a preference model using simulated preference pairs that contain both a high-quality and low-quality example, generated using contrasting positive and negative prompts. The preference model is then used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022b) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks--harmlessness, helpfulness, and story outline generation--and on both 7B and 30B model scales for preference data simulation.
    摘要 我们提议一种基于强化学习的自然语言原则对齐方法,称为强化学习自然语言原则(RLCD)。RLCD使用模拟的偏好对使用了对比正反例的高质量和低质量示例,通过偏好模型进行改进。我们在三种多样化的对齐任务中(无害、有用和故事笔记生成) empirically 证明RLCD超过RLAIF(Bai et al., 2022b)和语音维度(Huang et al., 2022)基elines,并在7B和30B模型缩放下进行偏好数据模拟。

On Privileged and Convergent Bases in Neural Network Representations

  • paper_url: http://arxiv.org/abs/2307.12941
  • repo_url: None
  • paper_authors: Davis Brown, Nikhil Vyas, Yamini Bansal
  • for: 本研究探究 neural network 学习的表示方式是否具有特权和共同基准。
  • methods: 研究使用各个神经元表示的特征方向的重要性。
  • results: 发现 neural network 表示不具有完全旋转不变性,并且在不同初始化的情况下,多个基准可以实现相同的性能。
    Abstract In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish that arbitrary rotations of neural representations cannot be inverted (unlike linear networks), indicating that they do not exhibit complete rotational invariance. Subsequently, we explore the possibility of multiple bases achieving identical performance. To do this, we compare the bases of networks trained with the same parameters but with varying random initializations. Our study reveals two findings: (1) Even in wide networks such as WideResNets, neural networks do not converge to a unique basis; (2) Basis correlation increases significantly when a few early layers of the network are frozen identically. Furthermore, we analyze Linear Mode Connectivity, which has been studied as a measure of basis correlation. Our findings give evidence that while Linear Mode Connectivity improves with increased network width, this improvement is not due to an increase in basis correlation.
    摘要 在这个研究中,我们研究神经网络学习的表示方式是否具有特权和共同基准。 Specifically,我们研究神经元每个个体的特征方向的重要性。首先,我们证明神经网络中的表示不能逆转(不同于线性网络),这表明它们不具备完全旋转不变性。接着,我们探索多个基准是否可以实现相同的性能。为此,我们比较由同样的参数训练而成的不同随机初始化的网络的基准。我们的研究发现了两点:1. même dans les réseaux larges tels que les WideResNets, les réseaux neuronaux ne convergent pas vers une base unique;2. La corrélation de la base augmente significativement lorsque les premières couches du réseau sont gelées de manière identique. En outre, nous analysons la connectivité linéaire des modes, qui a été étudiée comme une mesure de la corrélation de la base. Nos résultats montrent que si la largeur du réseau augmente, la connectivité linéaire des modes s'améliore, mais cet amélioration n'est pas due à une augmentation de la corrélation de la base.

Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection

  • paper_url: http://arxiv.org/abs/2307.12935
  • repo_url: https://github.com/chrisisking/rule-by-example
  • paper_authors: Christopher Clarke, Matthew Hall, Gaurav Mittal, Ye Yu, Sandra Sajeev, Jason Mars, Mei Chen
  • for: 这个论文旨在解决现代在线内容审核中的挑战,即使用深度学习模型来取代规则驱动的方法,以提高内容审核的可靠性和可 explainer。
  • methods: 这个论文提出了一种新的示例基于对比学习方法,称为规则示例学习(Rule By Example,RBE),可以从逻辑规则中学习rich embedding表示。
  • results: 实验结果表明,RBE可以在3个popular hate speech classification dataset上超越现状的深度学习分类器,以及使用规则和无监督学习方法,同时提供可 explainer的模型预测结果via规则基准。
    Abstract Classic approaches to content moderation typically apply a rule-based heuristic approach to flag content. While rules are easily customizable and intuitive for humans to interpret, they are inherently fragile and lack the flexibility or robustness needed to moderate the vast amount of undesirable content found online today. Recent advances in deep learning have demonstrated the promise of using highly effective deep neural models to overcome these challenges. However, despite the improved performance, these data-driven models lack transparency and explainability, often leading to mistrust from everyday users and a lack of adoption by many platforms. In this paper, we present Rule By Example (RBE): a novel exemplar-based contrastive learning approach for learning from logical rules for the task of textual content moderation. RBE is capable of providing rule-grounded predictions, allowing for more explainable and customizable predictions compared to typical deep learning-based approaches. We demonstrate that our approach is capable of learning rich rule embedding representations using only a few data examples. Experimental results on 3 popular hate speech classification datasets show that RBE is able to outperform state-of-the-art deep learning classifiers as well as the use of rules in both supervised and unsupervised settings while providing explainable model predictions via rule-grounding.
    摘要 传统的内容审核方法通常采用规则基于的冒泡法来标识内容。虽然规则容易自定义和人类易于理解,但它们具有脆弱性和缺乏在当今互联网上巨量undesirable content的适应性。近年来,深度学习的进步有力地解决了这些挑战。然而,尽管表现得到改善,这些数据驱动模型仍然缺乏透明性和可解释性,导致用户和多个平台的不信任。在这篇论文中,我们提出了 Rule By Example (RBE):一种基于例子的对比学习方法,用于从逻辑规则中学习文本内容审核任务。RBE可以提供规则基于的预测,使得模型预测更加可解释和自定义。我们示示了我们的方法可以使用只有几个数据示例来学习丰富的规则嵌入表示。实验结果表明,RBE在3个流行的仇恨言语分类数据集上能够超越当前的深度学习分类器和规则在指导下的情况下,同时提供可解释的模型预测 via 规则嵌入。

Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

  • paper_url: http://arxiv.org/abs/2307.12933
  • repo_url: None
  • paper_authors: Chuming Li, Ruonan Jia, Jie Liu, Yinmin Zhang, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang
  • for: 这篇论文是为了提出一种基于模型的 reinforcement learning 算法,以提高控制任务的效率。
  • methods: 该论文使用了模型改进阶段来储存优化的动作序列,并通过Policy ImprovementStep进行了优化。
  • results: 实验表明,MPDP算法在六个 MuJoCo 连续控制任务上实现了更高的样本效率和极限性性能,比较model-free和基于模型的 плани法。
    Abstract Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency. To save the computation cost of conducting planning online, recent practices tend to distill optimized action sequences into an RL policy during the training phase. Although the distillation can incorporate both the foresight of planning and the exploration ability of RL policies, the theoretical understanding of these methods is yet unclear. In this paper, we extend the policy improvement step of Soft Actor-Critic (SAC) by developing an approach to distill from model-based planning to the policy. We then demonstrate that such an approach of policy improvement has a theoretical guarantee of monotonic improvement and convergence to the maximum value defined in SAC. We discuss effective design choices and implement our theory as a practical algorithm -- Model-based Planning Distilled to Policy (MPDP) -- that updates the policy jointly over multiple future time steps. Extensive experiments show that MPDP achieves better sample efficiency and asymptotic performance than both model-free and model-based planning algorithms on six continuous control benchmark tasks in MuJoCo.
    摘要

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

  • paper_url: http://arxiv.org/abs/2307.12926
  • repo_url: None
  • paper_authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • for: 本研究考虑了上下文搬狮和模仿学习问题,learner 缺乏直接行动的奖励信息,而是可以在每个回合中aktive查询专家以获取不准确的偏好反馈。learner 的目标是同时减少执行行动的 regret 和对专家进行比较查询的次数。
  • methods: 本研究提出了一种算法,该算法利用在函数类型上的在线回归 oracle,以选择行动和决定何时进行查询。对于上下文搬狮 Setting,我们的算法实现了一个 regret bound,其中 regret 的极限为 $O(\min{\sqrt{T}, d/\Delta})$,其中 $T$ 表示互动次数,$d$ 表示函数类型的拓扑维度,$\Delta$ 表示最佳行动对所有上下文下的最小偏好。我们的算法不需要知道 $\Delta$,并且与标准上下文搬狮 Setting 中获得的 regret bound相当。此外,我们的算法只需要对专家进行 $O(\min{T, d^2/\Delta^2})$ 次查询。
  • results: 我们的算法可以在上下文搬狮和模仿学习 Setting 中实现 regret bound,同时减少对专家的查询次数。在模仿学习 Setting 中,我们的算法甚至可以超过专家的性能,这 highlights 一个实际的应用优点,即在不熟悉环境中,可以通过偏好反馈来学习并超越专家。
    Abstract We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback. The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert. In this paper, we assume that the learner has access to a function class that can represent the expert's preference model under appropriate link functions, and provide an algorithm that leverages an online regression oracle with respect to this function class for choosing its actions and deciding when to query. For the contextual bandit setting, our algorithm achieves a regret bound that combines the best of both worlds, scaling as $O(\min\{\sqrt{T}, d/\Delta\})$, where $T$ represents the number of interactions, $d$ represents the eluder dimension of the function class, and $\Delta$ represents the minimum preference of the optimal action over any suboptimal action under all contexts. Our algorithm does not require the knowledge of $\Delta$, and the obtained regret bound is comparable to what can be achieved in the standard contextual bandits setting where the learner observes reward signals at each round. Additionally, our algorithm makes only $O(\min\{T, d^2/\Delta^2\})$ queries to the expert. We then extend our algorithm to the imitation learning setting, where the learning agent engages with an unknown environment in episodes of length $H$ each, and provide similar guarantees for regret and query complexity. Interestingly, our algorithm for imitation learning can even learn to outperform the underlying expert, when it is suboptimal, highlighting a practical benefit of preference-based feedback in imitation learning.
    摘要 我们考虑了上下文带强盗捕鱼和模仿学习问题,learner缺乏直接行动的奖励知识。而是可以在每个回合中活动地询问专家, comparison two actions,并 receive noisy preference feedback。学习者的目标是两fold:一是最小化执行的行动奖励相关的 regret,二是最小化向专家提问的数量。在这篇文章中,我们假设学习者可以访问一个函数类,该函数类可以表示专家的偏好模型,并提供一个在线回归 oracle,以便选择行动和决定何时向专家提问。对于上下文带强盗捕鱼设置,我们的算法可以达到 $O(\min\{\sqrt{T}, d/\Delta\})$ 的 regret bound,其中 $T$ 表示互动次数, $d$ 表示函数类的吸引力维度, $\Delta$ 表示最佳行动在所有上下文中的最小偏好。我们的算法不需要了解 $\Delta$,并且与标准上下文带强盗捕鱼设置的 regret bound相比,我们的 regret bound相对较高。此外,我们的算法只需要 $O(\min\{T, d^2/\Delta^2\})$ 次向专家提问。然后,我们将我们的算法扩展到模仿学习设置,learner在每个 episodes 中与未知环境互动,并提供了类似的 regret 和查询复杂度保证。有趣的是,我们的算法可以在专家下不佳的情况下,learn to outperform 专家,这 highlights 实用上的 benefit 。

Hierarchical Skeleton Meta-Prototype Contrastive Learning with Hard Skeleton Mining for Unsupervised Person Re-Identification

  • paper_url: http://arxiv.org/abs/2307.12917
  • repo_url: https://github.com/kali-hac/hi-mpc
  • paper_authors: Haocong Rao, Cyril Leung, Chunyan Miao
  • for: 本研究旨在提出一种基于深度感知器和深度学习的人重识别方法,使用无监督的 Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) 方法,以提高人重识别的精度。
  • methods: 本方法首先构建了 hierarchical 表示,以模型人体的坐标系和运动特征,从 JOINTS 、component 和 limb 等多个水平。然后,提出了一种 hierarchical meta-prototype contrastive learning 模型,通过对不同水平的skeleton features进行 clustering和对比,以学习更有效的人体特征。此外,还提出了一种硬件骨挖掘机制,以适应ively 挖掘出更有用的骨骼特征。
  • results: 在五个数据集上进行了广泛的评估,显示了我们的方法可以与现有的state-of-the-art方法进行比较,并且在cross-view人重识别和 RGB 环境下也表现出色。
    Abstract With rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently achieved remarkable progress with many advantages. Most existing solutions learn single-level skeleton features from body joints with the assumption of equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then a hierarchical meta-prototype contrastive learning model is proposed to cluster and contrast the most typical skeleton features ("prototypes") from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.
    摘要 With the rapid advancements in depth sensors and deep learning, skeleton-based person re-identification (re-ID) models have recently made significant progress with many advantages. Most existing solutions learn single-level skeleton features from body joints, assuming equal skeleton importance, while they typically lack the ability to exploit more informative skeleton features from various levels such as limb level with more global body patterns. The label dependency of these methods also limits their flexibility in learning more general skeleton representations. This paper proposes a generic unsupervised Hierarchical skeleton Meta-Prototype Contrastive learning (Hi-MPC) approach with Hard Skeleton Mining (HSM) for person re-ID with unlabeled 3D skeletons. Firstly, we construct hierarchical representations of skeletons to model coarse-to-fine body and motion features from the levels of body joints, components, and limbs. Then, we propose a hierarchical meta-prototype contrastive learning model to cluster and contrast the most typical skeleton features ("prototypes") from different-level skeletons. By converting original prototypes into meta-prototypes with multiple homogeneous transformations, we induce the model to learn the inherent consistency of prototypes to capture more effective skeleton features for person re-ID. Furthermore, we devise a hard skeleton mining mechanism to adaptively infer the informative importance of each skeleton, so as to focus on harder skeletons to learn more discriminative skeleton representations. Extensive evaluations on five datasets demonstrate that our approach outperforms a wide variety of state-of-the-art skeleton-based methods. We further show the general applicability of our method to cross-view person re-ID and RGB-based scenarios with estimated skeletons.

Consensus-based Participatory Budgeting for Legitimacy: Decision Support via Multi-agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.12915
  • repo_url: None
  • paper_authors: Srijoni Majumdar, Evangelos Pournaras
  • for: 这篇论文是关于如何使用协商来改善参与预算的法定程序的,以提高公共基金的分配的公正性和包容性。
  • methods: 这篇论文提出了一种新的协商方法,使用多代理人强化学习技术来支持决策,并帮助选民互动以达成可持续的妥协。
  • results: 实验结果表明,这种协商方法可以达成妥协,效率高并稳定,而且与现有的投票聚合方法相比,它可以提高公平性和包容性。
    Abstract The legitimacy of bottom-up democratic processes for the distribution of public funds by policy-makers is challenging and complex. Participatory budgeting is such a process, where voting outcomes may not always be fair or inclusive. Deliberation for which project ideas to put for voting and choose for implementation lack systematization and do not scale. This paper addresses these grand challenges by introducing a novel and legitimate iterative consensus-based participatory budgeting process. Consensus is designed to be a result of decision support via an innovative multi-agent reinforcement learning approach. Voters are assisted to interact with each other to make viable compromises. Extensive experimental evaluation with real-world participatory budgeting data from Poland reveal striking findings: Consensus is reachable, efficient and robust. Compromise is required, which is though comparable to the one of existing voting aggregation methods that promote fairness and inclusion without though attaining consensus.
    摘要 政策制定者的底层民主过程对公共资金的分配存在挑战和复杂性。参与预算是这种过程之一,其投票结果可能不一定公平和包容。协商选择要投票的项目意见和实施的方法缺乏系统化和扩展性。这篇论文解决这些总统困难,提出了一种新的合法的迭代共识参与预算过程。这种共识是通过创新的多代理增强学习方法支持决策的结果。选民被助け到互动相互,制定可行的妥协。实际在波兰的实验证明了 striking 的发现:共识是可以达成的,高效和稳定。妥协是必要的,与现有的投票集成方法相比,它不一定实现共识,但能够保证公平和包容。

Graph Neural Networks For Mapping Variables Between Programs – Extended Version

  • paper_url: http://arxiv.org/abs/2307.13014
  • repo_url: https://github.com/pmorvalho/ecai23-gnns-for-mapping-variables-between-programs
  • paper_authors: Pedro Orvalho, Jelle Piepenbrock, Mikoláš Janota, Vasco Manquinho
  • for: 本研究旨在提高程序相似性比较的精度和效率,通过使用图神经网络(GNN)将两个程序中变量的集合映射到一起。
  • methods: 本研究使用了图神经网络(GNN)来映射两个程序的抽象树(AST)中的变量集合。
  • results: 实验结果显示,我们的方法可以正确地映射83%的评估数据集,而当前状态的程序修复方法(主要基于程序结构)只能修复约72%的错误程序。而我们的方法, solely based on variable mappings,可以修复约88.5%的错误程序。
    Abstract Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.
    摘要 自动化程序分析是计算机科学多个领域的重要研究领域,特别是正式方法和人工智能。由于程序相等性问题是不可解决的,因此比较两个程序是非常困难的。通常,以便比较两个程序,需要两个程序变量集的关系。因此,将变量 между两个程序映射到相同的空间是非常有用的,这有助于许多任务,如程序相等性、程序分析、程序修复和假象检测。在这个工作中,我们提议使用图神经网络(GNNs)将两个程序的变量集映射到相同的空间,基于这两个程序的抽象语法树(ASTs)。为了证明变量映射的强大性,我们在程序修复任务上提供了三个使用情况,用于修复 novice 程序员在入门编程作业(IPAs)中经常出现的常见bug。实验结果表明,我们的方法可以在4166对错误/正确程序的评估集中正确地映射83%的评估集。此外,我们的实验还表明,现有的程序修复方法,强调程序结构,只能修复约72%的错误程序。相比之下,我们的方法,solely基于变量映射,可以修复约88.5%的错误程序。

Towards a Visual-Language Foundation Model for Computational Pathology

  • paper_url: http://arxiv.org/abs/2307.12914
  • repo_url: None
  • paper_authors: Ming Y. Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Andrew Zhang, Long Phi Le, Georg Gerber, Anil V Parwani, Faisal Mahmood
  • for: 这篇研究旨在提出一个基于对比学习的类别学习模型,以推广 Histopathology 领域中的诊断和鉴别 tasks。
  • methods: 这篇研究使用了多种来源的 Histopathology 图像和生医文本,并运用了1,170万个图像-描述对的 pairs 进行task-agnostic pretraining。
  • results: 研究发现,这个 CONCH 模型可以在13个不同的benchmark上进行 transferred learning,并在 histology 图像分类、分类、描述、文本-图像和图像-文本撷取等下测试得到了顶尖性能。
    Abstract The accelerated adoption of digital pathology and advances in deep learning have enabled the development of powerful models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain and the model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and notably over 1.17 million image-caption pairs via task-agnostic pretraining. Evaluated on a suite of 13 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving either or both histopathology images and text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.
    摘要 随着数字 PATHOLOGY 的加速采用和深度学习的进步,已经开发出了许多强大的模型用于各种疾病和患者群体中的 PATHOLOGY 任务。然而,模型训练往往困难,因为医疗领域中标签的缺乏和模型的使用受到特定任务和疾病的限制。此外,大多数 histopathology 模型仅利用图像数据,与人类教育和理解 histopathologic 实体不符。我们介绍了 CONtrastive learning from Captions for Histopathology (CONCH),一种基于多种 histopathology 图像、生物医学文本和着重于 1.17 万个图像-caption 对的视觉语言基础模型。在 13 种多样化的标准测试集上评估,CONCH 可以转移到覆盖图像和文本下游任务,实现 histology 图像分类、 segmentation、captioning、text-to-image 和 image-to-text 检索的状态计算机科学中的最佳性能。CONCH 代表了对于 histopathology 的较大的进步,具有直接促进许多机器学习基于工作流程,需要 minimal 或无需进一步的监督微调的潜在。

GridMM: Grid Memory Map for Vision-and-Language Navigation

  • paper_url: http://arxiv.org/abs/2307.12907
  • repo_url: https://github.com/mrzihan/gridmm
  • paper_authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang
  • for: 本研究旨在提出一种新的视觉语言导航(VLN)方法,以便在3D环境中根据自然语言指令进行导航。
  • methods: 我们提出了一种名为Grid Memory Map(GridMM)的新方法,它使用了顺序状态、 topological maps 或 top-down semantic maps来表示已经游览过的环境。我们还提出了一种 instruction relevance aggregation 方法,用于在每个格子区域中捕捉细腻的视觉提示。
  • results: 我们在REVERIE、R2R、SOON数据集上进行了广泛的实验,并在R2R-CE数据集上进行了连续环境的实验,结果显示了我们的提posed方法的优越性。
    Abstract Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. To represent the previously visited environment, most approaches for VLN implement memory using recurrent states, topological maps, or top-down semantic maps. In contrast to these approaches, we build the top-down egocentric and dynamically growing Grid Memory Map (i.e., GridMM) to structure the visited environment. From a global perspective, historical observations are projected into a unified grid map in a top-down view, which can better represent the spatial relations of the environment. From a local perspective, we further propose an instruction relevance aggregation method to capture fine-grained visual clues in each grid region. Extensive experiments are conducted on both the REVERIE, R2R, SOON datasets in the discrete environments, and the R2R-CE dataset in the continuous environments, showing the superiority of our proposed method.
    摘要 视觉语言导航(VLN)允许代理人在三维环境中根据自然语言指令进行导航。以前的环境表示方法中,大多数方法使用循环状态、 topological map 或 top-down semantic map 来实现记忆。在这些方法中,我们构建了从上而下的 egocentric 和动态增长的 Grid Memory Map(i.e., GridMM),以 структуриze 已经探索的环境。从全球视角来看,历史观察被投影到一个统一的格子地图上,可以更好地表示环境的空间关系。从本地视角来看,我们还提出了一种指令相关积累方法,以捕捉每个格子区域中的细腻视觉准确。我们在 discrete 环境中的 REVERIE、R2R 和 SOON 数据集上,以及 continuous 环境中的 R2R-CE 数据集上进行了广泛的实验,显示了我们提出的方法的优越性。

cs.CL - 2023-07-25

XDLM: Cross-lingual Diffusion Language Model for Machine Translation

  • paper_url: http://arxiv.org/abs/2307.13560
  • repo_url: None
  • paper_authors: Linyao Chen, Aosong Feng, Boming Yang, Zihui Li
  • for: 这个研究是为了探讨 cross-lingual 的 diffusion model,以提高机器翻译的效果。
  • methods: 本研究使用了一种新的训练目标—TLDM,并在 fine-tuning 阶段使用了一个基于这个模型的翻译系统。
  • results: 研究发现,使用 XDLM 可以在机器翻译 benchmark 上超越 diffusion 和 Transformer 基于模型的基eline。
    Abstract Recently, diffusion models have excelled in image generation tasks and have also been applied to neural language processing (NLP) for controllable text generation. However, the application of diffusion models in a cross-lingual setting is less unexplored. Additionally, while pretraining with diffusion models has been studied within a single language, the potential of cross-lingual pretraining remains understudied. To address these gaps, we propose XDLM, a novel Cross-lingual diffusion model for machine translation, consisting of pretraining and fine-tuning stages. In the pretraining stage, we propose TLDM, a new training objective for mastering the mapping between different languages; in the fine-tuning stage, we build up the translation system based on the pretrained model. We evaluate the result on several machine translation benchmarks and outperformed both diffusion and Transformer baselines.
    摘要 近些时间,扩散模型在图像生成任务中表现出色,同时也应用于神经语言处理(NLP)中控制文本生成。然而,扩散模型在跨语言设置下的应用仍然未得到充分研究。此外,在单一语言预训练下的扩散模型预训练还未得到充分研究。为了解决这些漏洞,我们提出了 XDLM,一种新的跨语言扩散模型 для机器翻译,包括预训练和精度调整两个阶段。在预训练阶段,我们提出了 TLDM,一个新的训练目标,用于掌握不同语言之间的映射关系;在精度调整阶段,我们建立了基于预训练模型的翻译系统。我们对多个机器翻译标准 benchmark 进行评估,并在 diffusion 和 Transformer 基elines 上出perform。

Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm

  • paper_url: http://arxiv.org/abs/2307.13424
  • repo_url: https://github.com/hexuandeng/hexp4uds
  • paper_authors: Hexuan Deng, Xin Zhang, Meishan Zhang, Xuebo Liu, Min Zhang
  • for: 这个论文主要是为了探讨Universal Decompositional Semantic(UDS)分析的全面性。
  • methods: 这个论文提出了一种卷积模型来实现UDS分析,将复杂的分析任务分解成semantically appropriate的子任务。此外,论文还 incorporates了 syntax information和进一步优化了architecture。
  • results: 论文的实验结果显示, compared to prior models, our approach significantly reduces inference time while maintaining performance. 在不同的数据增强方法下,我们还进行了实验调查,发现ChatGPT在Attribute parsing方面表现出色,但在Relation parsing方面表现不佳,而使用ChatGPT进行数据增强则得不到优秀的结果。
    Abstract In this paper, we conduct a holistic exploration of the Universal Decompositional Semantic (UDS) Parsing. We first introduce a cascade model for UDS parsing that decomposes the complex parsing task into semantically appropriate subtasks. Our approach outperforms the prior models, while significantly reducing inference time. We also incorporate syntactic information and further optimized the architecture. Besides, different ways for data augmentation are explored, which further improve the UDS Parsing. Lastly, we conduct experiments to investigate the efficacy of ChatGPT in handling the UDS task, revealing that it excels in attribute parsing but struggles in relation parsing, and using ChatGPT for data augmentation yields suboptimal results. Our code is available at https://github.com/hexuandeng/HExp4UDS.
    摘要 在这篇论文中,我们进行了整体的 Universal Decompositional Semantic(UDS)解析探索。我们首先介绍了一种卷积模型为UDS解析任务进行分解,将复杂的解析任务分解成Semantically相应的子任务。我们的方法在优化后比之前的模型表现更优,同时减少了推理时间。我们还将 sintactic information incorporated 到架构中,进一步优化了 architecture。此外,我们还 explore 了不同的数据增强方法,进一步提高了 UDS解析。最后,我们对 ChatGPT 在 UDS 任务中的处理进行了实验,发现它在 attribute 解析方面表现出色,而在 relation 解析方面却遇到了困难,并且使用 ChatGPT 进行数据增强后的结果不佳。我们的代码可以在 GitHub 上找到:https://github.com/hexuandeng/HExp4UDS。

Towards Resolving Word Ambiguity with Word Embeddings

  • paper_url: http://arxiv.org/abs/2307.13417
  • repo_url: None
  • paper_authors: Matthias Thurnbauer, Johannes Reisinger, Christoph Goller, Andreas Fischer
  • for: This paper aims to address the problem of ambiguity in natural language processing, specifically in the context of word embeddings and information retrieval tasks.
  • methods: The authors propose using DBSCAN clustering to identify ambiguous words and evaluate their level of ambiguity in the latent space. They also propose an automatic parameter selection method for DBSCAN to ensure high-quality clusters.
  • results: The authors show that their approach can identify ambiguous words and evaluate their level of ambiguity, and that the resulting clusters are semantically coherent and correspond well to the perceived meanings of the words.
    Abstract Ambiguity is ubiquitous in natural language. Resolving ambiguous meanings is especially important in information retrieval tasks. While word embeddings carry semantic information, they fail to handle ambiguity well. Transformer models have been shown to handle word ambiguity for complex queries, but they cannot be used to identify ambiguous words, e.g. for a 1-word query. Furthermore, training these models is costly in terms of time, hardware resources, and training data, prohibiting their use in specialized environments with sensitive data. Word embeddings can be trained using moderate hardware resources. This paper shows that applying DBSCAN clustering to the latent space can identify ambiguous words and evaluate their level of ambiguity. An automatic DBSCAN parameter selection leads to high-quality clusters, which are semantically coherent and correspond well to the perceived meanings of a given word.
    摘要 <>自然语言中的歧义是 ubique 存在的。在信息检索任务中,解决歧义的含义特别重要。虽然词嵌入带有含义信息,但它们不能好地处理歧义。 transformer 模型可以处理复杂的查询中的词歧义,但它们无法识别歧义的单个词,例如一个单词查询。此外,使用这些模型进行训练需要大量的时间、硬件资源和训练数据,这限制了它们在特殊环境中使用。 word embedding 可以通过中等级别的硬件资源进行训练。这篇论文显示,将 DBSCAN 归一化算法应用到封闭空间可以识别歧义的单个词,并评估它们的歧义水平。自动选择 DBSCAN 参数可以获得高质量的归一化结果,这些结果是semantically coherent 的,与给定词的感知含义相吻合。

Embedding Models for Supervised Automatic Extraction and Classification of Named Entities in Scientific Acknowledgements

  • paper_url: http://arxiv.org/abs/2307.13377
  • repo_url: https://github.com/kalawinka/season
  • paper_authors: Nina Smirnova, Philipp Mayr
  • for: 本研究旨在评估不同嵌入模型在科学论文感谢部分自动提取和分类承认实体的性能。
  • methods: 我们使用Flair NLP框架进行命名实体识别(NER)任务,训练 employed three default Flair NER模型,使用四个不同的训练集和不同版本的Flair NLP框架。
  • results: 我们发现,使用Flair Embeddings模型在中等训练集和最新版本Flair NLP框架下,性能最高,准确率为0.79。训练集的大小从非常小到中等大幅提高了所有训练算法的准确率,但是进一步扩大训练集不再提高性能。模型可以识别六种实体类型:资金机构、奖励编号、个人、大学、公司和其他。模型在一些实体类型上具有较高的F1分数,如个人和奖励编号,它们的F1分数都高于0.9。
    Abstract Acknowledgments in scientific papers may give an insight into aspects of the scientific community, such as reward systems, collaboration patterns, and hidden research trends. The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities from the acknowledgment text in scientific papers. We trained and implemented a named entity recognition (NER) task using the Flair NLP framework. The training was conducted using three default Flair NER models with four differently-sized corpora and different versions of the Flair NLP framework. The Flair Embeddings model trained on the medium corpus with the latest FLAIR version showed the best accuracy of 0.79. Expanding the size of a training corpus from very small to medium size massively increased the accuracy of all training algorithms, but further expansion of the training corpus did not bring further improvement. Moreover, the performance of the model slightly deteriorated. Our model is able to recognize six entity types: funding agency, grant number, individuals, university, corporation, and miscellaneous. The model works more precisely for some entity types than for others; thus, individuals and grant numbers showed a very good F1-Score over 0.9. Most of the previous works on acknowledgment analysis were limited by the manual evaluation of data and therefore by the amount of processed data. This model can be applied for the comprehensive analysis of acknowledgment texts and may potentially make a great contribution to the field of automated acknowledgment analysis.
    摘要 科学论文的感谢部分可以提供科学社区的一些方面的信息,如奖励系统、合作模式和隐藏的研究趋势。本文的目标是评估不同的嵌入模型在自动抽取和分类感谢 Entity 的任务中的表现。我们使用 Flair NLP 框架进行命名实体识别(NER)任务,并对三个默认的 Flair NER 模型进行训练。训练使用了不同的四个 corpus 和不同的 Flair NLP 框架版本。Flair Embeddings 模型使用 medium corpus 和最新的 FLAIR 版本显示最好的准确率为 0.79。将训练 corpus 的大小从非常小到中型大小会大幅提高所有训练算法的准确率,但是进一步扩大训练 corpus 不会再得到更好的改善。此外,模型的性能略有下降。我们的模型能够识别六种实体类型:资金机构、奖学金号、个人、大学、公司和其他。模型在一些实体类型上更加精准,例如个人和奖学金号的 F1 分数都高于 0.9。大多数前一些 acknowledgment 分析的研究都是通过手动评估数据来限制的,因此只能处理有限量的数据。这种模型可以应用于全面的 acknowledgment 文本分析,并可能对自动 acknowledgment 分析领域产生很大的贡献。

Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers

  • paper_url: http://arxiv.org/abs/2307.14367
  • repo_url: None
  • paper_authors: Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, Michalis Vazirgiannis
  • for: 这 paper 的目的是提出一种新的蛋白质功能预测方法,即 Prot2Text,可以在文本化的方式下预测蛋白质的功能。
  • methods: 这 paper 使用了 Graph Neural Networks(GNNs) 和 Large Language Models(LLMs) 组合在一起,在encoder-decoder框架下实现蛋白质功能的文本化预测。
  • results: 该 paper 的实验结果表明,Prot2Text 可以准确地预测蛋白质的功能,并且可以生成详细的文本描述。这些结果表明了 multimodal 模型的转变性,特别是 GNNs 和 LLMs 的融合,为蛋白质功能预测提供了 poderful 工具。
    Abstract The complex nature of big biological systems pushed some scientists to classify its understanding under the inconceivable missions. Different leveled challenges complicated this task, one of is the prediction of a protein's function. In recent years, significant progress has been made in this field through the development of various machine learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e assigning predefined labels to proteins. In this work, we propose a novel approach, \textbf{Prot2Text}, which predicts a protein function's in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including proteins' sequences, structures, and textual annotations. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate prediction of proteins' functions. The code, the models and a demo will be publicly released.
    摘要 大型生物系统的复杂性让一些科学家将其理解归类为不可思议任务。不同的层次挑战使得这项任务更加复杂,其中之一是蛋白质功能预测。在最近几年,通过开发不同的机器学习方法,有 significante进步在这一领域。然而,大多数现有方法将任务 формули化为多类别问题,即将蛋白质分配预定的标签。在这项工作中,我们提出了一种新的方法——Prot2Text,它预测蛋白质功能的方式与传统的多类别分类法不同,而是通过组合图 neural network和大型自然语言模型,在Encoder-Decoder框架中进行融合。这种多Modal的方法允许对蛋白质功能进行全面表示,从而生成详细和准确的描述。为评估我们的模型,我们从SwissProt中提取了一个多Modal蛋白质数据集,并通过实验证明了Prot2Text的效果。这些结果显示了多Modal模型的转变性,尤其是GNNs和LLMs的融合,为研究人员提供了更准确的蛋白质功能预测工具。代码、模型和 demo 将公开发布。

Improving the Generalization Ability in Essay Coherence Evaluation through Monotonic Constraints

  • paper_url: http://arxiv.org/abs/2308.02506
  • repo_url: None
  • paper_authors: Chen Zheng, Huan Zhang, Yan Zhao, Yuxuan Lai
  • for: 评估文本可读性的一个重要方面是 coherence,可以通过两个主要因素评估一篇文章的 coherence:第一个因素是逻辑连接的正确使用,第二个因素是句子结构的适用性。
  • methods: 为了解决这些问题,我们提出了一种 coherence 评估模型,包括一个回归模型和两个特征提取器:地方逻辑连接推论模型和句子结构修正模型。我们使用Gradient Boosting 回归树作为回归模型,并对输入特征做出假设约束。
  • results: 我们的提出的模型能够更好地泛化未经见过的数据。模型在 NLPCC 2023 年度共同任务7的第三名上进行了比赛,并 briefly 介绍了我们的解决方案的剩下的 tracks,其中在第二名上进行了第二名,并在第三名和第四名上进行了第一名。
    Abstract Coherence is a crucial aspect of evaluating text readability and can be assessed through two primary factors when evaluating an essay in a scoring scenario. The first factor is logical coherence, characterized by the appropriate use of discourse connectives and the establishment of logical relationships between sentences. The second factor is the appropriateness of punctuation, as inappropriate punctuation can lead to confused sentence structure. To address these concerns, we propose a coherence scoring model consisting of a regression model with two feature extractors: a local coherence discriminative model and a punctuation correction model. We employ gradient-boosting regression trees as the regression model and impose monotonicity constraints on the input features. The results show that our proposed model better generalizes unseen data. The model achieved third place in track 1 of NLPCC 2023 shared task 7. Additionally, we briefly introduce our solution for the remaining tracks, which achieves second place for track 2 and first place for both track 3 and track 4.
    摘要 <> translate "Coherence is a crucial aspect of evaluating text readability and can be assessed through two primary factors when evaluating an essay in a scoring scenario. The first factor is logical coherence, characterized by the appropriate use of discourse connectives and the establishment of logical relationships between sentences. The second factor is the appropriateness of punctuation, as inappropriate punctuation can lead to confused sentence structure. To address these concerns, we propose a coherence scoring model consisting of a regression model with two feature extractors: a local coherence discriminative model and a punctuation correction model. We employ gradient-boosting regression trees as the regression model and impose monotonicity constraints on the input features. The results show that our proposed model better generalizes unseen data. The model achieved third place in track 1 of NLPCC 2023 shared task 7. Additionally, we briefly introduce our solution for the remaining tracks, which achieves second place for track 2 and first place for both track 3 and track 4." into Simplified Chinese. coherence 是文本可读性评估中的一个关键因素,可以通过两个主要因素进行评估:一是逻辑连贯性,即使用演示连接词和建立句子之间的逻辑关系;二是句子结构的括号正确性,因为不当的括号可能导致句子结构混乱。为解决这些问题,我们提出了一种减量模型,包括两个特征提取器:本地连贯性推论模型和括号修正模型。我们使用梯度提升回归树作为回归模型,并对输入特征受到约束。结果表明,我们的提出的模型在未seen数据上更好地泛化。这个模型在 NLPCC 2023 共享任务 7 的第三名。此外,我们简要介绍我们对其余轨迹的解决方案,其中在第二轨迹上获得第二名,并在第三轨迹和第四轨迹上均获得第一名。Note: "NLPCC" stands for "Natural Language Processing and Chinese Computing" conference.

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

  • paper_url: http://arxiv.org/abs/2307.13304
  • repo_url: https://github.com/jerry-chee/quip
  • paper_authors: Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa
  • for: 本文研究大语言模型(LLM)后期参数量化。
  • methods: 我们提出了一种基于 weights和Hessian矩阵偏移的新方法,称为量化偏移处理(QuIP)。QuIP包括两个步骤:(1)一种适应减法程序,使得量化后的模型仍然能够保持准确性;(2)高效的预处理和后处理,使得模型的权重和Hessian矩阵偏移。
  • results: 我们通过实验发现,我们的偏移预处理可以提高一些现有的量化算法,并且使用只有两个位数的量化方法实现了LLM模型的可行结果。我们的代码可以在https://github.com/jerry-chee/QuIP上找到。
    Abstract This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/jerry-chee/QuIP .
    摘要
  1. An adaptive rounding procedure that minimizes a quadratic proxy objective.2. Efficient pre- and post-processing that ensures weight and Hessian incoherence through multiplication by random orthogonal matrices.We also provide the first theoretical analysis for an LLM-scale quantization algorithm and show that our theory applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/jerry-chee/QuIP.

  • paper_url: http://arxiv.org/abs/2307.13298
  • repo_url: None
  • paper_authors: Yunqiu Shao, Haitao Li, Yueyue Wu, Yiqun Liu, Qingyao Ai, Jiaxin Mao, Yixiao Ma, Shaoping Ma
  • for: 本研究旨在理解法律案例检索用户的搜寻意图,并开发一个新的法律案例检索Intent taxonomy。
  • methods: 本研究使用了面试、编辑用户研究和查询日志分析等方法构建了一个法律案例检索Intent taxonomy。
  • results: 研究发现在不同的搜寻意图下,用户的搜寻行为和满意度有显著差异。此外,该taxonomy可以应用于多个下游法律检索任务,如结果排名和满意度预测。
    Abstract Legal case retrieval is a special Information Retrieval~(IR) task focusing on legal case documents. Depending on the downstream tasks of the retrieved case documents, users' information needs in legal case retrieval could be significantly different from those in Web search and traditional ad-hoc retrieval tasks. While there are several studies that retrieve legal cases based on text similarity, the underlying search intents of legal retrieval users, as shown in this paper, are more complicated than that yet mostly unexplored. To this end, we present a novel hierarchical intent taxonomy of legal case retrieval. It consists of five intent types categorized by three criteria, i.e., search for Particular Case(s), Characterization, Penalty, Procedure, and Interest. The taxonomy was constructed transparently and evaluated extensively through interviews, editorial user studies, and query log analysis. Through a laboratory user study, we reveal significant differences in user behavior and satisfaction under different search intents in legal case retrieval. Furthermore, we apply the proposed taxonomy to various downstream legal retrieval tasks, e.g., result ranking and satisfaction prediction, and demonstrate its effectiveness. Our work provides important insights into the understanding of user intents in legal case retrieval and potentially leads to better retrieval techniques in the legal domain, such as intent-aware ranking strategies and evaluation methodologies.
    摘要 法律案例检索是一种特殊的信息检索任务,专注于法律案例文档。根据下游任务中返回的案例文档的用户信息需求,用户在法律检索任务中的搜索意图可能与传统的Web搜索和特殊检索任务存在很大差异。虽然有几篇研究文章通过文本相似性来检索法律案例,但用户在法律检索中的搜索意图还未得到了充分的研究。为此,我们提出了一个新的层次意图分类法,它包括五种意图类别,分为三个标准:寻找特定案例(Search for Particular Case)、特征化(Characterization)、裁罚(Penalty)、程序(Procedure)和利益(Interest)。这个分类体系由 transparent construction 和广泛的用户研究进行验证,并通过实验研究表明了用户在不同搜索意图下的行为和满意度之间存在显著差异。此外,我们还应用该分类法到不同的下游法律检索任务中,如结果排名和满意度预测,并证明其效果。我们的工作为法律检索领域的理解用户意图提供了重要的洞察,并可能导致更好的检索技术的发展,如意向检索策略和评价方法。

Schema-Driven Actionable Insight Generation and Smart Recommendation

  • paper_url: http://arxiv.org/abs/2307.13176
  • repo_url: None
  • paper_authors: Allmin Susaiyah, Aki Härmä, Milan Petković
  • for: 这篇论文是用于生成可行的洞察结论,以促进增长和变革。
  • methods: 该方法使用了Schema-driven的方法,通过挖掘数据中有趣的模式,生成可读的洞察结论。
  • results: 该方法可以根据用户反馈进行排序,以适应用户的兴趣。我们已经展示了这种方法可以生成的先验结果。
    Abstract In natural language generation (NLG), insight mining is seen as a data-to-text task, where data is mined for interesting patterns and verbalised into 'insight' statements. An 'over-generate and rank' paradigm is intuitively used to generate such insights. The multidimensionality and subjectivity of this process make it challenging. This paper introduces a schema-driven method to generate actionable insights from data to drive growth and change. It also introduces a technique to rank the insights to align with user interests based on their feedback. We show preliminary qualitative results of the insights generated using our technique and demonstrate its ability to adapt to feedback.
    摘要 natural language generation (NLG) 中,启示挖掘被看作是一个数据到文本任务,通过挖掘数据中有趣的模式,并将其转化为“启示”声明。一种“过度生成并排序”的思路是INTUITIVELY用于生成这些启示。由于这个过程的多维度和主观性,使其具有挑战性。这篇论文介绍了一种基于Schema驱动的方法,用于从数据中生成可行的启示,以驱动增长和变革。它还介绍了一种基于用户反馈的技术,用于对启示进行排序,以符合用户的兴趣。我们展示了先期的Qualitative结果,证明了我们的方法能够适应反馈。

Explaining Math Word Problem Solvers

  • paper_url: http://arxiv.org/abs/2307.13128
  • repo_url: None
  • paper_authors: Abby Newcomb, Jugal Kalita
  • for: 本研究旨在探讨自动解 math word problem 的方法是否遵循语言表达的 semantic logic。
  • methods: 本研究使用了 removing parts of the input 来测试模型的性能,以确定模型是否仅仅匹配语言表达的特征。
  • results: 结果表明,模型对输入中的多个字词的删除不会影响其解题能力,而且可以从 nonsense 问题中提取正确的解答。这表明自动解 math word problem 的模型可能会匹配语言表达的表现,而不是遵循语言表达的 semantic logic。
    Abstract Automated math word problem solvers based on neural networks have successfully managed to obtain 70-80\% accuracy in solving arithmetic word problems. However, it has been shown that these solvers may rely on superficial patterns to obtain their equations. In order to determine what information math word problem solvers use to generate solutions, we remove parts of the input and measure the model's performance on the perturbed dataset. Our results show that the model is not sensitive to the removal of many words from the input and can still manage to find a correct answer when given a nonsense question. This indicates that automatic solvers do not follow the semantic logic of math word problems, and may be overfitting to the presence of specific words.
    摘要 自动化的数学问题解决程式基于神经网络已经成功地在解决算数问题上获得70-80%的准确率。然而,研究人员发现这些解决方案可能会从 superficier 的特征中获得其方程。为了决定这些解决方案所使用的信息,我们将输入中的部分 removed 并评估模型在这些干扰dataset上的性能。我们的结果显示,模型对输入中的许多字都不敏感,仍然可以从非现实问题中获得正确的解答。这表明自动解决方案不会跟着数学问题的semantic逻辑,可能是过拟合特定的字汇。

Evaluating the Ripple Effects of Knowledge Editing in Language Models

  • paper_url: http://arxiv.org/abs/2307.12976
  • repo_url: https://github.com/edenbiran/rippleedits
  • paper_authors: Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva
  • for: 这篇论文主要针对现代语言模型中的知识更新问题。
  • methods: 论文提出了一种新的评估标准,用于评估编辑方法对模型知识的影响。
  • results: 研究发现,目前的编辑方法通常无法在模型知识中引入一致的变化,而一种简单的上下文编辑基线得到了最佳成绩。
    Abstract Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct \ripple{}, a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on \ripple{}, showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.
    摘要 现代语言模型可以捕捉大量的事实知识。然而,一些事实可能会在时间的推移中变得过时或者错误地被推导出来,导致模型生成的结果不准确。为了解决这个问题,人们开发了多种修改方法,以更新模型中的事实。然而,评估这些方法的主要方法是测试模型中的一个特定事实是否已经成功地更新,并且其他主题的预测没有变化。在这篇文章中,我们 argue这种评估方法是有限的,因为更新一个事实(例如,“杰克·德普是Johnny Depp的儿子”)会导致模型需要更新其他相关的事实(例如,“杰克·德普是LILY-ROSE DEPP的姐妹”)。为解决这个问题,我们提出了一组新的评估标准,考虑修改的影响于相关的事实。使用这些标准,我们然后构建了一个名为\ripple{}的诊断 benchmark,包含5000个事实修改。我们对这些修改进行评估,发现当前的修改方法无法在模型中引入一致的改变。此外,我们发现一个简单的 Context-sensitive editing baseline 在我们的benchmark上获得了最好的分数, suggesting a promising research direction for model editing。

Leveraging Label Variation in Large Language Models for Zero-Shot Text Classification

  • paper_url: http://arxiv.org/abs/2307.12973
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy
  • for: 这个论文旨在探讨大语言模型(LLM)在无监督学习下的文本分类能力,以及这些模型在不同任务、数据和语言下的表现。
  • methods: 这个论文使用了5种当今最佳的LLM作为”注解器”,在5个不同的任务(年龄、性别、主题、情感预测和仇恨言语检测)中进行了4种不同的语言(英语、法语、德语和西班牙语)的测试。
  • results: 论文发现,虽然不同的任务、数据和语言下的模型表现不同,但使用人工注解者的汇集技术可以substantially better than任何一个个体模型。然而,LLMs仍然不能与人工注解者相比,因此它们并不可以完全取代人工注解。
    Abstract The zero-shot learning capabilities of large language models (LLMs) make them ideal for text classification without annotation or supervised training. Many studies have shown impressive results across multiple tasks. While tasks, data, and results differ widely, their similarities to human annotation can aid us in tackling new tasks with minimal expenses. We evaluate using 5 state-of-the-art LLMs as "annotators" on 5 different tasks (age, gender, topic, sentiment prediction, and hate speech detection), across 4 languages: English, French, German, and Spanish. No single model excels at all tasks, across languages, or across all labels within a task. However, aggregation techniques designed for human annotators perform substantially better than any one individual model. Overall, though, LLMs do not rival even simple supervised models, so they do not (yet) replace the need for human annotation. We also discuss the tradeoffs between speed, accuracy, cost, and bias when it comes to aggregated model labeling versus human annotation.
    摘要 大型自然语言模型(LLM)的零shot学习能力使其成为文本分类无需注释或指导式训练的理想选择。许多研究表明在多个任务上获得了吸引人的结果。虽然任务、数据和结果之间存在差异,但它们在人类注释上的相似性可以帮助我们解决新的任务,降低成本。我们使用5种 state-of-the-art LLM作为“注释员”进行5个任务(年龄、性别、主题、情感预测和词语攻击检测),在英语、法语、德语和西班牙语四种语言上进行评估。没有任何模型在所有任务和语言上表现出色,但是为human annotator的汇集技术表现出了明显的改善。总的来说,LLMs现在没有超过简单的指导式模型,因此它们还没有取代人类注释。我们还讨论了在汇集模型标签与人类注释之间的速度、准确率、成本和偏见的贸易。

Aligning Large Language Models with Human: A Survey

  • paper_url: http://arxiv.org/abs/2307.12966
  • repo_url: https://github.com/garyyufei/alignllmhumansurvey
  • paper_authors: Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, Qun Liu
  • For: This paper provides a comprehensive overview of alignment technologies for large language models (LLMs) to better suit human-oriented tasks and expectations.* Methods: The paper reviews various training methodologies for LLM alignment, including supervised fine-tuning, online and offline human preference training, and parameter-efficient training mechanisms.* Results: The paper evaluates the effectiveness of human-aligned LLMs using a multifaceted approach and highlights several promising future research avenues in the field.Here is the same information in Simplified Chinese text:* For: 这篇论文提供了大语言模型(LLM)的启用技术的全面回顾,以便更好地适应人类需求。* Methods: 论文回顾了各种用于LLM启用的训练方法,包括监督精度调整、在线和离线人类偏好训练以及参数高效训练机制。* Results: 论文使用多方面的评估方法评估了人类启用LLM的效果,并提出了许多可能的未来研究方向。
    Abstract Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect (hallucinated) information. Hence, aligning LLMs with human expectations has become an active area of interest within the research community. This survey presents a comprehensive overview of these alignment technologies, including the following aspects. (1) Data collection: the methods for effectively collecting high-quality instructions for LLM alignment, including the use of NLP benchmarks, human annotations, and leveraging strong LLMs. (2) Training methodologies: a detailed review of the prevailing training methods employed for LLM alignment. Our exploration encompasses Supervised Fine-tuning, both Online and Offline human preference training, along with parameter-efficient training mechanisms. (3) Model Evaluation: the methods for evaluating the effectiveness of these human-aligned LLMs, presenting a multifaceted approach towards their assessment. In conclusion, we collate and distill our findings, shedding light on several promising future research avenues in the field. This survey, therefore, serves as a valuable resource for anyone invested in understanding and advancing the alignment of LLMs to better suit human-oriented tasks and expectations. An associated GitHub link collecting the latest papers is available at https://github.com/GaryYufei/AlignLLMHumanSurvey.
    摘要 庞大语言模型(LLM)在各种自然语言处理(NLP)任务中表现出色,但它们也存在一些局限性,如不理解人类指令、生成可能偏见的内容或者 factually incorrect(hallucinated)信息。因此,与人类期望的 aligning LLM 已成为研究领域的热点。这篇评论文章提供了一个全面的对这些对齐技术的评论,包括以下方面:1. 数据采集:如何有效地采集高质量的人类指令,包括使用 NLP 标准准则、人工标注和利用强大的 LLM。2. 训练方法:详细介绍了在 LLM 对齐中广泛使用的训练方法,包括监督精细调整、在线人类喜好训练和效率的参数训练机制。3. 模型评价:如何评价这些人类对齐的 LLM,提出了多方面的评价方法,以提供全面的评价方式。总之,这篇评论文章为各种投入 NLP 领域的人提供了一个有价值的资源,帮助他们更好地理解和提高 LLM 的对齐性,以更好地适应人类中心的任务和期望。关于这些研究的最新论文,可以通过以下 GitHub 链接获取:https://github.com/GaryYufei/AlignLLMHumanSurvey.

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.12949
  • repo_url: None
  • paper_authors: Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen
  • for: 提高自动语音识别(ASR)系统的可读性,通过使用生成语言模型和强化学习来修复ASR文本中的 sintactic structure。
  • methods: 使用强化学习方法,利用topic中的文本和大型预训练的生成语言模型,将written文本和ASR文本之间的差异 bridge。
  • results: 实验表明,我们的方法在ASR测试集上的两个标准 datasets上达到了状态的最佳性能。
    Abstract Punctuation restoration is an important task in automatic speech recognition (ASR) which aim to restore the syntactic structure of generated ASR texts to improve readability. While punctuated texts are abundant from written documents, the discrepancy between written punctuated texts and ASR texts limits the usability of written texts in training punctuation restoration systems for ASR texts. This paper proposes a reinforcement learning method to exploit in-topic written texts and recent advances in large pre-trained generative language models to bridge this gap. The experiments show that our method achieves state-of-the-art performance on the ASR test set on two benchmark datasets for punctuation restoration.
    摘要 “短语结构修复是自动语音识别(ASR)中的一项重要任务,旨在提高ASR文本的可读性。written文本充沛,但是ASR文本与written文本之间存在差异,这限制了使用written文本来训练ASR文本的短语结构修复系统。本文提出了一种利用topic内文本和大型预训练生成语言模型的强化学习方法,bridge这个差异。实验结果显示,我们的方法在ASR测试集上两个benchmark dataset上达到了状态对抗性的性能。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

The potential of LLMs for coding with low-resource and domain-specific programming languages

  • paper_url: http://arxiv.org/abs/2307.13018
  • repo_url: None
  • paper_authors: Artur Tarassow
  • for: 这项研究旨在探讨使用大语言模型(LLM)进行低资源和域语言编程语言的编程,以便更好地利用LLM处理技术。
  • methods: 本研究使用了一种基于GPT-3.5的专有LLM,并采用了经济cripting语言名为hansl的开源软件gretl进行实践。
  • results: 研究发现,LLM可以用于写作、理解、改进和文档gretl代码,包括生成函数的描述文档和提供 econometric代码的准确解释。但是,LLM还有一些局限性,如无法改进某些代码部分和写入正确的单元测试代码。
    Abstract This paper presents a study on the feasibility of using large language models (LLM) for coding with low-resource and domain-specific programming languages that typically lack the amount of data required for effective LLM processing techniques. This study focuses on the econometric scripting language named hansl of the open-source software gretl and employs a proprietary LLM based on GPT-3.5. Our findings suggest that LLMs can be a useful tool for writing, understanding, improving, and documenting gretl code, which includes generating descriptive docstrings for functions and providing precise explanations for abstract and poorly documented econometric code. While the LLM showcased promoting docstring-to-code translation capability, we also identify some limitations, such as its inability to improve certain sections of code and to write accurate unit tests. This study is a step towards leveraging the power of LLMs to facilitate software development in low-resource programming languages and ultimately to lower barriers to entry for their adoption.
    摘要 Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well.

cs.LG - 2023-07-25

Multi-GPU Approach for Training of Graph ML Models on large CFD Meshes

  • paper_url: http://arxiv.org/abs/2307.13592
  • repo_url: None
  • paper_authors: Sebastian Strönisch, Maximilian Sander, Andreas Knüpfer, Marcus Meyer
  • For: 这 paper 的目的是开发一种基于机器学习的拟合模型,用于加速计算流体力学 simulate 的过程。* Methods: 这 paper 使用了图 neural network (GNN) 作为拟合模型,并在多个 GPU 上分区和分配流体流体Domain。* Results: Comparing 该 paper 的拟合模型与传统的分布式模型,后者 производи了更好的预测结果,并且超越了该拟合模型。
    Abstract Mesh-based numerical solvers are an important part in many design tool chains. However, accurate simulations like computational fluid dynamics are time and resource consuming which is why surrogate models are employed to speed-up the solution process. Machine Learning based surrogate models on the other hand are fast in predicting approximate solutions but often lack accuracy. Thus, the development of the predictor in a predictor-corrector approach is the focus here, where the surrogate model predicts a flow field and the numerical solver corrects it. This paper scales a state-of-the-art surrogate model from the domain of graph-based machine learning to industry-relevant mesh sizes of a numerical flow simulation. The approach partitions and distributes the flow domain to multiple GPUs and provides halo exchange between these partitions during training. The utilized graph neural network operates directly on the numerical mesh and is able to preserve complex geometries as well as all other properties of the mesh. The proposed surrogate model is evaluated with an application on a three dimensional turbomachinery setup and compared to a traditionally trained distributed model. The results show that the traditional approach produces superior predictions and outperforms the proposed surrogate model. Possible explanations, improvements and future directions are outlined.
    摘要 mesh-based numerical solvers 是设计工具链中的一个重要部分。然而,精确的 simulate like computational fluid dynamics 需要时间和资源,这是 why 使用 surrogate models 来快速解决方案。机器学习基于 surrogate models 则很快速预测 approximate solutions,但frequently lack accuracy。因此,在这里的发展问题是predictor-corrector方法中的开发预测器,这里的 surrogate model 预测了流场,而numerical solver 则更正。这篇文章 scales 一个 state-of-the-art surrogate model 从 domain of graph-based machine learning 到 industry-relevant mesh sizes 的 numerical flow simulation。该方法将流体Domain partitioned 和分配到多个 GPUs,并在训练中提供了 halos exchange между这些分区。使用的 graph neural network 直接操作 numerical mesh,能够保留复杂的几何和所有其他 mesh 的属性。提案的 surrogate model 与一个 three-dimensional turbomachinery 应用中进行比较,与传统的分布式训练模型相比,传统方法产生了更好的预测,超越了提案的 surrogate model。 possible explanations, improvements 和 future directions 也被详细描述。

Settling the Sample Complexity of Online Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13586
  • repo_url: None
  • paper_authors: Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du
  • for: The paper is written to address the issue of data efficiency in online reinforcement learning, specifically the problem of achieving minimax-optimal regret without incurring any burn-in cost.
  • methods: The paper proposes a modified version of Monotonic Value Propagation (MVP), a model-based algorithm, and develops a new regret decomposition strategy and analysis paradigm to decouple complicated statistical dependency.
  • results: The paper achieves a regret on the order of $(SAH^3K)/\sqrt{\log K}$, which matches the minimax lower bound for the entire range of sample size $K\geq 1$, and translates to a PAC sample complexity of $\frac{SAH^3}{\varepsilon^2}$ up to log factor, which is minimax-optimal for the full $\varepsilon$-range.
    Abstract A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a ``large-sample'' regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for the context of finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version of Monotonic Value Propagation (MVP), a model-based algorithm proposed by \cite{zhang2020reinforcement}, achieves a regret on the order of (modulo log factors) \begin{equation*} \min\big\{ \sqrt{SAH^3K}, \,HK \big\}, \end{equation*} where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, and $K$ is the total number of episodes. This regret matches the minimax lower bound for the entire range of sample size $K\geq 1$, essentially eliminating any burn-in requirement. It also translates to a PAC sample complexity (i.e., the number of episodes needed to yield $\varepsilon$-accuracy) of $\frac{SAH^3}{\varepsilon^2}$ up to log factor, which is minimax-optimal for the full $\varepsilon$-range. Further, we extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances. The key technical innovation lies in the development of a new regret decomposition strategy and a novel analysis paradigm to decouple complicated statistical dependency -- a long-standing challenge facing the analysis of online RL in the sample-hungry regime.
    摘要 在在线强化学习中,数据效率是中心问题。虽然一些最近的研究已达到了几何上的最小误差,但这些结果的可行性只在大样本 regime 中保证,这意味着在使用这些算法时需要支付巨大的烧毁成本。如何在不支付任何烧毁成本的情况下实现最优误差响应是在RL理论中的开放问题。我们在具有finite-horizon不规则 Markov决策过程的上下文中解决了这个问题。我们证明了一种修改后的升权宣传(MVP)算法可以在不支付任何烧毁成本的情况下实现误差的最小化。具体来说,我们证明了MVP算法在SAH^3K sample size中的误差是(modulo log factor)最多为\begin{equation*} \min\big\{ \sqrt{SAH^3K}, \,HK \big\}, \end{equation*} where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, and $K$ is the total number of episodes.这个误差与整个样本大小$K\geq 1$的最小误差响应相同,实际上消除了任何烧毁要求。它还翻译到一个PAC样本复杂度(即要求episode数来实现 $\varepsilon$-精度)为$\frac{SAH^3}{\varepsilon^2}$,这是最优的PAC样本复杂度。此外,我们还扩展了我们的理论,探讨了问题依赖于问题特定的量,如优值/成本和某些方差。我们的关键技术创新在于开发了一种新的误差分解策略和一种新的分析方法,用于解耦在线RL在样本匮乏 regime 中的复杂的统计依赖关系。

Piecewise Linear Functions Representable with Infinite Width Shallow ReLU Neural Networks

  • paper_url: http://arxiv.org/abs/2307.14373
  • repo_url: None
  • paper_authors: Sarah McCarty
  • for: 这个论文研究了连续piecewise线性函数的无限宽深度学习模型,使用Rectified Linear Unit(ReLU)作为活化函数。
  • methods: 通过积分表示,这种深度学习模型可以被视为一种finite cost shallow neural network,并且可以被相应的signed,finite measure表示在适当的参数空间中。
  • results: 论文证明了 ONgie et al.的 conjecture,即任何连续piecewise线性函数都可以通过这种无限宽深度学习模型表示,而且这种表示可以被finite width shallow ReLU neural network来实现。
    Abstract This paper analyzes representations of continuous piecewise linear functions with infinite width, finite cost shallow neural networks using the rectified linear unit (ReLU) as an activation function. Through its integral representation, a shallow neural network can be identified by the corresponding signed, finite measure on an appropriate parameter space. We map these measures on the parameter space to measures on the projective $n$-sphere cross $\mathbb{R}$, allowing points in the parameter space to be bijectively mapped to hyperplanes in the domain of the function. We prove a conjecture of Ongie et al. that every continuous piecewise linear function expressible with this kind of infinite width neural network is expressible as a finite width shallow ReLU neural network.
    摘要 Simplified Chinese translation:这篇论文研究了无限宽 continuous piecewise linear function的表示,使用 finite cost shallow neural network 和 rectified linear unit (ReLU) activation function。通过积分表示,我们可以将 shallow neural network 与signed, finite measure on 相应的参数空间进行对应。然后,我们将这些度量映射到 projective $n$-sphere cross $\mathbb{R}$ 上,使得参数空间中的点可以一一映射到函数的域中的hyperplane。我们证明了 Ongie et al. 的 conjecture,即任何可表示为 infinite width neural network 的 continuous piecewise linear function都可以表示为 finite width shallow ReLU neural network。

Comparing Forward and Inverse Design Paradigms: A Case Study on Refractory High-Entropy Alloys

  • paper_url: http://arxiv.org/abs/2307.13581
  • repo_url: None
  • paper_authors: Arindam Debnath, Lavanya Raman, Wenjie Li, Adam M. Krajewski, Marcia Ahn, Shuang Lin, Shunli Shang, Allison M. Beese, Zi-Kui Liu, Wesley F. Reinhart
  • for: 本研究的目的是比较前向和反向设计模型范文在实际应用中的性能。
  • methods: 本研究使用了反向设计方法,并对其进行了对比,以评估其在不同的目标和约束下的性能。
  • results: 研究发现,反向设计方法在refractory高 entropy合金设计中表现出色,能够更好地满足不同的目标和约束。
    Abstract The rapid design of advanced materials is a topic of great scientific interest. The conventional, ``forward'' paradigm of materials design involves evaluating multiple candidates to determine the best candidate that matches the target properties. However, recent advances in the field of deep learning have given rise to the possibility of an ``inverse'' design paradigm for advanced materials, wherein a model provided with the target properties is able to find the best candidate. Being a relatively new concept, there remains a need to systematically evaluate how these two paradigms perform in practical applications. Therefore, the objective of this study is to directly, quantitatively compare the forward and inverse design modeling paradigms. We do so by considering two case studies of refractory high-entropy alloy design with different objectives and constraints and comparing the inverse design method to other forward schemes like localized forward search, high throughput screening, and multi objective optimization.
    摘要 rapid design of advanced materials 是科学领域中很受欢迎的话题。传统的,“前进”的材料设计方法是评估多个候选者,以确定最佳符合目标性能的候选者。然而,近年,深度学习的发展使得“反向”的材料设计方法变得可能,其中一个模型提供目标性能后,能够找到最佳候选者。由于是一个新的概念,因此还需要系统地评估这两种方法在实际应用中的性能。因此,本研究的目标是直接、量化地比较前进和反向设计模型方法。我们通过考虑高熔环境高级合金设计的两个案例研究,并与其他前进方案如本地前进搜索、高通过率检测和多目标优化进行比较。

Reinterpreting survival analysis in the universal approximator age

  • paper_url: http://arxiv.org/abs/2307.13579
  • repo_url: https://github.com/sdittmer/survival_analysis_sumo_plus_plus
  • paper_authors: Sören Dittmer, Michael Roberts, Jacobus Preller, AIX COVNET, James H. F. Rudd, John A. D. Aston, Carola-Bibiane Schönlieb
  • for: 本文旨在提供用于深度学习中survival分析的工具,以便全面利用survival分析的潜在力量。
  • methods: 本文提出了一种新的损失函数、评价指标和首个可提供survival曲线的universalapproximation网络。
  • results: 对比其他方法,该损失函数和模型在大规模数字实验中表现出色。
    Abstract Survival analysis is an integral part of the statistical toolbox. However, while most domains of classical statistics have embraced deep learning, survival analysis only recently gained some minor attention from the deep learning community. This recent development is likely in part motivated by the COVID-19 pandemic. We aim to provide the tools needed to fully harness the potential of survival analysis in deep learning. On the one hand, we discuss how survival analysis connects to classification and regression. On the other hand, we provide technical tools. We provide a new loss function, evaluation metrics, and the first universal approximating network that provably produces survival curves without numeric integration. We show that the loss function and model outperform other approaches using a large numerical study.
    摘要 生存分析是统计工具箱中的一个重要组成部分。然而,在классиical统计领域中,大多数领域都已经欢迎了深度学习,而生存分析则只是最近才从深度学习社区得到了一些微的关注。这种最近的发展可能受到了COVID-19大流行的影响。我们的目标是为生存分析在深度学习中充分发挥作用的工具。一方面,我们讨论了生存分析与分类和回归之间的连接。另一方面,我们提供技术工具。我们提出了一个新的损失函数、评估指标和首个可提供生存曲线的全面拟合网络,无需数值积分。我们通过大量的数据分析表明,我们的损失函数和模型在其他方法上出现较好的表现。

PT$\mathrm{L}^{p}$: Partial Transport $\mathrm{L}^{p}$ Distances

  • paper_url: http://arxiv.org/abs/2307.13571
  • repo_url: None
  • paper_authors: Xinran Liu, Yikun Bai, Huy Tran, Zhanqi Zhu, Matthew Thorpe, Soheil Kolouri
  • for: 本文提出了一种新的策略来比较普通的信号,即基于优化运输的partial transport $\mathrm{L}^{p}$距离。
  • methods: 本文使用了优化运输框架,并提出了partial transport $\mathrm{L}^{p}$距离作为一种新的策略来比较普通的信号。
  • results: 本文提供了partial transport $\mathrm{L}^{p}$距离的理论背景,包括优化计划的存在和距离在不同的限制下的行为。此外,本文还介绍了对这种距离的剖分变化,以及它在信号分类和最近邻近分类中的应用。
    Abstract Optimal transport and its related problems, including optimal partial transport, have proven to be valuable tools in machine learning for computing meaningful distances between probability or positive measures. This success has led to a growing interest in defining transport-based distances that allow for comparing signed measures and, more generally, multi-channeled signals. Transport $\mathrm{L}^{p}$ distances are notable extensions of the optimal transport framework to signed and possibly multi-channeled signals. In this paper, we introduce partial transport $\mathrm{L}^{p}$ distances as a new family of metrics for comparing generic signals, benefiting from the robustness of partial transport distances. We provide theoretical background such as the existence of optimal plans and the behavior of the distance in various limits. Furthermore, we introduce the sliced variation of these distances, which allows for rapid comparison of generic signals. Finally, we demonstrate the application of the proposed distances in signal class separability and nearest neighbor classification.
    摘要 优化交通和其相关问题,包括优化部分交通,在机器学习中证明是有用的工具来计算概率或正式推论中的有意义距离。这种成功引起了比较Transport基于距离的定义,以便比较签名的推论和更一般的多通道信号。TransportLP distances是优化交通框架中的可扩展,用于比较签名或多通道信号。在这篇论文中,我们介绍partial transportLP distances作为比较通用信号的新家族度量,受到部分交通距离的稳定性的启发。我们还提供了有关最佳计划的存在和距离的不同限制下的行为。此外,我们还介绍了这些距离的割裂变种,可以快速比较通用信号。最后,我们示出了提案的距离在信号分类和最近邻居分类中的应用。

Introducing Hybrid Modeling with Time-series-Transformers: A Comparative Study of Series and Parallel Approach in Batch Crystallization

  • paper_url: http://arxiv.org/abs/2308.05749
  • repo_url: None
  • paper_authors: Niranjan Sitapure, Joseph S Kwon
  • For: The paper is written for the development of a first-of-a-kind, attention-based time-series transformer (TST) hybrid framework for batch crystallization, which aims to improve the accuracy and interpretability of digital twins in chemical manufacturing.* Methods: The paper uses a hybrid approach that combines first-principles physics-based dynamics with machine learning (ML) models, specifically attention-based TSTs, to capture long-term and short-term changes in process states. The authors compare two different configurations of TST-based hybrid models and evaluate their performance using normalized-mean-square-error (NMSE) and $R^2$ values.* Results: The paper reports improved accuracy and interpretability of the TST-based hybrid models compared to traditional black-box models, with NMSE values in the range of $[10, 50]\times10^{-4}$ and $R^2$ values over 0.99. The authors also demonstrate the effectiveness of the hybrid models in predicting batch crystallization processes.
    Abstract Most existing digital twins rely on data-driven black-box models, predominantly using deep neural recurrent, and convolutional neural networks (DNNs, RNNs, and CNNs) to capture the dynamics of chemical systems. However, these models have not seen the light of day, given the hesitance of directly deploying a black-box tool in practice due to safety and operational issues. To tackle this conundrum, hybrid models combining first-principles physics-based dynamics with machine learning (ML) models have increased in popularity as they are considered a 'best of both worlds' approach. That said, existing simple DNN models are not adept at long-term time-series predictions and utilizing contextual information on the trajectory of the process dynamics. Recently, attention-based time-series transformers (TSTs) that leverage multi-headed attention mechanism and positional encoding to capture long-term and short-term changes in process states have shown high predictive performance. Thus, a first-of-a-kind, TST-based hybrid framework has been developed for batch crystallization, demonstrating improved accuracy and interpretability compared to traditional black-box models. Specifically, two different configurations (i.e., series and parallel) of TST-based hybrid models are constructed and compared, which show a normalized-mean-square-error (NMSE) in the range of $[10, 50]\times10^{-4}$ and an $R^2$ value over 0.99. Given the growing adoption of digital twins, next-generation attention-based hybrid models are expected to play a crucial role in shaping the future of chemical manufacturing.
    摘要 现有的数字双胞虫大多采用数据驱动黑盒模型,主要使用深度循环神经网络(RNN)和卷积神经网络(CNN)来捕捉化学系统的动态。然而,这些模型尚未得到实际应用,因为直接部署黑盒工具会带来安全和运营问题。为解决这个悖论,Hybrid模型,结合物理基础知识和机器学习(ML)模型,在化学制造中得到了广泛的应用。然而,现有的简单的DNN模型不具备长期时间序列预测和利用过程动态轨迹上的 контекст信息。最近,听力基于时间序列变换器(TST),利用多头听力机制和位置编码,能够捕捉长期和短期变化的过程状态,已经显示出高预测性能。因此,一种首次实现的TST基于混合框架,在批晶凝聚过程中得到了改进的准确性和可解释性,相比传统黑盒模型。具体来说,我们构建了两种不同的配置(即串行和平行)的TST基于混合模型,其NMSE在 $[10, 50]\times10^{-4}$ 之间,$R^2$ 值超过 0.99。随着数字双胞虫的普及,未来的听力基于混合模型将在化学制造中扮演关键的角色。

Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities

  • paper_url: http://arxiv.org/abs/2307.13565
  • repo_url: https://github.com/predopt/predopt-benchmarks
  • paper_authors: Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, Ferdinando Fioretto
  • for: 本研究は、机器学习中的决策专注式学习(DFL)的实现方法についての综观的评估を提供。
  • methods: 本研究使用了多种integraging machine learning和优化模型的技术,包括内置式学习、迭代式学习、迭代式优化、等。
  • results: 本研究提出了一个综合性的DFL方法分类系统,并进行了广泛的实验评估。结果显示,DFL方法可以在许多不确定性下的决策任务中实现更好的性能。
    Abstract Decision-focused learning (DFL) is an emerging paradigm in machine learning which trains a model to optimize decisions, integrating prediction and optimization in an end-to-end system. This paradigm holds the promise to revolutionize decision-making in many real-world applications which operate under uncertainty, where the estimation of unknown parameters within these decision models often becomes a substantial roadblock. This paper presents a comprehensive review of DFL. It provides an in-depth analysis of the various techniques devised to integrate machine learning and optimization models, introduces a taxonomy of DFL methods distinguished by their unique characteristics, and conducts an extensive empirical evaluation of these methods proposing suitable benchmark dataset and tasks for DFL. Finally, the study provides valuable insights into current and potential future avenues in DFL research.
    摘要 决策驱动学习(DFL)是一种emerging paradigm在机器学习领域,它将机器学习模型训练为优化决策,并将预测和优化 integrate into an end-to-end system。这种 paradigm 承诺可以 revolutionize 决策making 在uncertainty 环境中的应用,因为在这些决策模型中 estimate unknown parameters 时常常成为一个substantial roadblock。这篇文章提供了 DFL 的全面 Review,包括了不同技术的 integrate machine learning 和优化模型的分析,并提出了 DFL 方法的分类,以及适用于 DFL 的 Benchmark 数据集和任务。最后,文章还提供了有价值的 Insight 到当前和未来 DFL 研究的方向。

  • paper_url: http://arxiv.org/abs/2307.13548
  • repo_url: None
  • paper_authors: Oualid Zari, Javier Parra-Arnau, Ayşe Ünsal, Melek Önen
  • for: 攻击Graph Neural Networks(GNNs)中的隐私漏洞,泄露图形数据中的私有链接信息。
  • methods: 利用新节点加入图和API查询预测来研究隐私链接信息泄露的可能性,并提出防止隐私泄露的方法以保持模型实用性。
  • results: 对比现有方法,我们的攻击方法在推断链接信息方面表现出色,同时我们还分析了应用 differential privacy(DP)机制来mitigate攻击的影响,并研究了隐私保护和模型实用性之间的质量衡量。
    Abstract In this paper, we present a stealthy and effective attack that exposes privacy vulnerabilities in Graph Neural Networks (GNNs) by inferring private links within graph-structured data. Focusing on the inductive setting where new nodes join the graph and an API is used to query predictions, we investigate the potential leakage of private edge information. We also propose methods to preserve privacy while maintaining model utility. Our attack demonstrates superior performance in inferring the links compared to the state of the art. Furthermore, we examine the application of differential privacy (DP) mechanisms to mitigate the impact of our proposed attack, we analyze the trade-off between privacy preservation and model utility. Our work highlights the privacy vulnerabilities inherent in GNNs, underscoring the importance of developing robust privacy-preserving mechanisms for their application.
    摘要 在这篇论文中,我们提出了一种隐蔽和有效的攻击,暴露了图神经网络(GNN)中的隐私漏洞,通过推断图结构数据中的私人链接。我们在新节点加入图时的招呼设定下进行研究,并使用 API 来查询预测结果。我们发现了私人边信息泄露的可能性,并提出了保持隐私的方法,以保持模型的有用性。我们的攻击性能superior于现有的状态。此外,我们还研究了在应用 differential privacy(DP)机制时,如何减轻我们所提出的攻击的影响。我们分析了隐私保护和模型有用性之间的负担,我们的工作抛光了 GNN 中的隐私漏洞,强调了在其应用中发展Robust隐私保护机制的重要性。

Transfer Learning for Portfolio Optimization

  • paper_url: http://arxiv.org/abs/2307.13546
  • repo_url: None
  • paper_authors: Haoyang Cao, Haotian Gu, Xin Guo, Mathieu Rosenbaum
  • for: 本文探讨了通过传输学习技术解决金融 portefolio优化问题的可能性。
  • methods: 本文引入了一种新的概念 called “传输风险”,用于优化传输学习方法的优化框架。
  • results: numerical experiments 表明,传输风险与传输学习方法的总性表现之间存在强相关关系,表明传输风险为”可传输性”的可靠指标。
    Abstract In this work, we explore the possibility of utilizing transfer learning techniques to address the financial portfolio optimization problem. We introduce a novel concept called "transfer risk", within the optimization framework of transfer learning. A series of numerical experiments are conducted from three categories: cross-continent transfer, cross-sector transfer, and cross-frequency transfer. In particular, 1. a strong correlation between the transfer risk and the overall performance of transfer learning methods is established, underscoring the significance of transfer risk as a viable indicator of "transferability"; 2. transfer risk is shown to provide a computationally efficient way to identify appropriate source tasks in transfer learning, enhancing the efficiency and effectiveness of the transfer learning approach; 3. additionally, the numerical experiments offer valuable new insights for portfolio management across these different settings.
    摘要 在这项工作中,我们探讨了使用传输学习技术来解决金融股票优化问题的可能性。我们介绍了一种新的概念 called "传输风险",这个概念在传输学习优化框架中被引入。我们从三类实验中进行了数据分析:跨大陆传输、跨业传输和跨频传输。具体来说,我们发现:1. 传输风险和传输学习方法的总性性能之间存在强正相关关系,这确立了传输风险作为"传输可用性"的可靠指标的重要性。2. 传输风险可以提供一种计算效率高的方法来确定合适的源任务,从而提高传输学习方法的效率和效果。3. 数据分析还提供了有价值的新视角 для股票管理在不同设置下。Here's the translation in Traditional Chinese:在这个工作中,我们探讨了使用传递学习技术来解决金融股票优化问题的可能性。我们介绍了一个新的概念 called "传递风险",这个概念在传递学习优化框架中被引入。我们从三种类型的实验中进行了数据分析:跨大陆传递、跨业传递和跨频传递。具体来说,我们发现:1. 传递风险和传递学习方法的总性性能之间存在强正相关关系,这确立了传递风险作为"传递可用性"的可靠指标的重要性。2. 传递风险可以提供一种计算效率高的方法来决定合适的源任务,从而提高传递学习方法的效率和效果。3. 数据分析还提供了有价值的新视角 для股票管理在不同设置下。

A model for efficient dynamical ranking in networks

  • paper_url: http://arxiv.org/abs/2307.13544
  • repo_url: None
  • paper_authors: Andrea Della Vecchia, Kibidi Neocosmos, Daniel B. Larremore, Cristopher Moore, Caterina De Bacco
  • for: 这篇论文旨在提出一种物理启发的方法,用于在指定的时间网络中INFER动态排名。
  • methods: 该方法是解一个线性方程系统,只需要一个参数调整。
  • results: 经测试,该方法可以更好地预测交互(边的存在)和其结果(边的方向),在许多情况下表现比较出色。
    Abstract We present a physics-inspired method for inferring dynamic rankings in directed temporal networks - networks in which each directed and timestamped edge reflects the outcome and timing of a pairwise interaction. The inferred ranking of each node is real-valued and varies in time as each new edge, encoding an outcome like a win or loss, raises or lowers the node's estimated strength or prestige, as is often observed in real scenarios including sequences of games, tournaments, or interactions in animal hierarchies. Our method works by solving a linear system of equations and requires only one parameter to be tuned. As a result, the corresponding algorithm is scalable and efficient. We test our method by evaluating its ability to predict interactions (edges' existence) and their outcomes (edges' directions) in a variety of applications, including both synthetic and real data. Our analysis shows that in many cases our method's performance is better than existing methods for predicting dynamic rankings and interaction outcomes.
    摘要 我们提出一种物理启发的方法估算直接时间网络中的动态排名 - 直接时间网络中每个Directed和时间戳的边都表示对话的结果和时间,例如赢利或失败。我们的方法会解决一个线性方程系统,只需要一个参数调整,因此算法可扩展和高效。我们对多种应用进行测试,包括 sintetic 数据和实际数据,并发现我们的方法在许多情况下的性能比现有方法更高。

Model Calibration in Dense Classification with Adaptive Label Perturbation

  • paper_url: http://arxiv.org/abs/2307.13539
  • repo_url: https://github.com/carlisle-liu/aslp
  • paper_authors: Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes
  • for: 本研究旨在提高深度神经网络的准确率和信任度,以便在安全应用中使用。
  • methods: 本文提出了一种名为 Adaptive Stochastic Label Perturbation(ASLP)的方法,它可以学习每个训练图像的唯一标签杂化水平。ASLP使用的是我们提出的 Self-Calibrating Binary Cross Entropy(SC-BCE)损失函数,它将杂化过程、标签杂化和标签平滑等进程集成到一起,以达到更好的准确率和信任度。
  • results: 对比于传统的 dense binary classification 模型,ASLP 可以显著提高模型的准确率和信任度。在 known data 上保持 классификация精度作为保守解决方案,或者特定地改进模型的准确率和信任度。
    Abstract For safety-related applications, it is crucial to produce trustworthy deep neural networks whose prediction is associated with confidence that can represent the likelihood of correctness for subsequent decision-making. Existing dense binary classification models are prone to being over-confident. To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. ASLP employs our proposed Self-Calibrating Binary Cross Entropy (SC-BCE) loss, which unifies label perturbation processes including stochastic approaches (like DisturbLabel), and label smoothing, to correct calibration while maintaining classification rates. ASLP follows Maximum Entropy Inference of classic statistical mechanics to maximise prediction entropy with respect to missing information. It performs this while: (1) preserving classification accuracy on known data as a conservative solution, or (2) specifically improves model calibration degree by minimising the gap between the prediction accuracy and expected confidence of the target training label. Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.
    摘要
  1. 保持知道数据上的分类精度作为保守解决方案,或2. 特别是提高模型准确率和预期确信度之间的差异,以iminimize the gap between the prediction accuracy and expected confidence of the target training label.Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.

INFINITY: Neural Field Modeling for Reynolds-Averaged Navier-Stokes Equations

  • paper_url: http://arxiv.org/abs/2307.13538
  • repo_url: None
  • paper_authors: Louis Serrano, Leon Migus, Yuan Yin, Jocelyn Ahmed Mazari, Patrick Gallinari
  • for: 这篇论文的目的是提出一种基于深度学习的精准仿真模型,用于简化物理现象的计算。
  • methods: 该方法使用含义 Neil 表示(INRs)来地址这个挑战,将物理场景的几何信息和物理场景编码成紧凑的表示,然后学习这些表示之间的映射,以便从物理场景中推断物理场景。
  • results: 在一个空气foil设计优化问题中,该方法达到了当前最佳性能,准确地预测了物理场景中的场景和表面上的物理场景。此外,该方法还可以在设计探索和形状优化等应用中使用,并能够正确预测拖拽和升力系数。
    Abstract For numerical design, the development of efficient and accurate surrogate models is paramount. They allow us to approximate complex physical phenomena, thereby reducing the computational burden of direct numerical simulations. We propose INFINITY, a deep learning model that utilizes implicit neural representations (INRs) to address this challenge. Our framework encodes geometric information and physical fields into compact representations and learns a mapping between them to infer the physical fields. We use an airfoil design optimization problem as an example task and we evaluate our approach on the challenging AirfRANS dataset, which closely resembles real-world industrial use-cases. The experimental results demonstrate that our framework achieves state-of-the-art performance by accurately inferring physical fields throughout the volume and surface. Additionally we demonstrate its applicability in contexts such as design exploration and shape optimization: our model can correctly predict drag and lift coefficients while adhering to the equations.
    摘要 für numerische Entwurfsdesign ist die Entwicklung effizienter und genauer surrogatmodelle von entscheidender Bedeutung. Sie ermöglichen uns, komplexe physikalische Phänomene zu approximiieren, somit die computermonierte Last von direkten numerischen Simulationen zu reduzieren. Wir schlagen INFINITY vor, ein tiefes lernendes Modell, das impliciten neuronalen Darstellungen (INRs) nutzt, um diese Herausforderung zu meistern. Unser Framework kodiert geometrische Informationen und physikalische Felder in compacten Darstellungen und lernt eine Abbildung zwischen ihnen, um die physikalischen Felder zu infolgen. Wir verwenden ein airfoil-Design-Optimierungstask als Beispielaufgabe und bewerten unsere Methode am schwierigen AirfRANS-Datensatz, der eng mit realen industriellen Anwendungen verwandt ist. Die experimentellen Ergebnisse zeigen, dass unsere Methode einen neuen Standardsatz erreicht, indem sie physikalische Felder in Volumen und Oberfläche genau approximiiert. Wir demonstrieren ferner ihre Anwendbarkeit in Kontexten wie Design-Exploration und Form-Optimierung: unser Modell kann richtig druck- und liftkoeffizienten vorhersagen, ohne die Gleichungen zu verletzen.

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

  • paper_url: http://arxiv.org/abs/2307.13535
  • repo_url: None
  • paper_authors: Guanyi Wang, Mengqi Lou, Ashwin Pananjady
  • for: 本研究探讨了带有杂质模型的主成分分析问题,使用 union-of-subspace 模型捕捉信号结构。
  • methods: 本研究使用了统计和计算方法,并设立了基本的限制,证明了一种自然的投影力方法在特定情况下展现了局部 converges 性。
  • results: 研究发现,一些对于普通稀缺 PCA 的现象也适用于其结构化版本。
    Abstract We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on the geometry of the problem instance, and show that a natural projected power method exhibits local convergence to the statistically near-optimal neighborhood of the solution. We complement these results with end-to-end analyses of two important special cases given by path and tree sparsity in a general basis, showing initialization methods and matching evidence of computational hardness. Overall, our results indicate that several of the phenomena observed for vanilla sparse PCA extend in a natural fashion to its structured counterparts.
    摘要 我们研究一个主成分分析问题,其中信号结构是由一类联合子空间模型捕捉的。这个总类包括普通稀畴PCA以及其变体具有图稀畴。为了在统一的统计和计算镜头下研究这些问题,我们确立了基本的限制,并证明自然的投影力方法在统计上准确的邻居解决方案附近进行本地准确。我们补充了一些重要的特殊情况分析,包括路径和树稀畴在一般基础上,并提供了初始化方法和匹配证明的计算困难。总之,我们的结果表明,许多vanilla sparse PCA的现象在其结构化对应中也有自然的扩展。

Differentiable Turbulence II

  • paper_url: http://arxiv.org/abs/2307.13533
  • repo_url: None
  • paper_authors: Varun Shankar, Romit Maulik, Venkatasubramanian Viswanathan
  • for: This paper is written for developing data-driven models in computational fluid dynamics (CFD) using differentiable fluid simulators and machine learning (ML) methods.
  • methods: The paper proposes a framework for integrating deep learning models into a generic finite element numerical scheme for solving the Navier-Stokes equations, and applies the technique to learn a sub-grid scale closure using a multi-scale graph neural network.
  • results: The learned closure can achieve accuracy comparable to traditional large eddy simulation on a finer grid, resulting in an equivalent speedup of 10x. The method has been demonstrated on several realizations of flow over a backwards-facing step, testing on both unseen Reynolds numbers and new geometry.
    Abstract Differentiable fluid simulators are increasingly demonstrating value as useful tools for developing data-driven models in computational fluid dynamics (CFD). Differentiable turbulence, or the end-to-end training of machine learning (ML) models embedded in CFD solution algorithms, captures both the generalization power and limited upfront cost of physics-based simulations, and the flexibility and automated training of deep learning methods. We develop a framework for integrating deep learning models into a generic finite element numerical scheme for solving the Navier-Stokes equations, applying the technique to learn a sub-grid scale closure using a multi-scale graph neural network. We demonstrate the method on several realizations of flow over a backwards-facing step, testing on both unseen Reynolds numbers and new geometry. We show that the learned closure can achieve accuracy comparable to traditional large eddy simulation on a finer grid that amounts to an equivalent speedup of 10x. As the desire and need for cheaper CFD simulations grows, we see hybrid physics-ML methods as a path forward to be exploited in the near future.
    摘要 diferenciable 流体 simulator 在 Computational Fluid Dynamics (CFD) 中展示了越来越多的价值,作为数据驱动模型的有用工具。 diferenciable turbulence,也就是在 CFD 解决方案算法中嵌入机器学习 (ML) 模型的终端训练,捕捉了物理基础的通用力和初始成本的限制,以及深度学习方法的自动训练和灵活性。我们开发了一个抽象 Finite Element 数学模型的框架,将深度学习模型集成到 Navier-Stokes 方程中,并使用多尺度图 neural network 来学习子网格抑制。我们在不同的 Reynolds 数和新geometry 上进行了多个实现,并示出了学习 closure 可以与传统大 Eddy simulation 相当的精度相比,在一个等效的加速10倍的粗网上。随着 CFD simulation 的成本下降的需求和需求,我们认为 hybrid physics-ML 方法将在未来被利用。

Towards Long-Term predictions of Turbulence using Neural Operators

  • paper_url: http://arxiv.org/abs/2307.13517
  • repo_url: None
  • paper_authors: Fernando Gonzalez, François-Xavier Demoulin, Simon Bernard
  • for: 这 paper 探讨了使用神经操作符来预测湍流,主要关注于欧姆 neural 操作符(FNO)模型。它的目的是通过机器学习来开发减少的order/代理模型,以便为湍流计算 simulate 提供更好的估算。
  • methods: 这 paper 使用了不同的模型配置,包括 U-NET 结构(UNO 和 U-FNET),并发现这些结构在准确性和稳定性方面表现更好于标准 FNO。U-FNET 在更高的 Reynolds 数下预测湍流的能力更高。使用梯度和稳定性损失来保证模型的稳定和准确预测。
  • results: 这 paper 发现,使用不同的模型配置和梯度损失可以获得更好的预测结果。特别是,U-FNET 在更高的 Reynolds 数下预测湍流的能力更高。然而,为了更好地评估深度学习模型在液流预测中的性能,还需要开发更好的评价指标。
    Abstract This paper explores Neural Operators to predict turbulent flows, focusing on the Fourier Neural Operator (FNO) model. It aims to develop reduced-order/surrogate models for turbulent flow simulations using Machine Learning. Different model configurations are analyzed, with U-NET structures (UNO and U-FNET) performing better than the standard FNO in accuracy and stability. U-FNET excels in predicting turbulence at higher Reynolds numbers. Regularization terms, like gradient and stability losses, are essential for stable and accurate predictions. The study emphasizes the need for improved metrics for deep learning models in fluid flow prediction. Further research should focus on models handling complex flows and practical benchmarking metrics.
    摘要

An Empirical Study on Fairness Improvement with Multiple Protected Attributes

  • paper_url: http://arxiv.org/abs/2308.01923
  • repo_url: None
  • paper_authors: Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman
  • for: 本研究旨在探讨多个保护特征的公平改进策略的效果,以帮助更好地理解多特征公平改进策略的性能。
  • methods: 本研究使用了11种最新的公平改进方法,包括对多个保护特征进行公平改进。我们使用不同的数据集、度量和机器学习模型来分析这些方法的效果。
  • results: 研究发现,为一个保护特征进行公平改进可能会导致其他保护特征的公平性下降,这种下降的比例可达88.3%(平均为57.5%)。同时,我们发现在考虑多个保护特征时,精度和准确率的影响相对较小,但是 recall 的影响相对较大。这些结论有重要意义,因为现有的研究通常只报告精度作为机器学习性能指标,这并不充分。
    Abstract Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effectiveness of these methods with different datasets, metrics, and ML models when considering multiple protected attributes. The results reveal that improving fairness for a single protected attribute can largely decrease fairness regarding unconsidered protected attributes. This decrease is observed in up to 88.3% of scenarios (57.5% on average). More surprisingly, we find little difference in accuracy loss when considering single and multiple protected attributes, indicating that accuracy can be maintained in the multiple-attribute paradigm. However, the effect on precision and recall when handling multiple protected attributes is about 5 times and 8 times that of a single attribute. This has important implications for future fairness research: reporting only accuracy as the ML performance metric, which is currently common in the literature, is inadequate.
    摘要 现有研究主要是在单个保护属性上提高机器学习软件的公平性,但这是不切实际的,因为用户通常有多个保护属性。这篇论文进行了对多个保护属性的公平性提高方法的广泛研究,涵盖了11种现状最佳实践。我们对不同的数据集、 метри和机器学习模型进行了这些方法的分析,并评估了它们在考虑多个保护属性时的效果。结果表明,只考虑一个保护属性进行公平性提高可能会导致其他保护属性的不公平性减少,这种减少率在88.3%的场景中(57.5%的平均值)被观察到。更有趣的是,考虑单个和多个保护属性时,准确性损失的差异很小,这表示在多属性情况下,准确性可以维持。然而,处理多个保护属性时的精度和回归的影响相对较大,大约是单个属性的5倍和8倍。这有重要的意义,未来公平性研究应该不再仅仅是通过准确性来评估机器学习软件的性能。

Continuous Time Evidential Distributions for Irregular Time Series

  • paper_url: http://arxiv.org/abs/2307.13503
  • repo_url: https://github.com/twkillian/edict
  • paper_authors: Taylor W. Killian, Haoran Zhang, Thomas Hartvigsen, Ava P. Amini
  • for: 这篇论文是用于描述一种为健康照顾等实际应用场景中的不规则时间序列进行预测的方法。
  • methods: 这篇论文使用了一种名为EDICT的策略,它 learns一个证据分布来描述不规则时间序列。这个分布可以在不同的时间点进行不同的推论,并且可以在缺乏观察的情况下提供更好的uncertainty estimation。
  • results: 这篇论文的结果显示,EDICT可以在具有挑战性的时间序列分类任务中实现竞争性的表现,并且可以在缺乏观察的情况下提供更好的uncertainty-guided推论。
    Abstract Prevalent in many real-world settings such as healthcare, irregular time series are challenging to formulate predictions from. It is difficult to infer the value of a feature at any given time when observations are sporadic, as it could take on a range of values depending on when it was last observed. To characterize this uncertainty we present EDICT, a strategy that learns an evidential distribution over irregular time series in continuous time. This distribution enables well-calibrated and flexible inference of partially observed features at any time of interest, while expanding uncertainty temporally for sparse, irregular observations. We demonstrate that EDICT attains competitive performance on challenging time series classification tasks and enabling uncertainty-guided inference when encountering noisy data.
    摘要 广泛存在在现实世界中的应用场景,如医疗、财经等,不规则时间序列是预测的挑战。因为观察是间歇的,特征值的推测是具有uncertainty的,可能在不同的时间点 prendre on a range of values。为了捕捉这种uncertainty,我们提出了EDICT策略,它在连续时间中学习不规则时间序列的证据分布。这种分布允许在任何时间点进行高度抽象和灵活的特征值推测,同时在笼统观察中扩展uncertainty。我们示例了EDICT在具有噪声数据的时间序列分类任务中的竞争性表现和uncertainty导航能力。

Deep Reinforcement Learning for Robust Goal-Based Wealth Management

  • paper_url: http://arxiv.org/abs/2307.13501
  • repo_url: None
  • paper_authors: Tessa Bauman, Bruno Gašperov, Stjepan Begušić, Zvonko Kostanjčar
  • for: 这个研究旨在提出一种基于深度强化学习的 Robust Goal-Based Wealth Management 方法,以便实现特定金融目标。
  • methods: 本研究使用了深度强化学习技术来估算投资选择,并通过训练一个对应投资策略的神经网络来实现目标。
  • results: 实验结果显示,该方法比较多的目标评估和投资策略选择方法来得到更好的效果,并在实际市场数据上显示出优越性。
    Abstract Goal-based investing is an approach to wealth management that prioritizes achieving specific financial goals. It is naturally formulated as a sequential decision-making problem as it requires choosing the appropriate investment until a goal is achieved. Consequently, reinforcement learning, a machine learning technique appropriate for sequential decision-making, offers a promising path for optimizing these investment strategies. In this paper, a novel approach for robust goal-based wealth management based on deep reinforcement learning is proposed. The experimental results indicate its superiority over several goal-based wealth management benchmarks on both simulated and historical market data.
    摘要 目的基本投资是一种财务管理方法,强调达到特定的金融目标。这是一个顺序决策问题,因为需要选择适当的投资直到达到目标。因此,深度回归学习,一种适合顺序决策的机器学习技术,对于优化这些投资策略表现出了承诺。在这篇论文中,一种基于深度回归学习的新方法 дляrobust目的基本财务管理被提议。实验结果表明,这种方法在模拟和历史市场数据上都超过了多个目的基本财务管理参考标准。

Finding Money Launderers Using Heterogeneous Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.13499
  • repo_url: https://github.com/fredjo89/heterogeneous-mpnn
  • paper_authors: Fredrik Johannessen, Martin Jullum
  • for: 本研究旨在提高银行电子监测系统的检测犯罪吸收能力,使用机器学习方法对大规模不同类型数据图进行分析。
  • methods: 本研究使用图神经网络(GNN)方法,对实际世界银行交易和企业角色数据构建的大型不同类型数据图进行分析。特别是,我们对现有的同质GNN方法(Message Passing Neural Network,MPNN)进行扩展,使其在不同类型数据图上有效运行。我们还提出了一种新的消息汇聚方法,以便在不同边的消息之间进行效果地汇聚。
  • results: 我们的模型实现了在大规模不同类型数据图上对犯罪吸收进行有效检测,提高了银行电子监测系统的检测精度。这是首次将GNN应用于实际世界大规模不同类型数据图进行反洗钱检测,我们的研究成果具有广泛的应用前景。
    Abstract Current anti-money laundering (AML) systems, predominantly rule-based, exhibit notable shortcomings in efficiently and precisely detecting instances of money laundering. As a result, there has been a recent surge toward exploring alternative approaches, particularly those utilizing machine learning. Since criminals often collaborate in their money laundering endeavors, accounting for diverse types of customer relations and links becomes crucial. In line with this, the present paper introduces a graph neural network (GNN) approach to identify money laundering activities within a large heterogeneous network constructed from real-world bank transactions and business role data belonging to DNB, Norway's largest bank. Specifically, we extend the homogeneous GNN method known as the Message Passing Neural Network (MPNN) to operate effectively on a heterogeneous graph. As part of this procedure, we propose a novel method for aggregating messages across different edges of the graph. Our findings highlight the importance of using an appropriate GNN architecture when combining information in heterogeneous graphs. The performance results of our model demonstrate great potential in enhancing the quality of electronic surveillance systems employed by banks to detect instances of money laundering. To the best of our knowledge, this is the first published work applying GNN on a large real-world heterogeneous network for anti-money laundering purposes.
    摘要 现有的反贩卖财 (AML) 系统,主要基于规则,显示出明显的缺陷,无法有效地和精准地检测贩卖财活动。因此,有一些最新的研究尝试使用机器学习方法。由于别派犯罪分子通常会合作,在检测贩卖财活动时,考虑到不同类型的客户关系和链接变得非常重要。针对这一点,本文提出了一种基于图神经网络 (GNN) 的方法,用于在大规模不同类型图中检测贩卖财活动。具体来说,我们将已知的同类GNN方法——消息传递神经网络 (MPNN)——修改以适应不同类型图。在这个过程中,我们提出了一种新的消息汇聚方法,用于在不同的图边缘上进行消息汇聚。我们的发现表明,在组合不同类型图的情况下,使用适当的 GNN 建筑可以提高电子监测系统的检测贩卖财活动质量。我们知道,这是首次在实际世界上大规模不同类型图上应用 GNN 的反贩卖财研究。

Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction

  • paper_url: http://arxiv.org/abs/2307.13497
  • repo_url: None
  • paper_authors: Gabriele Picco, Marcos Martínez Galindo, Alberto Purpura, Leopold Fuchs, Vanessa López, Hoang Thanh Lam
  • For: The paper is written for researchers and industry professionals who are interested in zero-shot learning (ZSL) and its applications in natural language processing (NLP).* Methods: The paper proposes a novel ZSL framework called Zshot, which aims to address the challenges of ZSL by providing a platform for comparing different state-of-the-art ZSL methods with standard benchmark datasets. The framework also includes readily available APIs for production under the standard SpaCy NLP pipeline, and it is designed to be extendible and evaluable.* Results: The paper does not provide specific results, but it aims to provide a platform for comparing different ZSL methods and evaluating their performance on standard benchmark datasets. The authors also include numerous enhancements such as pipeline ensembling and visualization utilities available as a SpaCy extension.
    Abstract The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.
    摘要 zero-shot learning (ZSL) 任务是指在训练过程中未看过的文本中预测实体或关系。 ZSL 已成为一个重要的研究领域,因为特定领域的标注数据稀缺,而其应用也在过去几年内有所增长。随着大型预训语言模型的出现,许多新的方法被提出,导致 ZSL 性能得到了显著提高。在研究 сообществе和industry 中,有一个增长的需求,即开发一个通用的 ZSL 框架,以便开发和访问最新的方法和预训模型。在这项研究中,我们提出了一个新的 ZSL 框架,称为 Zshot。我们的主要目标是提供一个平台,允许研究人员比较不同的状态艺术 ZSL 方法,并使用标准的 benchmark 数据集进行比较。此外,我们设计了我们的框架,以支持产业,并提供了可靠的 SpaCy NLP 管道中的 API。我们的 API 可扩展和评估,并且包括了多种改进,例如将管道 ensemble 提高准确性,以及可用于 SpaCy 扩展的可视化工具。

Duet: efficient and scalable hybriD neUral rElation undersTanding

  • paper_url: http://arxiv.org/abs/2307.13494
  • repo_url: https://github.com/GIS-PuppetMaster/Duet
  • paper_authors: Kaixin Zhang, Hongzhi Wang, Yabin Lu, Ziqi Li, Chang Shu, Yu Yan, Donghua Yang
  • for: 本研究旨在解决learned cardinality estimation方法中的数据和工作负荷飘移问题,以及高纬度和高维度表中Cardinality estimator的应用问题。
  • methods: 本文提出了一种基于预测模型的hybrid方法,named Duet,它可以直接估计 cardinality без采样或任何非�ifferentiable过程,并且可以提高高纬度和高维度表中Cardinality estimator的准确性。
  • results: 实验结果表明,Duet可以实现所有设计目标,并且在CPU上比较多学方法更加实用,甚至在GPU上也具有较低的推理成本。
    Abstract Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches face the data and workload drift problem for a long time. Although both query-driven and hybrid methods are proposed to avoid this problem, even the state-of-the-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high-dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicates information into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduces the inference complexity from O(n) to O(1) compared to Naru and UAE but also achieve higher accuracy on high cardinality and high-dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.
    摘要 现有学习Cardinality estimation方法已经实现了高精度比 tradicional方法。 Among these learned methods, query-driven approaches have faced the data and workload drift problem for a long time. Although both query-driven and hybrid methods have been proposed to avoid this problem, even the state-of-the-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high-dimensional tables, which seriously affects the practical application of learned cardinality estimators.在这篇论文中,我们证明了大多数这些问题是由广泛使用进度 sampling 所导致的。 We solve this problem by introducing predicates information into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduce the inference complexity from O(n) to O(1) compared to Naru and UAE but also achieve higher accuracy on high cardinality and high-dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.

ECG classification using Deep CNN and Gramian Angular Field

  • paper_url: http://arxiv.org/abs/2308.02395
  • repo_url: None
  • paper_authors: Youssef Elmir, Yassine Himeur, Abbes Amira
  • for: 这个研究提供了一种新的ECG信号分析方法,用于心血管疾病诊断和异常检测。
  • methods: 该方法基于将时域1D вектор转换为2D图像使用 Gramian Angular Field transform,并使用卷积神经网络(CNN)进行分类。
  • results: 实验结果显示,提出的方法可以达到97.47%和98.65%的分类精度,并且可以辨别和可视化ECG信号中的时间特征,如心率、心音和信号形态变化,这些变化可能不可见于原始信号中。
    Abstract This paper study provides a novel contribution to the field of signal processing and DL for ECG signal analysis by introducing a new feature representation method for ECG signals. The proposed method is based on transforming time frequency 1D vectors into 2D images using Gramian Angular Field transform. Moving on, the classification of the transformed ECG signals is performed using Convolutional Neural Networks (CNN). The obtained results show a classification accuracy of 97.47% and 98.65% for anomaly detection. Accordingly, in addition to improving the classification performance compared to the state-of-the-art, the feature representation helps identify and visualize temporal patterns in the ECG signal, such as changes in heart rate, rhythm, and morphology, which may not be apparent in the original signal. This has significant implications in the diagnosis and treatment of cardiovascular diseases and detection of anomalies.
    摘要

Rational kernel-based interpolation for complex-valued frequency response functions

  • paper_url: http://arxiv.org/abs/2307.13484
  • repo_url: https://github.com/stk-kriging/complex-rational-interpolation
  • paper_authors: Julien Bect, Niklas Georg, Ulrich Römer, Sebastian Schöps
  • for: 这个论文关注了使用kernel方法估计复杂数值函数的问题,特别是在频域中的频谱响应函数。
  • methods: 这篇论文使用了kernel方法,但标准kernel并不perform well。作者引入了新的复杂数值函数抽象空间,并将问题转化为最小二乘问题在这些空间中。此外,作者还结合了一个低阶racional函数,其阶数由一个新的模型选择 criterion 来动态选择。
  • results: 作者的方法在不同领域的实例中进行了数值实验,包括电磁学和声学实例。比较 Result 与可用的racionalapproximation方法,这种方法的性能很高。
    Abstract This work is concerned with the kernel-based approximation of a complex-valued function from data, where the frequency response function of a partial differential equation in the frequency domain is of particular interest. In this setting, kernel methods are employed more and more frequently, however, standard kernels do not perform well. Moreover, the role and mathematical implications of the underlying pair of kernels, which arises naturally in the complex-valued case, remain to be addressed. We introduce new reproducing kernel Hilbert spaces of complex-valued functions, and formulate the problem of complex-valued interpolation with a kernel pair as minimum norm interpolation in these spaces. Moreover, we combine the interpolant with a low-order rational function, where the order is adaptively selected based on a new model selection criterion. Numerical results on examples from different fields, including electromagnetics and acoustic examples, illustrate the performance of the method, also in comparison to available rational approximation methods.
    摘要

Combinatorial Auctions and Graph Neural Networks for Local Energy Flexibility Markets

  • paper_url: http://arxiv.org/abs/2307.13470
  • repo_url: None
  • paper_authors: Awadelrahman M. A. Ahmed, Frank Eliassen, Yan Zhang
  • for: 本研究提出了一个新的 combinatorial 拍卖框架,用于地方能源灵活性市场,以解决潜在参与者无法组合多个灵活时间间隔的问题。
  • methods: 本研究使用了简单 yet powerful 三元图表示法和图生物学网络模型来解决背景NP-完备的胜出决定问题。
  • results: 模型实现了与商业解决方案相对的优化值差不 біль于5%,并且显示了线性推论时间复杂性,与商业解决方案的指数复杂性相比。
    Abstract This paper proposes a new combinatorial auction framework for local energy flexibility markets, which addresses the issue of prosumers' inability to bundle multiple flexibility time intervals. To solve the underlying NP-complete winner determination problems, we present a simple yet powerful heterogeneous tri-partite graph representation and design graph neural network-based models. Our models achieve an average optimal value deviation of less than 5\% from an off-the-shelf optimization tool and show linear inference time complexity compared to the exponential complexity of the commercial solver. Contributions and results demonstrate the potential of using machine learning to efficiently allocate energy flexibility resources in local markets and solving optimization problems in general.
    摘要 translate to Simplified Chinese as follows:这篇论文提出了一种新的 combinatorial 拍卖框架,用于本地能源灵活性市场,解决了潜在用户无法组合多个灵活时间间隔的问题。为解决这个下面NP完备的赢家决定问题,我们提出了一种简单强大的三元 Graph 表示和图 neural network 模型。我们的模型在与 commercial 优化工具的比较中,实现了 Less than 5% 的最佳值偏差,并且显示了与商业解决方案的线性推理时间复杂度。研讨和结果表明,使用机器学习可以有效地分配本地能源灵活性资源,并在总体上解决优化问题。

Gaussian Graph with Prototypical Contrastive Learning in E-Commerce Bundle Recommendation

  • paper_url: http://arxiv.org/abs/2307.13468
  • repo_url: None
  • paper_authors: Zhao-Yang Liu, Liucheng Sun, Chenwei Weng, Qijin Chen, Chengfu Huo
  • for: 提高电商平台上的bundle推荐的精度和效果,解决实际推荐场景中的uncertainty问题。
  • methods: 提出了一种新的 Gaussian Graph with Prototypical Contrastive Learning (GPCL)框架,使用Gaussian分布代替固定向量,并设计了一种prototypical contrastive learning模块来捕捉 Contextual信息,缓解采样偏见问题。
  • results: 经验表明,GPCL在多个公共数据集上达到了新的州OF-the-art性能水平,并在真实的电商平台上实现了显著的提升。
    Abstract Bundle recommendation aims to provide a bundle of items to satisfy the user preference on e-commerce platform. Existing successful solutions are based on the contrastive graph learning paradigm where graph neural networks (GNNs) are employed to learn representations from user-level and bundle-level graph views with a contrastive learning module to enhance the cooperative association between different views. Nevertheless, they ignore the uncertainty issue which has a significant impact in real bundle recommendation scenarios due to the lack of discriminative information caused by highly sparsity or diversity. We further suggest that their instancewise contrastive learning fails to distinguish the semantically similar negatives (i.e., sampling bias issue), resulting in performance degradation. In this paper, we propose a novel Gaussian Graph with Prototypical Contrastive Learning (GPCL) framework to overcome these challenges. In particular, GPCL embeds each user/bundle/item as a Gaussian distribution rather than a fixed vector. We further design a prototypical contrastive learning module to capture the contextual information and mitigate the sampling bias issue. Extensive experiments demonstrate that benefiting from the proposed components, we achieve new state-of-the-art performance compared to previous methods on several public datasets. Moreover, GPCL has been deployed on real-world e-commerce platform and achieved substantial improvements.
    摘要 电商平台上的Bundle推荐 aimsto提供一个Bundle的item来满足用户的首选。现有的成功解决方案基于对冲raph学习 paradigm,使用граф neural networks (GNNs)来学习用户和Bundle的表示,并通过对冲学习模块来增强不同视图之间的合作关系。然而,它们忽略了uncertainty问题,这有着很大的影响在实际的Bundle推荐场景中,因为缺乏特征信息引起的高稀疏或多样性。我们还指出,它们的Instancewise对冲学习无法分辨semantic Similarity的负样本(即采样偏见问题),导致性能下降。在这篇论文中,我们提出了一种novel Gaussian Graph with Prototypical Contrastive Learning (GPCL)框架,以解决这些挑战。具体来说,GPCL将每个用户/Bundle/itemembed为 Gaussian Distribution而不是固定的 вектор。我们还设计了一个prototypical对冲学习模块,以捕捉上下文信息并缓解采样偏见问题。我们的实验证明,由于我们提出的组件,我们在多个公共数据集上达到了新的状态态performanced比之前的方法。此外,GPCL已经部署在实际的电商平台上,并实现了显著的改善。Note: "izin" is a marker in Simplified Chinese to indicate that the following text is a translation of a foreign language text.

Integrating processed-based models and machine learning for crop yield prediction

  • paper_url: http://arxiv.org/abs/2307.13466
  • repo_url: None
  • paper_authors: Michiel G. J. Kallenberg, Bernardo Maestrini, Ron van Bree, Paul Ravensbergen, Christos Pylianidis, Frits van Evert, Ioannis N. Athanasiadis
  • for: 预测哈比甜菜的产量
  • methods: 使用гибрид元模型方法,结合理论驱动的植物生长模型和数据驱动的神经网络
  • results: 在silico和实际场景中,元模型方法比基eline方法更好地预测哈比甜菜的产量,但需要进一步的 validate和优化以确定实际效果。
    Abstract Crop yield prediction typically involves the utilization of either theory-driven process-based crop growth models, which have proven to be difficult to calibrate for local conditions, or data-driven machine learning methods, which are known to require large datasets. In this work we investigate potato yield prediction using a hybrid meta-modeling approach. A crop growth model is employed to generate synthetic data for (pre)training a convolutional neural net, which is then fine-tuned with observational data. When applied in silico, our meta-modeling approach yields better predictions than a baseline comprising a purely data-driven approach. When tested on real-world data from field trials (n=303) and commercial fields (n=77), the meta-modeling approach yields competitive results with respect to the crop growth model. In the latter set, however, both models perform worse than a simple linear regression with a hand-picked feature set and dedicated preprocessing designed by domain experts. Our findings indicate the potential of meta-modeling for accurate crop yield prediction; however, further advancements and validation using extensive real-world datasets is recommended to solidify its practical effectiveness.
    摘要 通常,耐作预测通过使用理论驱动的过程基于植物生长模型或数据驱动的机器学习方法进行实现。这两种方法都有其缺点,其中一种是难以适应当地区条件,另一种是需要大量数据。在这种情况下,我们研究了使用半结构化模型的hybrid meta-modeling方法来预测芋头收获。我们使用植物生长模型生成了合成数据,然后使用卷积神经网络进行预测,并且通过观察数据进行精度调整。在silico中应用的meta-modeling方法比基准情况下的纯数据驱动方法更好。在实际场景中,我们对303个试验场和77个商业场的数据进行测试,并发现meta-modeling方法和植物生长模型在这些场景中具有竞争力。然而,在这些场景中,一个简单的直线回归模型和专业人员设计的特定预处理和特征集合得到了更好的表现。我们的发现表明meta-modeling方法在准确预测耐作收获方面存在潜力,但是进一步的发展和验证使用广泛的实际数据是必要的,以固化其实际效果。

Fundamental causal bounds of quantum random access memories

  • paper_url: http://arxiv.org/abs/2307.13460
  • repo_url: None
  • paper_authors: Yunfei Wang, Yuri Alexeev, Liang Jiang, Frederic T. Chong, Junyu Liu
  • for: This paper explores the fundamental limits of rapid quantum memories in quantum computing applications, particularly in the context of hybrid quantum acoustic systems.
  • methods: The paper employs relativistic quantum field theory and Lieb-Robinson bounds to critically examine the causality constraints of quantum memories and their impact on quantum computing performance.
  • results: The paper shows that the number of logical qubits that can be accommodated in a QRAM design can be scaled up to $\mathcal{O}(10^7)$ in 1 dimension, $\mathcal{O}(10^{15})$ to $\mathcal{O}(10^{20})$ in various 2D architectures, and $\mathcal{O}(10^{24})$ in 3 dimensions, subject to the causality bound. These findings have important implications for the long-term performance of quantum computing applications in data science.
    Abstract Quantum devices should operate in adherence to quantum physics principles. Quantum random access memory (QRAM), a fundamental component of many essential quantum algorithms for tasks such as linear algebra, data search, and machine learning, is often proposed to offer $\mathcal{O}(\log N)$ circuit depth for $\mathcal{O}(N)$ data size, given $N$ qubits. However, this claim appears to breach the principle of relativity when dealing with a large number of qubits in quantum materials interacting locally. In our study we critically explore the intrinsic bounds of rapid quantum memories based on causality, employing the relativistic quantum field theory and Lieb-Robinson bounds in quantum many-body systems. In this paper, we consider a hardware-efficient QRAM design in hybrid quantum acoustic systems. Assuming clock cycle times of approximately $10^{-3}$ seconds and a lattice spacing of about 1 micrometer, we show that QRAM can accommodate up to $\mathcal{O}(10^7)$ logical qubits in 1 dimension, $\mathcal{O}(10^{15})$ to $\mathcal{O}(10^{20})$ in various 2D architectures, and $\mathcal{O}(10^{24})$ in 3 dimensions. We contend that this causality bound broadly applies to other quantum hardware systems. Our findings highlight the impact of fundamental quantum physics constraints on the long-term performance of quantum computing applications in data science and suggest potential quantum memory designs for performance enhancement.
    摘要 量子设备应遵循量子物理原理运行。量子随机访问存储器(QRAM),许多关键量子算法中的基本组件,通常被提议可以提供 $\mathcal{O}(\log N)$ 圈深度,对于 $\mathcal{O}(N)$ 数据大小, givent $N$ qubits。然而,这个宣称似乎违反了 relativity 原理,当处理大量的 qubits 在量子材料中互动时。在我们的研究中,我们 kritisch 探讨了快速量子存储器的内在 bound,基于 causality,使用量子场论和 Lieb-Robinson bound 在量子多体系统中。在这篇文章中,我们考虑了硬件高效的 QRAM 设计,在半导体量子声学系统中。假设clock cycle 时间约为 $10^{-3}$ 秒,格子间距约为 1 微米,我们显示了 QRAM 可以容纳 $\mathcal{O}(10^7)$ 逻辑 qubits 在1维度,$\mathcal{O}(10^{15})$ 到 $\mathcal{O}(10^{20})$ 在不同的 2D 架构中,以及 $\mathcal{O}(10^{24})$ 在 3 维度。我们认为这种 causality bound 广泛适用于其他量子硬件系统。我们的发现指出了量子计算应用中长期表现的基本量子物理约束,并提出了可能的量子存储器设计来提高性能。

A behavioural transformer for effective collaboration between a robot and a non-stationary human

  • paper_url: http://arxiv.org/abs/2307.13447
  • repo_url: None
  • paper_authors: Ruaridh Mon-Williams, Theodoros Stouraitis, Sethu Vijayakumar
  • for: 本研究旨在解决人机合作中人类行为变化带来的非站立性问题,提高机器人的预测能力以适应新的人类代理人。
  • methods: 本研究提出了一种原则式的meta学框架,并基于这个框架开发了Behavior-Transform(BeTrans)。BeTrans是一种可适应新人类代理人的 conditional transformer,可以快速适应新的人类行为变化。
  • results: 通过在 simulated human agents 上进行训练,我们发现BeTrans在合作设置下与不同系统偏见的人类代理人协作得非常好,并且比SOTA技术更快地适应新的人类行为变化。
    Abstract A key challenge in human-robot collaboration is the non-stationarity created by humans due to changes in their behaviour. This alters environmental transitions and hinders human-robot collaboration. We propose a principled meta-learning framework to explore how robots could better predict human behaviour, and thereby deal with issues of non-stationarity. On the basis of this framework, we developed Behaviour-Transform (BeTrans). BeTrans is a conditional transformer that enables a robot agent to adapt quickly to new human agents with non-stationary behaviours, due to its notable performance with sequential data. We trained BeTrans on simulated human agents with different systematic biases in collaborative settings. We used an original customisable environment to show that BeTrans effectively collaborates with simulated human agents and adapts faster to non-stationary simulated human agents than SOTA techniques.
    摘要 人机合作中的一大挑战是由人类行为引起的非站点性,这会导致环境变化和人机合作困难。我们提出了一种原则正的meta学框架,以便机器人更好地预测人类行为,从而更好地处理非站点性问题。基于这个框架,我们开发了Behaviour-Transform(BeTrans)。BeTrans是一种 Conditional Transformer,它允许机器人代理人类快速适应新的人类行为,并且在序列数据上表现出色。我们在 simulate human agents with different systematic biases in collaborative settings 中训练了 BeTrans,并用自定义环境示出了它在与 simulate human agents 合作中的有效性,并且更快地适应非站点性 simulate human agents than SOTA技术。

Network Traffic Classification based on Single Flow Time Series Analysis

  • paper_url: http://arxiv.org/abs/2307.13434
  • repo_url: https://github.com/koumajos/classificationbasedonsfts
  • paper_authors: Josef Koumar, Karel Hynek, Tomáš Čejka
  • for: 用于分析加密网络通信的现场挑战
  • methods: 基于时间序列分析单流时间序列(包括每个 packet 的字节数和时间戳),提出69种统一特征
  • results: 在15种公共可用的数据集上进行了多种网络流量分类任务的评估,表明提议的特征向量可以达到与相关工作相当或更好的分类性能,在超过一半的评估任务中,分类性能提高了5%以上。
    Abstract Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number of bytes in each packet and its timestamp. We propose 69 universal features based on the statistical analysis of data points, time domain analysis, packet distribution within the flow timespan, time series behavior, and frequency domain analysis. We have demonstrated the usability and universality of the proposed feature vector for various network traffic classification tasks using 15 well-known publicly available datasets. Our evaluation shows that the novel feature vector achieves classification performance similar or better than related works on both binary and multiclass classification tasks. In more than half of the evaluated tasks, the classification performance increased by up to 5\%.
    摘要 网络流量监测使用流量记录来处理现在的挑战,即分析加密网络通信。然而,流量聚合到记录中自然导致信息损失,因此这篇论文提议一种新的流量扩展 для交通特征基于单流时间序列分析,即每个包的字节数和时间戳创建的时间序列。我们提出69个统一特征,包括数据点统计分析、时间域分析、流时间范围内包分布、时间序列行为和频域分析。我们通过使用15个公共可用的数据集来证明提议的特征向量的可用性和通用性,并在 binary 和多类分类任务中达到类似或更好的性能。在评估中,在超过半个评估任务中,分类性能提高5%以上。

Achieving Linear Speedup in Decentralized Stochastic Compositional Minimax Optimization

  • paper_url: http://arxiv.org/abs/2307.13430
  • repo_url: None
  • paper_authors: Hongchang Gao
  • for: 本研究的目的是提出一种基于分布式数据的各自练习作业的分布式compositional minimax问题的解决方案,以便在分布式设置下优化这类问题。
  • methods: 我们提出了一种基于均匀采样和动量的分布式Stochastic Compositional Gradient Descent Ascent算法,用于降低内层函数的共识错误。
  • results: 我们的 teoría results 表明,该算法可以实现线性增速,即与工作者数量 linearly 相关。 Additionally, we applied our method to the imbalanced classification problem and obtained extensive experimental results, which demonstrate the effectiveness of our algorithm.
    Abstract The stochastic compositional minimax problem has attracted a surge of attention in recent years since it covers many emerging machine learning models. Meanwhile, due to the emergence of distributed data, optimizing this kind of problem under the decentralized setting becomes badly needed. However, the compositional structure in the loss function brings unique challenges to designing efficient decentralized optimization algorithms. In particular, our study shows that the standard gossip communication strategy cannot achieve linear speedup for decentralized compositional minimax problems due to the large consensus error about the inner-level function. To address this issue, we developed a novel decentralized stochastic compositional gradient descent ascent with momentum algorithm to reduce the consensus error in the inner-level function. As such, our theoretical results demonstrate that it is able to achieve linear speedup with respect to the number of workers. We believe this novel algorithmic design could benefit the development of decentralized compositional optimization. Finally, we applied our methods to the imbalanced classification problem. The extensive experimental results provide evidence for the effectiveness of our algorithm.
    摘要 “ Stochastic compositional minimax problem 在 recent years 已经吸引了许多关注,因为它涵盖了许多emerging machine learning models。然而,由于分布式数据的出现,对这种问题的分布式优化成为了非常需要。然而,compositional structure 在损失函数中带来了独特的挑战,对于设计高效的分布式优化算法。具体来说,我们的研究显示,标准的gossip Communication Strategy 不能实现linear speedup для分布式compositional minimax problem,因为内部函数的大调和错误。为了解决这个问题,我们开发了一个新的分布式随机compositional gradient descent ascent with momentum algorithm,以减少内部函数的调和错误。因此,我们的理论结果显示,它能够实现linear speedup 与 respect to the number of workers。我们认为这个新的算法设计可以帮助分布式compositional optimization的发展。 finally,我们将我们的方法应用到不对称类别问题。广泛的实验结果为我们的算法的有效性提供了证据。”

A signal processing interpretation of noise-reduction convolutional neural networks

  • paper_url: http://arxiv.org/abs/2307.13425
  • repo_url: None
  • paper_authors: Luis A. Zavala-Mondragón, Peter H. N. de With, Fons van der Sommen
  • for: 本文旨在为数据驱动降噪和深度学习算法中的Encoding-decoding CNNs提供理论基础,以便更好地理解这些架构的内部工作机制。
  • methods: 本文使用了深度卷积架构,并提出了一种基于深度学习和信号处理的理论框架,用于解释Encoding-decoding CNNs的内部工作机制。
  • results: 本文通过 connecting basic principles from signal processing to the field of deep learning, 提供了一种可以用于设计robust和高效的新型Encoding-decoding CNNs架构的有效指导。
    Abstract Encoding-decoding CNNs play a central role in data-driven noise reduction and can be found within numerous deep-learning algorithms. However, the development of these CNN architectures is often done in ad-hoc fashion and theoretical underpinnings for important design choices is generally lacking. Up to this moment there are different existing relevant works that strive to explain the internal operation of these CNNs. Still, these ideas are either scattered and/or may require significant expertise to be accessible for a bigger audience. In order to open up this exciting field, this article builds intuition on the theory of deep convolutional framelets and explains diverse ED CNN architectures in a unified theoretical framework. By connecting basic principles from signal processing to the field of deep learning, this self-contained material offers significant guidance for designing robust and efficient novel CNN architectures.
    摘要 Encoding-decoding CNNs 在数据驱动的噪声缓解中扮演中心角色,可以在多种深度学习算法中找到。然而,这些 CNN 架构的开发通常是靠悄悄话的,lacking 理论基础。到目前为止,有很多相关的工作努力来解释这些 CNN 的内部运作。然而,这些想法是分散的,或者需要一定的专业知识才能访问。为了开放这个激动人心的领域,这篇文章建立了深度卷积框架的理论基础,并将多种 ED CNN 架构集成到一个统一的理论框架中。通过将信号处理的基本原理与深度学习相连接,这篇自包含的材料提供了设计robust和高效的新的 CNN 架构的重要指南。

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

  • paper_url: http://arxiv.org/abs/2307.13423
  • repo_url: None
  • paper_authors: George Close, Thomas Hain, Stefan Goetze
  • for: 这个论文旨在扩展自我监督抽象语音表示法(SSSR),以便非侵入式方式预测听力障碍用户的语音质量评分。
  • methods: 本文使用了SSSR作为输入特征,通过非侵入式预测模型来预测听力障碍用户的语音理解度。
  • results: 研究发现,SSSR可以作为输入特征,实现与更复杂的系统相当的竞争性性能。分析表明,更多的数据可能需要用于将预测模型 generalized to unknown systems and individuals。
    Abstract Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e.g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing. However, exact knowledge of why and how quality-related information is encoded well in such representations remains poorly understood. In this work, techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users. It is found that self-supervised representations are useful as input features to non-intrusive prediction models, achieving competitive performance to more complex systems. A detailed analysis of the performance depending on Clarity Prediction Challenge 1 listeners and enhancement systems indicates that more data might be needed to allow generalisation to unknown systems and (hearing-impaired) individuals
    摘要 自我监督的语音表示 (SSSR) 已成功应用于多个语音处理任务中,例如作为语音质量预测的特征提取器,这同时对于评估和培训语音增强系统的用户进行评估和培训具有重要性。然而,关于为什么和如何在这些表示中编码质量相关信息的准确知识仍然不够了解。在这种情况下,非侵入式预测模型的扩展被应用于预测听力障碍用户的语音明白度。结果表明,自我监督表示可以作为输入特征进行非侵入式预测模型,实现与更复杂的系统相当的性能。一个细化的性能分析,具体分析了根据Clearity Prediction Challenge 1的听众和增强系统,表明更多的数据可能需要以Allow generalization to unknown systems and (hearing-impaired) individuals。

On the Learning Dynamics of Attention Networks

  • paper_url: http://arxiv.org/abs/2307.13421
  • repo_url: https://github.com/vashisht-rahul/on-the-learning-dynamics-of-attention-networks
  • paper_authors: Rahul Vashisht, Harish G. Ramaswamy
  • for: 本文探讨了三种常见的注意力模型优化方法,即软注意力、硬注意力和隐变量 marginal likelihood(LVML)注意力。这三种方法都是为了找到一个 ocus' 模型,可以选择输入中的正确段落,以及一个 classification’ 模型,可以处理选择的段落并生成目标标签。但是它们在选取段落的方式不同,导致了不同的动态和最终结果。
  • methods: 本文使用了不同的注意力优化方法,包括软注意力损失、硬注意力损失和隐变量 marginal likelihood(LVML)注意力损失。这些方法的选择对于模型的性能有很大的影响。
  • results: 本文通过对一系列半 sintetic和实际世界数据集进行实验,发现了不同的注意力优化方法在模型性能上的不同表现。软注意力损失在初始化时能够快速改进 focus 模型,但是后续会降低。与此相反,硬注意力损失在初始化时会降低 focus 模型,但是后续会快速改进。基于这些观察,文章提出了一种简单的混合方法,可以结合不同的注意力优化方法的优点,并在实验中得到了良好的性能。
    Abstract Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets
    摘要 听力模型通常通过优化三种标准损失函数来学习:软注意力、硬注意力和隐变量极值 probabilistic(LVML)注意力。这三种方法均由同一目标而导向:找到一个`焦点'模型,可以选择输入中正确的`段',以及一个`分类'模型,可以处理选择的段来生成目标标签。然而,它们在选择段的方式不同,从而导致了不同的动态和最终结果。我们观察到这些模型学习使用这些方法的独特签名,并解释这为梯度下降在 fixes 焦点模型下的演化。我们还分析了这些方法在简单的设置下,并 deriv 出closed-form 表达式用于参数轨迹的梯度流。与软注意力损失函数相比,硬注意力损失函数在初始化时快速改进,然后后来停滞不前进。相反,硬注意力损失函数在另一方面表现出opposite的特点。基于我们的观察,我们提出了一种简单的混合方法,将不同损失函数的优点结合起来,并在一些半Synthetic 和实际数据集上进行了证明。

Co-Design of Out-of-Distribution Detectors for Autonomous Emergency Braking Systems

  • paper_url: http://arxiv.org/abs/2307.13419
  • repo_url: None
  • paper_authors: Michael Yuhas, Arvind Easwaran
  • For: The paper aims to improve the safety of autonomous vehicles (AVs) by co-designing an out-of-distribution (OOD) detector and a learning-enabled component (LEC) to detect and mitigate potential failures in the LEC.* Methods: The paper uses a risk model to analyze the impact of design parameters on both the OOD detector and the LEC, and co-designs the two components to minimize the risk of failure.* Results: The paper demonstrates a 42.3% risk reduction in the system while maintaining equivalent resource utilization, indicating the effectiveness of the co-design methodology in improving the safety of AVs.
    Abstract Learning enabled components (LECs), while critical for decision making in autonomous vehicles (AVs), are likely to make incorrect decisions when presented with samples outside of their training distributions. Out-of-distribution (OOD) detectors have been proposed to detect such samples, thereby acting as a safety monitor, however, both OOD detectors and LECs require heavy utilization of embedded hardware typically found in AVs. For both components, there is a tradeoff between non-functional and functional performance, and both impact a vehicle's safety. For instance, giving an OOD detector a longer response time can increase its accuracy at the expense of the LEC. We consider an LEC with binary output like an autonomous emergency braking system (AEBS) and use risk, the combination of severity and occurrence of a failure, to model the effect of both components' design parameters on each other's functional and non-functional performance, as well as their impact on system safety. We formulate a co-design methodology that uses this risk model to find the design parameters for an OOD detector and LEC that decrease risk below that of the baseline system and demonstrate it on a vision based AEBS. Using our methodology, we achieve a 42.3% risk reduction while maintaining equivalent resource utilization.
    摘要 We use risk, which is the combination of the severity and occurrence of a failure, to model the effect of both components' design parameters on each other's functional and non-functional performance, as well as their impact on system safety. We develop a co-design methodology that uses this risk model to find the design parameters for an OOD detector and LEC that minimize risk. We demonstrate the effectiveness of our methodology on a vision-based autonomous emergency braking system (AEBS).By using our co-design methodology, we achieve a 42.3% risk reduction while maintaining equivalent resource utilization. This demonstrates the potential of our approach to improve the safety of AVs by optimizing the design parameters of both OOD detectors and LECs.

Communication-Efficient Orchestrations for URLLC Service via Hierarchical Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13415
  • repo_url: None
  • paper_authors: Wei Shi, Milad Ganjalizadeh, Hossein Shokri Ghadikolaei, Marina Petrova
  • for: 此研究旨在提高5G中的可靠低延迟通信服务(URLLC)的可靠性和响应速度。
  • methods: 本研究使用多代理 Hierarchical Reinforcement Learning(HRL)框架,实现多级政策的实现,并且通过不同控制循环时间的调整,提高控制循环的响应速度和灵活性。
  • results: 在一个先前的实验中,使用HRL框架优化工业设备的最大重传数和传输功率,并获得了较好的性能,比基eline单代理RL方法更好,同时具有较少的信号传输 overhead和延迟。
    Abstract Ultra-reliable low latency communications (URLLC) service is envisioned to enable use cases with strict reliability and latency requirements in 5G. One approach for enabling URLLC services is to leverage Reinforcement Learning (RL) to efficiently allocate wireless resources. However, with conventional RL methods, the decision variables (though being deployed at various network layers) are typically optimized in the same control loop, leading to significant practical limitations on the control loop's delay as well as excessive signaling and energy consumption. In this paper, we propose a multi-agent Hierarchical RL (HRL) framework that enables the implementation of multi-level policies with different control loop timescales. Agents with faster control loops are deployed closer to the base station, while the ones with slower control loops are at the edge or closer to the core network providing high-level guidelines for low-level actions. On a use case from the prior art, with our HRL framework, we optimized the maximum number of retransmissions and transmission power of industrial devices. Our extensive simulation results on the factory automation scenario show that the HRL framework achieves better performance as the baseline single-agent RL method, with significantly less overhead of signal transmissions and delay compared to the one-agent RL methods.
    摘要 超可靠低延迟通信服务(URLLC)在5G中被视为实现严格可靠性和延迟要求的应用场景。一种实现URLLC服务的方法是通过强化学习(RL)有效地分配无线资源。然而,传统RL方法中的决策变量(即在不同网络层部署)通常在同一控制循环中优化,这会导致控制循环延迟的限制以及过分的信号传输和能耗。在这篇论文中,我们提出了一种多代理层RL(HRL)框架,该框架允许实现多级策略,并且在不同层次上有不同的控制循环时间尺度。靠近基站的代理在更快的控制循环中部署,而Edge或更接近核心网络的代理则提供高级指导 для低级动作。在一个优化 industrial device 的Factory automation scenario中,我们使用HRL框架进行优化,并取得了较好的性能,与基准单代理RL方法相比,减少了信号传输的 overhead和延迟。Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and widely used in informal writing. If you need Traditional Chinese, please let me know.

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

  • paper_url: http://arxiv.org/abs/2307.13412
  • repo_url: None
  • paper_authors: Stylianos I. Venieris, Javier Fernandez-Marques, Nicholas D. Lane
  • for: 本研究旨在提高FPGA基于Convolutional Neural Networks(CNN)加速器的性能和能效性。
  • methods: 本文提出了一种新的CNN推理系统,称为unzipFPGA,它使用了一种新的硬件架构,包括一个 weights生成模块,可以在运行时生成权重,以解决受限制的带宽的问题。此外,文章还提出了一种自动硬件快照方法,可以根据目标CNN设备对硬件进行优化,从而提高精度和性能的平衡。
  • results: 根据结果表明,unzipFPGA可以实现2.57倍的性能效率提升相比高优化的GPU设计,并且可以达到3.94倍的性能密度,超过了一系列state-of-the-art FPGA基于CNN加速器。
    Abstract The unprecedented accuracy of convolutional neural networks (CNNs) across a broad range of AI tasks has led to their widespread deployment in mobile and embedded settings. In a pursuit for high-performance and energy-efficient inference, significant research effort has been invested in the design of FPGA-based CNN accelerators. In this context, single computation engines constitute a popular approach to support diverse CNN modes without the overhead of fabric reconfiguration. Nevertheless, this flexibility often comes with significantly degraded performance on memory-bound layers and resource underutilisation due to the suboptimal mapping of certain layers on the engine's fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. This paper presents unzipFPGA, a novel CNN inference system that counteracts the limitations of existing CNN engines. The proposed framework comprises a novel CNN hardware architecture that introduces a weights generator module that enables the on-chip on-the-fly generation of weights, alleviating the negative impact of limited bandwidth on memory-bound layers. We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair, leading to an improved accuracy-performance balance. Finally, we introduce an input selective processing element (PE) design that balances the load between PEs in suboptimally mapped layers. The proposed framework yields hardware designs that achieve an average of 2.57x performance efficiency gain over highly optimised GPU designs for the same power constraints and up to 3.94x higher performance density over a diverse range of state-of-the-art FPGA-based CNN accelerators.
    摘要 “ convolutional neural networks (CNNs) 在许多人工智能任务中表现无 precedent 的准确率,导致它们在移动和嵌入设备上广泛应用。为了实现高性能且能效的推理,大量的研究精力被投入到基于 FPGA 的 CNN 加速器的设计中。在这个上下文中,单 computation 引擎是一种广泛使用的方法,以支持多种 CNN 模式,而无需 fabric 重新配置的开销。然而,这种灵活性通常会导致在占用内存层的执行中表现下降和资源利用率下降,因为在固定配置的引擎上对某些层的映射是不优化的。在这种情况下,我们 investigate 了在 CNN 引擎设计方面的影响,特别是那些引入预处理阶段来压缩 веса的模型。我们称这些方法为“在 fly”。本文提出了 unzipFPGA,一种新的 CNN 推理系统,该系统通过引入 weights 生成器模块来解决限制现有 CNN 引擎的缺点。我们还提供了一种自适应硬件方法,该方法根据目标 CNN-设备对可以自动调整 weights 生成机制,从而提高精度-性能平衡。 finally,我们引入了一种输入选择处理元件(PE)的设计,以平衡在不优化的层中的负载。提出的框架可以在同等功耗和能源约束下实现2.57倍的性能效率提升,相比高优化的 GPU 设计,以及3.94倍的性能密度提升,超过了多种现有的 FPGA 基于 CNN 加速器。”

The Double-Edged Sword of Big Data and Information Technology for the Disadvantaged: A Cautionary Tale from Open Banking

  • paper_url: http://arxiv.org/abs/2307.13408
  • repo_url: None
  • paper_authors: Savina Dine Kim, Galina Andreeva, Michael Rovatsos
  • for: 本研究探讨了开放银行技术的隐含不平等风险,通过使用 machine learning(ML)技术和 UK FinTech 银行数据集来示例。
  • methods: 本研究使用了三种 ML 分类器来预测 финан参与者的可能性,并通过集成特征分析groups exhibiting不同的大小和形式的 Financial Vulnerability(FV)。
  • results: 研究发现,工程化的金融行为特征可以预测排除个人信息的 omitted 个人特征,特别是敏感或保护特征,这解释了开放银行数据的隐藏危险。
    Abstract This research article analyses and demonstrates the hidden implications for fairness of seemingly neutral data coupled with powerful technology, such as machine learning (ML), using Open Banking as an example. Open Banking has ignited a revolution in financial services, opening new opportunities for customer acquisition, management, retention, and risk assessment. However, the granularity of transaction data holds potential for harm where unnoticed proxies for sensitive and prohibited characteristics may lead to indirect discrimination. Against this backdrop, we investigate the dimensions of financial vulnerability (FV), a global concern resulting from COVID-19 and rising inflation. Specifically, we look to understand the behavioral elements leading up to FV and its impact on at-risk, disadvantaged groups through the lens of fair interpretation. Using a unique dataset from a UK FinTech lender, we demonstrate the power of fine-grained transaction data while simultaneously cautioning its safe usage. Three ML classifiers are compared in predicting the likelihood of FV, and groups exhibiting different magnitudes and forms of FV are identified via clustering to highlight the effects of feature combination. Our results indicate that engineered features of financial behavior can be predictive of omitted personal information, particularly sensitive or protected characteristics, shedding light on the hidden dangers of Open Banking data. We discuss the implications and conclude fairness via unawareness is ineffective in this new technological environment.
    摘要 Using a unique dataset from a UK FinTech lender, the article demonstrates the power of fine-grained transaction data while cautioning its safe usage. Three machine learning (ML) classifiers are compared in predicting the likelihood of FV, and groups exhibiting different magnitudes and forms of FV are identified through clustering. The results show that engineered features of financial behavior can be predictive of omitted personal information, particularly sensitive or protected characteristics, highlighting the hidden dangers of Open Banking data.The article concludes that fairness via unawareness is ineffective in this new technological environment and discusses the implications for ensuring fairness in the use of Open Banking data. The findings have important implications for the financial industry, policymakers, and consumers, highlighting the need for careful consideration of the potential risks and benefits of Open Banking data and the importance of ensuring fairness in its use.

Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space

  • paper_url: http://arxiv.org/abs/2307.13390
  • repo_url: None
  • paper_authors: Xuan Zhao, Klaus Broelemann, Gjergji Kasneci
  • for: 本研究旨在提供一种新的方法来生成Counterfactual Explanations(CE),以帮助用户更好地理解AI系统的决策过程和改进其结果。
  • methods: 本研究使用了一种基于自适应卷积神经网络的方法,首先将矩阵空间转换成一个 mixture of Gaussian distributions 的形式,然后通过线性 interpolate 生成 CE。
  • results: 对于各种图像和表格数据集,我们的方法能够具有比例尺度和数据抽象的优势,并且能够高效地返回更加真实的结果,相比三种现有的方法。
    Abstract Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions: 1. What are the crucial factors that led to an automated prediction/decision? 2. How can these factors be changed to achieve a more favorable outcome from a user's perspective? Thus, guiding the user's interaction with AI systems by proposing easy-to-understand explanations and easy-to-attain feasible changes is essential for the trustworthy adoption and long-term acceptance of AI systems. In the literature, various methods have been proposed to generate CEs, and different quality measures have been suggested to evaluate these methods. However, the generation of CEs is usually computationally expensive, and the resulting suggestions are unrealistic and thus non-actionable. In this paper, we introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions. CEs are then generated in latent space by linear interpolation between the query sample and the centroid of the target class. We show that our method maintains the characteristics of the input sample during the counterfactual search. In various experiments, we show that the proposed method is competitive based on different quality measures on image and tabular datasets -- efficiently returns results that are closer to the original data manifold compared to three state-of-the-art methods, which are essential for realistic high-dimensional machine learning applications.
    摘要 “ counterfactual 解释 (CEs) 是 Algorithmic Recourse 中一种重要的工具,用于回答以下两个问题:1. 自动预测/决策中的关键因素为何?2. 如何变化这些因素以获得更有利的结果从用户的角度?因此,为AI系统的使用者提供易于理解的解释和可行的改变建议是Algorithmic Recourse 的重要 Component。在文献中,多种方法已经被提出供Counterfactual 解释,并且不同的质量指标已经被建议来评估这些方法。然而,Counterfactual 解释的生成通常是 computationally expensive 的,并且生成的建议通常是不现实的,因此无法使用。在本文中,我们创新了一种用于预训binary classifier的Counterfactual 解释方法,通过首先将 autoencoder 的latent space 变成一个 mixture of Gaussian distributions。Counterfactual 解释在latent space中generated by linear interpolation between the query sample and the centroid of the target class。我们显示了我们的方法可以维持输入样本的特性。在多个实验中,我们显示了我们的方法与三种state-of-the-art方法相比,能够实现更高的质量指标。”

BotHawk: An Approach for Bots Detection in Open Source Software Projects

  • paper_url: http://arxiv.org/abs/2307.13386
  • repo_url: https://github.com/bifenglin/bothawk
  • paper_authors: Fenglin Bi, Zhiwei Zhu, Wei Wang, Xiaoya Xia, Hassan Ali Khan, Peng Pu
  • for: 这个研究旨在调查开源软件项目中的机器人账户,并尝试准确地识别机器人账户。
  • methods: 该研究使用了一种严格的数据采集工作流程,以确保收集到的数据准确、可重复、可扩展和有效。研究人员还提出了一种名为BotHawk的机器人检测模型,可以高效地检测开源软件项目中的机器人账户。
  • results: 研究人员通过分析17个特征在5个维度中,确定了开源软件项目中机器人账户的四种类型。此外,研究人员发现,跟踪者数、仓库数和标签含义最有用于识别账户类型。BotHawk模型在检测开源软件项目中的机器人账户方面表现出色,其AUC为0.947,F1分数为0.89。
    Abstract Social coding platforms have revolutionized collaboration in software development, leading to using software bots for streamlining operations. However, The presence of open-source software (OSS) bots gives rise to problems including impersonation, spamming, bias, and security risks. Identifying bot accounts and behavior is a challenging task in the OSS project. This research aims to investigate bots' behavior in open-source software projects and identify bot accounts with maximum possible accuracy. Our team gathered a dataset of 19,779 accounts that meet standardized criteria to enable future research on bots in open-source projects. We follow a rigorous workflow to ensure that the data we collect is accurate, generalizable, scalable, and up-to-date. We've identified four types of bot accounts in open-source software projects by analyzing their behavior across 17 features in 5 dimensions. Our team created BotHawk, a highly effective model for detecting bots in open-source software projects. It outperforms other models, achieving an AUC of 0.947 and an F1-score of 0.89. BotHawk can detect a wider variety of bots, including CI/CD and scanning bots. Furthermore, we find that the number of followers, number of repositories, and tags contain the most relevant features to identify the account type.
    摘要 社交代码平台已经革命化软件开发合作,使用软件机器人来简化操作。然而,开源软件(OSS)机器人的存在导致了多种问题,包括伪造、诈骗、偏见和安全风险。标识机器人帐户和行为是开源项目中的挑战。本研究目的是调查开源软件项目中的机器人行为,并尽可能准确地识别机器人帐户。我们的团队收集了19,779个符合标准化riteria的帐户,以便未来对开源项目中的机器人进行研究。我们采用了严格的工作流程,以确保收集的数据准确、可重复、可扩展和时尚。我们通过分析17个特征在5个维度来Identify four types of bot accounts in open-source software projects。我们创建了BotHawk模型,可以高效地检测开源软件项目中的机器人。它比其他模型高效,AUC为0.947,F1分数为0.89。BotHawk可以检测更多的机器人,包括CI/CD和扫描机器人。此外,我们发现帐户类型的最有用特征是粉丝数、仓库数和标签。

Scaff-PD: Communication Efficient Fair and Robust Federated Learning

  • paper_url: http://arxiv.org/abs/2307.13381
  • repo_url: None
  • paper_authors: Yaodong Yu, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan
  • for: 提高 Federated Learning 中的公平性和鲁棒性,适用于资源受限和多样化环境。
  • methods: 使用 acceleration primal dual (APD) 算法,利用偏好 corrected local steps (as in Scaffold) 实现更高效的通信协调和更快的收敛速度。
  • results: 在多个 benchmark 数据集上测试,Scaff-PD 能够提高公平性和鲁棒性,同时保持竞争性的准确率。
    Abstract We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning. Our approach improves fairness by optimizing a family of distributionally robust objectives tailored to heterogeneous clients. We leverage the special structure of these objectives, and design an accelerated primal dual (APD) algorithm which uses bias corrected local steps (as in Scaffold) to achieve significant gains in communication efficiency and convergence speed. We evaluate Scaff-PD on several benchmark datasets and demonstrate its effectiveness in improving fairness and robustness while maintaining competitive accuracy. Our results suggest that Scaff-PD is a promising approach for federated learning in resource-constrained and heterogeneous settings.
    摘要 我团队现请Scaff-PD,一种快速并通信效率高的分布robust federated learning算法。我们的方法通过优化适应于异构客户端的分布robust目标函数来提高公平性。我们利用这些目标函数的特殊结构,并设计了一种加速的 principales dual(APD)算法,使用偏好修正的本地步骤(如Scaffold)来实现重要的通信效率和速度增加。我们在多个 Referenced datasets上评估Scaff-PD,并证明其在保持竞争性准确性的情况下提高公平性和鲁棒性。我们的结果表明Scaff-PD是一种有前途的approach federated learning中的资源受限和异构环境中。

Submodular Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13372
  • repo_url: https://github.com/manish-pra/non-additive-rl
  • paper_authors: Manish Prajapat, Mojmír Mutný, Melanie N. Zeilinger, Andreas Krause
  • for: 这个论文是为了解决强迫学习(RL)中的奖励问题,奖励通常是加法的,但在许多重要应用中,奖励具有减少返回的特点,例如覆盖控制、实验设计和信息 PATH 规划。
  • methods: 作者提出了一种新的概念——submodular RL(SubRL),它寻找更一般、非加法(历史相互作用)的奖励模型,使用 submodular 集合函数来捕捉减少返回的特点。然而,在总的来说,即使在表格设定中,这种优化问题是Difficult to approximate。
  • results: 作者提出了一种简单的policy gradient算法——SubPO,它可以处理非加法奖励。SubPO 可以在一些假设下 recuperate 优化的常数因子应用,并且在大 state-和 action- 空间下可以进行本地优化。作者通过应用 SubPO 到不同的应用中,如生物多样性监测、抽象实验设计、信息 PATH 规划和覆盖最大化,来展示其效果。结果表明 SubPO 具有高效的样本使用和可扩展性。
    Abstract In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are $\textit{independent}$ of states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. To tackle this, we propose $\textit{submodular RL}$ (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns. Unfortunately, in general, even in tabular settings, we show that the resulting optimization problem is hard to approximate. On the other hand, motivated by the success of greedy algorithms in classical submodular optimization, we propose SubPO, a simple policy gradient-based algorithm for SubRL that handles non-additive rewards by greedily maximizing marginal gains. Indeed, under some assumptions on the underlying Markov Decision Process (MDP), SubPO recovers optimal constant factor approximations of submodular bandits. Moreover, we derive a natural policy gradient approach for locally optimizing SubRL instances even in large state- and action- spaces. We showcase the versatility of our approach by applying SubPO to several applications, such as biodiversity monitoring, Bayesian experiment design, informative path planning, and coverage maximization. Our results demonstrate sample efficiency, as well as scalability to high-dimensional state-action spaces.
    摘要 在增强学习(RL)中,状态奖励通常是加法的,并且根据马可夫假设,它们是独立的。在许多重要应用中,如覆盖控制、实验设计和有益路径规划,奖励自然地具有减少的返回,即在相似的状态前后访问的情况下,奖励的价值逐渐减少。为解决这个问题,我们提议了“增强RL”(SubRL),一种潜在优化更一般、非加法(历史相依)奖励的 paradigma。然而,在总的来说,即使在表格设置中,我们显示出的优化问题是难以估算的。在这种情况下,我们提出了一种简单的政策梯度方法,即SubPO,用于解决SubRL问题。SubPO通过积极地最大化权重的增加来处理非加法奖励。在某些假设下,SubPO可以在MDP中获得优化的常量因子approximation。此外,我们还 deriv了一种自然的政策梯度方法,用于本地优化SubRL实例,即使在大的状态和动作空间中。我们在各种应用中应用SubPO,如生物多样性监测、推论实验设计、有益路径规划和覆盖最大化。我们的结果显示了样本效率,以及可扩展性。

Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation

  • paper_url: http://arxiv.org/abs/2307.13371
  • repo_url: None
  • paper_authors: Fengxue Zhang, Jialin Song, James Bowden, Alexander Ladd, Yisong Yue, Thomas A. Desautels, Yuxin Chen
  • for: 这篇论文是关于 Bayesian 优化 (BO) 在高维和非站ARY enario 中的研究。
  • methods: 论文提出了一个框架,called BALLET,它可以在高维和非站ARY enario 中实现 Bayesian 优化。BALLET 使用了两个 probabilistic 模型:一个粗糙的 Gaussian 过程 (GP) 来识别高信度区域 (ROI),以及一个本地化的 GP 来优化在 ROI 中。
  • results: 论文证明了 BALLET 可以有效缩小搜索空间,并且可以比标准 BO 方法更紧的对应 regret bound。论文还进行了实验证明,证明了 BALLET 在实际应用中的效果。
    Abstract We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP). Our approach is easy to tune, and is able to focus on local region of the optimization space that can be tackled by existing BO methods. The key idea is to use two probabilistic models: a coarse GP to identify the ROI, and a localized GP for optimization within the ROI. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO without ROI filtering. We demonstrate empirically the effectiveness of BALLET on both synthetic and real-world optimization tasks.
    摘要 我们研究 bayesian 优化(BO)在高维和非站点场景下。现有的算法通常需要广泛的 гипер参数调整,这限制了它们的实际效果。我们提出了一个框架,叫做 BALLET,它可以动态筛选出高信息域的兴趣点(ROI),作为非参数型 probabilistic 模型,如 Gaussian process(GP)的超级集。我们的方法容易调整,可以将关注点放在可以由现有的 BO 方法解决的本地优化空间上。关键思想是使用两种 probabilistic 模型:一个粗细的 GP 来识别 ROI,并一个局部化的 GP 进行优化在 ROI 中。我们证明了 BALLET 可以有效缩小搜索空间,并可以比标准 BO 无 ROI 筛选更紧的 regret bound。我们在 sintetic 和实际优化任务上证明了 BALLET 的实际效果。

Computational Guarantees for Doubly Entropic Wasserstein Barycenters via Damped Sinkhorn Iterations

  • paper_url: http://arxiv.org/abs/2307.13370
  • repo_url: None
  • paper_authors: Lénaïc Chizat, Tomas Vaškevičius
  • for: Computation of doubly regularized Wasserstein barycenters
  • methods: Damped Sinkhorn iterations followed by exact maximization/minimization steps
  • results: Convergence guarantees for any choice of regularization parameters, and non-asymptotic convergence guarantees for approximating Wasserstein barycenters between discrete point clouds in the free-support/grid-free setting.Here’s the format you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope that helps!
    Abstract We study the computation of doubly regularized Wasserstein barycenters, a recently introduced family of entropic barycenters governed by inner and outer regularization strengths. Previous research has demonstrated that various regularization parameter choices unify several notions of entropy-penalized barycenters while also revealing new ones, including a special case of debiased barycenters. In this paper, we propose and analyze an algorithm for computing doubly regularized Wasserstein barycenters. Our procedure builds on damped Sinkhorn iterations followed by exact maximization/minimization steps and guarantees convergence for any choice of regularization parameters. An inexact variant of our algorithm, implementable using approximate Monte Carlo sampling, offers the first non-asymptotic convergence guarantees for approximating Wasserstein barycenters between discrete point clouds in the free-support/grid-free setting.
    摘要 我们研究双正则化 Wasserstein 质心的计算,这是最近引入的一种内外正则化强度控制的泛化积分质心。先前的研究表明,不同的正则化参数选择可以统一各种束缚积分质心,同时还可以揭示新的质心,包括特殊情况下的减偏积分质心。在这篇文章中,我们提出了一种计算双正则化 Wasserstein 质心的算法,该算法基于抑制式谱散数据进行融合,然后使用精确的最大化/最小化步骤,可以保证任何正则化参数选择的收敛。在实际应用中,我们还提出了一种使用伪随机抽样来实现准确的质心计算的不精准变体,这是自由支持/网格自由设置下的第一个不对称收敛保证的方法。

Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers

  • paper_url: http://arxiv.org/abs/2307.14367
  • repo_url: None
  • paper_authors: Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, Michalis Vazirgiannis
  • for: 这篇论文的目的是提出一个新的方法来预测蛋白质的功能,这个方法使用 Graph Neural Networks(GNNs) 和 Large Language Models(LLMs) 在encoder-decoder架构中结合,以生成蛋白质功能的详细描述。
  • methods: 这篇论文使用的方法是一种 multimodal 方法,结合蛋白质的序列、结构和文本描述,使用 GNNs 和 LLMs 进行融合,实现蛋白质功能的全面表示。
  • results: 这篇论文的实验结果显示,这个新的方法可以实现更高的预测精度,并且可以生成蛋白质功能的详细描述。
    Abstract The complex nature of big biological systems pushed some scientists to classify its understanding under the inconceivable missions. Different leveled challenges complicated this task, one of is the prediction of a protein's function. In recent years, significant progress has been made in this field through the development of various machine learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e assigning predefined labels to proteins. In this work, we propose a novel approach, \textbf{Prot2Text}, which predicts a protein function's in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including proteins' sequences, structures, and textual annotations. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate prediction of proteins' functions. The code, the models and a demo will be publicly released.
    摘要 大生物系统的复杂性让一些科学家将其理解列入不可思议任务之列。不同的等级挑战困扰了这个任务,其中之一是蛋白质功能预测。在过去几年,我们在这一领域进行了重要的进步,通过开发多种机器学习方法。然而,大多数现有方法将任务定义为多类别问题,即将蛋白质分配预先定义的标签。在这种情况下,我们提出了一种新的方法——Prot2Text,它预测蛋白质功能在自由文本格式下,超越传统的二分或分类化预测。我们通过将图 neural network(GNN)和大型自然语言模型(LLM)组合在encoder-decoder框架中,能够集成多种蛋白质数据类型,包括序列、结构和文本注释。这种多模式方法允许我们对蛋白质功能进行整体表示,使得生成详细和准确的描述。为评估我们的模型,我们从SwissProt中提取了多模式蛋白质数据集,并通过实验证明Prot2Text的效果。这些结果显示了多模式模型的融合,特别是GNN和LLM的融合,为研究人员提供了更加准确的蛋白质功能预测工具。代码、模型和demo将公共发布。

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

  • paper_url: http://arxiv.org/abs/2307.13352
  • repo_url: None
  • paper_authors: Puning Zhao, Zhiguo Wan
  • for: 这篇论文是关于robust分布式学习,拥有Byzantine失败的情况下的研究。
  • methods: 该方法采用了直方正方向的半验证方法,可以在高维问题上进行解决,并且可以适应arbitrary数量的Byzantine攻击者。
  • results: 我们的研究表明,该方法可以在高维问题上实现最佳的统计效果,并且与前一些研究相比,它在维度上具有更好的性能。
    Abstract Robust distributed learning with Byzantine failures has attracted extensive research interests in recent years. However, most of existing methods suffer from curse of dimensionality, which is increasingly serious with the growing complexity of modern machine learning models. In this paper, we design a new method that is suitable for high dimensional problems, under arbitrary number of Byzantine attackers. The core of our design is a direct high dimensional semi-verified mean estimation method. Our idea is to identify a subspace first. The components of mean value perpendicular to this subspace can be estimated via gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. We then use our new method as the aggregator of distributed learning problems. Our theoretical analysis shows that the new method has minimax optimal statistical rates. In particular, the dependence on dimensionality is significantly improved compared with previous works.
    摘要 robust 分布式学习受到了最近几年的广泛研究兴趣。然而,大多数现有方法受到了维度灾难的拥挤,这在现代机器学习模型的复杂度逐渐增加时变得越来越严重。在这篇论文中,我们设计了适用于高维问题的新方法,可以抗 resist 任意数量的拜占庭攻击者。我们的设计核心在于直接使用高维半验证平均值计算方法。我们的想法是先Identify一个子空间,然后通过上传 worker 机器的梯度向量来计算沿着这个子空间的方向的部分,而在这个子空间内部使用 auxillary 数据来计算剩余的部分。我们然后使用我们的新方法来汇集分布式学习问题。我们的理论分析表明,我们的新方法具有最优的最小最大统计率。具体来说,与前一些工作相比,我们的方法在维度方面具有显著的改进。

Explainable Disparity Compensation for Efficient Fair Ranking

  • paper_url: http://arxiv.org/abs/2307.14366
  • repo_url: None
  • paper_authors: Abraham Gale, Amélie Marian
  • for: This paper aims to address the issue of disparate outcomes in decision systems, specifically in ranking functions, and proposes data-driven compensatory measures to improve fairness.
  • methods: The proposed measures rely on generating bonus points for members of underrepresented groups to address disparity in the ranking function. Efficient sampling-based algorithms are used to calculate the number of bonus points to minimize disparity.
  • results: The authors validate their algorithms using real-world school admissions and recidivism datasets, and compare their results with those of existing fair ranking algorithms. The results show that their proposed measures can effectively improve fairness in the ranking function.Here’s the full text in Simplified Chinese:
  • for: 这篇论文目标是解决决策系统中的不平等结果问题,具体来说是对排名函数中的不平等进行补偿。
  • methods: 提议的补偿措施基于为受排除群体成员分配加分点,以解决排名函数中的不平等。 authors使用高效的采样算法来计算加分点的数量,以最小化不平等。
  • results: authors使用实际的学校招生和重犯罪数据集来验证他们的算法,并与现有的公平排名算法进行比较。结果表明,提议的补偿措施可以有效地提高排名函数的公平性。
    Abstract Ranking functions that are used in decision systems often produce disparate results for different populations because of bias in the underlying data. Addressing, and compensating for, these disparate outcomes is a critical problem for fair decision-making. Recent compensatory measures have mostly focused on opaque transformations of the ranking functions to satisfy fairness guarantees or on the use of quotas or set-asides to guarantee a minimum number of positive outcomes to members of underrepresented groups. In this paper we propose easily explainable data-driven compensatory measures for ranking functions. Our measures rely on the generation of bonus points given to members of underrepresented groups to address disparity in the ranking function. The bonus points can be set in advance, and can be combined, allowing for considering the intersections of representations and giving better transparency to stakeholders. We propose efficient sampling-based algorithms to calculate the number of bonus points to minimize disparity. We validate our algorithms using real-world school admissions and recidivism datasets, and compare our results with that of existing fair ranking algorithms.
    摘要

Feature Importance Measurement based on Decision Tree Sampling

  • paper_url: http://arxiv.org/abs/2307.13333
  • repo_url: https://github.com/tsudalab/dt-sampler
  • paper_authors: Chao Huang, Diptesh Das, Koji Tsuda
  • for: 用于提高树基型模型中FeatureImportance的可 interpretability和稳定性。
  • methods: 使用SAT理论来测试FeatureImportance,具有 fewer parameters 和更高的可 interpretability,适用于实际问题。
  • results: 在实际问题中,DT-Sampler可以提供更高的可 interpretability和稳定性,并且比Random Forest具有更少的参数。Translation:
  • for: Used to improve the interpretability and stability of feature importance in tree-based models.
  • methods: Uses SAT theory to test feature importance, with fewer parameters and higher interpretability, applicable to real-world problems.
  • results: In practical problems, DT-Sampler can provide higher interpretability and stability, and has fewer parameters than Random Forest.
    Abstract Random forest is effective for prediction tasks but the randomness of tree generation hinders interpretability in feature importance analysis. To address this, we proposed DT-Sampler, a SAT-based method for measuring feature importance in tree-based model. Our method has fewer parameters than random forest and provides higher interpretability and stability for the analysis in real-world problems. An implementation of DT-Sampler is available at https://github.com/tsudalab/DT-sampler.
    摘要 随机森林可以很有效地进行预测任务,但随机生成树的randomness会降低特征重要性的解释性。为解决这个问题,我们提出了DT-Sampler,一种基于SAT的方法来测量树型模型中特征的重要性。我们的方法有 fewer parameters than random forest,并且在实际问题中提供了更高的解释性和稳定性。DT-Sampler的实现可以在https://github.com/tsudalab/DT-sampler中找到。

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

  • paper_url: http://arxiv.org/abs/2307.13332
  • repo_url: None
  • paper_authors: Philip Amortila, Nan Jiang, Csaba Szepesvári
  • for: 这篇论文主要针对 linear off-policy value function estimation 问题进行研究,具体来说是研究函数approximation factor在不同设置下的优化形式。
  • methods: 论文使用了各种方法来研究函数approximation factor,包括使用权重$L_2$ norm、$L_\infty$ norm、状态别名和完整/半 coverage的state space。
  • results: 论文的结果显示,在不同的设置下,函数approximation factor的优化形式是不同的,并且可以确定具体的常数因素。特别是,$L_2(\mu)$ norm 下的两个实例特定因素和 $L_\infty$ norm 下的一个常数因素被证明为决定了偏离策略评估的困难程度。
    Abstract Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.
    摘要 theoretically guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. in this paper, we study this question in linear off-policy value function estimation, where many open questions remain. we study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. we establish the optimal asymptotic approximation factors (up to constants) for all of these settings. in particular, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.Note that Simplified Chinese is a written language, and the translation is based on the standardized grammar and vocabulary of Simplified Chinese. However, the actual translation may vary depending on the specific context and register used in the original text.

Unleash the Power of Context: Enhancing Large-Scale Recommender Systems with Context-Based Prediction Models

  • paper_url: http://arxiv.org/abs/2308.01231
  • repo_url: None
  • paper_authors: Jan Hartman, Assaf Klein, Davorin Kopič, Natalia Silberstein
  • for: 提高大规模商业推荐系统的性能,具有广泛的个性化推荐应用场景。
  • methods: 基于用户和上下文特征的预测模型,不考虑物品特征,可以减少服务成本。
  • results: 实验表明,这种方法可以在线上和离线上的商业指标中带来显著改善,而且对服务成本的影响很小。
    Abstract In this work, we introduce the notion of Context-Based Prediction Models. A Context-Based Prediction Model determines the probability of a user's action (such as a click or a conversion) solely by relying on user and contextual features, without considering any specific features of the item itself. We have identified numerous valuable applications for this modeling approach, including training an auxiliary context-based model to estimate click probability and incorporating its prediction as a feature in CTR prediction models. Our experiments indicate that this enhancement brings significant improvements in offline and online business metrics while having minimal impact on the cost of serving. Overall, our work offers a simple and scalable, yet powerful approach for enhancing the performance of large-scale commercial recommender systems, with broad implications for the field of personalized recommendations.
    摘要 在这项工作中,我们介绍了基于上下文的预测模型。这种预测模型根据用户和上下文特征决定用户行为(如点击或购买)的概率,不考虑物品本身的特征。我们已经认为这种模型途径具有很多有价值的应用,包括培养一个辅助上下文基于模型来估计点击概率,并将其预测作为ctr预测模型中的一个特征。我们的实验表明,这种增强可以在线上和Offline商业指标方面带来显著改善,而无需增加服务成本。总之,我们的工作提供了一种简单、可扩展、 yet 具有强大能力的方法来提高大规模的商业推荐系统的性能,对个人化推荐领域产生广泛的影响。

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

  • paper_url: http://arxiv.org/abs/2307.13304
  • repo_url: https://github.com/jerry-chee/quip
  • paper_authors: Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa
  • for: 本研究探讨大语言模型(LLM)后期参数归一化。
  • methods: 我们提出了一种基于干扰量和矩阵方向的归一化方法(QuIP),包括两个步骤:(1) 适应归一化过程中的二次proxy目标函数; (2) 高效的预处理和后处理,通过随机正交矩阵来保证参数和偏差矩阵的不一致。
  • results: 我们的实验表明,QuIP可以提高多种现有的归一化算法的性能,并且在只使用两个位数据时实现了首个可行的LLM归一化方法。我们的代码可以在https://github.com/jerry-chee/QuIP上找到。
    Abstract This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/jerry-chee/QuIP .
    摘要
  1. An adaptive rounding procedure that minimizes a quadratic proxy objective.2. Efficient pre- and post-processing that ensures weight and Hessian incoherence through multiplication by random orthogonal matrices.We also provide the first theoretical analysis for an LLM-scale quantization algorithm and show that our theory applies to an existing method, OPTQ. Our experiments demonstrate that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code is available at https://github.com/jerry-chee/QuIP.

Word Sense Disambiguation as a Game of Neurosymbolic Darts

  • paper_url: http://arxiv.org/abs/2307.16663
  • repo_url: None
  • paper_authors: Tiansi Dong, Rafet Sifa
  • for: 提高Word Sense Disambiguation(WSD)任务的性能,突破深度学习方法的“玻璃天花”
  • methods: 提出了一种新的神经符号方法,利用嵌入式的方法,通过嵌入式的方法来实现简单的逻辑推理,并通过游戏“dart”来训练Transformer模型
  • results: 在多个测试数据集上达到了F1分数的90%以上,并且在不同的n-ball嵌入中得到了70%-75%的覆盖率,表明该方法可以超越深度学习方法的性能 bound.
    Abstract Word Sense Disambiguation (WSD) is one of the hardest tasks in natural language understanding and knowledge engineering. The glass ceiling of 80% F1 score is recently achieved through supervised deep-learning, enriched by a variety of knowledge graphs. Here, we propose a novel neurosymbolic methodology that is able to push the F1 score above 90%. The core of our methodology is a neurosymbolic sense embedding, in terms of a configuration of nested balls in n-dimensional space. The centre point of a ball well-preserves word embedding, which partially fix the locations of balls. Inclusion relations among balls precisely encode symbolic hypernym relations among senses, and enable simple logic deduction among sense embeddings, which cannot be realised before. We trained a Transformer to learn the mapping from a contextualized word embedding to its sense ball embedding, just like playing the game of darts (a game of shooting darts into a dartboard). A series of experiments are conducted by utilizing pre-training n-ball embeddings, which have the coverage of around 70% training data and 75% testing data in the benchmark WSD corpus. The F1 scores in experiments range from 90.1% to 100.0% in all six groups of test data-sets (each group has 4 testing data with different sizes of n-ball embeddings). Our novel neurosymbolic methodology has the potential to break the ceiling of deep-learning approaches for WSD. Limitations and extensions of our current works are listed.
    摘要

Modify Training Directions in Function Space to Reduce Generalization Error

  • paper_url: http://arxiv.org/abs/2307.13290
  • repo_url: None
  • paper_authors: Yi Yu, Wenlian Lu, Boyu Chen
  • for: 提高神经网络模型的泛化性能
  • methods: 使用修改后的自然向导方法在神经网络函数空间进行 theoretically 分析,并利用 eigendecompositions 和统计学理论来derive 神经网络函数的泛化误差
  • results: 提出一个基于 eigendecompositions 和统计学理论的泛化误差减少方法,并通过数学示例证明该方法可以改善神经网络模型的泛化性能。此外,这种 theoretically 方法还可以解释许多现有的泛化提高方法的效果。
    Abstract We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We firstly present analytical expression for the function learned by this modified natural gradient under the assumptions of Gaussian distribution and infinite width limit. Thus, we explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory. By decomposing of the total generalization error attributed to different eigenspace of the kernel in function space, we propose a criterion for balancing the errors stemming from training set and the distribution discrepancy between the training set and the true data. Through this approach, we establish that modifying the training direction of the neural network in function space leads to a reduction in the total generalization error. Furthermore, We demonstrate that this theoretical framework is capable to explain many existing results of generalization enhancing methods. These theoretical results are also illustrated by numerical examples on synthetic data.
    摘要 我们提出了一种基于神经网络函数空间的修改后自然算法的理论分析。我们首先提出了假设 Gaussian 分布和无限宽限的情况下,修改后自然算法所学习的函数的analytical表达。这使我们可以明确地 Compute 修改后神经网络函数的通用错误。我们还将Total generalization error decomposed into different eigenspace of the kernel in function space,并提出一个均衡错误的对象,以减少Total generalization error。此外,我们还证明了这个理论框架可以解释许多现有的通用提升方法的结果。这些理论结果也通过了实验示例 validate 在Synthetic data 上。Here's the translation in Traditional Chinese:我们提出了一种基于神经网络函数空间的修改后自然算法的理论分析。我们首先提出了假设 Gaussian 分布和无限宽限的情况下,修改后自然算法所学习的函数的analytical表达。这使我们可以明确地 Compute 修改后神经网络函数的通用错误。我们还将Total generalization error decomposed into different eigenspace of the kernel in function space,并提出一个均衡错误的对象,以减少Total generalization error。此外,我们还证明了这个理论框架可以解释许多现有的通用提升方法的结果。这些理论结果也通过了实验示例 validate 在Synthetic data 上。

Curvature-based Transformer for Molecular Property Prediction

  • paper_url: http://arxiv.org/abs/2307.13275
  • repo_url: None
  • paper_authors: Yili Chen, Zhengyu Li, Zheng Wan, Hui Yu, Xian Wei
  • for: 提高基于人工智能的药物设计中分子属性预测的能力
  • methods: 引入Discretization of Ricci Curvature来提高图граaph神经网络模型对分子图数据的结构信息抽取能力
  • results: 在PCQM4M-LST、MoleculeNet等化学分子数据集上进行实验,与Uni-Mol、Graphormer等模型进行比较,结果表明该方法可以达到当前最佳结果。另外,Discretized Ricci curvature还能够捕捉分子结构和功能关系的信息,描述分子图数据的地方几何特征。
    Abstract The prediction of molecular properties is one of the most important and challenging tasks in the field of artificial intelligence-based drug design. Among the current mainstream methods, the most commonly used feature representation for training DNN models is based on SMILES and molecular graphs, although these methods are concise and effective, they also limit the ability to capture spatial information. In this work, we propose Curvature-based Transformer to improve the ability of Graph Transformer neural network models to extract structural information on molecular graph data by introducing Discretization of Ricci Curvature. To embed the curvature in the model, we add the curvature information of the graph as positional Encoding to the node features during the attention-score calculation. This method can introduce curvature information from graph data without changing the original network architecture, and it has the potential to be extended to other models. We performed experiments on chemical molecular datasets including PCQM4M-LST, MoleculeNet and compared with models such as Uni-Mol, Graphormer, and the results show that this method can achieve the state-of-the-art results. It is proved that the discretized Ricci curvature also reflects the structural and functional relationship while describing the local geometry of the graph molecular data.
    摘要 “分子质量预测是人工智能基于药物设计的一个最重要和挑战性任务。现今主流方法中,最常用的特征表示方法是基于SMILES和分子图,尽管这些方法简洁有效,但它们也限制了捕捉空间信息的能力。在这种工作中,我们提出了几何基于 transformer 的 Curvature-based Transformer,以提高对分子图数据的结构信息抽取能力。在计算注意力分布时,我们添加了图的 Ricci 曲率信息作为 pozitional Encoding,以将曲率信息 embed 到节点特征中。这种方法可以在不改变原始网络结构的情况下,将曲率信息从图数据中引入到模型中,并且具有扩展性。我们在 PCQM4M-LST、MoleculeNet 等化学分子数据集上进行了实验,与 Uni-Mol、Graphormer 等模型进行比较,结果显示,这种方法可以达到领先的成绩。这也证明了对分子图数据的几何基本特征的描述,同时还能够反映分子的结构和功能关系。”

Unbiased Weight Maximization

  • paper_url: http://arxiv.org/abs/2307.13270
  • repo_url: None
  • paper_authors: Stephen Chung
  • for: 本研究旨在提出一种生物学上有效的人工神经网络(ANN)训练方法,即对每个单元视为杂种学习(RL)代理,以视网膜为一个团队。
  • methods: 本方法使用REINFORCE算法,但是由于单元之间的信息不准确传递,会导致学习过程缓慢,难以扩展到大规模网络。为解决这个问题,提出了Weight Maximization方法,即每个隐藏单元可以通过自己的出口权重的 нор 来最大化自己的学习效果。
  • results: 研究人员分析了Weight Maximization方法的理论性质,并提出了一种变体Unbiased Weight Maximization。这种新方法可以提供一种不偏学习规则,使学习速度加快,并在网络规模增加时保持良好的性能。具体来说,这是目前所知道的第一种可以快速学习、适用于大规模网络的不偏学习规则。
    Abstract A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. Nevertheless, this learning method is often slow and scales poorly with network size due to inefficient structural credit assignment, since a single reward signal is broadcast to all units without considering individual contributions. Weight Maximization, a proposed solution, replaces a unit's reward signal with the norm of its outgoing weight, thereby allowing each hidden unit to maximize the norm of the outgoing weight instead of the global reward signal. In this research report, we analyze the theoretical properties of Weight Maximization and propose a variant, Unbiased Weight Maximization. This new approach provides an unbiased learning rule that increases learning speed and improves asymptotic performance. Notably, to our knowledge, this is the first learning rule for a network of Bernoulli-logistic units that is unbiased and scales well with the number of network's units in terms of learning speed.
    摘要 一种生物学可能性的方法 для训练人工神经网络(ANN)是将每个单元视为一个随机反弹学习(RL)代理,从而考虑神经网络为一支代理队伍。因此,所有单元都可以通过REINFORCE本地学习规则,该规则由全局奖励信号修饰,与生物观察到的 synaptic plasticity更加相似。然而,这种学习方法通常慢和网络大小不好扩展,因为不效的结构归因分配,单个奖励信号 Broadcast 到所有单元而无法考虑单元之间的贡献。Weight Maximization,一种提议的解决方案,将单元的奖励信号替换为出口重量的norm, allowing each hidden unit to maximize the norm of the outgoing weight instead of the global reward signal。在这份研究报告中,我们分析Weight Maximization的理论性质和提出一种变体,即不偏学习Weight Maximization。这种新的学习规则提供了一个不偏的学习规则,提高学习速度和长期性表现。值得注意的是,我们知道,这是一种可以扩展到神经网络中的 Bernoulli-logistic 单元数量的学习规则,并且与网络大小成直线关系。

Federated K-Means Clustering via Dual Decomposition-based Distributed Optimization

  • paper_url: http://arxiv.org/abs/2307.13267
  • repo_url: None
  • paper_authors: Vassilios Yfantis, Achim Wagner, Martin Ruskowski
  • for: This paper is written for researchers and practitioners interested in distributed optimization for machine learning, particularly in the context of $ K $-means clustering.
  • methods: The paper uses dual decomposition to solve the distributed training of $ K $-means clustering problems. The authors propose a mixed-integer quadratically constrained programming-based formulation of the clustering training problem and evaluate the performance of three optimization algorithms (subgradient method, bundle trust method, and quasi-Newton dual ascent algorithm) on a set of benchmark problems.
  • results: The paper demonstrates the potential of using dual decomposition for distributed training of $ K $-means clustering problems, but notes that the mixed-integer programming-based formulation of the clustering problems suffers from weak integer relaxations. The authors evaluate the performance of three optimization algorithms and show that the proposed approach can potentially enable an efficient solution in the future, both in a central and distributed setting.
    Abstract The use of distributed optimization in machine learning can be motivated either by the resulting preservation of privacy or the increase in computational efficiency. On the one hand, training data might be stored across multiple devices. Training a global model within a network where each node only has access to its confidential data requires the use of distributed algorithms. Even if the data is not confidential, sharing it might be prohibitive due to bandwidth limitations. On the other hand, the ever-increasing amount of available data leads to large-scale machine learning problems. By splitting the training process across multiple nodes its efficiency can be significantly increased. This paper aims to demonstrate how dual decomposition can be applied for distributed training of $ K $-means clustering problems. After an overview of distributed and federated machine learning, the mixed-integer quadratically constrained programming-based formulation of the $ K $-means clustering training problem is presented. The training can be performed in a distributed manner by splitting the data across different nodes and linking these nodes through consensus constraints. Finally, the performance of the subgradient method, the bundle trust method, and the quasi-Newton dual ascent algorithm are evaluated on a set of benchmark problems. While the mixed-integer programming-based formulation of the clustering problems suffers from weak integer relaxations, the presented approach can potentially be used to enable an efficient solution in the future, both in a central and distributed setting.
    摘要 使用分布式优化在机器学习中可以受到保持隐私和提高计算效率的两种动机。一个是训练数据可能会被存储在多个设备上,而每个节点只有访问自己的敏感数据时,需要使用分布式算法来训练全球模型。另一个是由于数据量的增加,导致大规模机器学习问题的出现。通过将训练过程分布到多个节点来加速其效率。这篇论文旨在演示如何使用分布式优化解决分布式训练 $ K $-means clustering问题。文章首先介绍分布式和联邦机器学习,然后提出基于杂Integer编程的$ K $-means clustering训练问题的混合形式。通过在不同节点上分布数据并通过共识约束相连接这些节点来进行分布式训练。最后,文章评估了在一组 benchmark 问题上的贪婪法、杂Integer法和 quasi-Newton dual ascent 算法的性能。虽然杂Integer编程基本形式的 clustering 问题受到弱的整数放宽,但是该方法可能可以在未来的中心和分布式设置中启用高效的解决方案。

Federated Split Learning with Only Positive Labels for resource-constrained IoT environment

  • paper_url: http://arxiv.org/abs/2307.13266
  • repo_url: None
  • paper_authors: Praveen Joshi, Chandra Thapa, Mohammed Hasanuzzaman, Ted Scully, Haithem Afli
  • for: 提高 IoT 设备数据隐私和提高模型训练效率
  • methods: 使用分布式合作机器学习(DCML)和分布式分裂学习(SFL)技术,并在客户端模型部分应用本地批处理和批处理随机混淆
  • results: SFPL 比 SFL 提高了模型训练效率和预测精度,具体达到了以下因素:(i)CIFAR-100 数据集上,SFPL 比 SFL 提高了 ResNet-56 和 ResNet-32 模型的训练效率,分别提高了51.54和32.57倍;(ii)CIFAR-10 数据集上,SFPL 比 SFL 提高了 ResNet-32 和 ResNet-8 模型的训练效率,分别提高了9.23和8.52倍。
    Abstract Distributed collaborative machine learning (DCML) is a promising method in the Internet of Things (IoT) domain for training deep learning models, as data is distributed across multiple devices. A key advantage of this approach is that it improves data privacy by removing the necessity for the centralized aggregation of raw data but also empowers IoT devices with low computational power. Among various techniques in a DCML framework, federated split learning, known as splitfed learning (SFL), is the most suitable for efficient training and testing when devices have limited computational capabilities. Nevertheless, when resource-constrained IoT devices have only positive labeled data, multiclass classification deep learning models in SFL fail to converge or provide suboptimal results. To overcome these challenges, we propose splitfed learning with positive labels (SFPL). SFPL applies a random shuffling function to the smashed data received from clients before supplying it to the server for model training. Additionally, SFPL incorporates the local batch normalization for the client-side model portion during the inference phase. Our results demonstrate that SFPL outperforms SFL: (i) by factors of 51.54 and 32.57 for ResNet-56 and ResNet-32, respectively, with the CIFAR-100 dataset, and (ii) by factors of 9.23 and 8.52 for ResNet-32 and ResNet-8, respectively, with CIFAR-10 dataset. Overall, this investigation underscores the efficacy of the proposed SFPL framework in DCML.
    摘要 分布式协同机器学习(DCML)是互联网物联网(IoT)领域的一种有前途的方法,用于训练深度学习模型,因为数据分布在多个设备上。DCML的一个优点是改善数据隐私,不需要将原始数据集中化,同时也使 IoT 设备具备较低的计算能力。在 DCML 框架中,联邦分割学习(SFL)是最适合高效地训练和测试,当设备有限的计算能力时。然而,当资源有限的 IoT 设备只有正例数据时,SFL 中的多类分类深度学习模型会无法 converge 或提供低优的结果。为了解决这些挑战,我们提议使用分布式学习Positive Label(SFPL)。SFPL 使用客户端接收到的数据进行随机混淆函数处理,然后将其提供给服务器进行模型训练。此外,SFPL 还在推理阶段在客户端上实现本地批处理标准化。我们的结果表明,SFPL 在 CIFAR-100 和 CIFAR-10 数据集上分别比 SFL 提高了51.54 和 32.57 倍,并且在 CIFAR-10 数据集上比 SFL 提高了9.23 和 8.52 倍。总的来说,这种研究证明了我们提议的 SFPL 框架在 DCML 中的效果。

Structural Credit Assignment with Coordinated Exploration

  • paper_url: http://arxiv.org/abs/2307.13256
  • repo_url: None
  • paper_authors: Stephen Chung
    for: 这个论文旨在提出一种生物学上可能性的人工神经网络(ANN)训练方法,该方法是将每个单元视为随机奖励学习(RL)代理,从而将网络视为一群代理。methods: 该方法使用REINFORCE本地学习规则,该规则由全局奖励信号修饰,与生物观察到的 synaptic plasticity更加一致。然而,这种学习方法的启用缓慢,并且与网络大小不相关。这种缓慢的原因是:(i)所有单元独立探索网络,(ii)所有单元都使用同一个奖励来评估其行动。因此,可以分为两类方法来改进结构准确评估。results: 我们提出使用博尔ツ曼机或回卷网络进行协调探索。我们发现,在训练博尔ツ曼机时,可以消除负阶段,其学习规则与奖励调整 Hebbian 学习规则相似。实验结果表明,协调探索在多个随机和离散单元基于 REINFORCE 训练速度上明显超过独立探索,甚至超过 straight-through estimator(STE)反propagation。
    Abstract A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. However, this learning method tends to be slow and does not scale well with the size of the network. This inefficiency arises from two factors impeding effective structural credit assignment: (i) all units independently explore the network, and (ii) a single reward is used to evaluate the actions of all units. Accordingly, methods aimed at improving structural credit assignment can generally be classified into two categories. The first category includes algorithms that enable coordinated exploration among units, such as MAP propagation. The second category encompasses algorithms that compute a more specific reward signal for each unit within the network, like Weight Maximization and its variants. In this research report, our focus is on the first category. We propose the use of Boltzmann machines or a recurrent network for coordinated exploration. We show that the negative phase, which is typically necessary to train Boltzmann machines, can be removed. The resulting learning rules are similar to the reward-modulated Hebbian learning rule. Experimental results demonstrate that coordinated exploration significantly exceeds independent exploration in training speed for multiple stochastic and discrete units based on REINFORCE, even surpassing straight-through estimator (STE) backpropagation.
    摘要 生物学上有效的人工神经网络(ANN)训练方法是将每个单元视为随机奖励学习(RL)代理,从而考虑整个网络为一支代理队伍。这样做的优点是每个单元都可以通过REINFORCE本地学习规则和全局奖励信号来学习,这更加符合生物观察到的神经元强化突变。然而,这种学习方法具有两个缺点,导致效率低下:(1)所有单元独立探索网络,(2)网络中的所有单元都接受同一个奖励。这两点导致了结构准确评价的障碍。为了改进结构准确评价,一般可以分为两类方法:第一类是使单元之间协同探索的算法,如MAP协同传播;第二类是计算网络中每个单元的更加准确的奖励信号,如质量最大化和其变种。本研究报告的焦点是第一类方法。我们提议使用博尔ツ曼机或回归网络进行协同探索。我们发现,通常需要训练博尔ツ曼机的负阶段可以除去。结果的学习规则类似于奖励调节的希质bean学习规则。实验结果表明,协同探索在多个随机离散单元基于REINFORCE训练速度上明显高于独立探索,甚至超过STE归整梯度归整。

RoSAS: Deep Semi-Supervised Anomaly Detection with Contamination-Resilient Continuous Supervision

  • paper_url: http://arxiv.org/abs/2307.13239
  • repo_url: https://github.com/xuhongzuo/rosas
  • paper_authors: Hongzuo Xu, Yijie Wang, Guansong Pang, Songlei Jian, Ning Liu, Yongjun Wang
  • for: 这篇论文的目的是提出一种新的半指导型异常检测方法,以提高异常检测的性能。
  • methods: 这篇论文使用了一种新的混合梯度法,将涉猎到异常的标签资料与正常标签资料混合在一起,从而创建了新的标签资料集。同时,这篇论文还使用了一个特别的目标函数来规范网络,以提高网络的Robustness。
  • results: 这篇论文的实验结果显示,该方法可以在11个真实世界数据集上取得20%-30%的提升,并且在不同的异常污染水平和不同数量的标签异常下展现出更加稳定和更好的性能。
    Abstract Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (such as binary or ordinal data labels) is exploited, which leads to suboptimal learning of anomaly scores that essentially take on a continuous distribution. Therefore, this paper proposes a novel semi-supervised anomaly detection method, which devises \textit{contamination-resilient continuous supervisory signals}. Specifically, we propose a mass interpolation method to diffuse the abnormality of labeled anomalies, thereby creating new data samples labeled with continuous abnormal degrees. Meanwhile, the contaminated area can be covered by new data samples generated via combinations of data with correct labels. A feature learning-based objective is added to serve as an optimization constraint to regularize the network and further enhance the robustness w.r.t. anomaly contamination. Extensive experiments on 11 real-world datasets show that our approach significantly outperforms state-of-the-art competitors by 20%-30% in AUC-PR and obtains more robust and superior performance in settings with different anomaly contamination levels and varying numbers of labeled anomalies. The source code is available at https://github.com/xuhongzuo/rosas/.
    摘要 semi-supervised异常检测方法可以借鉴一些异常示例,以实现与不supervised模型相比的显著改善。然而,它们仍然受到两个限制:1)无标签异常(即异常杂化)可能会导致学习过程中的混乱,当所有无标签数据作为模型训练中的内liers使用时;2)只是利用简单的数据标签(如二分或ORDinal数据标签),导致异常分数的学习变得不优化。因此,本文提出了一种新的 semi-supervised异常检测方法,即使用“杂化防御”的continuous supervisory signals。具体来说,我们提出了一种杂化 interpolating方法,以帮助异常标注样本的异常程度进行灵活的销毁,并创建了新的数据样本,其中每个样本都有连续的异常度标签。此外,杂化区域可以通过组合正确标注的数据来覆盖。我们还添加了一个特征学习基于的目标函数,以便为杂化异常进行更好的规范化和强化。我们在11个真实世界数据集上进行了广泛的实验,结果显示,我们的方法在AUC-PR方面与当前竞争对手相比,提高了20%-30%,并且在不同的异常杂化水平和异常标注数量的情况下具有更加稳定和优秀的性能。代码可以在https://github.com/xuhongzuo/rosas/ obtain。

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

  • paper_url: http://arxiv.org/abs/2307.13236
  • repo_url: None
  • paper_authors: Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang
  • for: Audio-visual segmentation (AVS) task, 用于将声音cue integrate into视频帧中的物体 segmentation
  • methods: 提出了一种新的AUDIO-aware query-enhanced TRANSFORMER(AuTR)方法,通过多模态变换架构和声音注意力机制来深度融合和聚合声音-视频特征
  • results: 比前方法更高的性能和更好的泛化能力在多声音和开放集成enario中
    Abstract The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues. However, current fusion-based methods have the performance limitations due to the small receptive field of convolution and inadequate fusion of audio-visual features. To overcome these issues, we propose a novel \textbf{Au}dio-aware query-enhanced \textbf{TR}ansformer (AuTR) to tackle the task. Unlike existing methods, our approach introduces a multimodal transformer architecture that enables deep fusion and aggregation of audio-visual features. Furthermore, we devise an audio-aware query-enhanced transformer decoder that explicitly helps the model focus on the segmentation of the pinpointed sounding objects based on audio signals, while disregarding silent yet salient objects. Experimental results show that our method outperforms previous methods and demonstrates better generalization ability in multi-sound and open-set scenarios.
    摘要 文本:目标是使用音频cue将视频帧中的响应物分 segment。然而,现有的混合方法受限于小感知范围和不足的音频视频特征混合。为解决这些问题,我们提出一种新的听音感知Query加强的 transformer(AuTR)来解决这个任务。与现有方法不同,我们的方法 introduce一种多模态 transformer架构,该架构允许深度融合和音频视频特征的总结。此外,我们设计了一种听音感知Query加强的 transformer解码器,该解码器会在音频信号上显式地帮助模型将注意力集中在音频cue上,而忽略沉默却重要的物体。实验结果表明,我们的方法在多音和开放集成enario中表现出色,并且比前方法更好地普适化。Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

Spectral-DP: Differentially Private Deep Learning through Spectral Perturbation and Filtering

  • paper_url: http://arxiv.org/abs/2307.13231
  • repo_url: None
  • paper_authors: Ce Feng, Nuo Xu, Wujie Wen, Parv Venkitasubramaniam, Caiwen Ding
  • For: This paper proposes a new approach to differentially private deep learning called Spectral-DP, which improves upon existing methods by achieving a desired privacy guarantee with a lower noise scale and thus better utility.* Methods: The paper uses a combination of gradient perturbation in the spectral domain and spectral filtering to achieve differential privacy, and develops methods for both convolutional and fully connected layers.* Results: The paper shows through comprehensive experiments that Spectral-DP has uniformly better utility performance compared to state-of-the-art DP-SGD based approaches, both in training from scratch and transfer learning settings.Here’s the simplified Chinese text:
  • for: 这篇论文提出了一种新的扩展privacy深度学习方法,叫做Spectral-DP,它可以在保持隐私性的同时提高实用性。
  • methods: 该论文使用了spectral domain中的梯度偏移和spectral filtering来实现扩展privacy,并为 convolutional和fully connected层都开发了方法。
  • results: 经过了广泛的实验,论文显示Spectral-DP在训练从头开始和传输学习设置下都有uniformly更好的实用性表现,比采用现有的DP-SGD基于方法更好。
    Abstract Differential privacy is a widely accepted measure of privacy in the context of deep learning algorithms, and achieving it relies on a noisy training approach known as differentially private stochastic gradient descent (DP-SGD). DP-SGD requires direct noise addition to every gradient in a dense neural network, the privacy is achieved at a significant utility cost. In this work, we present Spectral-DP, a new differentially private learning approach which combines gradient perturbation in the spectral domain with spectral filtering to achieve a desired privacy guarantee with a lower noise scale and thus better utility. We develop differentially private deep learning methods based on Spectral-DP for architectures that contain both convolution and fully connected layers. In particular, for fully connected layers, we combine a block-circulant based spatial restructuring with Spectral-DP to achieve better utility. Through comprehensive experiments, we study and provide guidelines to implement Spectral-DP deep learning on benchmark datasets. In comparison with state-of-the-art DP-SGD based approaches, Spectral-DP is shown to have uniformly better utility performance in both training from scratch and transfer learning settings.
    摘要 diffeential privacy 是深度学习算法中广泛接受的隐私标准,实现 diffeential privacy 需要使用带有直接噪声的 dense neural network 的启发式梯度下降(DP-SGD)。DP-SGD 需要在每个梯度上添加直接噪声,以实现隐私,但是这会导致较高的额外成本。在这项工作中,我们介绍 Spectral-DP,一种新的启发式隐私学习方法,它在spectral domain中添加梯度扰动,并通过spectral filtering来实现隐私保证,并且具有较低的噪声级别和更好的实用性。我们开发了基于 Spectral-DP 的启发式深度学习方法,包括具有 convolution 和 fully connected 层的架构。尤其是在 fully connected 层上,我们将 block-circulant 基于的空间重构与 Spectral-DP 结合使用,以实现更好的实用性。通过广泛的实验,我们研究了 Spectral-DP 深度学习的实现方法,并提供了实现指南。与state-of-the-art DP-SGD 基于方法相比,Spectral-DP 在训练从头开始和转移学习设置下具有更好的实用性表现。

A Primer on the Data Cleaning Pipeline

  • paper_url: http://arxiv.org/abs/2307.13219
  • repo_url: None
  • paper_authors: Rebecca C. Steorts
  • for: 这篇论文主要是为了介绍数据整理管道(data cleaning pipeline)的科学,以便在下游任务、预测分析或统计分析中使用“净化数据”。
  • methods: 论文介绍了数据整理管道的四个阶段,包括数据预处理、数据整理、数据检查和数据净化。
  • results: 论文介绍了一些常用的数据整理方法和技术,以及在实际应用中的效果。
    Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, has also grown. Specifically, the science of the ``data cleaning pipeline'' contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on ``cleaned data.'' This article provides a review of this emerging field, introducing technical terminology and commonly used methods.
    摘要 “过去一代,数据库的可用性,包括电子健康数据、社交媒体数据、专利数据和调查等,在快速增长。这种增长也导致了数据集成问题的统计和方法问题的增长。特别是“数据清洁管道”科学中的四个阶段,允许分析员在“净化数据”后进行下游任务、预测分析或统计分析。本文将介绍这个新兴领域,并介绍技术术语和常用方法。”Note: "数据清洁管道" (data cleaning pipeline) is a term used to describe the process of preparing data for analysis, including cleaning, transforming, and integrating data from multiple sources.

FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

  • paper_url: http://arxiv.org/abs/2307.13214
  • repo_url: None
  • paper_authors: Huy Q. Le, Minh N. H. Nguyen, Chu Myaet Thwal, Yu Qiao, Chaoning Zhang, Choong Seon Hong
    for:这个研究旨在提出一个基于联邦学习的多 modal 机器学习架构,以便在多个客户端上实现共同训练一个通用的全球模型,而无需分享私人数据。methods:这个架构使用了一种 semi-supervised 学习方法,以利用不同modalities的表现。它还包括一个发散基于多 modal 嵌入知识转移机制,名为 FedMEKT,可以将服务器和客户端的学习模型中的共同知识转移到参与的客户端上。results:这个研究透过三个多 modal 人类活动识别数据集进行广泛的实验,展示了 FedMEKT 可以实现更好的全球嵌入器性能,并且保护使用者的隐私和模型参数,并且需要较少的通信成本。
    Abstract Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL approaches still rely on the labeled data at the client side, which is limited in real-world applications due to the inability of self-annotation from users. In light of these limitations, we propose a novel multimodal FL framework that employs a semi-supervised learning approach to leverage the representations from different modalities. Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset. Our FedMEKT iteratively updates the generalized global encoders with the joint embedding knowledge from the participating clients. Thereby, to address the modality discrepancy and labeled data constraint in existing FL systems, our proposed FedMEKT comprises local multimodal autoencoder learning, generalized multimodal autoencoder construction, and generalized classifier learning. Through extensive experiments on three multimodal human activity recognition datasets, we demonstrate that FedMEKT achieves superior global encoder performance on linear evaluation and guarantees user privacy for personal data and model parameters while demanding less communication cost than other baselines.
    摘要 联邦学习(FL)可以实现分布式机器学习的分布式机器学习模型,让多个客户端共同训练一个通用全球模型,而不需要分享私人数据。现有大多数工作都只是提议典型的FL系统,因此限制了其在多模态数据上的潜在应用。另外,大多数FL方法仍然依赖客户端上的标注数据,这在实际应用中是有限的,由于用户无法进行自我标注。为了解决这些限制,我们提出了一种新的多模态FL框架,该框架使用半指导学习方法,以利用不同模式的表示。将这个概念引入系统,我们开发了一种基于储革的多模态嵌入知识传递机制,即FedMEKT,该机制让服务器和客户端可以交换归一化的知识。我们的FedMEKT通过迭代更新通用全球编码器,以获取参与客户端的共同归一化知识。因此,我们的提议的FedMEKT包括本地多模态自动编码学习、通用多模态自动编码器建构和通用分类学习。通过对三个多模态人动识别数据集进行广泛的实验,我们证明了FedMEKT可以在线评估中 достичь更高的全球编码器性能,保护用户隐私和个人数据,同时减少通信成本。

Transferability of Graph Neural Networks using Graphon and Sampling Theories

  • paper_url: http://arxiv.org/abs/2307.13206
  • repo_url: None
  • paper_authors: A. Martina Neuman, Jason J. Bramburger
  • for: 本研究旨在应用graphon来提高graph neural network(GNN)的可转移性。
  • methods: 本研究使用了two-layer graphon neural network(WNN)架构,并证明了其能够高效地近似带限信号。
  • results: 研究表明,使用WNN架构可以在不同图形式的数据上保持高度的表现,而且可以在不同图大小之间进行可转移学习。
    Abstract Graph neural networks (GNNs) have become powerful tools for processing graph-based information in various domains. A desirable property of GNNs is transferability, where a trained network can swap in information from a different graph without retraining and retain its accuracy. A recent method of capturing transferability of GNNs is through the use of graphons, which are symmetric, measurable functions representing the limit of large dense graphs. In this work, we contribute to the application of graphons to GNNs by presenting an explicit two-layer graphon neural network (WNN) architecture. We prove its ability to approximate bandlimited signals within a specified error tolerance using a minimal number of network weights. We then leverage this result, to establish the transferability of an explicit two-layer GNN over all sufficiently large graphs in a sequence converging to a graphon. Our work addresses transferability between both deterministic weighted graphs and simple random graphs and overcomes issues related to the curse of dimensionality that arise in other GNN results. The proposed WNN and GNN architectures offer practical solutions for handling graph data of varying sizes while maintaining performance guarantees without extensive retraining.
    摘要 граф neural networks (GNNs) 已成为处理图形信息的有力工具。一个愿望的特性是转移性,其中训练好的网络可以将图形信息交换到另一个图形上,而不需要重新训练,并保持准确性。一种最近提出的捕捉GNNs的转移性的方法是通过图拟函数(graphons),它们是对大量紧凑图的极限函数。在这个工作中,我们对GNNs中的图拟函数应用了Explicit two-layer graphon neural network(WNN)架构。我们证明了它可以在给定的误差范围内近似离散信号,使用最小的网络重量。然后,我们利用这个结论,以证明Explicit two-layer GNN在所有足够大的图上的转移性。我们的工作解决了对权重图和简单随机图之间的转移性问题,并超越了其他GNN结果中的尺度繁殖问题。提出的WNN和GNN架构为处理图数据的不同大小而提供了实用的解决方案,无需进行广泛的重新训练。

Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

  • paper_url: http://arxiv.org/abs/2307.14364
  • repo_url: None
  • paper_authors: Yang Jiao, Kai Yang, Dongjin Song
  • For: + The paper aims to solve the federated distributionally robust optimization (FDRO) problem, which is to find an optimal decision that minimizes the worst-case cost over the ambiguity set of probability distributions in a distributed environment.* Methods: + The proposed algorithm is called Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE). + The algorithm leverages the prior distribution using a new uncertainty set called constrained D-norm uncertainty set.* Results: + The proposed algorithm is guaranteed to converge and the iteration complexity is analyzed. + Extensive empirical studies on real-world datasets demonstrate that the proposed method can achieve fast convergence, remain robust against data heterogeneity and malicious attacks, and trade off robustness with performance.
    Abstract Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been widely applied in diverse applications, e.g., network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to different scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the federated distributionally robust optimization (FDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.
    摘要 Distributionally Robust Optimization (DRO) 目标是找到最佳决策,以最小化不确定性集中的最差成本。这种技术在多个应用中广泛使用,如网络行为分析、风险管理等。然而,现有的 DRO 技术面临三大挑战:1)如何在分布式环境中 asynchronous 更新; 2)如何有效地利用先前分布; 3)如何适当调整不确定性中的度量。为此,我们提出了一个异步分布式算法,名为异步单脚拟合gradient projection(ASPIRE)算法,并与itErative Active SEt方法(EASE)结合以解决联邦分布式不确定优化(FDRO)问题。此外,我们还开发了一个新的不确定集,即受限的 D-norm 不确定集,以有效地利用先前分布并flexibly控制不确定性度量。最后,我们的理论分析表明,提案的算法可以保证收敛,并且迭代复杂性也进行了分析。实际实验表明,提案的方法可以不仅快速收敛,同时也能够对数据不一致和恶意攻击具有抗性。

An Investigation into Glomeruli Detection in Kidney H&E and PAS Images using YOLO

  • paper_url: http://arxiv.org/abs/2307.13199
  • repo_url: https://github.com/AlexeyAB/darknet
  • paper_authors: Kimia Hemmatirad, Morteza Babaie, Jeffrey Hodgin, Liron Pantanowitz, H. R. Tizhoosh
    for: This paper aims to assist pathologists in detecting glomeruli in human kidney images using computerized solutions, specifically by proposing an automated tissue structure detection and segmentation method using the YOLO-v4 object detector.methods: The YOLO-v4 model was trained on whole slide images and fine-tuned on a private dataset from the University of Michigan for glomeruli detection. Multiple experiments were conducted using different training data and stains.results: The model achieved average specificity and sensitivity for all experiments and outperformed existing segmentation methods on the same datasets. However, the design and validation for different stains still depend on the variability of public multi-stain datasets.
    Abstract Context: Analyzing digital pathology images is necessary to draw diagnostic conclusions by investigating tissue patterns and cellular morphology. However, manual evaluation can be time-consuming, expensive, and prone to inter- and intra-observer variability. Objective: To assist pathologists using computerized solutions, automated tissue structure detection and segmentation must be proposed. Furthermore, generating pixel-level object annotations for histopathology images is expensive and time-consuming. As a result, detection models with bounding box labels may be a feasible solution. Design: This paper studies. YOLO-v4 (You-Only-Look-Once), a real-time object detector for microscopic images. YOLO uses a single neural network to predict several bounding boxes and class probabilities for objects of interest. YOLO can enhance detection performance by training on whole slide images. YOLO-v4 has been used in this paper. for glomeruli detection in human kidney images. Multiple experiments have been designed and conducted based on different training data of two public datasets and a private dataset from the University of Michigan for fine-tuning the model. The model was tested on the private dataset from the University of Michigan, serving as an external validation of two different stains, namely hematoxylin and eosin (H&E) and periodic acid-Schiff (PAS). Results: Average specificity and sensitivity for all experiments, and comparison of existing segmentation methods on the same datasets are discussed. Conclusions: Automated glomeruli detection in human kidney images is possible using modern AI models. The design and validation for different stains still depends on variability of public multi-stain datasets.
    摘要 Context: 分析数字 PATHOLOGY 图像是必要的,以便从investigating 组织模式和细胞形态中提取诊断结论。然而,手动评估可能会 consume 时间和成本,并且存在 между观察者和内部观察者的差异。目标:使用计算机化解决方案,自动检测和分割组织结构。此外,为 Histopathology 图像生成像素级对象标注是昂贵和时间consuming。因此,使用 bounding box 标签的检测模型可能是一个可行的解决方案。Design:本文研究了 YOLO-v4(You-Only-Look-Once),一种实时物体检测器,用于微scopic 图像。YOLO 使用单个神经网络预测多个 bounding box 和对象类概率。YOLO 可以通过训练整个扫描图像来提高检测性能。本文使用 YOLO-v4 进行了 glomeruli 检测在人脏肾图像中。基于两个公共数据集和一个由美国密歇根大学提供的私人数据集进行了多个实验,并对模型进行了微调。模型在密歇根大学私人数据集上进行了测试,作为对两种染色物(HE和PAS)的EXternal 验证。Results:本文的结果显示,使用 YOLO-v4 可以自动检测人脏肾中的 glomeruli。average 特异性和敏感性的讨论,以及现有的分 segmentation 方法在同一个数据集上的比较。Conclusions:使用现代 AI 模型,自动检测人脏肾中的 glomeruli 是可能的。然而,设计和验证不同染色物的方法仍然виси于多个公共多染色物数据集的变化。

Knowledge-enhanced Neuro-Symbolic AI for Cybersecurity and Privacy

  • paper_url: http://arxiv.org/abs/2308.02031
  • repo_url: None
  • paper_authors: Aritran Piplai, Anantaa Kotal, Seyedreza Mohseni, Manas Gaur, Sudip Mittal, Anupam Joshi
  • for: This paper is written to explore the potential of Neuro-Symbolic Artificial Intelligence (AI) in addressing the challenges of explainability and safety in AI systems, particularly in the domains of cybersecurity and privacy.
  • methods: The paper uses a combination of neural networks and symbolic knowledge graphs to integrate the strengths of both approaches, enabling AI systems to reason, learn, and generalize in a manner understandable to experts.
  • results: The paper demonstrates the potential of Neuro-Symbolic AI to improve the explainability and safety of AI systems in complex environments, specifically in the domains of cybersecurity and privacy.
    Abstract Neuro-Symbolic Artificial Intelligence (AI) is an emerging and quickly advancing field that combines the subsymbolic strengths of (deep) neural networks and explicit, symbolic knowledge contained in knowledge graphs to enhance explainability and safety in AI systems. This approach addresses a key criticism of current generation systems, namely their inability to generate human-understandable explanations for their outcomes and ensure safe behaviors, especially in scenarios with \textit{unknown unknowns} (e.g. cybersecurity, privacy). The integration of neural networks, which excel at exploring complex data spaces, and symbolic knowledge graphs, which represent domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts. This article describes how applications in cybersecurity and privacy, two most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from Neuro-Symbolic AI.
    摘要 neuromorphic artificial intelligence (AI) 是一个emerging 和rapidly advancing field,它将SUBSYMBOLIC strengths of (deep) neural networks 和Explicit, symbolic knowledge contained in knowledge graphs 搭配,以提高AI系统的可解释性和安全性。这种方法解决了当前一代系统的一个批评,即它们无法生成人理解的解释,特别是在“Unknown unknowns”(例如Cybersecurity, privacy)的场景中。 combining neural networks, which excel at exploring complex data spaces, and symbolic knowledge graphs, which represent domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts。This article describes how applications in cybersecurity and privacy, two most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from Neuro-Symbolic AI。

Counterfactual Explanation Policies in RL

  • paper_url: http://arxiv.org/abs/2307.13192
  • repo_url: None
  • paper_authors: Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal
  • for: 这个论文旨在探讨如何使用对假设的解释来分析RL策略,以便更好地理解RL策略的含义。
  • methods: 该论文提出了一种新的Counterpol方法,通过在RL中 incorporating counterfactuals in supervised learning,以生成对策略的解释。
  • results: 实验结果表明,Counterpol可以帮助分析RL策略,并且可以在不同的状态和动作空间中工作。
    Abstract As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in mapping observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL. Extensive empirical analysis shows the efficacy of COUNTERPOL in generating explanations for (un)learning skills while keeping close to the original policy. Our results on five different RL environments with diverse state and action spaces demonstrate the utility of counterfactual explanations, paving the way for new frontiers in designing and developing counterfactual policies.
    摘要 “在增强学习(Reinforcement Learning,RL)Agent increasingly 应用于多元决策问题中,使得Policy learned by these frameworks in mapping observations to a probability distribution of possible actions的可读性变得越来越重要。然而,有很少关于这些复杂的策略的系统性理解,即对策略的改进或恶化的最小变化。在这种情况下,我们提出了COUNTERPOL,第一个使用对假设的批评性解释来分析RL策略的框架。我们通过在RL中 incorporating counterfactuals in supervised learning with the target outcome regulated using desired return来实现这一点。我们还证明了COUNTERPOL和常用的信任区基于RL策略优化方法之间的理论联系。我们的实验结果表明,COUNTERPOL能够生成高效的解释,并保持与原始策略几乎相同。我们在五个不同的RL环境中进行了Extensive empirical analysis,这些环境包括多种状态和动作空间。我们的结果表明,对假设的批评性解释可以帮助RL Agent 学习和改进策略,开辟出新的前iers in designing and developing counterfactual policies。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.

Neural Memory Decoding with EEG Data and Representation Learning

  • paper_url: http://arxiv.org/abs/2307.13181
  • repo_url: None
  • paper_authors: Glenn Bruns, Michael Haidar, Federico Rubino
  • for: 本研究描述了一种使用神经网络快速解oding记忆的方法,用于从EEG数据中提取记忆。
  • methods: 该方法使用深度表示学习,并使用supervised contrastive损失来将EEG记录转换到低维度空间中。
  • results: 该方法可以在EEG数据中准确地识别概念,其准确率为About 78.4%( chance 4%)。此外,该方法还应用于信息检索问题,可以生成基于EEG数据的预测文档列表。
    Abstract We describe a method for the neural decoding of memory from EEG data. Using this method, a concept being recalled can be identified from an EEG trace with an average top-1 accuracy of about 78.4% (chance 4%). The method employs deep representation learning with supervised contrastive loss to map an EEG recording of brain activity to a low-dimensional space. Because representation learning is used, concepts can be identified even if they do not appear in the training data set. However, reference EEG data must exist for each such concept. We also show an application of the method to the problem of information retrieval. In neural information retrieval, EEG data is captured while a user recalls the contents of a document, and a list of links to predicted documents is produced.
    摘要 我们描述了一种使用神经网络对电enzephalogram(EEG)数据进行记忆解oding的方法。使用这种方法,在EEG追踪记录中检测到的概念可以与高达78.4%的准确率(Random 4%)进行匹配。该方法利用深度表示学习和监督对比损失来将EEG记录的脑动activty映射到低维度空间中。由于使用表示学习,即使概念没有出现在训练数据集中,也可以正确地识别出它们。然而,每个概念都需要相应的参照EEG数据。我们还展示了该方法的应用于信息检索问题。在神经信息检索中,EEG数据被记录在用户回忆文档内容时,并生成一个预测文档列表。

Evaluating the reliability of automatically generated pedestrian and bicycle crash surrogates

  • paper_url: http://arxiv.org/abs/2307.13178
  • repo_url: None
  • paper_authors: Agnimitra Sengupta, S. Ilgin Guler, Vikash V. Gayah, Shannon Warchol
  • For: This study aims to assess the reliability of automatically generated surrogates in predicting confirmed conflicts involving vulnerable road users (VRUs) at signalized intersections.* Methods: The study uses a video-based event monitoring system to collect data on VRU and motor vehicle interactions at 15 signalized intersections in Pennsylvania. Advanced data-driven models are used to analyze the surrogate data, including automatically collectable variables such as speeds, movements, and post-encroachment time, as well as manually collected variables like signal states, lighting, and weather conditions.* Results: The findings highlight the varying importance of specific surrogates in predicting true conflicts, with some being more informative than others. The results can assist transportation agencies in prioritizing infrastructure investments, such as bike lanes and crosswalks, and evaluating their effectiveness.
    Abstract Vulnerable road users (VRUs), such as pedestrians and bicyclists, are at a higher risk of being involved in crashes with motor vehicles, and crashes involving VRUs also are more likely to result in severe injuries or fatalities. Signalized intersections are a major safety concern for VRUs due to their complex and dynamic nature, highlighting the need to understand how these road users interact with motor vehicles and deploy evidence-based countermeasures to improve safety performance. Crashes involving VRUs are relatively infrequent, making it difficult to understand the underlying contributing factors. An alternative is to identify and use conflicts between VRUs and motorized vehicles as a surrogate for safety performance. Automatically detecting these conflicts using a video-based systems is a crucial step in developing smart infrastructure to enhance VRU safety. The Pennsylvania Department of Transportation conducted a study using video-based event monitoring system to assess VRU and motor vehicle interactions at fifteen signalized intersections across Pennsylvania to improve VRU safety performance. This research builds on that study to assess the reliability of automatically generated surrogates in predicting confirmed conflicts using advanced data-driven models. The surrogate data used for analysis include automatically collectable variables such as vehicular and VRU speeds, movements, post-encroachment time, in addition to manually collected variables like signal states, lighting, and weather conditions. The findings highlight the varying importance of specific surrogates in predicting true conflicts, some being more informative than others. The findings can assist transportation agencies to collect the right types of data to help prioritize infrastructure investments, such as bike lanes and crosswalks, and evaluate their effectiveness.
    摘要 易受损的路用者(VRU),如行人和自行车客,与机动车在交通冲突中更容易发生事故,事故也更可能导致严重的伤害或死亡。信号控制的交叉口是VRU安全的主要问题,因为它们的复杂性和动态性使得需要更好地了解路用者与机动车之间的互动,并采取基于证据的减少措施以提高安全性。与机动车相撞的事故 rarely occurs,这使得Difficult to understand the underlying contributing factors. An alternative is to identify and use conflicts between VRUs and motorized vehicles as a surrogate for safety performance. Using a video-based system to automatically detect these conflicts is a crucial step in developing smart infrastructure to enhance VRU safety.在美国宾夕法尼亚州交通部的一项研究中,使用视频基本监测系统评估了全州十五个信号控制交叉口中VRU和机动车之间的互动,以提高VRU安全性。本研究基于这项研究,以自动生成的代理数据来评估确定的冲突。代理数据包括自动收集的变量,如机动车和VRU的速度、移动、后续时间,以及手动收集的变量,如信号状态、灯光和天气情况。研究发现代理数据中的特定变量在预测真正的冲突中扮演着不同的重要性。这些发现可以帮助交通部门收集适当的数据,以便优先投入基础设施投资,如自行车道和横渡道,并评估其效果。

Unsupervised reconstruction of accelerated cardiac cine MRI using Neural Fields

  • paper_url: http://arxiv.org/abs/2307.14363
  • repo_url: None
  • paper_authors: Tabita Catalán, Matías Courdurier, Axel Osses, René Botnar, Francisco Sahli Costabal, Claudia Prieto
  • for: 这项研究的目的是提出一种无监督的深度学习方法来加速卡ди亚维度MRI的重建。
  • methods: 该方法基于启发型神经场表示,并在实验中使用金字塔型多晶维度收集器进行受损样本收集。
  • results: 实验结果表明,该方法可以在实验室中实现高质量的卡ди亚维度MRI重建,并且比传统方法更具有空间和时间表示能力。
    Abstract Cardiac cine MRI is the gold standard for cardiac functional assessment, but the inherently slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available. In this work, we propose an unsupervised approach based on implicit neural field representations for cardiac cine MRI (so called NF-cMRI). The proposed method was evaluated in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 26x and 52x, achieving good image quality, and comparable spatial and improved temporal depiction than a state-of-the-art reconstruction technique.
    摘要 卡ди亚金枪MRI是心脏功能评估的标准,但它的自然slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available. In this work, we propose an unsupervised approach based on implicit neural field representations for cardiac cine MRI (so called NF-cMRI). The proposed method was evaluated in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 26x and 52x, achieving good image quality, and comparable spatial and improved temporal depiction than a state-of-the-art reconstruction technique.Here's the translation in Traditional Chinese:卡迪亚金枪MRI是心脏功能评估的标准,但它的自然slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available. In this work, we propose an unsupervised approach based on implicit neural field representations for cardiac cine MRI (so called NF-cMRI). The proposed method was evaluated in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 26x and 52x, achieving good image quality, and comparable spatial and improved temporal depiction than a state-of-the-art reconstruction technique.

Multi-UAV Speed Control with Collision Avoidance and Handover-aware Cell Association: DRL with Action Branching

  • paper_url: http://arxiv.org/abs/2307.13158
  • repo_url: None
  • paper_authors: Zijiang Yan, Wael Jaafar, Bassant Selim, Hina Tabassum
  • for: 提高运输和通信性能,包括碰撞避免、连接稳定和交换。
  • methods: 使用深度强化学习解决多架空交通干道上UAV协调决策和移动速度优化问题,形式化为Markov决策过程(MDP),UAV状态定义为速度和通信数据率。提议一种神经网络架构,具有共享决策模块和多个网络分支,每个分支专门处理特定的行动维度在2D交通通信空间。
  • results: 通过 simulation 结果表明,与现有 refer bench 比较,该方法可以提高18.32%。
    Abstract This paper presents a deep reinforcement learning solution for optimizing multi-UAV cell-association decisions and their moving velocity on a 3D aerial highway. The objective is to enhance transportation and communication performance, including collision avoidance, connectivity, and handovers. The problem is formulated as a Markov decision process (MDP) with UAVs' states defined by velocities and communication data rates. We propose a neural architecture with a shared decision module and multiple network branches, each dedicated to a specific action dimension in a 2D transportation-communication space. This design efficiently handles the multi-dimensional action space, allowing independence for individual action dimensions. We introduce two models, Branching Dueling Q-Network (BDQ) and Branching Dueling Double Deep Q-Network (Dueling DDQN), to demonstrate the approach. Simulation results show a significant improvement of 18.32% compared to existing benchmarks.
    摘要 To address the multi-dimensional action space, the proposed solution features a neural architecture with a shared decision module and multiple network branches, each dedicated to a specific action dimension in a 2D transportation-communication space. This design enables independence for individual action dimensions.Two models, Branching Dueling Q-Network (BDQ) and Branching Dueling Double Deep Q-Network (Dueling DDQN), are introduced to demonstrate the approach. Simulation results show a significant improvement of 18.32% compared to existing benchmarks.translate to 简化中文 as follows:这篇论文提出了一种深度强化学习解决方案,用于优化多架航空器(UAV)的维度协调决策和移动速度在3D空中高速公路上。目标是提高交通和通信性能,包括避免冲突、连接和交换。问题被形式化为一个Markov决策过程(MDP),UAV的状态被定义为速度和通信数据速率。为了处理多维动作空间,提议的解决方案具有一个共享决策模块和多个网络分支,每个分支专门处理一个特定的动作维度在2D交通通信空间中。这种设计允许每个动作维度独立进行决策。提议的解决方案还 introduce了两种模型:分支决策网络(BDQ)和分支决策双深度网络(DDQN),以示解决方案。实验结果显示,与现有 referential 比较,提议的方案具有18.32%的显著提高。

Discovering interpretable elastoplasticity models via the neural polynomial method enabled symbolic regressions

  • paper_url: http://arxiv.org/abs/2307.13149
  • repo_url: None
  • paper_authors: Bahador Bahmani, Hyoung Suk Suh, WaiChing Sun
  • for: 这篇论文旨在提出一种两步机器学习方法,以帮助解释神经网络模型的含义。
  • methods: 这种方法首先使用超级vised学习获得单变量特征映射,然后使用符号回归将这些映射转化为数学模型。
  • results: 这种方法可以解决神经网络模型的解释性问题,同时提供了一些优点,如缩放问题的解决和代码的可重用性。
    Abstract Conventional neural network elastoplasticity models are often perceived as lacking interpretability. This paper introduces a two-step machine-learning approach that returns mathematical models interpretable by human experts. In particular, we introduce a surrogate model where yield surfaces are expressed in terms of a set of single-variable feature mappings obtained from supervised learning. A postprocessing step is then used to re-interpret the set of single-variable neural network mapping functions into mathematical form through symbolic regression. This divide-and-conquer approach provides several important advantages. First, it enables us to overcome the scaling issue of symbolic regression algorithms. From a practical perspective, it enhances the portability of learned models for partial differential equation solvers written in different programming languages. Finally, it enables us to have a concrete understanding of the attributes of the materials, such as convexity and symmetries of models, through automated derivations and reasoning. Numerical examples have been provided, along with an open-source code to enable third-party validation.
    摘要
  1. Scalability: The symbolic regression algorithms are able to handle large datasets.2. Portability: The learned models can be easily integrated into partial differential equation solvers written in different programming languages.3. Interpretability: The approach provides a concrete understanding of the attributes of the materials, such as convexity and symmetries of models, through automated derivations and reasoning.Numerical examples are provided, along with open-source code for third-party validation.

Learnable wavelet neural networks for cosmological inference

  • paper_url: http://arxiv.org/abs/2307.14362
  • repo_url: https://github.com/chris-pedersen/learnablewavelets
  • paper_authors: Christian Pedersen, Michael Eickenberg, Shirley Ho
  • for: cosmological inference and marginalisation over astrophysical effects
  • methods: 使用可学习散射转换,一种基于 convolutional neural network 的trainable wavelets滤波器
  • results: 比 CNN 更高效,特别是对小训练数据样本; 提供可读的干扰网络Here’s the breakdown of each point:1. for: The paper is written for the purpose of cosmological inference and marginalisation over astrophysical effects using the learnable scattering transform.2. methods: The paper uses the learnable scattering transform, which is a type of convolutional neural network that utilizes trainable wavelets as filters, to perform cosmological inference and marginalisation over astrophysical effects.3. results: The paper shows that scattering architectures can outperform a convolutional neural network (CNN) in terms of performance, especially when dealing with small training data samples. Additionally, the paper presents a lightweight scattering network that is highly interpretable.
    Abstract Convolutional neural networks (CNNs) have been shown to both extract more information than the traditional two-point statistics from cosmological fields, and marginalise over astrophysical effects extremely well. However, CNNs require large amounts of training data, which is potentially problematic in the domain of expensive cosmological simulations, and it is difficult to interpret the network. In this work we apply the learnable scattering transform, a kind of convolutional neural network that uses trainable wavelets as filters, to the problem of cosmological inference and marginalisation over astrophysical effects. We present two models based on the scattering transform, one constructed for performance, and one constructed for interpretability, and perform a comparison with a CNN. We find that scattering architectures are able to outperform a CNN, significantly in the case of small training data samples. Additionally we present a lightweight scattering network that is highly interpretable.
    摘要 卷积神经网络(CNN)能够提取更多的信息于 cosmological 场的传统两点统计,并且很好地把astrophysical效应卷积。但是,CNN需要大量的训练数据,这可能是costly cosmological simulations中的问题,而且很难 interpret the network。在这个工作中,我们使用可学习的散射变换,一种基于trainable wavelets的卷积神经网络,来解决 cosmological inference和astrophysical effects的卷积。我们提出了两种基于散射变换的模型,一种是 для性能,另一种是 для可读性,并与CNN进行比较。我们发现,散射架构能够超过CNN,特别是在小训练样本情况下。此外,我们还提出了一个轻量级的散射网络,具有很好的可读性。

Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation Framework

  • paper_url: http://arxiv.org/abs/2307.13147
  • repo_url: https://github.com/floriankrach/pd-njode
  • paper_authors: William Andersson, Jakob Heiss, Florian Krach, Josef Teichmann
  • for: 预测连续时间随机过程,具有不规则和部分观测。
  • methods: 使用Path-Dependent Neural Jump ODE(PD-NJ-ODE)模型,学习最佳预测给 irregularly sampled时间序列的受限观测。
  • results: 提供了两种扩展,使模型能够满足不独立的时间序列和均匀观测,并且提供了理论保证和实际示例。
    Abstract The Path-Dependent Neural Jump ODE (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them.
    摘要 “path-dependent neural jump ODE(PD-NJ-ODE)是一种预测连续时间随机过程的模型,特别是在含有异常和缺失观测的时间序列上。在现有的方法中,这种方法学习最佳预测, givens irregularly sampled time series of incomplete past observations。在这篇文章中,我们讨论了两种扩展,以增强这种模型的应用范围和提供理论保证。”Here's a breakdown of the translation:* "Path-Dependent Neural Jump ODE" (PD-NJ-ODE) is translated as "path-dependent neural jump ODE" (PD-NJ-ODE), which is a direct translation of the English name.* "in particular" is translated as "特别是" (tèbié shì), which is a common way to emphasize a specific aspect of the previous statement.* "the method learns optimal forecasts given irregularly sampled time series of incomplete past observations" is translated as "这种方法学习最佳预测, givens irregularly sampled time series of incomplete past observations" (zhè zhǒng fāng yào xué xí zhì yì, gěi yìn zhèng yè qián zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "givens" to clarify the context.* "So far the process itself and the coordinate-wise observation times were assumed to be independent" is translated as "在现有的方法中,这种过程自身和坐标点wise的观测时间被认为是独立的" (zhè zhǒng fāng yào xiàng yì, zhè zhǒng fāng yào xiàng yì zhèng yè qián zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word " coordinate-wise" to clarify the context.* "and observations were assumed to be noiseless" is translated as "并且观测被认为是噪声的" (bèi qǐ zhèng yì, gòu yì zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "噪声" (noise) to clarify the context.* "In this work we discuss two extensions to lift these restrictions" is translated as "在这篇文章中,我们讨论了两种扩展,以增强这种模型的应用范围" (zhè zhǒng fāng yào xiàng yì, wǒmen tiǎo yì zhèng zhèng yīn zhèng yè qián zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "扩展" (extension) to clarify the context.* "and provide theoretical guarantees as well as empirical examples for them" is translated as "并提供理论保证,以及实证示例" (bèi tīng yì zhèng, yǐng qǐ zhèng yì). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "理论保证" (theoretical guarantees) to clarify the context.

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

  • paper_url: http://arxiv.org/abs/2307.13136
  • repo_url: None
  • paper_authors: Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
    for:* 这种研究旨在测试对象识别模型在不同地区的普遍性。methods:* 使用了两个 datasets of objects from households across the globe,并进行了大量的实验研究。results:* 发现标准 benchmarks 不能准确度量实际世界中的普遍性,而实际的地区差异导致模型的性能差异较大。* 发现规模alone 不能 garantate 实际世界中的一致性,而在早期实验中,简单地 retrained 最后一层的数据可以减少地区差异。
    Abstract For more than a decade, researchers have measured progress in object recognition on ImageNet-based generalization benchmarks such as ImageNet-A, -C, and -R. Recent advances in foundation models, trained on orders of magnitude more data, have begun to saturate these standard benchmarks, but remain brittle in practice. This suggests standard benchmarks, which tend to focus on predefined or synthetic changes, may not be sufficient for measuring real world generalization. Consequently, we propose studying generalization across geography as a more realistic measure of progress using two datasets of objects from households across the globe. We conduct an extensive empirical evaluation of progress across nearly 100 vision models up to most recent foundation models. We first identify a progress gap between standard benchmarks and real-world, geographical shifts: progress on ImageNet results in up to 2.5x more progress on standard generalization benchmarks than real-world distribution shifts. Second, we study model generalization across geographies by measuring the disparities in performance across regions, a more fine-grained measure of real world generalization. We observe all models have large geographic disparities, even foundation CLIP models, with differences of 7-20% in accuracy between regions. Counter to modern intuition, we discover progress on standard benchmarks fails to improve geographic disparities and often exacerbates them: geographic disparities between the least performant models and today's best models have more than tripled. Our results suggest scaling alone is insufficient for consistent robustness to real-world distribution shifts. Finally, we highlight in early experiments how simple last layer retraining on more representative, curated data can complement scaling as a promising direction of future work, reducing geographic disparity on both benchmarks by over two-thirds.
    摘要 We conducted an extensive empirical evaluation of progress across nearly 100 vision models, including the most recent foundation models. Our results reveal a significant gap between progress on ImageNet and real-world, geographical shifts. Specifically, we found that progress on ImageNet results in up to 2.5 times more progress on standard generalization benchmarks than real-world distribution shifts.Furthermore, we studied model generalization across geographies by measuring the disparities in performance across regions, providing a more fine-grained measure of real-world generalization. Our findings show that all models, including foundation CLIP models, exhibit large geographic disparities, with differences of 7-20% in accuracy between regions. Surprisingly, we found that progress on standard benchmarks does not improve geographic disparities and often exacerbates them. In fact, the geographic disparities between the least performant models and today's best models have more than tripled.Our results suggest that scaling alone is insufficient for achieving consistent robustness to real-world distribution shifts. However, we discovered that simple last layer retraining on more representative, curated data can complement scaling as a promising direction of future work. By doing so, we were able to reduce geographic disparity on both benchmarks by over two-thirds. Our findings highlight the importance of considering real-world distribution shifts when evaluating progress in object recognition and provide a new direction for future research.

simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects

  • paper_url: http://arxiv.org/abs/2307.13133
  • repo_url: None
  • paper_authors: Maria Bauza, Antonia Bronars, Yifan Hou, Ian Taylor, Nikhil Chavan-Dafle, Alberto Rodriguez
  • for: 这篇论文旨在解决 robotic manipulation 中的一般性和精度之间的矛盾。
  • methods: 该论文提出了一种叫做 simPLE 的解决方案,用于精度的 pick-and-place 任务。 simPLE 包括三个主要组成部分:任务意识 grasping,视听感知和重新抓取规划。
  • results: 在一个 dual-arm 机器人上,使用 simPLE 可以成功地完成 15 种不同的物体的 pick-and-place 任务,其中 6 种物体的成功率高于 90%,而剩下 11 种物体的成功率高于 80%。视频可以在 http://mcube.mit.edu/research/simPLE.html 上查看。
    Abstract Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects. Videos are available at http://mcube.mit.edu/research/simPLE.html .
    摘要 现有的 роботиче系统存在一种明显的矛盾:一个机器人可以解决一个特定任务,但缺乏高级别的总体化能力,即能够解决多个任务而不失去精度。这篇论文探讨精度和通用性的pick-and-place解决方案。在精度的pick-and-place任务中,机器人将无结构的物品排序成结构化的排序,以便进一步的操作。我们提出了simPLE(从模拟到选择和置入)解决方案,该方案可以帮助机器人准确地找到、重新抓取和置入物品,只需要物品的CAD模型,无需互联网经验。我们开发了三个主要组成部分:任务意识 grasping、视听感知和重新抓取规划。任务意识 grasping计算可行的抓取方式,以确保稳定、可见和置入的情况。视听感知模型通过对实际观察与 simulate 的对比,通过超过学习来学习实际观察。最后,我们解决了一个短路问题,以计算手中重新抓取的最佳动作。在配备视听感知的双臂机器人上,我们用simPLE实现了15种多样化的物品的pick-and-place任务,物品的形状覆盖广泛,simPLE在90%的时间内成功地将物品置入结构化的排序中,距离1毫米。视频可以在http://mcube.mit.edu/research/simPLE.html 中找到。

A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

  • paper_url: http://arxiv.org/abs/2307.13127
  • repo_url: None
  • paper_authors: Spencer Giddens, Yiwang Zhou, Kevin R. Krull, Tara M. Brinkman, Peter X. K. Song, Fang Liu
  • For: 这个论文的目的是提出一种在使用敏感数据时实现隐私保护的可靠学习方法,以解决数据隐私issue。* Methods: 该论文使用了 differential privacy(DP)框架,并提出了一种基于 weights ERM 的首个具有隐私保证的算法,以及一种对现有 DP-ERM 方法的推广。* Results: 实验研究表明,通过在 wERM 中应用 DP 保证可以train OWL 模型,而不会导致模型性能下降。这种隐私保护的 OWL 方法在实验中得到了证明,并在实际临床试验中得到了应用。
    Abstract It is commonplace to use data containing personal information to build predictive models in the framework of empirical risk minimization (ERM). While these models can be highly accurate in prediction, results obtained from these models with the use of sensitive data may be susceptible to privacy attacks. Differential privacy (DP) is an appealing framework for addressing such data privacy issues by providing mathematically provable bounds on the privacy loss incurred when releasing information from sensitive data. Previous work has primarily concentrated on applying DP to unweighted ERM. We consider an important generalization to weighted ERM (wERM). In wERM, each individual's contribution to the objective function can be assigned varying weights. In this context, we propose the first differentially private wERM algorithm, backed by a rigorous theoretical proof of its DP guarantees under mild regularity conditions. Extending the existing DP-ERM procedures to wERM paves a path to deriving privacy-preserving learning methods for individualized treatment rules, including the popular outcome weighted learning (OWL). We evaluate the performance of the DP-wERM application to OWL in a simulation study and in a real clinical trial of melatonin for sleep health. All empirical results demonstrate the viability of training OWL models via wERM with DP guarantees while maintaining sufficiently useful model performance. Therefore, we recommend practitioners consider implementing the proposed privacy-preserving OWL procedure in real-world scenarios involving sensitive data.
    摘要 通常使用包含个人信息的数据来构建预测模型在预测风险最小化(ERM)框架中。虽然这些模型可以在预测上具有非常高的准确性,但使用敏感数据时可能会遭受隐私攻击。不等式隐私(DP)是一个吸引人的框架,可以为隐私泄露提供数学上可证明的约束。前一个研究主要集中在应用DP于不加权的ERM。我们考虑了一个重要的扩展,即加权ERM(wERM)。在wERM中,每个个体的对象函数中的贡献可以被分配不同的权重。在这种情况下,我们提出了首个具有DP保证的wERM算法,并提供了在轻度Regularity Conditions下的理论证明。通过扩展现有的DP-ERM过程,我们开辟了一条privacy-preserving学习方法的道路,包括流行的结果权重学习(OWL)。我们在一个Simulation Study和一个实际的临床试验中评估了DP-wERM应用于OWL的性能。所有实验结果表明,可以通过wERM WITH DP保证来训练OWL模型,而不会丧失有用的模型性能。因此,我们建议实践者在涉及敏感数据的实际场景中考虑实施我们提议的隐私保护OWL过程。

A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe

  • paper_url: http://arxiv.org/abs/2307.14361
  • repo_url: None
  • paper_authors: Sanad Aburass, Osama Dorgham, Jamil Al Shaqsi
  • for: 这个研究旨在使用Kaggle的个性化医学:重新定义癌症治疗数据集来类型基因变化。
  • methods: 该模型 combinig LSTM, BiLSTM, CNN, GRU, 以及GloVe来进行基因变化分类。
  • results: 该模型在准确率、精度、回归率、F1分数和平均方差方面与其他模型进行比较,并取得了最高的表现。此外,它还需要较少的训练时间,从而实现了性能和效率的完美协同。
    Abstract This study presents an ensemble model combining LSTM, BiLSTM, CNN, GRU, and GloVe to classify gene mutations using Kaggle's Personalized Medicine: Redefining Cancer Treatment dataset. The results were compared against well-known transformers like as BERT, Electra, Roberta, XLNet, Distilbert, and their LSTM ensembles. Our model outperformed all other models in terms of accuracy, precision, recall, F1 score, and Mean Squared Error. Surprisingly, it also needed less training time, resulting in a perfect combination of performance and efficiency. This study demonstrates the utility of ensemble models for difficult tasks such as gene mutation classification.
    摘要 Note:* LSTM: Long Short-Term Memory* BiLSTM: Bidirectional Long Short-Term Memory* CNN: Convolutional Neural Network* GRU: Gated Recurrent Unit* GloVe: Global Vectors for Word Representation* BERT: Bidirectional Encoder Representations from Transformers* Electra: Efficient Lifelong End-to-End Text Recognition* Roberta: Robustly Optimized BERT Pretraining Approach* XLNet: Extreme Language Modeling* Distilbert: Distilled BERTPlease note that the translation is in Simplified Chinese, and the words and phrases in bold are the names of the models and techniques used in the study.

Deep Bradley-Terry Rating: Quantifying Properties from Comparisons

  • paper_url: http://arxiv.org/abs/2307.13709
  • repo_url: None
  • paper_authors: Satoru Fujii
  • for: 这篇论文的目的是为了量化和评估未知对象的属性。
  • methods: 这篇论文使用了深度学习框架,并将布莱德利-泰勒模型 integrate 到 neural network 结构中。此外,它还推广到不平等环境下,以便更好地应用于实际场景。
  • results: 经过实验分析,这篇论文成功地量化和估算了欲要的属性。
    Abstract Many properties in the real world can't be directly observed, making them difficult to learn. To deal with this challenging problem, prior works have primarily focused on estimating those properties by using graded human scores as the target label in the training. Meanwhile, rating algorithms based on the Bradley-Terry model are extensively studied to evaluate the competitiveness of players based on their match history. In this paper, we introduce the Deep Bradley-Terry Rating (DBTR), a novel machine learning framework designed to quantify and evaluate properties of unknown items. Our method seamlessly integrates the Bradley-Terry model into the neural network structure. Moreover, we generalize this architecture further to asymmetric environments with unfairness, a condition more commonly encountered in real-world settings. Through experimental analysis, we demonstrate that DBTR successfully learns to quantify and estimate desired properties.
    摘要 多种Properties在现实世界中难以直接观察,这使得它们学习变得困难。以前的工作主要通过使用排名作为目标标签进行估算这些Properties。而BRADLEY-TERRY模型的评分算法在评估玩家的竞争力方面得到了广泛的研究。在这篇论文中,我们引入了深度BRADLEY-TERRY评分(DBTR),一种新的机器学习框架,用于评估和评价未知项目的属性。我们将BRADLEY-TERRY模型与神经网络结构结合,并将其扩展到偏袋环境中,以适应实际世界中更常见的不平等条件。通过实验分析,我们证明了DBTR成功地学习和估算所需的属性。

Conformal prediction for frequency-severity modeling

  • paper_url: http://arxiv.org/abs/2307.13124
  • repo_url: https://github.com/heltongraziadei/conformal-fs
  • paper_authors: Helton Graziadei, Paulo C. Marques F., Eduardo F. L. de Melo, Rodrigo S. Targino
  • for: 预测保险索赔数量
  • methods: 非参数模型无关框架、分割兼容预测、基于出袋机制的适应宽度调整
  • results: 适用于实际数据集和仿真数据集,可以生成具有finite sample statistcial guarantees的预测间隔
    Abstract We present a nonparametric model-agnostic framework for building prediction intervals of insurance claims, with finite sample statistical guarantees, extending the technique of split conformal prediction to the domain of two-stage frequency-severity modeling. The effectiveness of the framework is showcased with simulated and real datasets. When the underlying severity model is a random forest, we extend the two-stage split conformal prediction procedure, showing how the out-of-bag mechanism can be leveraged to eliminate the need for a calibration set and to enable the production of prediction intervals with adaptive width.
    摘要 我们提出了一种非参数化、模型无关的框架,用于构建保险索赔预测范围,具有有限样本统计保证,基于分割哲学预测技术扩展到两个阶段频率严重模型预测领域。我们通过使用 simulate 和实际数据示例,证明了该框架的效iveness。当下面际严重模型是随机森林时,我们对两个阶段分割哲学预测过程进行扩展,并示出了如何通过尝试机制来消除需要Calibration集和生成适应宽度的预测范围。

An Explainable Geometric-Weighted Graph Attention Network for Identifying Functional Networks Associated with Gait Impairment

  • paper_url: http://arxiv.org/abs/2307.13108
  • repo_url: https://github.com/favour-nerrise/xgw-gat
  • paper_authors: Favour Nerrise, Qingyu Zhao, Kathleen L. Poston, Kilian M. Pohl, Ehsan Adeli
    for:这个研究旨在更好地理解帕金森病(PD)的运动进程,特别是步行障碍和平衡问题的发展。通过识别大脑功能失调的特征,可以更好地理解PD的运动进程,从而开发更有效和个性化的治疗方法。methods:这个研究使用了一种可解释的、几何学的、质量权重 graphs attention neural network(xGW-GAT),用于预测帕金森病患者的步行障碍级别。xGW-GAT使用了函数连接矩阵来表示整个连接网络,并使用个性化的注意力mask来提取个体和群体水平的解释。results:这个研究发现,xGW-GAT在resting-state功能磁共振成像(rs-fMRI)数据集中的帕金森病患者中出色地预测了步行障碍级别,并提供了解释性的功能子网络结构。与现有的方法相比,xGW-GAT模型成功地超越了其他方法,同时揭示了临床有关的连接模式。
    Abstract One of the hallmark symptoms of Parkinson's Disease (PD) is the progressive loss of postural reflexes, which eventually leads to gait difficulties and balance problems. Identifying disruptions in brain function associated with gait impairment could be crucial in better understanding PD motor progression, thus advancing the development of more effective and personalized therapeutics. In this work, we present an explainable, geometric, weighted-graph attention neural network (xGW-GAT) to identify functional networks predictive of the progression of gait difficulties in individuals with PD. xGW-GAT predicts the multi-class gait impairment on the MDS Unified PD Rating Scale (MDS-UPDRS). Our computational- and data-efficient model represents functional connectomes as symmetric positive definite (SPD) matrices on a Riemannian manifold to explicitly encode pairwise interactions of entire connectomes, based on which we learn an attention mask yielding individual- and group-level explainability. Applied to our resting-state functional MRI (rs-fMRI) dataset of individuals with PD, xGW-GAT identifies functional connectivity patterns associated with gait impairment in PD and offers interpretable explanations of functional subnetworks associated with motor impairment. Our model successfully outperforms several existing methods while simultaneously revealing clinically-relevant connectivity patterns. The source code is available at https://github.com/favour-nerrise/xGW-GAT .
    摘要 一个典型的parkinson病(PD)的表现之一是慢慢地失去姿态反射,这会导致步态困难和平衡问题。确定潜在的脑功能干预在步态困难方面可能是理解PD motor进程的关键,从而开发更有效和个性化的治疗方法。在这项工作中,我们提出了一种可解释的、几何的、质量权重的神经网络(xGW-GAT),用于预测PD患者的步态困难级别。xGW-GAT预测了MDS联合PD评估rating scale(MDS-UPDRS)中的多个步态困难类型。我们的计算和数据有效的模型将功能连接矩阵(connectome)表示为对称正定的矩阵(SPD),并在RIemannian manifold上进行Explicitly编码对整个connectome的对称对应关系,基于这些对应关系我们学习一个注意力mask,以获得个体和组级别的解释。应用于我们的resting-state功能MRI(rs-fMRI)数据集中的PD患者,xGW-GAT确定了与步态困难相关的功能连接模式,并提供了可解释的功能子网络相关于运动障碍。我们的模型成功击败了一些现有的方法,同时揭示了临床有用的连接模式。模型代码可以在https://github.com/favour-nerrise/xGW-GAT 上获取。

Contrastive Example-Based Control

  • paper_url: http://arxiv.org/abs/2307.13101
  • repo_url: https://github.com/khatch31/laeo
  • paper_authors: Kyle Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov, Sergey Levine, Chelsea Finn
  • for: 本研究旨在提出一种基于例子的离线控制方法,可以学习多步转移的隐藏模型,而不需要 specify 奖励函数。
  • methods: 本方法使用了数据驱动的方法,通过从转移动力学中采样得到的例子来学习隐藏模型,并使用这些模型来预测转移的Q值。
  • results: 相比基elines,本方法在多个状态基于和图像基于的离线控制任务中表现出色,并且在数据集大小增加时显示了更好的稳定性和扩展性。
    Abstract While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.
    摘要 虽然许多实际问题可以受惠于强化学习,但这些问题很少遵循MDP模型:与环境交互通常是昂贵的,并且指定奖励函数是困难的。为了解决这些挑战,先前的工作已经开发了基于数据的方法,这些方法通过从转移动态和高返回状态中提取样本来学习。这些方法通常会学习一个奖励函数从高返回状态,使用这个奖励函数来标注转移,然后应用在这些转移上的离线RL算法。虽然这些方法可以在许多任务上达到良好的结果,但它们可能是复杂的,经常需要规则化和时间差更新。在这篇论文中,我们提出了一种离线、示例基于的控制方法,这种方法学习了多步转移的隐藏模型,而不是奖励函数。我们示示了这个隐藏模型可以表示示例基于的控制问题中的Q值。在一系列的状态基本和图像基本的离线控制任务上,我们的方法超过了基于学习的奖励函数的基eline,其他实验还证明了我们的方法具有更好的 Robustness 和数据集大小增长。

Label Noise: Correcting a Correction

  • paper_url: http://arxiv.org/abs/2307.13100
  • repo_url: None
  • paper_authors: William Toner, Amos Storkey
  • for: addressing the issue of overfitting in training neural network classifiers on datasets with label noise
  • methods: proposing a more direct approach to mitigate overfitting by imposing a lower bound on the empirical risk during training
  • results: providing theoretical results with explicit, easily computable bounds on the minimum achievable noisy risk for different loss functions, and demonstrating significant enhancement in robustness with virtually no additional computational cost.Here’s the full text in Simplified Chinese:
  • for: 这个研究旨在解决对于训练神经网络分类器的标签杂读问题
  • methods: 我们提出了一种更直接的方法,通过在训练过程中对实验风险下设置下限,以遏止过拟合
  • results: 我们提供了具体、容易计算的下限 bounds,以及实验结果,显示这种方法可以帮助提高神经网络分类器的Robustness,而且无需额外的计算成本。
    Abstract Training neural network classifiers on datasets with label noise poses a risk of overfitting them to the noisy labels. To address this issue, researchers have explored alternative loss functions that aim to be more robust. However, many of these alternatives are heuristic in nature and still vulnerable to overfitting or underfitting. In this work, we propose a more direct approach to tackling overfitting caused by label noise. We observe that the presence of label noise implies a lower bound on the noisy generalised risk. Building upon this observation, we propose imposing a lower bound on the empirical risk during training to mitigate overfitting. Our main contribution is providing theoretical results that yield explicit, easily computable bounds on the minimum achievable noisy risk for different loss functions. We empirically demonstrate that using these bounds significantly enhances robustness in various settings, with virtually no additional computational cost.
    摘要 <>将神经网络分类器训练数据集中的标签噪声可能会导致模型过度适应。为解决这个问题,研究人员已经探索了一些alternative的损失函数,以增强模型的Robustness。然而,许多这些alternative都是HEURISTIC的 Natur,可能会导致过度适应或者下降。在这项工作中,我们提出了一种更直接的方法来处理标签噪声导致的过度适应。我们发现,标签噪声存在一个下界,这个下界对于不同的损失函数来说都是可能的。基于这个发现,我们提议在训练过程中尝试在Empirical risk下设置下界,以避免过度适应。我们的主要贡献是提供了理论结果,可以给出不同损失函数的明确、容易计算的下界,以降低过度适应的风险。我们的实验结果表明,使用这些下界可以在不同的设置下提高模型的Robustness,而且几乎没有额外的计算成本。

Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example

  • paper_url: http://arxiv.org/abs/2308.10818
  • repo_url: None
  • paper_authors: Xinyu Jiang, Haofan Sun, Kamal Choudhary, Houlong Zhuang, Qiong Nian
  • for: 这篇论文目的是用机器学习(ML)技术预测晶体材料的性能。
  • methods: 该方法使用了回归树 ensemble learning,不需要任何描述符,直接使用分子动力学计算的物理性质作为输入。
  • results: 结果显示,对于碳杂合物的小样本数据,ensemble learning的预测结果比使用传统的分子动力学potential更准确,并且能够捕捉9种不同的分子动力学potential中的相对准确性。
    Abstract Machine learning (ML) is widely used to explore crystal materials and predict their properties. However, the training is time-consuming for deep-learning models, and the regression process is a black box that is hard to interpret. Also, the preprocess to transfer a crystal structure into the input of ML, called descriptor, needs to be designed carefully. To efficiently predict important properties of materials, we propose an approach based on ensemble learning consisting of regression trees to predict formation energy and elastic constants based on small-size datasets of carbon allotropes as an example. Without using any descriptor, the inputs are the properties calculated by molecular dynamics with 9 different classical interatomic potentials. Overall, the results from ensemble learning are more accurate than those from classical interatomic potentials, and ensemble learning can capture the relatively accurate properties from the 9 classical potentials as criteria for predicting the final properties.
    摘要 机器学习(ML)广泛应用于探索晶体材料和预测其性能。然而,训练深度学习模型需时间consuming, regression 过程是一个难以解释的黑盒子。此外,将晶体结构转换为 ML 的输入,即描述符,需要仔细设计。为了有效预测材料的重要性能,我们提出了基于集成学习的方法,包括回归树来预测基于小型 datasets of carbon allotropes 的形成能gy和弹性常数。无需使用任何描述符,输入是通过分子动力学计算的物理性质。总的来说, ensemble 学习的结果比 классиical interatomic potentials 更加准确,并且 ensemble 学习可以捕捉来自 nine classical potentials 的相对准确性作为预测最终性能的标准。

Fairness Under Demographic Scarce Regime

  • paper_url: http://arxiv.org/abs/2307.13081
  • repo_url: None
  • paper_authors: Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji
  • for: 本研究旨在 Addressing the limitation of prior works on fairness, which assume full access to demographic information, but in reality, demographic information may be partially available or unavailable due to privacy concerns.
  • methods: 本研究提出了一个 attribute classifier 建构框架,通过将不确定性写入模型中,并在有demographic信息的样本上强制施行公平性约束,以提高公平精度贡献。
  • results: 经过实验显示,提出的框架可以实现更好的公平精度贡献,并且超越了使用真实敏感特征的模型。此外,模型还能够在没有demographic信息的情况下提供更好的公平精度贡献。
    Abstract Most existing works on fairness assume the model has full access to demographic information. However, there exist scenarios where demographic information is partially available because a record was not maintained throughout data collection or due to privacy reasons. This setting is known as demographic scarce regime. Prior research have shown that training an attribute classifier to replace the missing sensitive attributes (proxy) can still improve fairness. However, the use of proxy-sensitive attributes worsens fairness-accuracy trade-offs compared to true sensitive attributes. To address this limitation, we propose a framework to build attribute classifiers that achieve better fairness-accuracy trade-offs. Our method introduces uncertainty awareness in the attribute classifier and enforces fairness on samples with demographic information inferred with the lowest uncertainty. We show empirically that enforcing fairness constraints on samples with uncertain sensitive attributes is detrimental to fairness and accuracy. Our experiments on two datasets showed that the proposed framework yields models with significantly better fairness-accuracy trade-offs compared to classic attribute classifiers. Surprisingly, our framework outperforms models trained with constraints on the true sensitive attributes.
    摘要 现有大多数工作假设模型拥有完整的人口信息。然而,有些场景下人口信息部分可用,例如记录不完整或因隐私原因无法获取。这种情况被称为人口缺乏 regime。先前的研究表明,使用代理敏感特征来代替缺失的敏感特征可以改善公平。然而,使用代理敏感特征会对公平精度负面影响比使用真实的敏感特征更大。为解决这些限制,我们提出了一个框架,用于建立具有更好的公平精度负面影响的特征分类器。我们的方法在特征分类器中引入了不确定性意识,并在具有最低不确定性的人口信息上遵循公平约束。我们的实验表明,对不确定的敏感特征进行公平约束是对公平和准确性的负面影响。我们的方法在两个数据集上进行了实验,并显示了与经典特征分类器相比,我们的框架可以获得显著更好的公平精度负面影响。另外,我们的方法还超过使用约束的真实敏感特征模型。

Adaptive Certified Training: Towards Better Accuracy-Robustness Tradeoffs

  • paper_url: http://arxiv.org/abs/2307.13078
  • repo_url: None
  • paper_authors: Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard
  • for: 提高模型的强健性和标准准确率之间的质量衡量。
  • methods: 基于适应证明的半径的训练方法,通过提高模型的准确率和强健性来实现更好的准确率-强健性质量衡量。
  • results: 在MNIST、CIFAR-10和TinyImageNet dataset上,提出的方法可以实现更高的强健性和标准准确率之间的质量衡量,特别是在CIFAR-10和TinyImageNet dataset上,模型的强健性可以提高至两倍的水平,而且保持同等水平的标准准确率。
    Abstract As deep learning models continue to advance and are increasingly utilized in real-world systems, the issue of robustness remains a major challenge. Existing certified training methods produce models that achieve high provable robustness guarantees at certain perturbation levels. However, the main problem of such models is a dramatically low standard accuracy, i.e. accuracy on clean unperturbed data, that makes them impractical. In this work, we consider a more realistic perspective of maximizing the robustness of a model at certain levels of (high) standard accuracy. To this end, we propose a novel certified training method based on a key insight that training with adaptive certified radii helps to improve both the accuracy and robustness of the model, advancing state-of-the-art accuracy-robustness tradeoffs. We demonstrate the effectiveness of the proposed method on MNIST, CIFAR-10, and TinyImageNet datasets. Particularly, on CIFAR-10 and TinyImageNet, our method yields models with up to two times higher robustness, measured as an average certified radius of a test set, at the same levels of standard accuracy compared to baseline approaches.
    摘要

General-Purpose Multi-Modal OOD Detection Framework

  • paper_url: http://arxiv.org/abs/2307.13069
  • repo_url: None
  • paper_authors: Viet Duong, Qiong Wu, Zhengyi Zhou, Eric Zavesky, Jiahe Chen, Xiangzhou Liu, Wen-Ling Hsu, Huajie Shao
  • for: 这个研究的目的是为了实现多种不同的假值检测方法,以确保机器学习系统的安全性和可靠性。
  • methods: 这个研究使用了一个通用的弱监督的假值检测框架,叫做WOOD,它结合了一个二分类器和一个对照学习部分,以获得两者的好处。
  • results: 实验结果显示,WOOD模型在多个真实世界的数据集上表现出色,能够同时在三个不同的假值enario中实现高准确性。
    Abstract Out-of-distribution (OOD) detection identifies test samples that differ from the training data, which is critical to ensuring the safety and reliability of machine learning (ML) systems. While a plethora of methods have been developed to detect uni-modal OOD samples, only a few have focused on multi-modal OOD detection. Current contrastive learning-based methods primarily study multi-modal OOD detection in a scenario where both a given image and its corresponding textual description come from a new domain. However, real-world deployments of ML systems may face more anomaly scenarios caused by multiple factors like sensor faults, bad weather, and environmental changes. Hence, the goal of this work is to simultaneously detect from multiple different OOD scenarios in a fine-grained manner. To reach this goal, we propose a general-purpose weakly-supervised OOD detection framework, called WOOD, that combines a binary classifier and a contrastive learning component to reap the benefits of both. In order to better distinguish the latent representations of in-distribution (ID) and OOD samples, we adopt the Hinge loss to constrain their similarity. Furthermore, we develop a new scoring metric to integrate the prediction results from both the binary classifier and contrastive learning for identifying OOD samples. We evaluate the proposed WOOD model on multiple real-world datasets, and the experimental results demonstrate that the WOOD model outperforms the state-of-the-art methods for multi-modal OOD detection. Importantly, our approach is able to achieve high accuracy in OOD detection in three different OOD scenarios simultaneously. The source code will be made publicly available upon publication.
    摘要 外部数据(OOD)检测可以识别测试样本与训练数据之间的差异,这是机器学习(ML)系统的安全性和可靠性的关键。虽然大量方法已经开发出来检测uni-modal OOD样本,但只有一些关注多模态 OOD 检测。当前的对比学习基于方法主要在一个给定的图像和其相应的文本描述来自新领域的情况下进行多模态 OOD 检测。然而,实际世界中 ML 系统的部署可能会遇到更多的异常情况,如感知器故障、坏天气和环境变化。因此,本研究的目标是同时从多个不同的 OOD 场景中进行细致的检测。为达到这个目标,我们提出了一种通用的弱监督 OOD 检测框架,称为 WOOD,该框架结合了一个二分类器和一个对比学习组件,以便充分利用它们的优势。为了更好地分别 ID 和 OOD 样本的幂本表示,我们采用了缺角损失来约束它们的相似性。此外,我们开发了一个新的评分指标,以集成 binary 分类器和对比学习的预测结果,以便更好地识别 OOD 样本。我们对多个实际世界数据集进行了实验,结果显示,提出的 WOOD 模型在多modal OOD 检测中高度超越了现状的方法。特别是,我们的方法能够同时高精度地识别 OOD 样本在三个不同的 OOD 场景中。代码将在发表后公开。

Personalized Category Frequency prediction for Buy It Again recommendations

  • paper_url: http://arxiv.org/abs/2308.01195
  • repo_url: None
  • paper_authors: Amit Pande, Kunal Ghosh, Rankyung Park
  • for: 提高用户体验和站点参与度,预测用户可能会购买的商品
  • methods: 提出一种层次PCIC模型,包括个性化类别模型(PC模型)和个性化类别内项模型(IC模型),模型使用存活模型和时间序列模型生成特征,然后使用类别划分的神经网络进行训练
  • results: 与12个基线比较,PCIC在四个标准开放数据集上提高了NDCG达16%,同时提高了回归率约2%,并在大规模数据集上进行了扩展和训练(超过8小时),并在一家大商户的官方网站上进行了AB测试,导致了用户参与度的显著提高
    Abstract Buy It Again (BIA) recommendations are crucial to retailers to help improve user experience and site engagement by suggesting items that customers are likely to buy again based on their own repeat purchasing patterns. Most existing BIA studies analyze guests personalized behavior at item granularity. A category-based model may be more appropriate in such scenarios. We propose a recommendation system called a hierarchical PCIC model that consists of a personalized category model (PC model) and a personalized item model within categories (IC model). PC model generates a personalized list of categories that customers are likely to purchase again. IC model ranks items within categories that guests are likely to consume within a category. The hierarchical PCIC model captures the general consumption rate of products using survival models. Trends in consumption are captured using time series models. Features derived from these models are used in training a category-grained neural network. We compare PCIC to twelve existing baselines on four standard open datasets. PCIC improves NDCG up to 16 percent while improving recall by around 2 percent. We were able to scale and train (over 8 hours) PCIC on a large dataset of 100M guests and 3M items where repeat categories of a guest out number repeat items. PCIC was deployed and AB tested on the site of a major retailer, leading to significant gains in guest engagement.
    摘要 Buy It Again (BIA) 建议是重要的 для零售商,帮助改善用户体验和网站参与度,通过建议客户可能会再次购买的商品,基于他们自己的重复购买模式。大多数现有的BIA研究分析客人个性化行为的项目粒度。我们提出一种推荐系统,即层次PCIC模型,包括个性化类别模型(PC模型)和个性化类别内容模型(IC模型)。PC模型生成个性化的类别列表,客户可能会购买的类别。IC模型将类别内容排名,客户可能会在类别内消耗的商品。层次PCIC模型捕捉产品的总消耗率,使用存生模型捕捉消耗趋势。这些特征被用于训练分类器。我们与十二个基准值进行比较,PCIC提高了NDCG达16%,同时提高了回归率约2%。我们可以在8小时内扩展和训练PCIC(超过100万客户和300万个商品),其中客户重复购买的类别大于客户重复购买的商品。PCIC在一家大型零售商的官方网站上进行了部署和AB测试,导致用户参与度显著提高。

Feature Gradient Flow for Interpreting Deep Neural Networks in Head and Neck Cancer Prediction

  • paper_url: http://arxiv.org/abs/2307.13061
  • repo_url: None
  • paper_authors: Yinzhu Jin, Jonathan C. Garneau, P. Thomas Fletcher
  • for: 这篇论文旨在介绍一种新的深度学习模型解释技术,即通过计算模型的梯度流来解释模型做出决策时使用的特征。
  • methods: 该技术使用了计算模型的梯度流来定义输入数据空间中的非线性坐标,从而解释模型做出决策时使用的信息。然后,通过比较特征的梯度流度量和基线噪声特征的梯度流度量来评估特征的重要性。
  • results: 在使用了该技术进行训练后,模型的解释性得到了提高。研究人员通过计算模型的梯度流来评估特征的重要性,并发现了一些有用的特征,例如肿瘤大小和形态等。
    Abstract This paper introduces feature gradient flow, a new technique for interpreting deep learning models in terms of features that are understandable to humans. The gradient flow of a model locally defines nonlinear coordinates in the input data space representing the information the model is using to make its decisions. Our idea is to measure the agreement of interpretable features with the gradient flow of a model. To then evaluate the importance of a particular feature to the model, we compare that feature's gradient flow measure versus that of a baseline noise feature. We then develop a technique for training neural networks to be more interpretable by adding a regularization term to the loss function that encourages the model gradients to align with those of chosen interpretable features. We test our method in a convolutional neural network prediction of distant metastasis of head and neck cancer from a computed tomography dataset from the Cancer Imaging Archive.
    摘要 Simplified Chinese:这篇论文介绍了一种新的技术,即特征涟流,可以使深度学习模型变得更加解释性。特征涟流定义了模型在输入数据空间中的非线性坐标,表示模型做出决策的信息。我们的方法是测量特定的可解释特征与模型涟流的一致程度,并评估特定特征对模型的重要性。我们还开发了一种方法,通过添加一个减少项到损失函数中,使模型的梯度与选择的可解释特征的梯度进行对齐。我们在计算tomography数据集中预测头颈癌 distant metastasis的 convolutional neural network中测试了我们的方法。

MARIO: Model Agnostic Recipe for Improving OOD Generalization of Graph Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.13055
  • repo_url: https://github.com/zhuyun97/mario
  • paper_authors: Yun Zhu, Haizhou Shi, Zhenshuo Zhang, Siliang Tang
  • for: 本文研究的问题是非标注图数据上的非标注泛化(Out-of-distribution, OOD)泛化问题,特别是图神经网络(Graph Neural Network, GNN)在分布转移时的敏感性问题。
  • methods: 我们提出了一种名为MARIO的模型无关的热革命方法,用于改进非标注图谱离散学习方法的OOD泛化性能。 MARIO包括两个原则:信息瓶颈(Information Bottleneck, IB)原则以实现泛化表示,以及不变原则,通过对敏感数据进行对抗数据增强来获得不变表示。
  • results: 我们通过广泛的实验表明,我们的方法可以在OOD测试集上实现状态之最的性能,而与现有方法相比,在标注测试集上保持相似的性能。代码可以在 GitHub 上找到:https://github.com/ZhuYun97/MARIO。
    Abstract In this work, we investigate the problem of out-of-distribution (OOD) generalization for unsupervised learning methods on graph data. This scenario is particularly challenging because graph neural networks (GNNs) have been shown to be sensitive to distributional shifts, even when labels are available. To address this challenge, we propose a \underline{M}odel-\underline{A}gnostic \underline{R}ecipe for \underline{I}mproving \underline{O}OD generalizability of unsupervised graph contrastive learning methods, which we refer to as MARIO. MARIO introduces two principles aimed at developing distributional-shift-robust graph contrastive methods to overcome the limitations of existing frameworks: (i) Information Bottleneck (IB) principle for achieving generalizable representations and (ii) Invariant principle that incorporates adversarial data augmentation to obtain invariant representations. To the best of our knowledge, this is the first work that investigates the OOD generalization problem of graph contrastive learning, with a specific focus on node-level tasks. Through extensive experiments, we demonstrate that our method achieves state-of-the-art performance on the OOD test set, while maintaining comparable performance on the in-distribution test set when compared to existing approaches. The source code for our method can be found at: https://github.com/ZhuYun97/MARIO
    摘要 在这个工作中,我们研究了无监督学习方法在图数据上的 OUT-OF-DISTRIBUTION(OOD)泛化问题。这种情况特别是挑战性的,因为图神经网络(GNNs)已经被证明是 Distributional Shifts 敏感的,即使标签可用。为了解决这个挑战,我们提出了一种名为 MARIO 的模型无关的照片,用于提高无监督图对比学习方法的 OOD 泛化性。MARIO 包括两个原则:(i) 信息瓶颈(IB)原则,以实现泛化表示,以及(ii) 不变原则,通过对数据进行对抗式数据增强来获得不变表示。根据我们所知,这是首次研究 OOD 泛化问题的图对比学习方法,具体注重节点级任务。通过广泛的实验,我们证明了我们的方法在 OOD 测试集上具有最佳性能,而且与现有方法相比,在同一个测试集上保持了相似的性能。MARIO 的源代码可以在 GitHub 上找到:https://github.com/ZhuYun97/MARIO。

Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

  • paper_url: http://arxiv.org/abs/2307.12983
  • repo_url: https://github.com/Improbable-AI/pql
  • paper_authors: Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
  • For: 这篇论文是关于加速复杂任务的强化学习的研究,通过使用大量的训练数据来提高模型的性能。* Methods: 这篇论文使用了Isaac Gym提供的GPU基于的 simulate系统,通过并行收集数据、策略学习和价值学习来提高强化学习的效率。* Results: 这篇论文提出了一种并行$Q$-学习(PQL)方案,可以在几个工作站上并行进行数据收集、策略学习和价值学习,从而提高强化学习的效率。在实验中,PQL方案可以扩展到数以千计的并行环境,并调查了学习速度的重要因素。
    Abstract Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that $Q$-learning can be scaled to \textit{tens of thousands of parallel environments} and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.
    摘要 强化学习因为复杂任务而需要大量训练数据,Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that $Q$-learning can be scaled to 万�� nombreux parallel environments and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you prefer Traditional Chinese, I can provide that as well.

3D-LLM: Injecting the 3D World into Large Language Models

  • paper_url: http://arxiv.org/abs/2307.12981
  • repo_url: https://github.com/UMass-Foundation-Model/3D-LLM
  • paper_authors: Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
  • for: 这个研究想要尝试在大语言模型(LLM)和视力语言模型(VLM)上插入3D世界,以提高这些模型在多个任务上的表现,包括常识推理。
  • methods: 这个研究使用了三种提示机制,并利用3D特征提取器从渲染的多视图图像中提取3D特征来训练3D语言模型(3D-LLM)。
  • results: 研究表明,在ScanQA任务上,该模型的表现比状态正常基eline高出9%的BLEU-1分数。此外,在3D描述、任务组合和3D辅助对话等任务上,该模型也表现出优于2D VLM。 Qualitative例子也显示,该模型可以完成跨度外的任务。
    Abstract Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs. Project Page: : https://vis-www.cs.umass.edu/3dllm/.
    摘要 大型语言模型(LLM)和视觉语言模型(VLM)已经被证明可以在多个任务上表现出色,如常识 reasoning。强大的这些模型可以是,它们没有与3D物理世界相关的概念,包括空间关系、可用性、物理学、布局等。在这项工作中,我们提议将3D世界注入到大型语言模型中,并引入一个全新的3D-LLM家族。specifically,3D-LLM可以从3D点云和其特征输入,并完成一系列3D相关任务,包括captioning、dense captioning、3D问答、任务分解、3D定位、3D-assisted dialog、导航等。通过我们设计的三种提示机制,我们能够收集超过300k个3D语言数据覆盖这些任务。为了效率地训练3D-LLM,我们首先利用3D特征提取器从渲染多视图图像中获取3D特征。然后,我们使用2D VLM作为我们的背部来训练我们的3D-LLM。通过引入3D本地化机制,3D-LLM可以更好地捕捉3D空间信息。在ScanQA上进行实验,我们的模型比州态艺术基eline的基eline高出大幅度(例如,BLEU-1分数超过州态艺术基eline的分数 by 9%)。此外,在我们保留的数据集上进行3D captioning、任务组合和3D-assisted dialogue的实验中,我们的模型超过2D VLM。Qualitative例子也表明我们的模型可以完成现有LLM和VLM的任务之外的更多任务。项目页面:https://vis-www.cs.umass.edu/3dllm/.

An Isometric Stochastic Optimizer

  • paper_url: http://arxiv.org/abs/2307.12979
  • repo_url: None
  • paper_authors: Jacob Jackson
  • for: 这 paper 的目的是解释 Adam 优化器的成功,并基于这个原理提出一种新的优化器。
  • methods: 这 paper 使用了一种新的优化器 called Iso,它使每个参数的步长独立于其他参数的 нор。 Additionally, the paper proposes a variant of Iso called IsoAdam, which allows for the transfer of optimal hyperparameters from Adam.
  • results: 实验结果表明,IsoAdam 在训练一个小型 Transformer 时比 Adam 快速。
    Abstract The Adam optimizer is the standard choice in deep learning applications. I propose a simple explanation of Adam's success: it makes each parameter's step size independent of the norms of the other parameters. Based on this principle I derive Iso, a new optimizer which makes the norm of a parameter's update invariant to the application of any linear transformation to its inputs and outputs. I develop a variant of Iso called IsoAdam that allows optimal hyperparameters to be transferred from Adam, and demonstrate that IsoAdam obtains a speedup over Adam when training a small Transformer.
    摘要 《Adam优化器在深度学习应用中是标准选择。我提出了对Adam成功的简单解释:它使每个参数的步长独立于其他参数的norm。基于这个原理,我 derivated一种新的优化器叫做Iso,它使参数更新的 нор免受输入和输出的任何线性变换的影响。我还开发了一种名为IsoAdam的变体,它允许从Adam中传输优化参数,并证明IsoAdam在训练小型Transformer时比Adam快。》Note that Simplified Chinese is used here, which is one of the two standard forms of Chinese writing. Traditional Chinese is the other form, and it may be used in different regions or contexts.

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

  • paper_url: http://arxiv.org/abs/2307.12975
  • repo_url: None
  • paper_authors: Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang
  • for: 这个论文是关于决策问题中的奖励工程。在实际应用中,常有无明显的奖励函数选择的情况,因此引入人工反馈以帮助学习奖励函数的方法得到了广泛应用。
  • methods: 这个论文使用了人工反馈来学习奖励函数,并且提出了一种基于偏好的方法,该方法在recent empirical applications such as InstructGPT中表现出色。
  • results: 论文提供了一种理论分析,证明了基于偏好的方法在offline上下文ual bandits中的优势。具体来说,论文提高了运行policy learning方法的人工分类样本的模型和下界分析,并与基于偏好的方法的下界比较,证明了基于偏好的方法具有较低的下界。
    Abstract A crucial task in decision-making problems is reward engineering. It is common in practice that no obvious choice of reward function exists. Thus, a popular approach is to introduce human feedback during training and leverage such feedback to learn a reward function. Among all policy learning methods that use human feedback, preference-based methods have demonstrated substantial success in recent empirical applications such as InstructGPT. In this work, we develop a theory that provably shows the benefits of preference-based methods in offline contextual bandits. In particular, we improve the modeling and suboptimality analysis for running policy learning methods on human-scored samples directly. Then, we compare it with the suboptimality guarantees of preference-based methods and show that preference-based methods enjoy lower suboptimality.
    摘要 决策问题中一项非常重要的任务是奖励工程。在实践中,不存在明显的奖励函数选择。因此,一种受欢迎的方法是在训练过程中引入人类反馈,并使用这些反馈学习一个奖励函数。在所有基于策略学习方法中使用人类反馈的方法中,偏好基于方法在最近的实际应用中,如InstructGPT,已经实现了显著的成功。在这项工作中,我们发展了一种理论,证明了偏好基于方法在线上上下文带动机中的优点。具体来说,我们改进了运行策略学习方法直接使用人类评分样本的模型和下optimality分析。然后,我们与偏好基于方法的下optimality保证进行比较,并证明偏好基于方法在下optimality方面具有更低的下optimality。

Big Data - Supply Chain Management Framework for Forecasting: Data Preprocessing and Machine Learning Techniques

  • paper_url: http://arxiv.org/abs/2307.12971
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Md Abrar Jahin, Md Sakib Hossain Shovon, Jungpil Shin, Istiyaque Ahmed Ridoy, Yoichi Tomioka, M. F. Mridha
  • For: This paper aims to systematically identify and comparatively analyze state-of-the-art supply chain (SC) forecasting strategies and technologies, and to propose a novel framework incorporating Big Data Analytics in SC Management.* Methods: The proposed framework includes problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization. The paper discusses the need for different types of forecasting according to the period or SC objective, and recommends SC KPIs and error-measurement systems to optimize the top-performing model.* Results: The paper illustrates the adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning).Here are the three points in Simplified Chinese text:* For: 这篇论文目标是系统地检查和比较现有的供应链(SC)预测策略和技术,并提出一种新的框架,将大数据分析integrated into SC Management。* Methods: 该提案的框架包括问题识别、数据来源、探索数据分析、机器学习模型训练、 гипер参数调整、性能评估和优化。 paper discusses the need for different types of forecasting according to the period or SC objective, and recommends SC KPIs and error-measurement systems to optimize the top-performing model.* Results: paper illustrates the adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning).
    Abstract This article intends to systematically identify and comparatively analyze state-of-the-art supply chain (SC) forecasting strategies and technologies. A novel framework has been proposed incorporating Big Data Analytics in SC Management (problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization), forecasting effects on human-workforce, inventory, and overall SC. Initially, the need to collect data according to SC strategy and how to collect them has been discussed. The article discusses the need for different types of forecasting according to the period or SC objective. The SC KPIs and the error-measurement systems have been recommended to optimize the top-performing model. The adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency have been illustrated. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning). The contribution of this research lies in the standard SC process framework proposal, recommended forecasting data analysis, forecasting effects on SC performance, machine learning algorithms optimization followed, and in shedding light on future research.
    摘要 Translated into Simplified Chinese:这篇文章的目的是系统地找出当前最佳实践的供应链(SC)预测策略和技术,并提出一种新的框架,该框架包括在SC管理中使用大数据分析。文章讨论了SC预测的数据收集方式和不同类型的预测,以及适用于不同的时间间隔或SC目标。文章还建议了SC指标和错误度量系统,以便优化最佳模型。文章还描述了预测对人工资源、存储和整体SC的影响,以及管理决策的依赖于SC指标以确定模型性能参数和改善运营管理、透明度和规划效率。文章还提出了一种循环连接的框架,该框架包括根据后处理指标进行预处理优化,以及供应链管理、人力决策、成本、生产和容量规划。本文的贡献在于提出了标准SC过程框架、预测数据分析、预测对SC性能的影响、机器学习算法优化和未来研究方向。

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.12968
  • repo_url: https://github.com/ben-eysenbach/ac-connection
  • paper_authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov
  • for: 这 paper 的目的是解释一些关于 Offline Reinforcement Learning 的问题,包括如何使用一步或多步的策略改进来避免过拟合。
  • methods: 这 paper 使用了一些不同的方法来进行策略改进,包括 advantage-weighted regression 和 conditional behavioral cloning。它们的主要区别在于,一步方法会在一步的策略改进后停止,而多步方法会通过多个步骤来进行策略改进。
  • results: 这 paper 的实验结果表明,一步 RL 可以与多步 RL 相比肩,但是它们在不同的问题上的表现可能不同。具体来说,一步 RL 在需要强regularization的问题上可能更 Competitive。
    Abstract As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, truncate policy iteration after just one step. This ``early stopping'' makes one-step RL simple and stable, but can limit its asymptotic performance. Critic regularization typically requires more compute but has appealing lower-bound guarantees. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While practical implementations violate our assumptions and critic regularization is typically applied with smaller regularization coefficients, our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters. Our results that every problem can be solved with a single step of policy improvement, but rather that one-step RL might be competitive with critic regularization on RL problems that demand strong regularization.
    摘要 如果机器学习问题受限于数据,有效的离线RL算法需要谨慎的补偿来避免过拟合。一步方法通过做一步策略改进来实现补偿,而批评补偿方法则通过多个步骤的策略改进来实现补偿。这些方法看起来很不同。一步方法,如优点权重回归和Conditional Behavioral Cloning,在策略迭代后 truncate 策略迭代。这个``早期停止''使一步RL简单和稳定,但可能限制其极限性表现。批评补偿通常需要更多的计算,但它具有吸引人的下界保证。在这篇论文中,我们 Draw a close connection between these methods:在应用多步批评补偿方法时,使用补偿系数为1就可以获得与一步RL相同的策略。虽然实际实现可能违反我们的假设,但我们的实验表明,我们的分析可以准确预测实际的离线RL方法(CQL和一步RL)在通用的 гиперпараметры下的表现。我们的结果表明,每个问题都可以通过一步策略改进来解决,但是一步RL可能与批评补偿在RL问题中具有强补偿需求的问题竞争。

Learning Dense Correspondences between Photos and Sketches

  • paper_url: http://arxiv.org/abs/2307.12967
  • repo_url: https://github.com/cogtoolslab/photo-sketch-correspondence
  • paper_authors: Xuanchen Lu, Xiaolong Wang, Judith E Fan
  • for: 这个论文旨在研究计算机系统是如何模拟人类的简图理解能力?
  • methods: 作者提出了一种基于自我超vised学习的方法,使用尺寸变换网络来估计简图和照片之间的匹配关系。
  • results: 研究发现,该方法可以超过多个强化基elines,并且与其他抽象级别的方法相比,具有更高的准确率。然而,研究还发现了人类和机器系统之间的差异,提示了更多的研究空间。
    Abstract Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic. Moreover, human sketch understanding goes beyond categorization -- critically, it also entails understanding how individual elements within a sketch correspond to parts of the physical world it represents. What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, $\textit{PSC6k}$, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. Our model uses a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo extracted by a contrastive learning-based ConvNet backbone. We found that this approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, our benchmark also revealed systematic differences between predictions of the suite of models we tested and those of humans. Taken together, our work suggests a promising path towards developing artificial systems that achieve more human-like understanding of visual images at different levels of abstraction. Project page: https://photo-sketch-correspondence.github.io
    摘要 人类很自然地理解绘图和现实世界之间的连接,即使绘图非常不真实。此外,人类绘图理解不仅是分类,更重要的是理解绘图中的各个元素与物理世界中的部分之间的对应关系。为解答这个问题,我们提出了两个贡献:首先,我们引入了一个新的绘图-照片对应 bencmark,称为 $\textit{PSC6k}$,包含150万个绘图-照片对应的注释,其中有6250个绘图和125种物品类别。其次,我们提出了一种自动学习的方法,用于学习绘图-照片对应的密集对应关系,基于近期对照片对应学习的进步。我们的模型使用一个空间变换网络来估算绘图和照片的卷积流,从而学习绘图中的各个元素与物理世界中的部分之间的对应关系。我们发现,这种方法比许多强大的基elines表现出色,并且生成的预测值与其他旋转基elines的预测值是量化一致的。然而,我们的benchmark还发现了模型预测值与人类预测值之间的系统性差异。总之,我们的工作建议了一种可能的方法,可以开发出更加人类化的视觉图像理解系统,以达到不同层次的抽象水平。项目页面:https://photo-sketch-correspondence.github.io

Synthetic pre-training for neural-network interatomic potentials

  • paper_url: http://arxiv.org/abs/2307.15714
  • repo_url: https://github.com/jla-gardner/nnp-pre-training
  • paper_authors: John L. A. Gardner, Kathryn T. Baker, Volker L. Deringer
  • for: 该研究旨在提高atomistic materials模型中ml基于potential的精度和稳定性,通过使用“synthetic”数据进行预训练。
  • methods: 研究使用了一种基于图 neural network的equivariant graph-neural-network potential,并通过大量生成的synthetic数据进行预训练,然后 fine-tune 到一个较小的量子力学参考数据集。
  • results: 研究发现,通过使用synthetic数据进行预训练,可以提高模型的精度和稳定性,并且可以避免一些由量子力学参考数据集的限制。
    Abstract Machine learning (ML) based interatomic potentials have transformed the field of atomistic materials modelling. However, ML potentials depend critically on the quality and quantity of quantum-mechanical reference data with which they are trained, and therefore developing datasets and training pipelines is becoming an increasingly central challenge. Leveraging the idea of "synthetic" (artificial) data that is common in other areas of ML research, we here show that synthetic atomistic data, themselves obtained at scale with an existing ML potential, constitute a useful pre-training task for neural-network interatomic potential models. Once pre-trained with a large synthetic dataset, these models can be fine-tuned on a much smaller, quantum-mechanical one, improving numerical accuracy and stability in computational practice. We demonstrate feasibility for a series of equivariant graph-neural-network potentials for carbon, and we carry out initial experiments to test the limits of the approach.
    摘要

Efficiently Sampling the PSD Cone with the Metric Dikin Walk

  • paper_url: http://arxiv.org/abs/2307.12943
  • repo_url: None
  • paper_authors: Yunbum Kook, Santosh S. Vempala
  • for: 这篇论文关注于半定义程序的效率计算 frontier。
  • methods: 论文使用了Dikin walk和相关的metric的概念,并提出了一种新的metric的选择方法,以提高混合时间和每步复杂度。
  • results: 论文的结果表明,使用这种新的metric和混合方法可以大幅降低混合时间和每步复杂度,并且可以将依赖于约束的数量限制为多少。
    Abstract Semi-definite programs represent a frontier of efficient computation. While there has been much progress on semi-definite optimization, with moderate-sized instances currently solvable in practice by the interior-point method, the basic problem of sampling semi-definite solutions remains a formidable challenge. The direct application of known polynomial-time algorithms for sampling general convex bodies to semi-definite sampling leads to a prohibitively high running time. In addition, known general methods require an expensive rounding phase as pre-processing. Here we analyze the Dikin walk, by first adapting it to general metrics, then devising suitable metrics for the PSD cone with affine constraints. The resulting mixing time and per-step complexity are considerably smaller, and by an appropriate choice of the metric, the dependence on the number of constraints can be made polylogarithmic. We introduce a refined notion of self-concordant matrix functions and give rules for combining different metrics. Along the way, we further develop the theory of interior-point methods for sampling.
    摘要 semi-definite 计划表示一种高效计算的前沿。虽然有很多进步在半定义优化方面,但基本的半定义解析问题仍然是一项挑战。直接将通用 convex 体的算法应用到半定义 sampling 中会导致非常高的运行时间。此外,已知的通用方法需要费时的舒缩阶段作为先决条件。我们分析了 Dikin 步行,首先适应到通用度量,然后适应 PSD cone 中的 affine 约束。得到的混合时间和每步复杂度较小,并且通过适当的度量选择,对数量约束的依赖可以被polylogarithmic。我们引入了自适应矩阵函数的更加细化的定义,并给出了不同度量的结合规则。在过程中,我们进一步发展了内点方法的 sampling 理论。

On Privileged and Convergent Bases in Neural Network Representations

  • paper_url: http://arxiv.org/abs/2307.12941
  • repo_url: None
  • paper_authors: Davis Brown, Nikhil Vyas, Yamini Bansal
  • for: 本研究探讨了神经网络学习的表示学习是否具有特权和共同基准。Specifically, we examine the significance of feature directions represented by individual neurons.
  • methods: 我们使用arbitrary rotations of neural representations来检验神经网络的特权性。我们还比较了由不同随机初始化生成的神经网络的基准。
  • results: 我们的研究发现,神经网络不 converge to a unique basis,而且基准相关性在各层神经网络中增加了 significannotly。Linear Mode Connectivity也与网络宽度相关,但这种相关性不是由基准相关性增加的。
    Abstract In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish that arbitrary rotations of neural representations cannot be inverted (unlike linear networks), indicating that they do not exhibit complete rotational invariance. Subsequently, we explore the possibility of multiple bases achieving identical performance. To do this, we compare the bases of networks trained with the same parameters but with varying random initializations. Our study reveals two findings: (1) Even in wide networks such as WideResNets, neural networks do not converge to a unique basis; (2) Basis correlation increases significantly when a few early layers of the network are frozen identically. Furthermore, we analyze Linear Mode Connectivity, which has been studied as a measure of basis correlation. Our findings give evidence that while Linear Mode Connectivity improves with increased network width, this improvement is not due to an increase in basis correlation.
    摘要 在本研究中,我们研究神经网络学习的表示方式是否具有特权和归一化的基准。我们专门研究神经元个体表达的特征方向的重要性。首先,我们证明神经网络中的表示不能被逆转(不同于线性网络),这表明它们不具有完全的旋转不变性。接着,我们探索是否存在多个基准可以达到同样的性能。为此,我们比较具有相同参数但具有不同随机初始化的网络的基准。我们的研究发现了两个结论:(1)甚至在宽度较大的网络如WideResNets中,神经网络并不会 converges to a unique basis;(2)基准相关性在冻结某些早期层时明显增加。此外,我们分析了Linear Mode Connectivity,这是一种基准相关性的度量。我们的发现表明,Linear Mode Connectivity在网络宽度增加时会提高,但这并不是基准相关性的提高。

HOOD: Real-Time Robust Human Presence and Out-of-Distribution Detection with Low-Cost FMCW Radar

  • paper_url: http://arxiv.org/abs/2308.02396
  • repo_url: None
  • paper_authors: Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach
  • For: 这种研究旨在实现indoor环境中的人员存在检测,使用60GHz短距离FMCW雷达,并提供一个实时Robust人员存在和非典型检测方法(HOOD)。* Methods: 该方法基于重构建立 architecture,使用60GHz短距离FMCW雷达生成 macro和微距离Doppler图像(RDIs),并通过对RDIs的重构来实现人员存在和非典型检测。* Results: 在基于60GHz短距离FMCW雷达的数据集上,HOOD方法实现了94.36%的平均AUROC水平,并在不同的人类场景下表现良好。此外,HOOD方法也在常见的OOD检测指标上表现出excel。实际实验结果可以在以下链接中找到:https://muskahya.github.io/HOOD
    Abstract Human presence detection in indoor environments using millimeter-wave frequency-modulated continuous-wave (FMCW) radar is challenging due to the presence of moving and stationary clutters in indoor places. This work proposes "HOOD" as a real-time robust human presence and out-of-distribution (OOD) detection method by exploiting 60 GHz short-range FMCW radar. We approach the presence detection application as an OOD detection problem and solve the two problems simultaneously using a single pipeline. Our solution relies on a reconstruction-based architecture and works with radar macro and micro range-Doppler images (RDIs). HOOD aims to accurately detect the "presence" of humans in the presence or absence of moving and stationary disturbers. Since it is also an OOD detector, it aims to detect moving or stationary clutters as OOD in humans' absence and predicts the current scene's output as "no presence." HOOD is an activity-free approach that performs well in different human scenarios. On our dataset collected with a 60 GHz short-range FMCW Radar, we achieve an average AUROC of 94.36%. Additionally, our extensive evaluations and experiments demonstrate that HOOD outperforms state-of-the-art (SOTA) OOD detection methods in terms of common OOD detection metrics. Our real-time experiments are available at: https://muskahya.github.io/HOOD
    摘要 人体存在检测在室内环境中使用毫米波频率调制连续波(FMCW)雷达是具有挑战性,原因在于室内的移动和静止干扰物。这项工作提出了“HOOD”实时可靠人体存在和非典型检测方法,通过利用60GHz短距离FMCW雷达。我们将存在检测应用作为非典型检测问题,并同时解决两个问题使用单一管道。我们的解决方案基于重建建筑,并与雷达macro和微范围Doppler图像(RDI)结合使用。HOOD hoped to accurately detect human presence in the presence or absence of moving and stationary disturbances. Since it is also an OOD detector, it aims to detect moving or stationary clutters as OOD in humans' absence and predicts the current scene's output as "no presence." HOOD is an activity-free approach that performs well in different human scenarios. On our dataset collected with a 60 GHz short-range FMCW Radar, we achieve an average AUROC of 94.36%. Additionally, our extensive evaluations and experiments demonstrate that HOOD outperforms state-of-the-art (SOTA) OOD detection methods in terms of common OOD detection metrics. Our real-time experiments are available at: https://muskahya.github.io/HOOD.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

  • paper_url: http://arxiv.org/abs/2307.12926
  • repo_url: None
  • paper_authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • for: 本文研究了 Contextual Bandits 和模仿学习问题,learner 缺乏直接行动的奖励信息,而是可以在每个回合 queries 专家,获得偏好反馈。learner 的目标是 minimize 执行动的尴尬吗,同时 minimize queries 的数量。
  • methods: 本文提出了一种 Algorithm ,利用在线回归 oracle 来选择行动和决定是否 queries。该 Algorithm 基于函数类型,可以在适当的链接函数下表示专家的偏好模型。
  • results: 本文证明了该 Algorithm 在 Contextual Bandits 设置下可以获得 $O(\min{\sqrt{T}, d/\Delta})$ 的尴尬 regret bound,其中 $T$ 是互动次数,$d$ 是函数类型的 eluder 维度,$\Delta$ 是最佳动作对所有上下文的最小偏好。此外,该 Algorithm 只需要 $O(\min{T, d^2/\Delta^2})$ 个 queries。在 imitation learning 设置下,本文也提出了一种 Algorithm,并证明了其在 regret 和 queries 上有类似的 guarantees。 interessingly,该 Algorithm 可以在专家不优秀时,even learn to outperform the underlying expert,这表明了 preference-based feedback 在 imitation learning 中的实际优势。
    Abstract We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback. The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert. In this paper, we assume that the learner has access to a function class that can represent the expert's preference model under appropriate link functions, and provide an algorithm that leverages an online regression oracle with respect to this function class for choosing its actions and deciding when to query. For the contextual bandit setting, our algorithm achieves a regret bound that combines the best of both worlds, scaling as $O(\min\{\sqrt{T}, d/\Delta\})$, where $T$ represents the number of interactions, $d$ represents the eluder dimension of the function class, and $\Delta$ represents the minimum preference of the optimal action over any suboptimal action under all contexts. Our algorithm does not require the knowledge of $\Delta$, and the obtained regret bound is comparable to what can be achieved in the standard contextual bandits setting where the learner observes reward signals at each round. Additionally, our algorithm makes only $O(\min\{T, d^2/\Delta^2\})$ queries to the expert. We then extend our algorithm to the imitation learning setting, where the learning agent engages with an unknown environment in episodes of length $H$ each, and provide similar guarantees for regret and query complexity. Interestingly, our algorithm for imitation learning can even learn to outperform the underlying expert, when it is suboptimal, highlighting a practical benefit of preference-based feedback in imitation learning.
    摘要 我们考虑了上下文强化策略和模仿学习问题,learner缺乏直接行动的奖励信息。相反,learner可以在每个回合中活动地询问专家, compare two actions 并获得受损的偏好反馈。learner的目标是二元的:一方面,避免 Executed 动作的后悔,另一方面,避免向专家提问。在这篇文章中,我们假设learner有Function class的存在,可以表示专家的偏好模型,并提供了一个给予online regression oracle 的算法,用于选择动作和决定当 query。 For 上下文强化策略设定,我们的算法可以获得 $O(\min\{\sqrt{T}, d/\Delta\})$ 的后悔 bound,where $T$ represents the number of interactions, $d$ represents the eluder dimension of the function class, and $\Delta$ represents the minimum preference of the optimal action over any suboptimal action under all contexts。我们的算法不需要知道 $\Delta$,而且的后悔 bound 与标准上下文强化策略设定,where the learner observes reward signals at each round,相同。此外,我们的算法仅需要 $O(\min\{T, d^2/\Delta^2\})$ 问题给专家。然后,我们延伸我们的算法到模仿学习设定,learner在每个回合中与未知环境进行交互,并提供了相似的后悔和问题复杂性 guarantee。有趣的是,我们的算法可以在模仿学习设定中learn to outperform the underlying expert,当专家是不良的时候,显示了偏好反馈在模仿学习中的实际优点。

  • paper_url: http://arxiv.org/abs/2307.14359
  • repo_url: None
  • paper_authors: Benny Wong
  • for: 这个研究论文是为了探讨一种新的优化方法—— Gaussian Crunching Search(GCS),以及其在不同领域中的应用。
  • methods: 本研究使用 Gaussian Crunching Search(GCS)方法,启发自狄拉克分布中的粒子行为,旨在高效地探索解决空间并趋向于全局最优点。
  • results: 通过实验评估和与现有优化方法比较,本研究展示了 GCS 方法的优势和特点。这篇研究论文对于优化方法的研究和实践者都是一个有价值的资源。
    Abstract Optimization methods are essential in solving complex problems across various domains. In this research paper, we introduce a novel optimization method called Gaussian Crunching Search (GCS). Inspired by the behaviour of particles in a Gaussian distribution, GCS aims to efficiently explore the solution space and converge towards the global optimum. We present a comprehensive analysis of GCS, including its working mechanism, and potential applications. Through experimental evaluations and comparisons with existing optimization methods, we highlight the advantages and strengths of GCS. This research paper serves as a valuable resource for researchers, practitioners, and students interested in optimization, providing insights into the development and potential of Gaussian Crunching Search as a new and promising approach.
    摘要 优化方法是解决复杂问题的关键,在不同领域都有广泛应用。本研究论文介绍一种新的优化方法——高斯压缩搜索(GCS)。该方法 draws inspiration from高斯分布中粒子的行为,旨在效率地探索解决空间并趋向于全球最优点。我们对GCS进行了全面的分析,包括它的工作机制和潜在应用。通过实验评估和现有优化方法的比较,我们提出了GCS的优势和特点。这篇研究论文对优化领域的研究人员、实践者和学生都是一种有价值的资源,为他们提供了GCS的开发和潜力的深入了解。

Graph Neural Networks For Mapping Variables Between Programs – Extended Version

  • paper_url: http://arxiv.org/abs/2307.13014
  • repo_url: https://github.com/pmorvalho/ecai23-gnns-for-mapping-variables-between-programs
  • paper_authors: Pedro Orvalho, Jelle Piepenbrock, Mikoláš Janota, Vasco Manquinho
  • for: 本研究旨在提出一种基于图神经网络(GNN)的变量映射方法,用于比较两个程序的变量集。
  • methods: 本研究使用GNN来映射两个程序的抽象 sintaxis树(AST)中的变量集。
  • results: 实验结果表明,我们的方法可以正确地映射83%的评估数据集中的变量。此外,当前领先的程序修复方法仅能修复约72%的错误程序,而我们的方法则可以修复约88.5%的错误程序。
    Abstract Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.
    摘要 自动化程序分析是计算机科学多个领域的关键研究领域之一,尤其是形式方法和人工智能。由于程序等价问题是不可解决的,因此比较两个程序很困难。通常,为了比较两个程序,需要在两个程序中变量集的关系。因此,将变量 между两个程序映射到相同的空间是有用的,可以用于许多任务,如程序等价、程序分析、程序修复和冲击检测。在这种工作中,我们提出使用图 neural network(GNN)将两个程序的抽象语法树(AST)中的变量集映射到相同的空间。为了证明变量映射的强大性,我们提出了三种用例,用于修复 novice 程序员在入门编程作业(IPA)中常见的错误。我们的实验结果表明,我们的方法可以正确地映射83%的评估数据集。此外,我们的实验还表明,现有的程序修复方法,强调程序结构,只能修复约72%的错误程序。与此相比,我们的方法, solely 基于变量映射,可以修复约88.5%的错误程序。

eess.IV - 2023-07-25

Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines

  • paper_url: http://arxiv.org/abs/2307.13375
  • repo_url: https://github.com/alexanderjaus/atlasdataset
  • paper_authors: Alexander Jaus, Constantin Seibold, Kelsey Hermann, Alexandra Walter, Kristina Giske, Johannes Haubold, Jens Kleesiek, Rainer Stiefelhagen
    for: 这种方法用于生成自动生成的解剖学 segmentation 数据集,使用紧跟式的 nnU-Net 基于 pseudo-labeling 和 anatomy-guided pseudo-label 精度调整。methods: 这种方法通过结合多种分立的知识库,生成了一个整体 CT 扫描图像的 $142$ 块级标签,提供了全面的解剖学覆盖。results: 我们的方法不需要手动标注 durante 标签聚合阶段,并在 BTCV 数据集上实现了 85% dice 分数。此外,我们还进行了医学有效性检查和可扩展自动检查。
    Abstract In this study, we present a method for generating automated anatomy segmentation datasets using a sequential process that involves nnU-Net-based pseudo-labeling and anatomy-guided pseudo-label refinement. By combining various fragmented knowledge bases, we generate a dataset of whole-body CT scans with $142$ voxel-level labels for 533 volumes providing comprehensive anatomical coverage which experts have approved. Our proposed procedure does not rely on manual annotation during the label aggregation stage. We examine its plausibility and usefulness using three complementary checks: Human expert evaluation which approved the dataset, a Deep Learning usefulness benchmark on the BTCV dataset in which we achieve 85% dice score without using its training dataset, and medical validity checks. This evaluation procedure combines scalable automated checks with labor-intensive high-quality expert checks. Besides the dataset, we release our trained unified anatomical segmentation model capable of predicting $142$ anatomical structures on CT data.
    摘要 在这项研究中,我们提出了一种方法,用于自动生成骨科影像分割数据集,通过nnU-Net基于pseudo标签和骨科指导pseudo标签纠正的顺序 proces。通过结合多个分割知识库,我们生成了整体 CT 扫描图像的 $142$ 块级标签,对 533 幅提供了全面的解剖学覆盖,经过专家审核。我们的提议过程不依赖于手动标注 during the label aggregation stage。我们使用三种 complementary 检查来评估我们的方法:人工专家评估,btcv 数据集上的深度学习有用性测试,以及医学有效性检查。这种评估过程结合了扩展自动检查和劳动密集高质量专家检查。除了数据集之外,我们发布了我们的训练过的一体解剖学分割模型,可以在 CT 数据上预测 $142$ 种解剖学结构。

Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks

  • paper_url: http://arxiv.org/abs/2307.13337
  • repo_url: None
  • paper_authors: Cheeun Hong, Kyoung Mu Lee
  • for: 这篇论文的目的是提出一个新的几何量化框架,以解决图像超解析网络中的分布差异问题,以提高量化后的精准度。
  • methods: 这篇论文使用了一个新的几何量化框架,named ODM,它通过在训练过程中直接调整特征分布的方式来降低分布差异问题。此外,ODM还引入了分布偏移来更好地调整各个通道的特征分布。
  • results: 实验结果显示,ODM可以对图像超解析网络进行有效的量化,并且与现有的量化方法相比,ODM可以更好地维持精准度。此外,ODM还可以降低分布差异问题的影响,使量化后的精准度得到更大的提升。
    Abstract Quantization is a promising approach to reduce the high computational complexity of image super-resolution (SR) networks. However, compared to high-level tasks like image classification, low-bit quantization leads to severe accuracy loss in SR networks. This is because feature distributions of SR networks are significantly divergent for each channel or input image, and is thus difficult to determine a quantization range. Existing SR quantization works approach this distribution mismatch problem by dynamically adapting quantization ranges to the variant distributions during test time. However, such dynamic adaptation incurs additional computational costs that limit the benefits of quantization. Instead, we propose a new quantization-aware training framework that effectively Overcomes the Distribution Mismatch problem in SR networks without the need for dynamic adaptation. Intuitively, the mismatch can be reduced by directly regularizing the variance in features during training. However, we observe that variance regularization can collide with the reconstruction loss during training and adversely impact SR accuracy. Thus, we avoid the conflict between two losses by regularizing the variance only when the gradients of variance regularization are cooperative with that of reconstruction. Additionally, to further reduce the distribution mismatch, we introduce distribution offsets to layers with a significant mismatch, which either scales or shifts channel-wise features. Our proposed algorithm, called ODM, effectively reduces the mismatch in distributions with minimal computational overhead. Experimental results show that ODM effectively outperforms existing SR quantization approaches with similar or fewer computations, demonstrating the importance of reducing the distribution mismatch problem. Our code is available at https://github.com/Cheeun/ODM.
    摘要 “量化是一种可能的方法来降低图像超解像网络的高度计算复杂性。然而,相比高水平任务如图像分类,低位数量化对SR网络导致严重的准确损失。这是因为SR网络的特征分布在每个通道或输入图像之间存在严重的分布不对称性。现有的SR量化工作通过在试用时适应性的方式来解决这个分布不对称问题。然而,这种动态适应带来更多的计算成本,限制了量化的利弊。相反,我们提出了一个新的量化意识训练框架,可以有效地解决SR网络中的分布不对称问题,无需动态适应。”“我们观察到,SR网络的特征分布存在严重的分布不对称性,这可以通过对特征的方差调控来缓和。然而,我们发现,在训练时对方差进行调控可能会与重建loss发生冲突,导致SR准确下降。因此,我们避免了这两个损失之间的冲突,通过对方差调控时只有在重建loss的Gradient与方差调控的Gradient之间有着合作的情况下进行调控。”“此外,为了进一步缓和分布不对称问题,我们引入了分布偏移,将通道对频率偏移或扭转。我们称之为ODM。实验结果显示,ODM可以对SR量化进行有效的缓和,并且与相同或 fewer 的计算成本下,实现SR准确的提高。”“我们的代码可以在https://github.com/Cheeun/ODM上找到。”

A Visual Quality Assessment Method for Raster Images in Scanned Document

  • paper_url: http://arxiv.org/abs/2307.13241
  • repo_url: None
  • paper_authors: Justin Yang, Peter Bauer, Todd Harris, Changhyung Lee, Hyeon Seok Seo, Jan P Allebach, Fengqing Zhu
  • for: 本研究探讨了扫描文档中的图像质量,特别是针对灰度图像区域。
  • methods: 我们提出了一种基于机器学习的分类方法,以确定扫描灰度图像的视觉质量是否符合标准。
  • results: 我们通过进行心理学实验,确定了不同分辨率设定下图像质量的可接受程度,并使用这些人工标准来训练机器学习模型。 However, this dataset is unbalanced as most images were rated as visually acceptable. To address the data imbalance problem, we introduce several noise models to simulate the degradation of image quality during the scanning process. Our results show that by including augmented data in training, we can significantly improve the performance of the classifier to determine whether the visual quality of raster images in a scanned document is acceptable or not for a given resolution setting.
    Abstract Image quality assessment (IQA) is an active research area in the field of image processing. Most prior works focus on visual quality of natural images captured by cameras. In this paper, we explore visual quality of scanned documents, focusing on raster image areas. Different from many existing works which aim to estimate a visual quality score, we propose a machine learning based classification method to determine whether the visual quality of a scanned raster image at a given resolution setting is acceptable. We conduct a psychophysical study to determine the acceptability at different image resolutions based on human subject ratings and use them as the ground truth to train our machine learning model. However, this dataset is unbalanced as most images were rated as visually acceptable. To address the data imbalance problem, we introduce several noise models to simulate the degradation of image quality during the scanning process. Our results show that by including augmented data in training, we can significantly improve the performance of the classifier to determine whether the visual quality of raster images in a scanned document is acceptable or not for a given resolution setting.
    摘要 We conduct a psychophysical study to determine the acceptability of images at different resolutions based on human subject ratings and use them as the ground truth to train our machine learning model. However, the dataset is unbalanced as most images were rated as visually acceptable. To address this problem, we introduce several noise models to simulate the degradation of image quality during the scanning process. Our results show that by including augmented data in training, we can significantly improve the performance of the classifier to determine whether the visual quality of raster images in a scanned document is acceptable or not for a given resolution setting.

One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2307.13220
  • repo_url: https://github.com/wangziblake/pisf
  • paper_authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen, Di Guo, Xiaobo Qu
  • for: 这个论文旨在提高快速磁共振成像(MRI)的扫描时间,使用深度学习(DL)技术进行图像重建,但是现有的DL方法在不同的成像场景下的应用尚未得到广泛开发。
  • methods: 这个研究使用了物理学习Synthetic数据框架(PISF),该框架可以通过具有一个训练后可通用的模型来实现多种成像场景下的MRI重建。在2D图像重建中,扫描是分解成多个1D基本问题,并从1D数据生成开始,以便普适化。
  • results: 研究发现,使用PISF学习Synthetic数据,并与提高学习技术相结合,可以在实验室中比较或者更好地重建实验室中的MRI图像,比对使用匹配的真实数据训练的模型。此外,PISF还能够在多种供应商多个中心的成像中表现出优异的适应性。10名经验丰富的医生也证明了PISF在实际应用中的优秀适应性。
    Abstract Magnetic resonance imaging (MRI) is a principal radiological modality that provides radiation-free, abundant, and diverse information about the whole human body for medical diagnosis, but suffers from prolonged scan time. The scan time can be significantly reduced through k-space undersampling but the introduced artifacts need to be removed in image reconstruction. Although deep learning (DL) has emerged as a powerful tool for image reconstruction in fast MRI, its potential in multiple imaging scenarios remains largely untapped. This is because not only collecting large-scale and diverse realistic training data is generally costly and privacy-restricted, but also existing DL methods are hard to handle the practically inevitable mismatch between training and target data. Here, we present a Physics-Informed Synthetic data learning framework for Fast MRI, called PISF, which is the first to enable generalizable DL for multi-scenario MRI reconstruction using solely one trained model. For a 2D image, the reconstruction is separated into many 1D basic problems and starts with the 1D data synthesis, to facilitate generalization. We demonstrate that training DL models on synthetic data, integrated with enhanced learning techniques, can achieve comparable or even better in vivo MRI reconstruction compared to models trained on a matched realistic dataset, reducing the demand for real-world MRI data by up to 96%. Moreover, our PISF shows impressive generalizability in multi-vendor multi-center imaging. Its excellent adaptability to patients has been verified through 10 experienced doctors' evaluations. PISF provides a feasible and cost-effective way to markedly boost the widespread usage of DL in various fast MRI applications, while freeing from the intractable ethical and practical considerations of in vivo human data acquisitions.
    摘要 To address these challenges, we present a Physics-Informed Synthetic data learning framework for Fast MRI, called PISF. PISF enables generalizable DL for multi-scenario MRI reconstruction using solely one trained model. For a 2D image, the reconstruction is separated into many 1D basic problems, starting with the synthesis of 1D data. This approach facilitates generalization and reduces the demand for real-world MRI data by up to 96%. Additionally, PISF demonstrates impressive generalizability in multi-vendor multi-center imaging and has been evaluated by 10 experienced doctors, who have verified its excellent adaptability to patients.PISF provides a feasible and cost-effective way to markedly boost the widespread usage of DL in various fast MRI applications, while freeing from the intractable ethical and practical considerations of in vivo human data acquisitions.

Magnetic Resonance Parameter Mapping using Self-supervised Deep Learning with Model Reinforcement

  • paper_url: http://arxiv.org/abs/2307.13211
  • repo_url: None
  • paper_authors: Wanyu Bian, Albert Jang, Fang Liu
  • for: 本研究提出了一种新的自动学习方法,RELAX-MORE,用于生物医学影像重建(qMRI)。
  • methods: 该方法使用优化算法将模型基于qMRI重建拓展到深度学习框架中,以生成高精度和可靠的MR参数图像。
  • results: 在不同的脑、膝和phantom实验中,提出的方法能够高效地重建MR参数图像,正确地纠正影像损害、除除噪音和恢复图像特征。与其他状态前方法相比,RELAX-MORE显著提高了效率、准确性、可靠性和通用性。这种方法有很大的应用前途,可能为qMRI的临床翻译提供很大的助力。
    Abstract This paper proposes a novel self-supervised learning method, RELAX-MORE, for quantitative MRI (qMRI) reconstruction. The proposed method uses an optimization algorithm to unroll a model-based qMRI reconstruction into a deep learning framework, enabling the generation of highly accurate and robust MR parameter maps at imaging acceleration. Unlike conventional deep learning methods requiring a large amount of training data, RELAX-MORE is a subject-specific method that can be trained on single-subject data through self-supervised learning, making it accessible and practically applicable to many qMRI studies. Using the quantitative $T_1$ mapping as an example at different brain, knee and phantom experiments, the proposed method demonstrates excellent performance in reconstructing MR parameters, correcting imaging artifacts, removing noises, and recovering image features at imperfect imaging conditions. Compared with other state-of-the-art conventional and deep learning methods, RELAX-MORE significantly improves efficiency, accuracy, robustness, and generalizability for rapid MR parameter mapping. This work demonstrates the feasibility of a new self-supervised learning method for rapid MR parameter mapping, with great potential to enhance the clinical translation of qMRI.
    摘要 The proposed method was tested using the quantitative $T_1$ mapping as an example at different brain, knee, and phantom experiments. The results showed that RELAX-MORE demonstrated excellent performance in reconstructing MR parameters, correcting imaging artifacts, removing noise, and recovering image features at imperfect imaging conditions. Compared with other state-of-the-art conventional and deep learning methods, RELAX-MORE significantly improved efficiency, accuracy, robustness, and generalizability for rapid MR parameter mapping.This work demonstrates the feasibility of self-supervised learning for rapid MR parameter mapping, with great potential to enhance the clinical translation of qMRI.

Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review

  • paper_url: http://arxiv.org/abs/2307.13125
  • repo_url: https://github.com/Arminsbss/tumor-classification
  • paper_authors: Aghiles Kebaili, Jérôme Lapuyade-Lahorgue, Su Ruan
  • for: 这篇论文主要针对医疗影像分析领域中深度学习模型的训练数据有限制的问题,即使用深度生成模型来生成更真实和多样化的数据,以提高深度学习模型在医疗影像分析中的表现。
  • methods: 这篇论文主要介绍了三种深度生成模型,即变量自动编码器、对抗网络和扩散模型,以及它们在医疗影像分析中的应用。
  • results: 论文提供了现有深度生成模型在医疗影像分析中的最新状况,以及它们在不同下游任务中的潜在应用,包括分类、分割和cross-modal翻译。同时,论文也评估了每种模型的优缺点,并提出了未来研究的方向。
    Abstract Deep learning has become a popular tool for medical image analysis, but the limited availability of training data remains a major challenge, particularly in the medical field where data acquisition can be costly and subject to privacy regulations. Data augmentation techniques offer a solution by artificially increasing the number of training samples, but these techniques often produce limited and unconvincing results. To address this issue, a growing number of studies have proposed the use of deep generative models to generate more realistic and diverse data that conform to the true distribution of the data. In this review, we focus on three types of deep generative models for medical image augmentation: variational autoencoders, generative adversarial networks, and diffusion models. We provide an overview of the current state of the art in each of these models and discuss their potential for use in different downstream tasks in medical imaging, including classification, segmentation, and cross-modal translation. We also evaluate the strengths and limitations of each model and suggest directions for future research in this field. Our goal is to provide a comprehensive review about the use of deep generative models for medical image augmentation and to highlight the potential of these models for improving the performance of deep learning algorithms in medical image analysis.
    摘要 深度学习已成为医疗影像分析的流行工具,但培训数据的有限性仍然是主要的挑战,尤其在医疗领域,数据获取可能昂贵且受隐私法规限制。数据扩充技术可以人工增加培训样本数量,但这些技术通常生成有限和不置人心的结果。为解决这个问题,一些研究提出使用深度生成模型生成更真实和多样的数据,以符合实际数据的分布。在这篇评论中,我们关注了医疗影像增强中三种深度生成模型:变量自适应网络、对抗网络和扩散模型。我们提供了每种模型的当前状态之讲,并讨论它们在不同下游任务中的潜在应用,包括分类、分割和cross-modal翻译。我们还评估了每种模型的优缺点,并建议未来在这一领域的发展方向。我们的目标是提供深度生成模型在医疗影像增强中的全面评论,并高亮这些模型在医疗影像分析中的潜在优势,以及未来研究的发展方向。

In-Situ Thickness Measurement of Die Silicon Using Voltage Imaging for Hardware Assurance

  • paper_url: http://arxiv.org/abs/2307.13118
  • repo_url: None
  • paper_authors: Olivia P. Dizon-Paradis, Nitin Varshney, M Tanjidur Rahman, Michael Strizich, Haoting Shen, Navid Asadizanjani
  • for: 这篇论文的目的是提出一种基于电子束电压成像、图像处理和蒙特卡洛模拟的快速减厚方法,以便在减厚过程中保证层的厚度均匀。
  • methods: 该方法使用电子束电压成像技术、图像处理和蒙特卡洛模拟来测量剩下的硅层的厚度,以便在减厚过程中实时监测和调整层的厚度。
  • results: 该方法可以快速、准确地测量硅层的厚度,并且可以在减厚过程中实时监测和调整层的厚度,以保证减厚过程的准确性和效率。
    Abstract Hardware assurance of electronics is a challenging task and is of great interest to the government and the electronics industry. Physical inspection-based methods such as reverse engineering (RE) and Trojan scanning (TS) play an important role in hardware assurance. Therefore, there is a growing demand for automation in RE and TS. Many state-of-the-art physical inspection methods incorporate an iterative imaging and delayering workflow. In practice, uniform delayering can be challenging if the thickness of the initial layer of material is non-uniform. Moreover, this non-uniformity can reoccur at any stage during delayering and must be corrected. Therefore, it is critical to evaluate the thickness of the layers to be removed in a real-time fashion. Our proposed method uses electron beam voltage imaging, image processing, and Monte Carlo simulation to measure the thickness of remaining silicon to guide a uniform delayering process
    摘要 硬件保证是电子设备领域的一项挑战,政府和电子行业对其具有很大的兴趣。物理检查方法如反工程(RE)和 Trojan 扫描(TS)在硬件保证中扮演着重要的角色。因此,自动化在 RE 和 TS 中的需求在增长。许多当今的物理检查方法具有迭代性的图像和层去除工艺。在实践中,均匀的层去除可以是一个挑战,特别是当初始材料厚度不均匀时。此外,这种不均匀性可能会在任何阶段重新出现,需要实时纠正。因此,我们的提议的方法使用电子束电幕摄影、图像处理和 Монте卡洛 simulate 来测量剩下的硬件厚度,以便实现均匀的层去除过程。

Automatic Infant Respiration Estimation from Video: A Deep Flow-based Algorithm and a Novel Public Benchmark

  • paper_url: http://arxiv.org/abs/2307.13110
  • repo_url: https://github.com/ostadabbas/infant-respiration-estimation
  • paper_authors: Sai Kumar Reddy Manne, Shaotong Zhu, Sarah Ostadabbas, Michael Wan
  • for: 这篇论文目标是为新生儿提供自动、无接触的呼吸监测。
  • methods: 该论文使用深度学习方法,使用普通的视频捕捉来估计新生儿的呼吸速率和呼吸波形。
  • results: 该论文在使用AIRFlowNet模型和AIR-125 infant数据集上进行训练后,与其他state-of-the-art方法相比,在呼吸速率估计中显著提高了精度, сред平均误差为$\sim$2.9 breaths per minute。
    Abstract Respiration is a critical vital sign for infants, and continuous respiratory monitoring is particularly important for newborns. However, neonates are sensitive and contact-based sensors present challenges in comfort, hygiene, and skin health, especially for preterm babies. As a step toward fully automatic, continuous, and contactless respiratory monitoring, we develop a deep-learning method for estimating respiratory rate and waveform from plain video footage in natural settings. Our automated infant respiration flow-based network (AIRFlowNet) combines video-extracted optical flow input and spatiotemporal convolutional processing tuned to the infant domain. We support our model with the first public annotated infant respiration dataset with 125 videos (AIR-125), drawn from eight infant subjects, set varied pose, lighting, and camera conditions. We include manual respiration annotations and optimize AIRFlowNet training on them using a novel spectral bandpass loss function. When trained and tested on the AIR-125 infant data, our method significantly outperforms other state-of-the-art methods in respiratory rate estimation, achieving a mean absolute error of $\sim$2.9 breaths per minute, compared to $\sim$4.7--6.2 for other public models designed for adult subjects and more uniform environments.
    摘要 呼吸是新生儿的重要生命 Parameter,连续呼吸监测特别重要。然而,新生儿脆弱, contact-based 感测器会带来舒适、卫生和皮肤健康问题,特别是 Premature 新生儿。为了实现完全自动、不间断、无接触的呼吸监测,我们开发了一种深度学习方法,可以从平面视频 Footage 中获取呼吸速率和波形。我们的自动 infant 呼吸流基本网络(AIRFlowNet)将视频提取的光学流输入和空间时间卷积处理相结合,并在婴儿领域进行了调整。我们为模型提供了首个公共标注 infant 呼吸数据集(AIR-125),包含 125 个视频,来自八个婴儿素材,有不同的姿势、照明和摄像头条件。我们还包括手动呼吸标注和使用一种新的spectral bandpass损失函数来优化 AIRFlowNet 的训练。当我们在 AIR-125 婴儿数据集上训练和测试 AIRFlowNet 时,它在呼吸速率估计方面表现出色,与其他公共模型在 adult 主题和更uniform 环境中的表现相比,表现出较低的mean absolute error(约为 2.9 呼吸/分钟)。

Framework for Automatic PCB Marking Detection and Recognition for Hardware Assurance

  • paper_url: http://arxiv.org/abs/2307.13105
  • repo_url: None
  • paper_authors: Olivia P. Dizon-Paradis, Daniel E. Capecci, Nathan T. Jessurun, Damon L. Woodard, Mark M. Tehranipoor, Navid Asadizanjani
  • for: 这个研究的目的是提出一种自动电路板标注EXTRACTION方法,以便为政府和电子行业提供高精度的自动硬件保证。
  • methods: 该研究提出了一种收集PCB标注数据的计划,以及一种将该数据integrated到自动硬件保证过程中的框架。
  • results: 该研究提出了一种收集PCB标注数据的计划和一种将该数据integrated到自动硬件保证过程中的框架,可以提高自动硬件保证的精度。
    Abstract A Bill of Materials (BoM) is a list of all components on a printed circuit board (PCB). Since BoMs are useful for hardware assurance, automatic BoM extraction (AutoBoM) is of great interest to the government and electronics industry. To achieve a high-accuracy AutoBoM process, domain knowledge of PCB text and logos must be utilized. In this study, we discuss the challenges associated with automatic PCB marking extraction and propose 1) a plan for collecting salient PCB marking data, and 2) a framework for incorporating this data for automatic PCB assurance. Given the proposed dataset plan and framework, subsequent future work, implications, and open research possibilities are detailed.
    摘要 一份成本物品列表(Bill of Materials,BoM)是印刷电路板(Printed Circuit Board,PCB)上所有组件的列表。由于BoM对硬件保证有益,因此自动BoM提取(AutoBoM)对政府和电子业界来说非常有利。为实现高精度的AutoBoM过程,需要利用PCB文本和标识符的领域知识。在这篇研究中,我们介绍了自动PCB标识提取的挑战,并提出了1)PCB标识数据收集计划,2)基于这些数据的自动PCB保证框架。根据提出的数据计划和框架,我们释放了未来工作、后续研究和开放的研究可能性。

Enhancing image captioning with depth information using a Transformer-based framework

  • paper_url: http://arxiv.org/abs/2308.03767
  • repo_url: None
  • paper_authors: Aya Mahmoud Ahmed, Mohamed Yousef, Khaled F. Hussain, Yousef Bassyouni Mahdy
  • for: 该论文旨在提高图像captioning任务中的场景理解,通过将RGB图像和其相应的深度图 integrate into一个Transformer-based encoder-decoder框架中,生成多句文本描述3D场景。
  • methods: 该论文提出了一种将RGB图像和深度图进行拼接的方法,并使用Transformer架构来生成多句文本描述。不同的拼接方法也被研究以实现最佳的结果。
  • results: 实验结果表明,使用RGB图像和深度图进行拼接可以提高图像captioning任务的效果,无论depth图是否为真实的或估算值。此外,该论文还提出了一个更正版的NYU-v2数据集,以解决存在问题的标注问题。
    Abstract Captioning images is a challenging scene-understanding task that connects computer vision and natural language processing. While image captioning models have been successful in producing excellent descriptions, the field has primarily focused on generating a single sentence for 2D images. This paper investigates whether integrating depth information with RGB images can enhance the captioning task and generate better descriptions. For this purpose, we propose a Transformer-based encoder-decoder framework for generating a multi-sentence description of a 3D scene. The RGB image and its corresponding depth map are provided as inputs to our framework, which combines them to produce a better understanding of the input scene. Depth maps could be ground truth or estimated, which makes our framework widely applicable to any RGB captioning dataset. We explored different fusion approaches to fuse RGB and depth images. The experiments are performed on the NYU-v2 dataset and the Stanford image paragraph captioning dataset. During our work with the NYU-v2 dataset, we found inconsistent labeling that prevents the benefit of using depth information to enhance the captioning task. The results were even worse than using RGB images only. As a result, we propose a cleaned version of the NYU-v2 dataset that is more consistent and informative. Our results on both datasets demonstrate that the proposed framework effectively benefits from depth information, whether it is ground truth or estimated, and generates better captions. Code, pre-trained models, and the cleaned version of the NYU-v2 dataset will be made publically available.
    摘要 标题: integrate depth information to enhance image captioning task摘要:在图像描述任务中,将RGB图像和深度图像融合在一起,可以提高描述任务的质量。我们提出了一种基于Transformer的RGB图像和深度图像融合框架,用于生成多句话描述3D场景。我们的框架可以将RGB图像和其对应的深度图像作为输入,并将它们融合在一起,以更好地理解输入场景。我们实现了不同的融合方法,并对NYU-v2数据集和Stanford图像描述数据集进行了实验。在我们的工作中,我们发现了NYU-v2数据集中的不一致标注问题,这使得使用深度信息来提高描述任务的 beneficial effects become less effective。最终,我们提出了一个更加一致的NYU-v2数据集,并对这两个数据集进行了实验。我们的结果表明,我们的提议的框架可以借助深度信息,无论是真实的深度图像还是估计的深度图像,生成更好的描述。我们将代码、预训练模型和清洁版NYU-v2数据集公开发布。

cs.SD - 2023-07-24

An objective evaluation of Hearing Aids and DNN-based speech enhancement in complex acoustic scenes

  • paper_url: http://arxiv.org/abs/2307.12888
  • repo_url: https://github.com/enricguso/guso_waspaa23
  • paper_authors: Enric Gusó, Joanna Luberadzka, Martí Baig, Umut Sayin Saraç, Xavier Serra
  • for: 评估五种高级商业听力器(HA)device的目标性能,并与基于深度神经网络(DNN)的抽象环境中的语音提高算法进行比较。
  • methods: 测量一个HAdevice的Head-Related Transfer Functions(HRTFs),用于Synthesize一个双抽象 dataset для训练两种 state-of-the-art causal和非 causal DNN增强模型。然后,通过Ambisonics loudspeaker设置生成一个评估集,并通过KU100 dummy head记录每个HA device上的语音,包括和不包括传统HA算法。
  • results: 发现DNN增强比传统HA算法在噪声抑制和对话情况中的对话智能度指标方面表现更好。
    Abstract We investigate the objective performance of five high-end commercially available Hearing Aid (HA) devices compared to DNN-based speech enhancement algorithms in complex acoustic environments. To this end, we measure the HRTFs of a single HA device to synthesize a binaural dataset for training two state-of-the-art causal and non-causal DNN enhancement models. We then generate an evaluation set of realistic speech-in-noise situations using an Ambisonics loudspeaker setup and record with a KU100 dummy head wearing each of the HA devices, both with and without the conventional HA algorithms, applying the DNN enhancers to the latter. We find that the DNN-based enhancement outperforms the HA algorithms in terms of noise suppression and objective intelligibility metrics.
    摘要 我们研究了五种高级商业可用的听力器(HA)device的目标性能,与基于深度学习(DNN)的语音提升算法在复杂的噪声环境中进行比较。为此,我们测量了一个单个HAdevice的Head-Related Transfer Functions(HRTFs),以生成一个双核心state-of-the-art causal和非 causal DNN增强模型的训练数据集。然后,我们使用一个Ambisonics喇叭设置生成一个评估集,记录了每个HA设备上的KU100假头和不同的传统HA算法,并应用DNN增强器。我们发现,DNN基于的增强方法在噪声抑制和对象智能度量标中超过HA算法。

Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

  • paper_url: http://arxiv.org/abs/2307.13012
  • repo_url: None
  • paper_authors: Martin Lebourdais, Théo Mariotte, Marie Tahon, Anthony Larcher, Antoine Laurent, Silvio Montresor, Sylvain Meignier, Jean-Hugh Thomas
  • for: 本研究旨在提供一个完整的精度评估材料,用于评估不同的语音分类器在单/多通道和多种语音频道上的性能。
  • methods: 本研究使用了一种新的多类别分类模型,将语音分类和 overlap speech detection 融合到一起进行训练。
  • results: 研究结果显示,该模型在多种语音频道和单/多通道上具有优秀的性能,与单独的语音分类和 overlap speech detection 系统相比,具有更高的一致性和更低的训练成本。
    Abstract Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking information about the generalization capacities of the systems. This paper proposes a complete and new benchmark of different VAD and OSD models, on multiple audio setups (single/multi-channel) and speech domains (e.g. media, meeting...). Our 2/3-class systems, which combine a Temporal Convolutional Network with speech representations adapted to the setup, outperform state-of-the-art results. We show that the joint training of these two tasks offers similar performances in terms of F1-score to two dedicated VAD and OSD systems while reducing the training cost. This unique architecture can also be used for single and multichannel speech processing.
    摘要 声音活动和重叠说话检测(简称VAD和OSD)是speaker分类的关键前置处理任务。最终 segmentation 性能强度取决于这两个子任务的稳定性。近年来研究表明,VAD和OSD可以通过多类型分类模型进行合作训练。然而,这些研究通常受到特定的speech域的限制,缺乏对系统总体化能力的信息。本文提出了一个完整的VAD和OSD模型 benchmark,在多个音频设置(单/多通道)和语音域(例如媒体、会议等)上进行测试。我们的2/3类系统,即将时间卷积网络与适应设置的语音表示相结合,在F1分数上超越了现有的state-of-the-artResult。我们显示,将这两个任务合作训练可以与专门的VAD和OSD系统相比,提高了training cost,同时保持了相似的F1分数性能。此特有的架构还可以用于单通道和多通道speech处理。

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

  • paper_url: http://arxiv.org/abs/2307.12767
  • repo_url: None
  • paper_authors: Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
  • for: 这个论文是为了提高自动语音识别的精度和Robustness,尤其是在遇到过去知识不足的情况下。
  • methods: 本论文使用了几种方法,包括frame-based模型和label-based注意力Encoder-Decoder,并通过在单一搜寻算法中交替进行F-Sync和L-Sync搜寻。
  • results: 实验结果显示,提案的搜寻算法比其他搜寻方法来得更低的误差率,并且在过去知识不足的情况下保持稳定性。
    Abstract Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning. Conversely, label-based attention encoder-decoder mitigates this issue using soft attention to the input, while it tends to overestimate labels biased towards its training domain, unlike CTC. We exploit these complementary attributes and propose to integrate the frame- and label-synchronous (F-/L-Sync) decoding alternately performed within a single beam-search scheme. F-Sync decoding leads the decoding for block-wise processing, while L-Sync decoding provides the prioritized hypotheses using look-ahead future frames within a block. We maintain the hypotheses from both decoding methods to perform effective pruning. Experiments demonstrate that the proposed search algorithm achieves lower error rates compared to the other search methods, while being robust against out-of-domain situations.
    摘要 尽管框架基模型,如 CTC 和传播器,与流动自动语音识别有着很好的相互作用,但它们的解码没有使用未来知识,这可能会导致错误的剪辑。相反,标签基于注意力Encoder-Decoder可以通过软注意力来输入,但它往往对它的训练领域偏好,不同于 CTC。我们利用这些 complementary 特点,并提议将帧和标签同步(F-/L-Sync)的解码 alternately 在同一个搜索方案中进行。F-Sync 解码在块级处理中领先,而 L-Sync 解码在预测未来帧内一个块中提供了优先的 гипотезы。我们保留了两个解码方法的假设,以实现有效的剪辑。实验表明,我们提议的搜索算法可以与其他搜索方法相比,在低误差情况下实现更好的性能,而且对于不同领域的情况也具有更高的稳定性。

Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

  • paper_url: http://arxiv.org/abs/2307.12759
  • repo_url: https://github.com/sage-khan/code-switched-noisy-urdu-asr
  • paper_authors: Muhammad Danyal Khan, Raheem Ali, Arshad Aziz
  • For: The paper aims to develop a resource-efficient Automatic Speech Recognition (ASR) system for code-switched Urdu language in a noisy call-center environment.* Methods: The proposed system uses a Chain Hybrid HMM and CNN-TDNN approach, which combines the advantages of HMM and DNN models with less labelled data. The system also utilizes a noisy environment-aware CNN to improve accuracy.* Results: The proposed system achieves a Word Error Rate (WER) of 5.2% in both noisy and clean environments, outperforming other ASR systems for code-switched Urdu language. The system also shows improved performance in recognizing isolated words, numbers, and continuous spontaneous speech.
    Abstract Call Centers have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective Automated Speech Recognition system can accurately transcribe these calls for easy search through call history for specific context and content allowing automatic call monitoring, improving QoS through keyword search and sentiment analysis. ASR for Call Center requires more robustness as telephonic environment are generally noisy. Moreover, there are many low-resourced languages that are on verge of extinction which can be preserved with help of Automatic Speech Recognition Technology. Urdu is the $10^{th}$ most widely spoken language in the world, with 231,295,440 worldwide still remains a resource constrained language in ASR. Regional call-center conversations operate in local language, with a mix of English numbers and technical terms generally causing a "code-switching" problem. Hence, this paper describes an implementation framework of a resource efficient Automatic Speech Recognition/ Speech to Text System in a noisy call-center environment using Chain Hybrid HMM and CNN-TDNN for Code-Switched Urdu Language. Using Hybrid HMM-DNN approach allowed us to utilize the advantages of Neural Network with less labelled data. Adding CNN with TDNN has shown to work better in noisy environment due to CNN's additional frequency dimension which captures extra information from noisy speech, thus improving accuracy. We collected data from various open sources and labelled some of the unlabelled data after analysing its general context and content from Urdu language as well as from commonly used words from other languages, primarily English and were able to achieve WER of 5.2% with noisy as well as clean environment in isolated words or numbers as well as in continuous spontaneous speech.
    摘要 call centers 有庞大的音频数据,可以用于获得有价值的商业意见和电话交流的自动识别,以便搜索历史记录中的特定上下文和内容,进行自动监控、关键词搜索和情感分析。为了在电话交流中提高质量,需要一个更加可靠的自动识别系统,因为电话环境通常噪音。此外,有很多资源受限的语言正在濒临灭绝,可以通过自动识别技术来保存它们。urd 是全球第10大最流行的语言,有231295440名使用者,但它仍然是资源受限的语言。本文描述了在噪音电话交流环境中实现资源有效的自动识别/文本转语系统的实现框架,使用链式混合HMM和CNN-TDNN进行混合语言识别。通过将HMM和DNN结合使用,我们可以利用神经网络的优势,而不需要大量标注数据。另外,通过添加CNN和TDNN,我们可以在噪音环境中提高准确率,因为CNN的额外频率维度可以捕捉更多的噪音语音信息。我们从多个开源资源中收集数据,并对一些未标注的数据进行分析和标注,以获得urd语言的WER为5.2%,包括噪音和清晰环境下的隔离单词或数字以及连续自由语言。

IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models

  • paper_url: http://arxiv.org/abs/2307.13005
  • repo_url: None
  • paper_authors: Hiromu Yakura, Masataka Goto
  • for: 帮助 novice 用户自由生成音乐音频,即使他们没有音乐知识,如和声进程和乐器知识。
  • methods: 我们使用 text-to-audio 生成技术,并提供了一个特定的 interface 以帮助用户逐步调整文本提示和选择有利的音频先导。
  • results: 通过这种双重探索方式,用户可以了解不同文本提示和音频先导对生成结果的影响,并逐步实现他们的模糊定义目标。
    Abstract Recent text-to-audio generation techniques have the potential to allow novice users to freely generate music audio. Even if they do not have musical knowledge, such as about chord progressions and instruments, users can try various text prompts to generate audio. However, compared to the image domain, gaining a clear understanding of the space of possible music audios is difficult because users cannot listen to the variations of the generated audios simultaneously. We therefore facilitate users in exploring not only text prompts but also audio priors that constrain the text-to-audio music generation process. This dual-sided exploration enables users to discern the impact of different text prompts and audio priors on the generation results through iterative comparison of them. Our developed interface, IteraTTA, is specifically designed to aid users in refining text prompts and selecting favorable audio priors from the generated audios. With this, users can progressively reach their loosely-specified goals while understanding and exploring the space of possible results. Our implementation and discussions highlight design considerations that are specifically required for text-to-audio models and how interaction techniques can contribute to their effectiveness.
    摘要 现代文本到音频生成技术具有让新手可以自由生成音频的潜力。即使他们没有音乐知识,如和声进程和乐器,用户仍可以尝试不同的文本提示来生成音频。然而,与图像领域不同,了解生成音频的可能性空间是困难的,因为用户无法同时听到生成音频的变化。我们因此为用户提供了探索不同文本提示和音频先前的机会。这种双重探索允许用户通过比较不同的文本提示和音频先前来了解不同的生成结果的影响。我们开发的界面IteraTTA专门为用户帮助制定文本提示和选择生成音频中有利的先前。通过这种方式,用户可以逐步实现自己的抽象目标,同时了解和探索生成结果的可能性空间。我们的实现和讨论探讨了特定于文本到音频模型的设计考虑因素,以及如何通过互动技术来提高其效iveness。

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

  • paper_url: http://arxiv.org/abs/2307.12659
  • repo_url: None
  • paper_authors: Edward Fish, Umberto Michieli, Mete Ozay
  • for: 这篇论文目的是提出一种可以个别化的数字化模型优化方法,以提高自动语音识别(ASR)模型在移动设备上的部署。
  • methods: 本文提出了一种混合精度量化方法(myQASR),可以根据不同的用户和目标领域,生成特化的数字化方案,并且不需要精确调整。myQASR通过分析全精度活动值,自动评估网络层的数字化敏感度,然后生成个别化的混合精度量化方案。
  • results: 本文的实验结果显示,myQASR可以对大规模ASR模型进行个别化优化,以提高特定的性别、语言和说话者的表现。
    Abstract Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small set of unlabelled samples from the target domain. To this end, we propose myQASR, a mixed-precision quantization method that generates tailored quantization schemes for diverse users under any memory requirement with no fine-tuning. myQASR automatically evaluates the quantization sensitivity of network layers by analysing the full-precision activation values. We are then able to generate a personalised mixed-precision quantization scheme for any pre-determined memory budget. Results for large-scale ASR models show how myQASR improves performance for specific genders, languages, and speakers.
    摘要 myQASR uses mixed-precision quantization to generate personalized schemes for each user. The method evaluates the quantization sensitivity of network layers by analyzing full-precision activation values. This allows for the creation of a personalized mixed-precision quantization scheme for any pre-determined memory budget.Results for large-scale ASR models show that myQASR improves performance for specific genders, languages, and speakers. This demonstrates the effectiveness of personalized quantization for ASR models, and highlights the potential of myQASR for improving the performance of ASR systems in a wide range of applications.Here is the text in Simplified Chinese:最近的自动语音识别(ASR)技术发展,导致大型AI模型的出现,但这些模型在移动设备上部署是不实际的。模型压缩是一种解决方案,但它只能部署到一个限制的子domain中。我们提出myQASR方法,它可以在压缩过程中为不同用户personal化ASR模型,而无需细化。myQASR使用混合精度压缩生成用户化的压缩方案,并通过分析全精度活动值来评估网络层的压缩敏感度。这allow for生成任何预先确定的内存预算的个性化混合精度压缩方案。results表明,myQASR可以对大规模ASR模型进行特定性别、语言和发音人的改进。这表明个性化压缩对ASR模型的性能有积极的影响,并 highlights myQASR在各种应用场景中的潜在应用前景。

Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

  • paper_url: http://arxiv.org/abs/2307.12498
  • repo_url: https://github.com/WAPATASR/WAPAT
  • paper_authors: Gege Qi, Yuefeng Chen, Xiaofeng Mao, Xiaojun Jia, Ranjie Duan, Rong Zhang, Hui Xue
  • for: 提高自动语音识别(ASR)模型在小量干扰和大域转移下的稳定性。
  • methods: 使用phoneme空间中的对抗例进行听话示例的挤压,使模型对phoneme表示具有抗衰减性,并通过使用挤压示例的phoneme表示来引导对抗例生成,以找到更稳定和多样的梯度方向,提高总体性。
  • results: 在End-to-end Speech Challenge Benchmark(ESB)上实现了6.28%的WRER降低,超过原始模型,达到新的领域最优。
    Abstract Developing a practically-robust automatic speech recognition (ASR) is challenging since the model should not only maintain the original performance on clean samples, but also achieve consistent efficacy under small volume perturbations and large domain shifts. To address this problem, we propose a novel WavAugment Guided Phoneme Adversarial Training (wapat). wapat use adversarial examples in phoneme space as augmentation to make the model invariant to minor fluctuations in phoneme representation and preserve the performance on clean samples. In addition, wapat utilizes the phoneme representation of augmented samples to guide the generation of adversaries, which helps to find more stable and diverse gradient-directions, resulting in improved generalization. Extensive experiments demonstrate the effectiveness of wapat on End-to-end Speech Challenge Benchmark (ESB). Notably, SpeechLM-wapat outperforms the original model by 6.28% WER reduction on ESB, achieving the new state-of-the-art.
    摘要 发展一个实用robust的自动语音识别(ASR)模型是挑战,因为模型不仅需要保持干净样本上的原始性能,还需要在小量干扰和大域变化下达到一致的效果。为解决这问题,我们提出了一种新的WavAugment导向的phoneme adversarial Training(wapat)方法。wapat使用phoneme空间中的 adversarial example作为增强素,以使模型对phoneme表示的小变化免疫,并保持干净样本上的性能。此外,wapat使用增强后的phoneme表示来导引敌对生成,以找到更稳定和多样的梯度方向,从而提高了总体的一致性。我们在End-to-end Speech Challenge Benchmark(ESB)上进行了广泛的实验,并证明了wapat的有效性。特别是,SpeechLM-wapat比原始模型减少了6.28%的WRR,达到了新的状态态-of-the-art。

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

  • paper_url: http://arxiv.org/abs/2307.12445
  • repo_url: None
  • paper_authors: Ivan Vallés-Pérez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote
  • for: This paper aims to learn shared representations of phonetic and acoustic spaces in the speech domain using a CLIP-based model.
  • methods: The proposed model is trained using the CLIP framework, which enables deep learning systems to learn shared latent spaces between images and text descriptions.
  • results: The model shows sensitivity to phonetic changes and robustness against different types of noise, with a 91% score drop when replacing 20% of the phonemes at random and a 10% performance drop when mixing the audio with 75% of Gaussian noise. The resulting embeddings are also found to be useful for downstream applications such as intelligibility evaluation and speech generation.
    Abstract Numerous examples in the literature proved that deep learning models have the ability to work well with multimodal data. Recently, CLIP has enabled deep learning systems to learn shared latent spaces between images and text descriptions, with outstanding zero- or few-shot results in downstream tasks. In this paper we explore the same idea proposed by CLIP but applied to the speech domain, where the phonetic and acoustic spaces usually coexist. We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces. The results show that the proposed model is sensible to phonetic changes, with a 91% of score drops when replacing 20% of the phonemes at random, while providing substantial robustness against different kinds of noise, with a 10% performance drop when mixing the audio with 75% of Gaussian noise. We also provide empirical evidence showing that the resulting embeddings are useful for a variety of downstream applications, such as intelligibility evaluation and the ability to leverage rich pre-trained phonetic embeddings in speech generation task. Finally, we discuss potential applications with interesting implications for the speech generation and recognition fields.
    摘要 多种例子在 literatur 中证明深度学习模型可以处理多模态数据。近期,CLIP 使得深度学习系统可以学习共同的封闭空间 между图像和文本描述, obtained outstanding zero- or few-shot results in downstream tasks。在这篇文章中,我们探索了同样的想法,但是应用到语音领域, где phonetic 和 acoustic 空间通常共存。我们使用 CLIP 基于的模型, aiming to learn shared representations of phonetic and acoustic spaces。结果显示,我们的模型对 phonetic 变化敏感,对于Randomly replacing 20% of phonemes 的情况下,得分下降了 91%,而对于不同类型的噪音混合情况下,得分下降了 10%。我们还提供了实验证明, embedding 是对下游应用场景有用,如智能识别和speech generation 任务中的质量评估。最后,我们讨论了可能的应用,具有 interessing 的潜在应用于语音生成和识别领域。

eess.AS - 2023-07-24

Adaptation of Whisper models to child speech recognition

  • paper_url: http://arxiv.org/abs/2307.13008
  • repo_url: https://github.com/c3imaging/whisper_child_speech
  • paper_authors: Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Peter Corcoran, Horia Cucu
  • for: 提高儿童语音识别(ASR)系统对儿童语音的识别精度。
  • methods: 利用大量的成人语音数据创建多语言ASR模型,如Whisper,并对其进行适应化以适应儿童语音。
  • results: 对Whisper模型进行finetuning后,对儿童语音识别表现出显著改善,而使用自然语言生成模型wav2vec2进行finetuning则超过了Whisper模型的表现。
    Abstract Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child speech to improve ASR for children. In addition, we compare Whisper child-adaptations with finetuned self-supervised models, such as wav2vec2. We demonstrate that finetuning Whisper on child speech yields significant improvements in ASR performance on child speech, compared to non finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2 models that have been finetuned on child speech outperforms Whisper finetuning.
    摘要 自动语音识别(ASR)系统经常遇到儿童语音识别问题,原因在于缺乏儿童语音数据集,以训练适合儿童的 ASR 模型。然而,有大量已注解的成人语音数据集,用于创建多语言 ASR 模型,如呐喊(Whisper)。我们的工作旨在探讨是否可以将这些模型适应儿童语音,以提高 ASR 性能。此外,我们还比较了呐喊儿童化的模型与自动学习的 wav2vec2 模型,并证明了后者在儿童语音识别中表现更优。

Performance Comparison Between VoLTE and non-VoLTE Voice Calls During Mobility in Commercial Deployment: A Drive Test-Based Analysis

  • paper_url: http://arxiv.org/abs/2307.12397
  • repo_url: None
  • paper_authors: Rashed Hasan Ratul, Muhammad Iqbal, Jen-Yi Pan, Mohammad Mahadi Al Deen, Mohammad Tawhid Kawser, Mohammad Masum Billah
  • for: 这个论文主要研究了 Voice over LTE (VoLTE) 技术对移动通信网络的性能优化的影响,尤其是 call setup delay 和用户设备 (UE) 电池寿命。
  • methods: 该研究使用了 XCAL 驱动测试工具收集实时网络参数数据,并对 VoLTE 和非 VoLTE 语音呼电中的实时网络特性进行分析。
  • results: 研究发现,使用 VoLTE 技术可以提高 call setup delay 的速度和 UE 电池寿命,并且在 VoLTE 和非 VoLTE 语音呼电中比较研究了 DRX 机制。这些结果可以帮助优化移动通信网络的质量服务 (QoS)。
    Abstract The optimization of network performance is vital for the delivery of services using standard cellular technologies for mobile communications. Call setup delay and User Equipment (UE) battery savings significantly influence network performance. Improving these factors is vital for ensuring optimal service delivery. In comparison to traditional circuit-switched voice calls, VoLTE (Voice over LTE) technology offers faster call setup durations and better battery-saving performance. To validate these claims, a drive test was carried out using the XCAL drive test tool to collect real-time network parameter details in VoLTE and non-VoLTE voice calls. The findings highlight the analysis of real-time network characteristics, such as the call setup delay calculation, battery-saving performance, and DRX mechanism. The study contributes to the understanding of network optimization strategies and provides insights for enhancing the quality of service (QoS) in mobile communication networks. Examining VoLTE and non-VoLTE operations, this research highlights the substantial energy savings obtained by VoLTE. Specifically, VoLTE saves approximately 60.76% of energy before the Service Request and approximately 38.97% of energy after the Service Request. Moreover, VoLTE to VoLTE calls have a 72.6% faster call setup delay than non-VoLTE-based LTE to LTE calls, because of fewer signaling messages required. Furthermore, as compared to non-VoLTE to non-VoLTE calls, VoLTE to non-VoLTE calls offer an 18.6% faster call setup delay. These results showcase the performance advantages of VoLTE and reinforce its potential for offering better services in wireless communication networks.
    摘要 网络性能优化对于通过标准无线通信技术提供服务是非常重要。启动延迟和用户设备(UE)电池寿命具有重要影响力。提高这些因素对于确保优质服务的提供是非常重要。与传统的循环连接语音电话相比,VoLTE(音声在LTE)技术提供了更快的启动延迟和更好的电池寿命性能。为了证明这些声明,我们使用了XCAL驱动测试工具来收集实时网络参数详细信息在VoLTE和非VoLTE语音电话中。研究结果显示了实时网络特性的分析,包括启动延迟计算、电池寿命性能和DRX机制。这项研究对网络优化策略的理解和服务质量(QoS)的提高做出了贡献。对VoLTE和非VoLTE操作进行比较,这项研究显示了VoLTE可以获得约60.76%的电源储存和约38.97%的电源储存。此外,VoLTE到VoLTE电话的启动延迟比非VoLTE基于LTE到LTE电话的启动延迟更快,这是因为VoLTE需要 fewer signaling messages。此外,VoLTE到非VoLTE电话的启动延迟比非VoLTE到非VoLTE电话的启动延迟更快,这是因为VoLTE需要更少的信号处理。这些结果显示了VoLTE的性能优势,并证明它在无线通信网络中可以提供更好的服务。