cs.LG - 2023-07-25

Multi-GPU Approach for Training of Graph ML Models on large CFD Meshes

  • paper_url: http://arxiv.org/abs/2307.13592
  • repo_url: None
  • paper_authors: Sebastian Strönisch, Maximilian Sander, Andreas Knüpfer, Marcus Meyer
  • For: 这 paper 的目的是开发一种基于机器学习的拟合模型,用于加速计算流体力学 simulate 的过程。* Methods: 这 paper 使用了图 neural network (GNN) 作为拟合模型,并在多个 GPU 上分区和分配流体流体Domain。* Results: Comparing 该 paper 的拟合模型与传统的分布式模型,后者 производи了更好的预测结果,并且超越了该拟合模型。
    Abstract Mesh-based numerical solvers are an important part in many design tool chains. However, accurate simulations like computational fluid dynamics are time and resource consuming which is why surrogate models are employed to speed-up the solution process. Machine Learning based surrogate models on the other hand are fast in predicting approximate solutions but often lack accuracy. Thus, the development of the predictor in a predictor-corrector approach is the focus here, where the surrogate model predicts a flow field and the numerical solver corrects it. This paper scales a state-of-the-art surrogate model from the domain of graph-based machine learning to industry-relevant mesh sizes of a numerical flow simulation. The approach partitions and distributes the flow domain to multiple GPUs and provides halo exchange between these partitions during training. The utilized graph neural network operates directly on the numerical mesh and is able to preserve complex geometries as well as all other properties of the mesh. The proposed surrogate model is evaluated with an application on a three dimensional turbomachinery setup and compared to a traditionally trained distributed model. The results show that the traditional approach produces superior predictions and outperforms the proposed surrogate model. Possible explanations, improvements and future directions are outlined.
    摘要 mesh-based numerical solvers 是设计工具链中的一个重要部分。然而,精确的 simulate like computational fluid dynamics 需要时间和资源,这是 why 使用 surrogate models 来快速解决方案。机器学习基于 surrogate models 则很快速预测 approximate solutions,但frequently lack accuracy。因此,在这里的发展问题是predictor-corrector方法中的开发预测器,这里的 surrogate model 预测了流场,而numerical solver 则更正。这篇文章 scales 一个 state-of-the-art surrogate model 从 domain of graph-based machine learning 到 industry-relevant mesh sizes 的 numerical flow simulation。该方法将流体Domain partitioned 和分配到多个 GPUs,并在训练中提供了 halos exchange между这些分区。使用的 graph neural network 直接操作 numerical mesh,能够保留复杂的几何和所有其他 mesh 的属性。提案的 surrogate model 与一个 three-dimensional turbomachinery 应用中进行比较,与传统的分布式训练模型相比,传统方法产生了更好的预测,超越了提案的 surrogate model。 possible explanations, improvements 和 future directions 也被详细描述。

Settling the Sample Complexity of Online Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13586
  • repo_url: None
  • paper_authors: Zihan Zhang, Yuxin Chen, Jason D. Lee, Simon S. Du
  • for: The paper is written to address the issue of data efficiency in online reinforcement learning, specifically the problem of achieving minimax-optimal regret without incurring any burn-in cost.
  • methods: The paper proposes a modified version of Monotonic Value Propagation (MVP), a model-based algorithm, and develops a new regret decomposition strategy and analysis paradigm to decouple complicated statistical dependency.
  • results: The paper achieves a regret on the order of $(SAH^3K)/\sqrt{\log K}$, which matches the minimax lower bound for the entire range of sample size $K\geq 1$, and translates to a PAC sample complexity of $\frac{SAH^3}{\varepsilon^2}$ up to log factor, which is minimax-optimal for the full $\varepsilon$-range.
    Abstract A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a ``large-sample'' regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for the context of finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version of Monotonic Value Propagation (MVP), a model-based algorithm proposed by \cite{zhang2020reinforcement}, achieves a regret on the order of (modulo log factors) \begin{equation*} \min\big\{ \sqrt{SAH^3K}, \,HK \big\}, \end{equation*} where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, and $K$ is the total number of episodes. This regret matches the minimax lower bound for the entire range of sample size $K\geq 1$, essentially eliminating any burn-in requirement. It also translates to a PAC sample complexity (i.e., the number of episodes needed to yield $\varepsilon$-accuracy) of $\frac{SAH^3}{\varepsilon^2}$ up to log factor, which is minimax-optimal for the full $\varepsilon$-range. Further, we extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances. The key technical innovation lies in the development of a new regret decomposition strategy and a novel analysis paradigm to decouple complicated statistical dependency -- a long-standing challenge facing the analysis of online RL in the sample-hungry regime.
    摘要 在在线强化学习中,数据效率是中心问题。虽然一些最近的研究已达到了几何上的最小误差,但这些结果的可行性只在大样本 regime 中保证,这意味着在使用这些算法时需要支付巨大的烧毁成本。如何在不支付任何烧毁成本的情况下实现最优误差响应是在RL理论中的开放问题。我们在具有finite-horizon不规则 Markov决策过程的上下文中解决了这个问题。我们证明了一种修改后的升权宣传(MVP)算法可以在不支付任何烧毁成本的情况下实现误差的最小化。具体来说,我们证明了MVP算法在SAH^3K sample size中的误差是(modulo log factor)最多为\begin{equation*} \min\big\{ \sqrt{SAH^3K}, \,HK \big\}, \end{equation*} where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, and $K$ is the total number of episodes.这个误差与整个样本大小$K\geq 1$的最小误差响应相同,实际上消除了任何烧毁要求。它还翻译到一个PAC样本复杂度(即要求episode数来实现 $\varepsilon$-精度)为$\frac{SAH^3}{\varepsilon^2}$,这是最优的PAC样本复杂度。此外,我们还扩展了我们的理论,探讨了问题依赖于问题特定的量,如优值/成本和某些方差。我们的关键技术创新在于开发了一种新的误差分解策略和一种新的分析方法,用于解耦在线RL在样本匮乏 regime 中的复杂的统计依赖关系。

Piecewise Linear Functions Representable with Infinite Width Shallow ReLU Neural Networks

  • paper_url: http://arxiv.org/abs/2307.14373
  • repo_url: None
  • paper_authors: Sarah McCarty
  • for: 这个论文研究了连续piecewise线性函数的无限宽深度学习模型,使用Rectified Linear Unit(ReLU)作为活化函数。
  • methods: 通过积分表示,这种深度学习模型可以被视为一种finite cost shallow neural network,并且可以被相应的signed,finite measure表示在适当的参数空间中。
  • results: 论文证明了 ONgie et al.的 conjecture,即任何连续piecewise线性函数都可以通过这种无限宽深度学习模型表示,而且这种表示可以被finite width shallow ReLU neural network来实现。
    Abstract This paper analyzes representations of continuous piecewise linear functions with infinite width, finite cost shallow neural networks using the rectified linear unit (ReLU) as an activation function. Through its integral representation, a shallow neural network can be identified by the corresponding signed, finite measure on an appropriate parameter space. We map these measures on the parameter space to measures on the projective $n$-sphere cross $\mathbb{R}$, allowing points in the parameter space to be bijectively mapped to hyperplanes in the domain of the function. We prove a conjecture of Ongie et al. that every continuous piecewise linear function expressible with this kind of infinite width neural network is expressible as a finite width shallow ReLU neural network.
    摘要 Simplified Chinese translation:这篇论文研究了无限宽 continuous piecewise linear function的表示,使用 finite cost shallow neural network 和 rectified linear unit (ReLU) activation function。通过积分表示,我们可以将 shallow neural network 与signed, finite measure on 相应的参数空间进行对应。然后,我们将这些度量映射到 projective $n$-sphere cross $\mathbb{R}$ 上,使得参数空间中的点可以一一映射到函数的域中的hyperplane。我们证明了 Ongie et al. 的 conjecture,即任何可表示为 infinite width neural network 的 continuous piecewise linear function都可以表示为 finite width shallow ReLU neural network。

Comparing Forward and Inverse Design Paradigms: A Case Study on Refractory High-Entropy Alloys

  • paper_url: http://arxiv.org/abs/2307.13581
  • repo_url: None
  • paper_authors: Arindam Debnath, Lavanya Raman, Wenjie Li, Adam M. Krajewski, Marcia Ahn, Shuang Lin, Shunli Shang, Allison M. Beese, Zi-Kui Liu, Wesley F. Reinhart
  • for: 本研究的目的是比较前向和反向设计模型范文在实际应用中的性能。
  • methods: 本研究使用了反向设计方法,并对其进行了对比,以评估其在不同的目标和约束下的性能。
  • results: 研究发现,反向设计方法在refractory高 entropy合金设计中表现出色,能够更好地满足不同的目标和约束。
    Abstract The rapid design of advanced materials is a topic of great scientific interest. The conventional, ``forward'' paradigm of materials design involves evaluating multiple candidates to determine the best candidate that matches the target properties. However, recent advances in the field of deep learning have given rise to the possibility of an ``inverse'' design paradigm for advanced materials, wherein a model provided with the target properties is able to find the best candidate. Being a relatively new concept, there remains a need to systematically evaluate how these two paradigms perform in practical applications. Therefore, the objective of this study is to directly, quantitatively compare the forward and inverse design modeling paradigms. We do so by considering two case studies of refractory high-entropy alloy design with different objectives and constraints and comparing the inverse design method to other forward schemes like localized forward search, high throughput screening, and multi objective optimization.
    摘要 rapid design of advanced materials 是科学领域中很受欢迎的话题。传统的,“前进”的材料设计方法是评估多个候选者,以确定最佳符合目标性能的候选者。然而,近年,深度学习的发展使得“反向”的材料设计方法变得可能,其中一个模型提供目标性能后,能够找到最佳候选者。由于是一个新的概念,因此还需要系统地评估这两种方法在实际应用中的性能。因此,本研究的目标是直接、量化地比较前进和反向设计模型方法。我们通过考虑高熔环境高级合金设计的两个案例研究,并与其他前进方案如本地前进搜索、高通过率检测和多目标优化进行比较。

Reinterpreting survival analysis in the universal approximator age

  • paper_url: http://arxiv.org/abs/2307.13579
  • repo_url: https://github.com/sdittmer/survival_analysis_sumo_plus_plus
  • paper_authors: Sören Dittmer, Michael Roberts, Jacobus Preller, AIX COVNET, James H. F. Rudd, John A. D. Aston, Carola-Bibiane Schönlieb
  • for: 本文旨在提供用于深度学习中survival分析的工具,以便全面利用survival分析的潜在力量。
  • methods: 本文提出了一种新的损失函数、评价指标和首个可提供survival曲线的universalapproximation网络。
  • results: 对比其他方法,该损失函数和模型在大规模数字实验中表现出色。
    Abstract Survival analysis is an integral part of the statistical toolbox. However, while most domains of classical statistics have embraced deep learning, survival analysis only recently gained some minor attention from the deep learning community. This recent development is likely in part motivated by the COVID-19 pandemic. We aim to provide the tools needed to fully harness the potential of survival analysis in deep learning. On the one hand, we discuss how survival analysis connects to classification and regression. On the other hand, we provide technical tools. We provide a new loss function, evaluation metrics, and the first universal approximating network that provably produces survival curves without numeric integration. We show that the loss function and model outperform other approaches using a large numerical study.
    摘要 生存分析是统计工具箱中的一个重要组成部分。然而,在классиical统计领域中,大多数领域都已经欢迎了深度学习,而生存分析则只是最近才从深度学习社区得到了一些微的关注。这种最近的发展可能受到了COVID-19大流行的影响。我们的目标是为生存分析在深度学习中充分发挥作用的工具。一方面,我们讨论了生存分析与分类和回归之间的连接。另一方面,我们提供技术工具。我们提出了一个新的损失函数、评估指标和首个可提供生存曲线的全面拟合网络,无需数值积分。我们通过大量的数据分析表明,我们的损失函数和模型在其他方法上出现较好的表现。

PT$\mathrm{L}^{p}$: Partial Transport $\mathrm{L}^{p}$ Distances

  • paper_url: http://arxiv.org/abs/2307.13571
  • repo_url: None
  • paper_authors: Xinran Liu, Yikun Bai, Huy Tran, Zhanqi Zhu, Matthew Thorpe, Soheil Kolouri
  • for: 本文提出了一种新的策略来比较普通的信号,即基于优化运输的partial transport $\mathrm{L}^{p}$距离。
  • methods: 本文使用了优化运输框架,并提出了partial transport $\mathrm{L}^{p}$距离作为一种新的策略来比较普通的信号。
  • results: 本文提供了partial transport $\mathrm{L}^{p}$距离的理论背景,包括优化计划的存在和距离在不同的限制下的行为。此外,本文还介绍了对这种距离的剖分变化,以及它在信号分类和最近邻近分类中的应用。
    Abstract Optimal transport and its related problems, including optimal partial transport, have proven to be valuable tools in machine learning for computing meaningful distances between probability or positive measures. This success has led to a growing interest in defining transport-based distances that allow for comparing signed measures and, more generally, multi-channeled signals. Transport $\mathrm{L}^{p}$ distances are notable extensions of the optimal transport framework to signed and possibly multi-channeled signals. In this paper, we introduce partial transport $\mathrm{L}^{p}$ distances as a new family of metrics for comparing generic signals, benefiting from the robustness of partial transport distances. We provide theoretical background such as the existence of optimal plans and the behavior of the distance in various limits. Furthermore, we introduce the sliced variation of these distances, which allows for rapid comparison of generic signals. Finally, we demonstrate the application of the proposed distances in signal class separability and nearest neighbor classification.
    摘要 优化交通和其相关问题,包括优化部分交通,在机器学习中证明是有用的工具来计算概率或正式推论中的有意义距离。这种成功引起了比较Transport基于距离的定义,以便比较签名的推论和更一般的多通道信号。TransportLP distances是优化交通框架中的可扩展,用于比较签名或多通道信号。在这篇论文中,我们介绍partial transportLP distances作为比较通用信号的新家族度量,受到部分交通距离的稳定性的启发。我们还提供了有关最佳计划的存在和距离的不同限制下的行为。此外,我们还介绍了这些距离的割裂变种,可以快速比较通用信号。最后,我们示出了提案的距离在信号分类和最近邻居分类中的应用。

Introducing Hybrid Modeling with Time-series-Transformers: A Comparative Study of Series and Parallel Approach in Batch Crystallization

  • paper_url: http://arxiv.org/abs/2308.05749
  • repo_url: None
  • paper_authors: Niranjan Sitapure, Joseph S Kwon
  • For: The paper is written for the development of a first-of-a-kind, attention-based time-series transformer (TST) hybrid framework for batch crystallization, which aims to improve the accuracy and interpretability of digital twins in chemical manufacturing.* Methods: The paper uses a hybrid approach that combines first-principles physics-based dynamics with machine learning (ML) models, specifically attention-based TSTs, to capture long-term and short-term changes in process states. The authors compare two different configurations of TST-based hybrid models and evaluate their performance using normalized-mean-square-error (NMSE) and $R^2$ values.* Results: The paper reports improved accuracy and interpretability of the TST-based hybrid models compared to traditional black-box models, with NMSE values in the range of $[10, 50]\times10^{-4}$ and $R^2$ values over 0.99. The authors also demonstrate the effectiveness of the hybrid models in predicting batch crystallization processes.
    Abstract Most existing digital twins rely on data-driven black-box models, predominantly using deep neural recurrent, and convolutional neural networks (DNNs, RNNs, and CNNs) to capture the dynamics of chemical systems. However, these models have not seen the light of day, given the hesitance of directly deploying a black-box tool in practice due to safety and operational issues. To tackle this conundrum, hybrid models combining first-principles physics-based dynamics with machine learning (ML) models have increased in popularity as they are considered a 'best of both worlds' approach. That said, existing simple DNN models are not adept at long-term time-series predictions and utilizing contextual information on the trajectory of the process dynamics. Recently, attention-based time-series transformers (TSTs) that leverage multi-headed attention mechanism and positional encoding to capture long-term and short-term changes in process states have shown high predictive performance. Thus, a first-of-a-kind, TST-based hybrid framework has been developed for batch crystallization, demonstrating improved accuracy and interpretability compared to traditional black-box models. Specifically, two different configurations (i.e., series and parallel) of TST-based hybrid models are constructed and compared, which show a normalized-mean-square-error (NMSE) in the range of $[10, 50]\times10^{-4}$ and an $R^2$ value over 0.99. Given the growing adoption of digital twins, next-generation attention-based hybrid models are expected to play a crucial role in shaping the future of chemical manufacturing.
    摘要 现有的数字双胞虫大多采用数据驱动黑盒模型,主要使用深度循环神经网络(RNN)和卷积神经网络(CNN)来捕捉化学系统的动态。然而,这些模型尚未得到实际应用,因为直接部署黑盒工具会带来安全和运营问题。为解决这个悖论,Hybrid模型,结合物理基础知识和机器学习(ML)模型,在化学制造中得到了广泛的应用。然而,现有的简单的DNN模型不具备长期时间序列预测和利用过程动态轨迹上的 контекст信息。最近,听力基于时间序列变换器(TST),利用多头听力机制和位置编码,能够捕捉长期和短期变化的过程状态,已经显示出高预测性能。因此,一种首次实现的TST基于混合框架,在批晶凝聚过程中得到了改进的准确性和可解释性,相比传统黑盒模型。具体来说,我们构建了两种不同的配置(即串行和平行)的TST基于混合模型,其NMSE在 $[10, 50]\times10^{-4}$ 之间,$R^2$ 值超过 0.99。随着数字双胞虫的普及,未来的听力基于混合模型将在化学制造中扮演关键的角色。

Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities

  • paper_url: http://arxiv.org/abs/2307.13565
  • repo_url: https://github.com/predopt/predopt-benchmarks
  • paper_authors: Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, Ferdinando Fioretto
  • for: 本研究は、机器学习中的决策专注式学习(DFL)的实现方法についての综观的评估を提供。
  • methods: 本研究使用了多种integraging machine learning和优化模型的技术,包括内置式学习、迭代式学习、迭代式优化、等。
  • results: 本研究提出了一个综合性的DFL方法分类系统,并进行了广泛的实验评估。结果显示,DFL方法可以在许多不确定性下的决策任务中实现更好的性能。
    Abstract Decision-focused learning (DFL) is an emerging paradigm in machine learning which trains a model to optimize decisions, integrating prediction and optimization in an end-to-end system. This paradigm holds the promise to revolutionize decision-making in many real-world applications which operate under uncertainty, where the estimation of unknown parameters within these decision models often becomes a substantial roadblock. This paper presents a comprehensive review of DFL. It provides an in-depth analysis of the various techniques devised to integrate machine learning and optimization models, introduces a taxonomy of DFL methods distinguished by their unique characteristics, and conducts an extensive empirical evaluation of these methods proposing suitable benchmark dataset and tasks for DFL. Finally, the study provides valuable insights into current and potential future avenues in DFL research.
    摘要 决策驱动学习(DFL)是一种emerging paradigm在机器学习领域,它将机器学习模型训练为优化决策,并将预测和优化 integrate into an end-to-end system。这种 paradigm 承诺可以 revolutionize 决策making 在uncertainty 环境中的应用,因为在这些决策模型中 estimate unknown parameters 时常常成为一个substantial roadblock。这篇文章提供了 DFL 的全面 Review,包括了不同技术的 integrate machine learning 和优化模型的分析,并提出了 DFL 方法的分类,以及适用于 DFL 的 Benchmark 数据集和任务。最后,文章还提供了有价值的 Insight 到当前和未来 DFL 研究的方向。

  • paper_url: http://arxiv.org/abs/2307.13548
  • repo_url: None
  • paper_authors: Oualid Zari, Javier Parra-Arnau, Ayşe Ünsal, Melek Önen
  • for: 攻击Graph Neural Networks(GNNs)中的隐私漏洞,泄露图形数据中的私有链接信息。
  • methods: 利用新节点加入图和API查询预测来研究隐私链接信息泄露的可能性,并提出防止隐私泄露的方法以保持模型实用性。
  • results: 对比现有方法,我们的攻击方法在推断链接信息方面表现出色,同时我们还分析了应用 differential privacy(DP)机制来mitigate攻击的影响,并研究了隐私保护和模型实用性之间的质量衡量。
    Abstract In this paper, we present a stealthy and effective attack that exposes privacy vulnerabilities in Graph Neural Networks (GNNs) by inferring private links within graph-structured data. Focusing on the inductive setting where new nodes join the graph and an API is used to query predictions, we investigate the potential leakage of private edge information. We also propose methods to preserve privacy while maintaining model utility. Our attack demonstrates superior performance in inferring the links compared to the state of the art. Furthermore, we examine the application of differential privacy (DP) mechanisms to mitigate the impact of our proposed attack, we analyze the trade-off between privacy preservation and model utility. Our work highlights the privacy vulnerabilities inherent in GNNs, underscoring the importance of developing robust privacy-preserving mechanisms for their application.
    摘要 在这篇论文中,我们提出了一种隐蔽和有效的攻击,暴露了图神经网络(GNN)中的隐私漏洞,通过推断图结构数据中的私人链接。我们在新节点加入图时的招呼设定下进行研究,并使用 API 来查询预测结果。我们发现了私人边信息泄露的可能性,并提出了保持隐私的方法,以保持模型的有用性。我们的攻击性能superior于现有的状态。此外,我们还研究了在应用 differential privacy(DP)机制时,如何减轻我们所提出的攻击的影响。我们分析了隐私保护和模型有用性之间的负担,我们的工作抛光了 GNN 中的隐私漏洞,强调了在其应用中发展Robust隐私保护机制的重要性。

Transfer Learning for Portfolio Optimization

  • paper_url: http://arxiv.org/abs/2307.13546
  • repo_url: None
  • paper_authors: Haoyang Cao, Haotian Gu, Xin Guo, Mathieu Rosenbaum
  • for: 本文探讨了通过传输学习技术解决金融 portefolio优化问题的可能性。
  • methods: 本文引入了一种新的概念 called “传输风险”,用于优化传输学习方法的优化框架。
  • results: numerical experiments 表明,传输风险与传输学习方法的总性表现之间存在强相关关系,表明传输风险为”可传输性”的可靠指标。
    Abstract In this work, we explore the possibility of utilizing transfer learning techniques to address the financial portfolio optimization problem. We introduce a novel concept called "transfer risk", within the optimization framework of transfer learning. A series of numerical experiments are conducted from three categories: cross-continent transfer, cross-sector transfer, and cross-frequency transfer. In particular, 1. a strong correlation between the transfer risk and the overall performance of transfer learning methods is established, underscoring the significance of transfer risk as a viable indicator of "transferability"; 2. transfer risk is shown to provide a computationally efficient way to identify appropriate source tasks in transfer learning, enhancing the efficiency and effectiveness of the transfer learning approach; 3. additionally, the numerical experiments offer valuable new insights for portfolio management across these different settings.
    摘要 在这项工作中,我们探讨了使用传输学习技术来解决金融股票优化问题的可能性。我们介绍了一种新的概念 called "传输风险",这个概念在传输学习优化框架中被引入。我们从三类实验中进行了数据分析:跨大陆传输、跨业传输和跨频传输。具体来说,我们发现:1. 传输风险和传输学习方法的总性性能之间存在强正相关关系,这确立了传输风险作为"传输可用性"的可靠指标的重要性。2. 传输风险可以提供一种计算效率高的方法来确定合适的源任务,从而提高传输学习方法的效率和效果。3. 数据分析还提供了有价值的新视角 для股票管理在不同设置下。Here's the translation in Traditional Chinese:在这个工作中,我们探讨了使用传递学习技术来解决金融股票优化问题的可能性。我们介绍了一个新的概念 called "传递风险",这个概念在传递学习优化框架中被引入。我们从三种类型的实验中进行了数据分析:跨大陆传递、跨业传递和跨频传递。具体来说,我们发现:1. 传递风险和传递学习方法的总性性能之间存在强正相关关系,这确立了传递风险作为"传递可用性"的可靠指标的重要性。2. 传递风险可以提供一种计算效率高的方法来决定合适的源任务,从而提高传递学习方法的效率和效果。3. 数据分析还提供了有价值的新视角 для股票管理在不同设置下。

A model for efficient dynamical ranking in networks

  • paper_url: http://arxiv.org/abs/2307.13544
  • repo_url: None
  • paper_authors: Andrea Della Vecchia, Kibidi Neocosmos, Daniel B. Larremore, Cristopher Moore, Caterina De Bacco
  • for: 这篇论文旨在提出一种物理启发的方法,用于在指定的时间网络中INFER动态排名。
  • methods: 该方法是解一个线性方程系统,只需要一个参数调整。
  • results: 经测试,该方法可以更好地预测交互(边的存在)和其结果(边的方向),在许多情况下表现比较出色。
    Abstract We present a physics-inspired method for inferring dynamic rankings in directed temporal networks - networks in which each directed and timestamped edge reflects the outcome and timing of a pairwise interaction. The inferred ranking of each node is real-valued and varies in time as each new edge, encoding an outcome like a win or loss, raises or lowers the node's estimated strength or prestige, as is often observed in real scenarios including sequences of games, tournaments, or interactions in animal hierarchies. Our method works by solving a linear system of equations and requires only one parameter to be tuned. As a result, the corresponding algorithm is scalable and efficient. We test our method by evaluating its ability to predict interactions (edges' existence) and their outcomes (edges' directions) in a variety of applications, including both synthetic and real data. Our analysis shows that in many cases our method's performance is better than existing methods for predicting dynamic rankings and interaction outcomes.
    摘要 我们提出一种物理启发的方法估算直接时间网络中的动态排名 - 直接时间网络中每个Directed和时间戳的边都表示对话的结果和时间,例如赢利或失败。我们的方法会解决一个线性方程系统,只需要一个参数调整,因此算法可扩展和高效。我们对多种应用进行测试,包括 sintetic 数据和实际数据,并发现我们的方法在许多情况下的性能比现有方法更高。

Model Calibration in Dense Classification with Adaptive Label Perturbation

  • paper_url: http://arxiv.org/abs/2307.13539
  • repo_url: https://github.com/carlisle-liu/aslp
  • paper_authors: Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes
  • for: 本研究旨在提高深度神经网络的准确率和信任度,以便在安全应用中使用。
  • methods: 本文提出了一种名为 Adaptive Stochastic Label Perturbation(ASLP)的方法,它可以学习每个训练图像的唯一标签杂化水平。ASLP使用的是我们提出的 Self-Calibrating Binary Cross Entropy(SC-BCE)损失函数,它将杂化过程、标签杂化和标签平滑等进程集成到一起,以达到更好的准确率和信任度。
  • results: 对比于传统的 dense binary classification 模型,ASLP 可以显著提高模型的准确率和信任度。在 known data 上保持 классификация精度作为保守解决方案,或者特定地改进模型的准确率和信任度。
    Abstract For safety-related applications, it is crucial to produce trustworthy deep neural networks whose prediction is associated with confidence that can represent the likelihood of correctness for subsequent decision-making. Existing dense binary classification models are prone to being over-confident. To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. ASLP employs our proposed Self-Calibrating Binary Cross Entropy (SC-BCE) loss, which unifies label perturbation processes including stochastic approaches (like DisturbLabel), and label smoothing, to correct calibration while maintaining classification rates. ASLP follows Maximum Entropy Inference of classic statistical mechanics to maximise prediction entropy with respect to missing information. It performs this while: (1) preserving classification accuracy on known data as a conservative solution, or (2) specifically improves model calibration degree by minimising the gap between the prediction accuracy and expected confidence of the target training label. Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.
    摘要
  1. 保持知道数据上的分类精度作为保守解决方案,或2. 特别是提高模型准确率和预期确信度之间的差异,以iminimize the gap between the prediction accuracy and expected confidence of the target training label.Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.

INFINITY: Neural Field Modeling for Reynolds-Averaged Navier-Stokes Equations

  • paper_url: http://arxiv.org/abs/2307.13538
  • repo_url: None
  • paper_authors: Louis Serrano, Leon Migus, Yuan Yin, Jocelyn Ahmed Mazari, Patrick Gallinari
  • for: 这篇论文的目的是提出一种基于深度学习的精准仿真模型,用于简化物理现象的计算。
  • methods: 该方法使用含义 Neil 表示(INRs)来地址这个挑战,将物理场景的几何信息和物理场景编码成紧凑的表示,然后学习这些表示之间的映射,以便从物理场景中推断物理场景。
  • results: 在一个空气foil设计优化问题中,该方法达到了当前最佳性能,准确地预测了物理场景中的场景和表面上的物理场景。此外,该方法还可以在设计探索和形状优化等应用中使用,并能够正确预测拖拽和升力系数。
    Abstract For numerical design, the development of efficient and accurate surrogate models is paramount. They allow us to approximate complex physical phenomena, thereby reducing the computational burden of direct numerical simulations. We propose INFINITY, a deep learning model that utilizes implicit neural representations (INRs) to address this challenge. Our framework encodes geometric information and physical fields into compact representations and learns a mapping between them to infer the physical fields. We use an airfoil design optimization problem as an example task and we evaluate our approach on the challenging AirfRANS dataset, which closely resembles real-world industrial use-cases. The experimental results demonstrate that our framework achieves state-of-the-art performance by accurately inferring physical fields throughout the volume and surface. Additionally we demonstrate its applicability in contexts such as design exploration and shape optimization: our model can correctly predict drag and lift coefficients while adhering to the equations.
    摘要 für numerische Entwurfsdesign ist die Entwicklung effizienter und genauer surrogatmodelle von entscheidender Bedeutung. Sie ermöglichen uns, komplexe physikalische Phänomene zu approximiieren, somit die computermonierte Last von direkten numerischen Simulationen zu reduzieren. Wir schlagen INFINITY vor, ein tiefes lernendes Modell, das impliciten neuronalen Darstellungen (INRs) nutzt, um diese Herausforderung zu meistern. Unser Framework kodiert geometrische Informationen und physikalische Felder in compacten Darstellungen und lernt eine Abbildung zwischen ihnen, um die physikalischen Felder zu infolgen. Wir verwenden ein airfoil-Design-Optimierungstask als Beispielaufgabe und bewerten unsere Methode am schwierigen AirfRANS-Datensatz, der eng mit realen industriellen Anwendungen verwandt ist. Die experimentellen Ergebnisse zeigen, dass unsere Methode einen neuen Standardsatz erreicht, indem sie physikalische Felder in Volumen und Oberfläche genau approximiiert. Wir demonstrieren ferner ihre Anwendbarkeit in Kontexten wie Design-Exploration und Form-Optimierung: unser Modell kann richtig druck- und liftkoeffizienten vorhersagen, ohne die Gleichungen zu verletzen.

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

  • paper_url: http://arxiv.org/abs/2307.13535
  • repo_url: None
  • paper_authors: Guanyi Wang, Mengqi Lou, Ashwin Pananjady
  • for: 本研究探讨了带有杂质模型的主成分分析问题,使用 union-of-subspace 模型捕捉信号结构。
  • methods: 本研究使用了统计和计算方法,并设立了基本的限制,证明了一种自然的投影力方法在特定情况下展现了局部 converges 性。
  • results: 研究发现,一些对于普通稀缺 PCA 的现象也适用于其结构化版本。
    Abstract We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on the geometry of the problem instance, and show that a natural projected power method exhibits local convergence to the statistically near-optimal neighborhood of the solution. We complement these results with end-to-end analyses of two important special cases given by path and tree sparsity in a general basis, showing initialization methods and matching evidence of computational hardness. Overall, our results indicate that several of the phenomena observed for vanilla sparse PCA extend in a natural fashion to its structured counterparts.
    摘要 我们研究一个主成分分析问题,其中信号结构是由一类联合子空间模型捕捉的。这个总类包括普通稀畴PCA以及其变体具有图稀畴。为了在统一的统计和计算镜头下研究这些问题,我们确立了基本的限制,并证明自然的投影力方法在统计上准确的邻居解决方案附近进行本地准确。我们补充了一些重要的特殊情况分析,包括路径和树稀畴在一般基础上,并提供了初始化方法和匹配证明的计算困难。总之,我们的结果表明,许多vanilla sparse PCA的现象在其结构化对应中也有自然的扩展。

Differentiable Turbulence II

  • paper_url: http://arxiv.org/abs/2307.13533
  • repo_url: None
  • paper_authors: Varun Shankar, Romit Maulik, Venkatasubramanian Viswanathan
  • for: This paper is written for developing data-driven models in computational fluid dynamics (CFD) using differentiable fluid simulators and machine learning (ML) methods.
  • methods: The paper proposes a framework for integrating deep learning models into a generic finite element numerical scheme for solving the Navier-Stokes equations, and applies the technique to learn a sub-grid scale closure using a multi-scale graph neural network.
  • results: The learned closure can achieve accuracy comparable to traditional large eddy simulation on a finer grid, resulting in an equivalent speedup of 10x. The method has been demonstrated on several realizations of flow over a backwards-facing step, testing on both unseen Reynolds numbers and new geometry.
    Abstract Differentiable fluid simulators are increasingly demonstrating value as useful tools for developing data-driven models in computational fluid dynamics (CFD). Differentiable turbulence, or the end-to-end training of machine learning (ML) models embedded in CFD solution algorithms, captures both the generalization power and limited upfront cost of physics-based simulations, and the flexibility and automated training of deep learning methods. We develop a framework for integrating deep learning models into a generic finite element numerical scheme for solving the Navier-Stokes equations, applying the technique to learn a sub-grid scale closure using a multi-scale graph neural network. We demonstrate the method on several realizations of flow over a backwards-facing step, testing on both unseen Reynolds numbers and new geometry. We show that the learned closure can achieve accuracy comparable to traditional large eddy simulation on a finer grid that amounts to an equivalent speedup of 10x. As the desire and need for cheaper CFD simulations grows, we see hybrid physics-ML methods as a path forward to be exploited in the near future.
    摘要 diferenciable 流体 simulator 在 Computational Fluid Dynamics (CFD) 中展示了越来越多的价值,作为数据驱动模型的有用工具。 diferenciable turbulence,也就是在 CFD 解决方案算法中嵌入机器学习 (ML) 模型的终端训练,捕捉了物理基础的通用力和初始成本的限制,以及深度学习方法的自动训练和灵活性。我们开发了一个抽象 Finite Element 数学模型的框架,将深度学习模型集成到 Navier-Stokes 方程中,并使用多尺度图 neural network 来学习子网格抑制。我们在不同的 Reynolds 数和新geometry 上进行了多个实现,并示出了学习 closure 可以与传统大 Eddy simulation 相当的精度相比,在一个等效的加速10倍的粗网上。随着 CFD simulation 的成本下降的需求和需求,我们认为 hybrid physics-ML 方法将在未来被利用。

Towards Long-Term predictions of Turbulence using Neural Operators

  • paper_url: http://arxiv.org/abs/2307.13517
  • repo_url: None
  • paper_authors: Fernando Gonzalez, François-Xavier Demoulin, Simon Bernard
  • for: 这 paper 探讨了使用神经操作符来预测湍流,主要关注于欧姆 neural 操作符(FNO)模型。它的目的是通过机器学习来开发减少的order/代理模型,以便为湍流计算 simulate 提供更好的估算。
  • methods: 这 paper 使用了不同的模型配置,包括 U-NET 结构(UNO 和 U-FNET),并发现这些结构在准确性和稳定性方面表现更好于标准 FNO。U-FNET 在更高的 Reynolds 数下预测湍流的能力更高。使用梯度和稳定性损失来保证模型的稳定和准确预测。
  • results: 这 paper 发现,使用不同的模型配置和梯度损失可以获得更好的预测结果。特别是,U-FNET 在更高的 Reynolds 数下预测湍流的能力更高。然而,为了更好地评估深度学习模型在液流预测中的性能,还需要开发更好的评价指标。
    Abstract This paper explores Neural Operators to predict turbulent flows, focusing on the Fourier Neural Operator (FNO) model. It aims to develop reduced-order/surrogate models for turbulent flow simulations using Machine Learning. Different model configurations are analyzed, with U-NET structures (UNO and U-FNET) performing better than the standard FNO in accuracy and stability. U-FNET excels in predicting turbulence at higher Reynolds numbers. Regularization terms, like gradient and stability losses, are essential for stable and accurate predictions. The study emphasizes the need for improved metrics for deep learning models in fluid flow prediction. Further research should focus on models handling complex flows and practical benchmarking metrics.
    摘要

An Empirical Study on Fairness Improvement with Multiple Protected Attributes

  • paper_url: http://arxiv.org/abs/2308.01923
  • repo_url: None
  • paper_authors: Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman
  • for: 本研究旨在探讨多个保护特征的公平改进策略的效果,以帮助更好地理解多特征公平改进策略的性能。
  • methods: 本研究使用了11种最新的公平改进方法,包括对多个保护特征进行公平改进。我们使用不同的数据集、度量和机器学习模型来分析这些方法的效果。
  • results: 研究发现,为一个保护特征进行公平改进可能会导致其他保护特征的公平性下降,这种下降的比例可达88.3%(平均为57.5%)。同时,我们发现在考虑多个保护特征时,精度和准确率的影响相对较小,但是 recall 的影响相对较大。这些结论有重要意义,因为现有的研究通常只报告精度作为机器学习性能指标,这并不充分。
    Abstract Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effectiveness of these methods with different datasets, metrics, and ML models when considering multiple protected attributes. The results reveal that improving fairness for a single protected attribute can largely decrease fairness regarding unconsidered protected attributes. This decrease is observed in up to 88.3% of scenarios (57.5% on average). More surprisingly, we find little difference in accuracy loss when considering single and multiple protected attributes, indicating that accuracy can be maintained in the multiple-attribute paradigm. However, the effect on precision and recall when handling multiple protected attributes is about 5 times and 8 times that of a single attribute. This has important implications for future fairness research: reporting only accuracy as the ML performance metric, which is currently common in the literature, is inadequate.
    摘要 现有研究主要是在单个保护属性上提高机器学习软件的公平性,但这是不切实际的,因为用户通常有多个保护属性。这篇论文进行了对多个保护属性的公平性提高方法的广泛研究,涵盖了11种现状最佳实践。我们对不同的数据集、 метри和机器学习模型进行了这些方法的分析,并评估了它们在考虑多个保护属性时的效果。结果表明,只考虑一个保护属性进行公平性提高可能会导致其他保护属性的不公平性减少,这种减少率在88.3%的场景中(57.5%的平均值)被观察到。更有趣的是,考虑单个和多个保护属性时,准确性损失的差异很小,这表示在多属性情况下,准确性可以维持。然而,处理多个保护属性时的精度和回归的影响相对较大,大约是单个属性的5倍和8倍。这有重要的意义,未来公平性研究应该不再仅仅是通过准确性来评估机器学习软件的性能。

Continuous Time Evidential Distributions for Irregular Time Series

  • paper_url: http://arxiv.org/abs/2307.13503
  • repo_url: https://github.com/twkillian/edict
  • paper_authors: Taylor W. Killian, Haoran Zhang, Thomas Hartvigsen, Ava P. Amini
  • for: 这篇论文是用于描述一种为健康照顾等实际应用场景中的不规则时间序列进行预测的方法。
  • methods: 这篇论文使用了一种名为EDICT的策略,它 learns一个证据分布来描述不规则时间序列。这个分布可以在不同的时间点进行不同的推论,并且可以在缺乏观察的情况下提供更好的uncertainty estimation。
  • results: 这篇论文的结果显示,EDICT可以在具有挑战性的时间序列分类任务中实现竞争性的表现,并且可以在缺乏观察的情况下提供更好的uncertainty-guided推论。
    Abstract Prevalent in many real-world settings such as healthcare, irregular time series are challenging to formulate predictions from. It is difficult to infer the value of a feature at any given time when observations are sporadic, as it could take on a range of values depending on when it was last observed. To characterize this uncertainty we present EDICT, a strategy that learns an evidential distribution over irregular time series in continuous time. This distribution enables well-calibrated and flexible inference of partially observed features at any time of interest, while expanding uncertainty temporally for sparse, irregular observations. We demonstrate that EDICT attains competitive performance on challenging time series classification tasks and enabling uncertainty-guided inference when encountering noisy data.
    摘要 广泛存在在现实世界中的应用场景,如医疗、财经等,不规则时间序列是预测的挑战。因为观察是间歇的,特征值的推测是具有uncertainty的,可能在不同的时间点 prendre on a range of values。为了捕捉这种uncertainty,我们提出了EDICT策略,它在连续时间中学习不规则时间序列的证据分布。这种分布允许在任何时间点进行高度抽象和灵活的特征值推测,同时在笼统观察中扩展uncertainty。我们示例了EDICT在具有噪声数据的时间序列分类任务中的竞争性表现和uncertainty导航能力。

Deep Reinforcement Learning for Robust Goal-Based Wealth Management

  • paper_url: http://arxiv.org/abs/2307.13501
  • repo_url: None
  • paper_authors: Tessa Bauman, Bruno Gašperov, Stjepan Begušić, Zvonko Kostanjčar
  • for: 这个研究旨在提出一种基于深度强化学习的 Robust Goal-Based Wealth Management 方法,以便实现特定金融目标。
  • methods: 本研究使用了深度强化学习技术来估算投资选择,并通过训练一个对应投资策略的神经网络来实现目标。
  • results: 实验结果显示,该方法比较多的目标评估和投资策略选择方法来得到更好的效果,并在实际市场数据上显示出优越性。
    Abstract Goal-based investing is an approach to wealth management that prioritizes achieving specific financial goals. It is naturally formulated as a sequential decision-making problem as it requires choosing the appropriate investment until a goal is achieved. Consequently, reinforcement learning, a machine learning technique appropriate for sequential decision-making, offers a promising path for optimizing these investment strategies. In this paper, a novel approach for robust goal-based wealth management based on deep reinforcement learning is proposed. The experimental results indicate its superiority over several goal-based wealth management benchmarks on both simulated and historical market data.
    摘要 目的基本投资是一种财务管理方法,强调达到特定的金融目标。这是一个顺序决策问题,因为需要选择适当的投资直到达到目标。因此,深度回归学习,一种适合顺序决策的机器学习技术,对于优化这些投资策略表现出了承诺。在这篇论文中,一种基于深度回归学习的新方法 дляrobust目的基本财务管理被提议。实验结果表明,这种方法在模拟和历史市场数据上都超过了多个目的基本财务管理参考标准。

Finding Money Launderers Using Heterogeneous Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.13499
  • repo_url: https://github.com/fredjo89/heterogeneous-mpnn
  • paper_authors: Fredrik Johannessen, Martin Jullum
  • for: 本研究旨在提高银行电子监测系统的检测犯罪吸收能力,使用机器学习方法对大规模不同类型数据图进行分析。
  • methods: 本研究使用图神经网络(GNN)方法,对实际世界银行交易和企业角色数据构建的大型不同类型数据图进行分析。特别是,我们对现有的同质GNN方法(Message Passing Neural Network,MPNN)进行扩展,使其在不同类型数据图上有效运行。我们还提出了一种新的消息汇聚方法,以便在不同边的消息之间进行效果地汇聚。
  • results: 我们的模型实现了在大规模不同类型数据图上对犯罪吸收进行有效检测,提高了银行电子监测系统的检测精度。这是首次将GNN应用于实际世界大规模不同类型数据图进行反洗钱检测,我们的研究成果具有广泛的应用前景。
    Abstract Current anti-money laundering (AML) systems, predominantly rule-based, exhibit notable shortcomings in efficiently and precisely detecting instances of money laundering. As a result, there has been a recent surge toward exploring alternative approaches, particularly those utilizing machine learning. Since criminals often collaborate in their money laundering endeavors, accounting for diverse types of customer relations and links becomes crucial. In line with this, the present paper introduces a graph neural network (GNN) approach to identify money laundering activities within a large heterogeneous network constructed from real-world bank transactions and business role data belonging to DNB, Norway's largest bank. Specifically, we extend the homogeneous GNN method known as the Message Passing Neural Network (MPNN) to operate effectively on a heterogeneous graph. As part of this procedure, we propose a novel method for aggregating messages across different edges of the graph. Our findings highlight the importance of using an appropriate GNN architecture when combining information in heterogeneous graphs. The performance results of our model demonstrate great potential in enhancing the quality of electronic surveillance systems employed by banks to detect instances of money laundering. To the best of our knowledge, this is the first published work applying GNN on a large real-world heterogeneous network for anti-money laundering purposes.
    摘要 现有的反贩卖财 (AML) 系统,主要基于规则,显示出明显的缺陷,无法有效地和精准地检测贩卖财活动。因此,有一些最新的研究尝试使用机器学习方法。由于别派犯罪分子通常会合作,在检测贩卖财活动时,考虑到不同类型的客户关系和链接变得非常重要。针对这一点,本文提出了一种基于图神经网络 (GNN) 的方法,用于在大规模不同类型图中检测贩卖财活动。具体来说,我们将已知的同类GNN方法——消息传递神经网络 (MPNN)——修改以适应不同类型图。在这个过程中,我们提出了一种新的消息汇聚方法,用于在不同的图边缘上进行消息汇聚。我们的发现表明,在组合不同类型图的情况下,使用适当的 GNN 建筑可以提高电子监测系统的检测贩卖财活动质量。我们知道,这是首次在实际世界上大规模不同类型图上应用 GNN 的反贩卖财研究。

Zshot: An Open-source Framework for Zero-Shot Named Entity Recognition and Relation Extraction

  • paper_url: http://arxiv.org/abs/2307.13497
  • repo_url: None
  • paper_authors: Gabriele Picco, Marcos Martínez Galindo, Alberto Purpura, Leopold Fuchs, Vanessa López, Hoang Thanh Lam
  • For: The paper is written for researchers and industry professionals who are interested in zero-shot learning (ZSL) and its applications in natural language processing (NLP).* Methods: The paper proposes a novel ZSL framework called Zshot, which aims to address the challenges of ZSL by providing a platform for comparing different state-of-the-art ZSL methods with standard benchmark datasets. The framework also includes readily available APIs for production under the standard SpaCy NLP pipeline, and it is designed to be extendible and evaluable.* Results: The paper does not provide specific results, but it aims to provide a platform for comparing different ZSL methods and evaluating their performance on standard benchmark datasets. The authors also include numerous enhancements such as pipeline ensembling and visualization utilities available as a SpaCy extension.
    Abstract The Zero-Shot Learning (ZSL) task pertains to the identification of entities or relations in texts that were not seen during training. ZSL has emerged as a critical research area due to the scarcity of labeled data in specific domains, and its applications have grown significantly in recent years. With the advent of large pretrained language models, several novel methods have been proposed, resulting in substantial improvements in ZSL performance. There is a growing demand, both in the research community and industry, for a comprehensive ZSL framework that facilitates the development and accessibility of the latest methods and pretrained models.In this study, we propose a novel ZSL framework called Zshot that aims to address the aforementioned challenges. Our primary objective is to provide a platform that allows researchers to compare different state-of-the-art ZSL methods with standard benchmark datasets. Additionally, we have designed our framework to support the industry with readily available APIs for production under the standard SpaCy NLP pipeline. Our API is extendible and evaluable, moreover, we include numerous enhancements such as boosting the accuracy with pipeline ensembling and visualization utilities available as a SpaCy extension.
    摘要 zero-shot learning (ZSL) 任务是指在训练过程中未看过的文本中预测实体或关系。 ZSL 已成为一个重要的研究领域,因为特定领域的标注数据稀缺,而其应用也在过去几年内有所增长。随着大型预训语言模型的出现,许多新的方法被提出,导致 ZSL 性能得到了显著提高。在研究 сообществе和industry 中,有一个增长的需求,即开发一个通用的 ZSL 框架,以便开发和访问最新的方法和预训模型。在这项研究中,我们提出了一个新的 ZSL 框架,称为 Zshot。我们的主要目标是提供一个平台,允许研究人员比较不同的状态艺术 ZSL 方法,并使用标准的 benchmark 数据集进行比较。此外,我们设计了我们的框架,以支持产业,并提供了可靠的 SpaCy NLP 管道中的 API。我们的 API 可扩展和评估,并且包括了多种改进,例如将管道 ensemble 提高准确性,以及可用于 SpaCy 扩展的可视化工具。

Duet: efficient and scalable hybriD neUral rElation undersTanding

  • paper_url: http://arxiv.org/abs/2307.13494
  • repo_url: https://github.com/GIS-PuppetMaster/Duet
  • paper_authors: Kaixin Zhang, Hongzhi Wang, Yabin Lu, Ziqi Li, Chang Shu, Yu Yan, Donghua Yang
  • for: 本研究旨在解决learned cardinality estimation方法中的数据和工作负荷飘移问题,以及高纬度和高维度表中Cardinality estimator的应用问题。
  • methods: 本文提出了一种基于预测模型的hybrid方法,named Duet,它可以直接估计 cardinality без采样或任何非�ifferentiable过程,并且可以提高高纬度和高维度表中Cardinality estimator的准确性。
  • results: 实验结果表明,Duet可以实现所有设计目标,并且在CPU上比较多学方法更加实用,甚至在GPU上也具有较低的推理成本。
    Abstract Learned cardinality estimation methods have achieved high precision compared to traditional methods. Among learned methods, query-driven approaches face the data and workload drift problem for a long time. Although both query-driven and hybrid methods are proposed to avoid this problem, even the state-of-the-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high-dimensional tables, which seriously affects the practical application of learned cardinality estimators. In this paper, we prove that most of these problems are directly caused by the widely used progressive sampling. We solve this problem by introducing predicates information into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduces the inference complexity from O(n) to O(1) compared to Naru and UAE but also achieve higher accuracy on high cardinality and high-dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.
    摘要 现有学习Cardinality estimation方法已经实现了高精度比 tradicional方法。 Among these learned methods, query-driven approaches have faced the data and workload drift problem for a long time. Although both query-driven and hybrid methods have been proposed to avoid this problem, even the state-of-the-art of them suffer from high training and estimation costs, limited scalability, instability, and long-tailed distribution problem on high cardinality and high-dimensional tables, which seriously affects the practical application of learned cardinality estimators.在这篇论文中,我们证明了大多数这些问题是由广泛使用进度 sampling 所导致的。 We solve this problem by introducing predicates information into the autoregressive model and propose Duet, a stable, efficient, and scalable hybrid method to estimate cardinality directly without sampling or any non-differentiable process, which can not only reduce the inference complexity from O(n) to O(1) compared to Naru and UAE but also achieve higher accuracy on high cardinality and high-dimensional tables. Experimental results show that Duet can achieve all the design goals above and be much more practical and even has a lower inference cost on CPU than that of most learned methods on GPU.

ECG classification using Deep CNN and Gramian Angular Field

  • paper_url: http://arxiv.org/abs/2308.02395
  • repo_url: None
  • paper_authors: Youssef Elmir, Yassine Himeur, Abbes Amira
  • for: 这个研究提供了一种新的ECG信号分析方法,用于心血管疾病诊断和异常检测。
  • methods: 该方法基于将时域1D вектор转换为2D图像使用 Gramian Angular Field transform,并使用卷积神经网络(CNN)进行分类。
  • results: 实验结果显示,提出的方法可以达到97.47%和98.65%的分类精度,并且可以辨别和可视化ECG信号中的时间特征,如心率、心音和信号形态变化,这些变化可能不可见于原始信号中。
    Abstract This paper study provides a novel contribution to the field of signal processing and DL for ECG signal analysis by introducing a new feature representation method for ECG signals. The proposed method is based on transforming time frequency 1D vectors into 2D images using Gramian Angular Field transform. Moving on, the classification of the transformed ECG signals is performed using Convolutional Neural Networks (CNN). The obtained results show a classification accuracy of 97.47% and 98.65% for anomaly detection. Accordingly, in addition to improving the classification performance compared to the state-of-the-art, the feature representation helps identify and visualize temporal patterns in the ECG signal, such as changes in heart rate, rhythm, and morphology, which may not be apparent in the original signal. This has significant implications in the diagnosis and treatment of cardiovascular diseases and detection of anomalies.
    摘要

Rational kernel-based interpolation for complex-valued frequency response functions

  • paper_url: http://arxiv.org/abs/2307.13484
  • repo_url: https://github.com/stk-kriging/complex-rational-interpolation
  • paper_authors: Julien Bect, Niklas Georg, Ulrich Römer, Sebastian Schöps
  • for: 这个论文关注了使用kernel方法估计复杂数值函数的问题,特别是在频域中的频谱响应函数。
  • methods: 这篇论文使用了kernel方法,但标准kernel并不perform well。作者引入了新的复杂数值函数抽象空间,并将问题转化为最小二乘问题在这些空间中。此外,作者还结合了一个低阶racional函数,其阶数由一个新的模型选择 criterion 来动态选择。
  • results: 作者的方法在不同领域的实例中进行了数值实验,包括电磁学和声学实例。比较 Result 与可用的racionalapproximation方法,这种方法的性能很高。
    Abstract This work is concerned with the kernel-based approximation of a complex-valued function from data, where the frequency response function of a partial differential equation in the frequency domain is of particular interest. In this setting, kernel methods are employed more and more frequently, however, standard kernels do not perform well. Moreover, the role and mathematical implications of the underlying pair of kernels, which arises naturally in the complex-valued case, remain to be addressed. We introduce new reproducing kernel Hilbert spaces of complex-valued functions, and formulate the problem of complex-valued interpolation with a kernel pair as minimum norm interpolation in these spaces. Moreover, we combine the interpolant with a low-order rational function, where the order is adaptively selected based on a new model selection criterion. Numerical results on examples from different fields, including electromagnetics and acoustic examples, illustrate the performance of the method, also in comparison to available rational approximation methods.
    摘要

Combinatorial Auctions and Graph Neural Networks for Local Energy Flexibility Markets

  • paper_url: http://arxiv.org/abs/2307.13470
  • repo_url: None
  • paper_authors: Awadelrahman M. A. Ahmed, Frank Eliassen, Yan Zhang
  • for: 本研究提出了一个新的 combinatorial 拍卖框架,用于地方能源灵活性市场,以解决潜在参与者无法组合多个灵活时间间隔的问题。
  • methods: 本研究使用了简单 yet powerful 三元图表示法和图生物学网络模型来解决背景NP-完备的胜出决定问题。
  • results: 模型实现了与商业解决方案相对的优化值差不 біль于5%,并且显示了线性推论时间复杂性,与商业解决方案的指数复杂性相比。
    Abstract This paper proposes a new combinatorial auction framework for local energy flexibility markets, which addresses the issue of prosumers' inability to bundle multiple flexibility time intervals. To solve the underlying NP-complete winner determination problems, we present a simple yet powerful heterogeneous tri-partite graph representation and design graph neural network-based models. Our models achieve an average optimal value deviation of less than 5\% from an off-the-shelf optimization tool and show linear inference time complexity compared to the exponential complexity of the commercial solver. Contributions and results demonstrate the potential of using machine learning to efficiently allocate energy flexibility resources in local markets and solving optimization problems in general.
    摘要 translate to Simplified Chinese as follows:这篇论文提出了一种新的 combinatorial 拍卖框架,用于本地能源灵活性市场,解决了潜在用户无法组合多个灵活时间间隔的问题。为解决这个下面NP完备的赢家决定问题,我们提出了一种简单强大的三元 Graph 表示和图 neural network 模型。我们的模型在与 commercial 优化工具的比较中,实现了 Less than 5% 的最佳值偏差,并且显示了与商业解决方案的线性推理时间复杂度。研讨和结果表明,使用机器学习可以有效地分配本地能源灵活性资源,并在总体上解决优化问题。

Gaussian Graph with Prototypical Contrastive Learning in E-Commerce Bundle Recommendation

  • paper_url: http://arxiv.org/abs/2307.13468
  • repo_url: None
  • paper_authors: Zhao-Yang Liu, Liucheng Sun, Chenwei Weng, Qijin Chen, Chengfu Huo
  • for: 提高电商平台上的bundle推荐的精度和效果,解决实际推荐场景中的uncertainty问题。
  • methods: 提出了一种新的 Gaussian Graph with Prototypical Contrastive Learning (GPCL)框架,使用Gaussian分布代替固定向量,并设计了一种prototypical contrastive learning模块来捕捉 Contextual信息,缓解采样偏见问题。
  • results: 经验表明,GPCL在多个公共数据集上达到了新的州OF-the-art性能水平,并在真实的电商平台上实现了显著的提升。
    Abstract Bundle recommendation aims to provide a bundle of items to satisfy the user preference on e-commerce platform. Existing successful solutions are based on the contrastive graph learning paradigm where graph neural networks (GNNs) are employed to learn representations from user-level and bundle-level graph views with a contrastive learning module to enhance the cooperative association between different views. Nevertheless, they ignore the uncertainty issue which has a significant impact in real bundle recommendation scenarios due to the lack of discriminative information caused by highly sparsity or diversity. We further suggest that their instancewise contrastive learning fails to distinguish the semantically similar negatives (i.e., sampling bias issue), resulting in performance degradation. In this paper, we propose a novel Gaussian Graph with Prototypical Contrastive Learning (GPCL) framework to overcome these challenges. In particular, GPCL embeds each user/bundle/item as a Gaussian distribution rather than a fixed vector. We further design a prototypical contrastive learning module to capture the contextual information and mitigate the sampling bias issue. Extensive experiments demonstrate that benefiting from the proposed components, we achieve new state-of-the-art performance compared to previous methods on several public datasets. Moreover, GPCL has been deployed on real-world e-commerce platform and achieved substantial improvements.
    摘要 电商平台上的Bundle推荐 aimsto提供一个Bundle的item来满足用户的首选。现有的成功解决方案基于对冲raph学习 paradigm,使用граф neural networks (GNNs)来学习用户和Bundle的表示,并通过对冲学习模块来增强不同视图之间的合作关系。然而,它们忽略了uncertainty问题,这有着很大的影响在实际的Bundle推荐场景中,因为缺乏特征信息引起的高稀疏或多样性。我们还指出,它们的Instancewise对冲学习无法分辨semantic Similarity的负样本(即采样偏见问题),导致性能下降。在这篇论文中,我们提出了一种novel Gaussian Graph with Prototypical Contrastive Learning (GPCL)框架,以解决这些挑战。具体来说,GPCL将每个用户/Bundle/itemembed为 Gaussian Distribution而不是固定的 вектор。我们还设计了一个prototypical对冲学习模块,以捕捉上下文信息并缓解采样偏见问题。我们的实验证明,由于我们提出的组件,我们在多个公共数据集上达到了新的状态态performanced比之前的方法。此外,GPCL已经部署在实际的电商平台上,并实现了显著的改善。Note: "izin" is a marker in Simplified Chinese to indicate that the following text is a translation of a foreign language text.

Integrating processed-based models and machine learning for crop yield prediction

  • paper_url: http://arxiv.org/abs/2307.13466
  • repo_url: None
  • paper_authors: Michiel G. J. Kallenberg, Bernardo Maestrini, Ron van Bree, Paul Ravensbergen, Christos Pylianidis, Frits van Evert, Ioannis N. Athanasiadis
  • for: 预测哈比甜菜的产量
  • methods: 使用гибрид元模型方法,结合理论驱动的植物生长模型和数据驱动的神经网络
  • results: 在silico和实际场景中,元模型方法比基eline方法更好地预测哈比甜菜的产量,但需要进一步的 validate和优化以确定实际效果。
    Abstract Crop yield prediction typically involves the utilization of either theory-driven process-based crop growth models, which have proven to be difficult to calibrate for local conditions, or data-driven machine learning methods, which are known to require large datasets. In this work we investigate potato yield prediction using a hybrid meta-modeling approach. A crop growth model is employed to generate synthetic data for (pre)training a convolutional neural net, which is then fine-tuned with observational data. When applied in silico, our meta-modeling approach yields better predictions than a baseline comprising a purely data-driven approach. When tested on real-world data from field trials (n=303) and commercial fields (n=77), the meta-modeling approach yields competitive results with respect to the crop growth model. In the latter set, however, both models perform worse than a simple linear regression with a hand-picked feature set and dedicated preprocessing designed by domain experts. Our findings indicate the potential of meta-modeling for accurate crop yield prediction; however, further advancements and validation using extensive real-world datasets is recommended to solidify its practical effectiveness.
    摘要 通常,耐作预测通过使用理论驱动的过程基于植物生长模型或数据驱动的机器学习方法进行实现。这两种方法都有其缺点,其中一种是难以适应当地区条件,另一种是需要大量数据。在这种情况下,我们研究了使用半结构化模型的hybrid meta-modeling方法来预测芋头收获。我们使用植物生长模型生成了合成数据,然后使用卷积神经网络进行预测,并且通过观察数据进行精度调整。在silico中应用的meta-modeling方法比基准情况下的纯数据驱动方法更好。在实际场景中,我们对303个试验场和77个商业场的数据进行测试,并发现meta-modeling方法和植物生长模型在这些场景中具有竞争力。然而,在这些场景中,一个简单的直线回归模型和专业人员设计的特定预处理和特征集合得到了更好的表现。我们的发现表明meta-modeling方法在准确预测耐作收获方面存在潜力,但是进一步的发展和验证使用广泛的实际数据是必要的,以固化其实际效果。

Fundamental causal bounds of quantum random access memories

  • paper_url: http://arxiv.org/abs/2307.13460
  • repo_url: None
  • paper_authors: Yunfei Wang, Yuri Alexeev, Liang Jiang, Frederic T. Chong, Junyu Liu
  • for: This paper explores the fundamental limits of rapid quantum memories in quantum computing applications, particularly in the context of hybrid quantum acoustic systems.
  • methods: The paper employs relativistic quantum field theory and Lieb-Robinson bounds to critically examine the causality constraints of quantum memories and their impact on quantum computing performance.
  • results: The paper shows that the number of logical qubits that can be accommodated in a QRAM design can be scaled up to $\mathcal{O}(10^7)$ in 1 dimension, $\mathcal{O}(10^{15})$ to $\mathcal{O}(10^{20})$ in various 2D architectures, and $\mathcal{O}(10^{24})$ in 3 dimensions, subject to the causality bound. These findings have important implications for the long-term performance of quantum computing applications in data science.
    Abstract Quantum devices should operate in adherence to quantum physics principles. Quantum random access memory (QRAM), a fundamental component of many essential quantum algorithms for tasks such as linear algebra, data search, and machine learning, is often proposed to offer $\mathcal{O}(\log N)$ circuit depth for $\mathcal{O}(N)$ data size, given $N$ qubits. However, this claim appears to breach the principle of relativity when dealing with a large number of qubits in quantum materials interacting locally. In our study we critically explore the intrinsic bounds of rapid quantum memories based on causality, employing the relativistic quantum field theory and Lieb-Robinson bounds in quantum many-body systems. In this paper, we consider a hardware-efficient QRAM design in hybrid quantum acoustic systems. Assuming clock cycle times of approximately $10^{-3}$ seconds and a lattice spacing of about 1 micrometer, we show that QRAM can accommodate up to $\mathcal{O}(10^7)$ logical qubits in 1 dimension, $\mathcal{O}(10^{15})$ to $\mathcal{O}(10^{20})$ in various 2D architectures, and $\mathcal{O}(10^{24})$ in 3 dimensions. We contend that this causality bound broadly applies to other quantum hardware systems. Our findings highlight the impact of fundamental quantum physics constraints on the long-term performance of quantum computing applications in data science and suggest potential quantum memory designs for performance enhancement.
    摘要 量子设备应遵循量子物理原理运行。量子随机访问存储器(QRAM),许多关键量子算法中的基本组件,通常被提议可以提供 $\mathcal{O}(\log N)$ 圈深度,对于 $\mathcal{O}(N)$ 数据大小, givent $N$ qubits。然而,这个宣称似乎违反了 relativity 原理,当处理大量的 qubits 在量子材料中互动时。在我们的研究中,我们 kritisch 探讨了快速量子存储器的内在 bound,基于 causality,使用量子场论和 Lieb-Robinson bound 在量子多体系统中。在这篇文章中,我们考虑了硬件高效的 QRAM 设计,在半导体量子声学系统中。假设clock cycle 时间约为 $10^{-3}$ 秒,格子间距约为 1 微米,我们显示了 QRAM 可以容纳 $\mathcal{O}(10^7)$ 逻辑 qubits 在1维度,$\mathcal{O}(10^{15})$ 到 $\mathcal{O}(10^{20})$ 在不同的 2D 架构中,以及 $\mathcal{O}(10^{24})$ 在 3 维度。我们认为这种 causality bound 广泛适用于其他量子硬件系统。我们的发现指出了量子计算应用中长期表现的基本量子物理约束,并提出了可能的量子存储器设计来提高性能。

A behavioural transformer for effective collaboration between a robot and a non-stationary human

  • paper_url: http://arxiv.org/abs/2307.13447
  • repo_url: None
  • paper_authors: Ruaridh Mon-Williams, Theodoros Stouraitis, Sethu Vijayakumar
  • for: 本研究旨在解决人机合作中人类行为变化带来的非站立性问题,提高机器人的预测能力以适应新的人类代理人。
  • methods: 本研究提出了一种原则式的meta学框架,并基于这个框架开发了Behavior-Transform(BeTrans)。BeTrans是一种可适应新人类代理人的 conditional transformer,可以快速适应新的人类行为变化。
  • results: 通过在 simulated human agents 上进行训练,我们发现BeTrans在合作设置下与不同系统偏见的人类代理人协作得非常好,并且比SOTA技术更快地适应新的人类行为变化。
    Abstract A key challenge in human-robot collaboration is the non-stationarity created by humans due to changes in their behaviour. This alters environmental transitions and hinders human-robot collaboration. We propose a principled meta-learning framework to explore how robots could better predict human behaviour, and thereby deal with issues of non-stationarity. On the basis of this framework, we developed Behaviour-Transform (BeTrans). BeTrans is a conditional transformer that enables a robot agent to adapt quickly to new human agents with non-stationary behaviours, due to its notable performance with sequential data. We trained BeTrans on simulated human agents with different systematic biases in collaborative settings. We used an original customisable environment to show that BeTrans effectively collaborates with simulated human agents and adapts faster to non-stationary simulated human agents than SOTA techniques.
    摘要 人机合作中的一大挑战是由人类行为引起的非站点性,这会导致环境变化和人机合作困难。我们提出了一种原则正的meta学框架,以便机器人更好地预测人类行为,从而更好地处理非站点性问题。基于这个框架,我们开发了Behaviour-Transform(BeTrans)。BeTrans是一种 Conditional Transformer,它允许机器人代理人类快速适应新的人类行为,并且在序列数据上表现出色。我们在 simulate human agents with different systematic biases in collaborative settings 中训练了 BeTrans,并用自定义环境示出了它在与 simulate human agents 合作中的有效性,并且更快地适应非站点性 simulate human agents than SOTA技术。

Network Traffic Classification based on Single Flow Time Series Analysis

  • paper_url: http://arxiv.org/abs/2307.13434
  • repo_url: https://github.com/koumajos/classificationbasedonsfts
  • paper_authors: Josef Koumar, Karel Hynek, Tomáš Čejka
  • for: 用于分析加密网络通信的现场挑战
  • methods: 基于时间序列分析单流时间序列(包括每个 packet 的字节数和时间戳),提出69种统一特征
  • results: 在15种公共可用的数据集上进行了多种网络流量分类任务的评估,表明提议的特征向量可以达到与相关工作相当或更好的分类性能,在超过一半的评估任务中,分类性能提高了5%以上。
    Abstract Network traffic monitoring using IP flows is used to handle the current challenge of analyzing encrypted network communication. Nevertheless, the packet aggregation into flow records naturally causes information loss; therefore, this paper proposes a novel flow extension for traffic features based on the time series analysis of the Single Flow Time series, i.e., a time series created by the number of bytes in each packet and its timestamp. We propose 69 universal features based on the statistical analysis of data points, time domain analysis, packet distribution within the flow timespan, time series behavior, and frequency domain analysis. We have demonstrated the usability and universality of the proposed feature vector for various network traffic classification tasks using 15 well-known publicly available datasets. Our evaluation shows that the novel feature vector achieves classification performance similar or better than related works on both binary and multiclass classification tasks. In more than half of the evaluated tasks, the classification performance increased by up to 5\%.
    摘要 网络流量监测使用流量记录来处理现在的挑战,即分析加密网络通信。然而,流量聚合到记录中自然导致信息损失,因此这篇论文提议一种新的流量扩展 для交通特征基于单流时间序列分析,即每个包的字节数和时间戳创建的时间序列。我们提出69个统一特征,包括数据点统计分析、时间域分析、流时间范围内包分布、时间序列行为和频域分析。我们通过使用15个公共可用的数据集来证明提议的特征向量的可用性和通用性,并在 binary 和多类分类任务中达到类似或更好的性能。在评估中,在超过半个评估任务中,分类性能提高5%以上。

Achieving Linear Speedup in Decentralized Stochastic Compositional Minimax Optimization

  • paper_url: http://arxiv.org/abs/2307.13430
  • repo_url: None
  • paper_authors: Hongchang Gao
  • for: 本研究的目的是提出一种基于分布式数据的各自练习作业的分布式compositional minimax问题的解决方案,以便在分布式设置下优化这类问题。
  • methods: 我们提出了一种基于均匀采样和动量的分布式Stochastic Compositional Gradient Descent Ascent算法,用于降低内层函数的共识错误。
  • results: 我们的 teoría results 表明,该算法可以实现线性增速,即与工作者数量 linearly 相关。 Additionally, we applied our method to the imbalanced classification problem and obtained extensive experimental results, which demonstrate the effectiveness of our algorithm.
    Abstract The stochastic compositional minimax problem has attracted a surge of attention in recent years since it covers many emerging machine learning models. Meanwhile, due to the emergence of distributed data, optimizing this kind of problem under the decentralized setting becomes badly needed. However, the compositional structure in the loss function brings unique challenges to designing efficient decentralized optimization algorithms. In particular, our study shows that the standard gossip communication strategy cannot achieve linear speedup for decentralized compositional minimax problems due to the large consensus error about the inner-level function. To address this issue, we developed a novel decentralized stochastic compositional gradient descent ascent with momentum algorithm to reduce the consensus error in the inner-level function. As such, our theoretical results demonstrate that it is able to achieve linear speedup with respect to the number of workers. We believe this novel algorithmic design could benefit the development of decentralized compositional optimization. Finally, we applied our methods to the imbalanced classification problem. The extensive experimental results provide evidence for the effectiveness of our algorithm.
    摘要 “ Stochastic compositional minimax problem 在 recent years 已经吸引了许多关注,因为它涵盖了许多emerging machine learning models。然而,由于分布式数据的出现,对这种问题的分布式优化成为了非常需要。然而,compositional structure 在损失函数中带来了独特的挑战,对于设计高效的分布式优化算法。具体来说,我们的研究显示,标准的gossip Communication Strategy 不能实现linear speedup для分布式compositional minimax problem,因为内部函数的大调和错误。为了解决这个问题,我们开发了一个新的分布式随机compositional gradient descent ascent with momentum algorithm,以减少内部函数的调和错误。因此,我们的理论结果显示,它能够实现linear speedup 与 respect to the number of workers。我们认为这个新的算法设计可以帮助分布式compositional optimization的发展。 finally,我们将我们的方法应用到不对称类别问题。广泛的实验结果为我们的算法的有效性提供了证据。”

A signal processing interpretation of noise-reduction convolutional neural networks

  • paper_url: http://arxiv.org/abs/2307.13425
  • repo_url: None
  • paper_authors: Luis A. Zavala-Mondragón, Peter H. N. de With, Fons van der Sommen
  • for: 本文旨在为数据驱动降噪和深度学习算法中的Encoding-decoding CNNs提供理论基础,以便更好地理解这些架构的内部工作机制。
  • methods: 本文使用了深度卷积架构,并提出了一种基于深度学习和信号处理的理论框架,用于解释Encoding-decoding CNNs的内部工作机制。
  • results: 本文通过 connecting basic principles from signal processing to the field of deep learning, 提供了一种可以用于设计robust和高效的新型Encoding-decoding CNNs架构的有效指导。
    Abstract Encoding-decoding CNNs play a central role in data-driven noise reduction and can be found within numerous deep-learning algorithms. However, the development of these CNN architectures is often done in ad-hoc fashion and theoretical underpinnings for important design choices is generally lacking. Up to this moment there are different existing relevant works that strive to explain the internal operation of these CNNs. Still, these ideas are either scattered and/or may require significant expertise to be accessible for a bigger audience. In order to open up this exciting field, this article builds intuition on the theory of deep convolutional framelets and explains diverse ED CNN architectures in a unified theoretical framework. By connecting basic principles from signal processing to the field of deep learning, this self-contained material offers significant guidance for designing robust and efficient novel CNN architectures.
    摘要 Encoding-decoding CNNs 在数据驱动的噪声缓解中扮演中心角色,可以在多种深度学习算法中找到。然而,这些 CNN 架构的开发通常是靠悄悄话的,lacking 理论基础。到目前为止,有很多相关的工作努力来解释这些 CNN 的内部运作。然而,这些想法是分散的,或者需要一定的专业知识才能访问。为了开放这个激动人心的领域,这篇文章建立了深度卷积框架的理论基础,并将多种 ED CNN 架构集成到一个统一的理论框架中。通过将信号处理的基本原理与深度学习相连接,这篇自包含的材料提供了设计robust和高效的新的 CNN 架构的重要指南。

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

  • paper_url: http://arxiv.org/abs/2307.13423
  • repo_url: None
  • paper_authors: George Close, Thomas Hain, Stefan Goetze
  • for: 这个论文旨在扩展自我监督抽象语音表示法(SSSR),以便非侵入式方式预测听力障碍用户的语音质量评分。
  • methods: 本文使用了SSSR作为输入特征,通过非侵入式预测模型来预测听力障碍用户的语音理解度。
  • results: 研究发现,SSSR可以作为输入特征,实现与更复杂的系统相当的竞争性性能。分析表明,更多的数据可能需要用于将预测模型 generalized to unknown systems and individuals。
    Abstract Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e.g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing. However, exact knowledge of why and how quality-related information is encoded well in such representations remains poorly understood. In this work, techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users. It is found that self-supervised representations are useful as input features to non-intrusive prediction models, achieving competitive performance to more complex systems. A detailed analysis of the performance depending on Clarity Prediction Challenge 1 listeners and enhancement systems indicates that more data might be needed to allow generalisation to unknown systems and (hearing-impaired) individuals
    摘要 自我监督的语音表示 (SSSR) 已成功应用于多个语音处理任务中,例如作为语音质量预测的特征提取器,这同时对于评估和培训语音增强系统的用户进行评估和培训具有重要性。然而,关于为什么和如何在这些表示中编码质量相关信息的准确知识仍然不够了解。在这种情况下,非侵入式预测模型的扩展被应用于预测听力障碍用户的语音明白度。结果表明,自我监督表示可以作为输入特征进行非侵入式预测模型,实现与更复杂的系统相当的性能。一个细化的性能分析,具体分析了根据Clearity Prediction Challenge 1的听众和增强系统,表明更多的数据可能需要以Allow generalization to unknown systems and (hearing-impaired) individuals。

On the Learning Dynamics of Attention Networks

  • paper_url: http://arxiv.org/abs/2307.13421
  • repo_url: https://github.com/vashisht-rahul/on-the-learning-dynamics-of-attention-networks
  • paper_authors: Rahul Vashisht, Harish G. Ramaswamy
  • for: 本文探讨了三种常见的注意力模型优化方法,即软注意力、硬注意力和隐变量 marginal likelihood(LVML)注意力。这三种方法都是为了找到一个 ocus' 模型,可以选择输入中的正确段落,以及一个 classification’ 模型,可以处理选择的段落并生成目标标签。但是它们在选取段落的方式不同,导致了不同的动态和最终结果。
  • methods: 本文使用了不同的注意力优化方法,包括软注意力损失、硬注意力损失和隐变量 marginal likelihood(LVML)注意力损失。这些方法的选择对于模型的性能有很大的影响。
  • results: 本文通过对一系列半 sintetic和实际世界数据集进行实验,发现了不同的注意力优化方法在模型性能上的不同表现。软注意力损失在初始化时能够快速改进 focus 模型,但是后续会降低。与此相反,硬注意力损失在初始化时会降低 focus 模型,但是后续会快速改进。基于这些观察,文章提出了一种简单的混合方法,可以结合不同的注意力优化方法的优点,并在实验中得到了良好的性能。
    Abstract Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets
    摘要 听力模型通常通过优化三种标准损失函数来学习:软注意力、硬注意力和隐变量极值 probabilistic(LVML)注意力。这三种方法均由同一目标而导向:找到一个`焦点'模型,可以选择输入中正确的`段',以及一个`分类'模型,可以处理选择的段来生成目标标签。然而,它们在选择段的方式不同,从而导致了不同的动态和最终结果。我们观察到这些模型学习使用这些方法的独特签名,并解释这为梯度下降在 fixes 焦点模型下的演化。我们还分析了这些方法在简单的设置下,并 deriv 出closed-form 表达式用于参数轨迹的梯度流。与软注意力损失函数相比,硬注意力损失函数在初始化时快速改进,然后后来停滞不前进。相反,硬注意力损失函数在另一方面表现出opposite的特点。基于我们的观察,我们提出了一种简单的混合方法,将不同损失函数的优点结合起来,并在一些半Synthetic 和实际数据集上进行了证明。

Co-Design of Out-of-Distribution Detectors for Autonomous Emergency Braking Systems

  • paper_url: http://arxiv.org/abs/2307.13419
  • repo_url: None
  • paper_authors: Michael Yuhas, Arvind Easwaran
  • For: The paper aims to improve the safety of autonomous vehicles (AVs) by co-designing an out-of-distribution (OOD) detector and a learning-enabled component (LEC) to detect and mitigate potential failures in the LEC.* Methods: The paper uses a risk model to analyze the impact of design parameters on both the OOD detector and the LEC, and co-designs the two components to minimize the risk of failure.* Results: The paper demonstrates a 42.3% risk reduction in the system while maintaining equivalent resource utilization, indicating the effectiveness of the co-design methodology in improving the safety of AVs.
    Abstract Learning enabled components (LECs), while critical for decision making in autonomous vehicles (AVs), are likely to make incorrect decisions when presented with samples outside of their training distributions. Out-of-distribution (OOD) detectors have been proposed to detect such samples, thereby acting as a safety monitor, however, both OOD detectors and LECs require heavy utilization of embedded hardware typically found in AVs. For both components, there is a tradeoff between non-functional and functional performance, and both impact a vehicle's safety. For instance, giving an OOD detector a longer response time can increase its accuracy at the expense of the LEC. We consider an LEC with binary output like an autonomous emergency braking system (AEBS) and use risk, the combination of severity and occurrence of a failure, to model the effect of both components' design parameters on each other's functional and non-functional performance, as well as their impact on system safety. We formulate a co-design methodology that uses this risk model to find the design parameters for an OOD detector and LEC that decrease risk below that of the baseline system and demonstrate it on a vision based AEBS. Using our methodology, we achieve a 42.3% risk reduction while maintaining equivalent resource utilization.
    摘要 We use risk, which is the combination of the severity and occurrence of a failure, to model the effect of both components' design parameters on each other's functional and non-functional performance, as well as their impact on system safety. We develop a co-design methodology that uses this risk model to find the design parameters for an OOD detector and LEC that minimize risk. We demonstrate the effectiveness of our methodology on a vision-based autonomous emergency braking system (AEBS).By using our co-design methodology, we achieve a 42.3% risk reduction while maintaining equivalent resource utilization. This demonstrates the potential of our approach to improve the safety of AVs by optimizing the design parameters of both OOD detectors and LECs.

Communication-Efficient Orchestrations for URLLC Service via Hierarchical Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13415
  • repo_url: None
  • paper_authors: Wei Shi, Milad Ganjalizadeh, Hossein Shokri Ghadikolaei, Marina Petrova
  • for: 此研究旨在提高5G中的可靠低延迟通信服务(URLLC)的可靠性和响应速度。
  • methods: 本研究使用多代理 Hierarchical Reinforcement Learning(HRL)框架,实现多级政策的实现,并且通过不同控制循环时间的调整,提高控制循环的响应速度和灵活性。
  • results: 在一个先前的实验中,使用HRL框架优化工业设备的最大重传数和传输功率,并获得了较好的性能,比基eline单代理RL方法更好,同时具有较少的信号传输 overhead和延迟。
    Abstract Ultra-reliable low latency communications (URLLC) service is envisioned to enable use cases with strict reliability and latency requirements in 5G. One approach for enabling URLLC services is to leverage Reinforcement Learning (RL) to efficiently allocate wireless resources. However, with conventional RL methods, the decision variables (though being deployed at various network layers) are typically optimized in the same control loop, leading to significant practical limitations on the control loop's delay as well as excessive signaling and energy consumption. In this paper, we propose a multi-agent Hierarchical RL (HRL) framework that enables the implementation of multi-level policies with different control loop timescales. Agents with faster control loops are deployed closer to the base station, while the ones with slower control loops are at the edge or closer to the core network providing high-level guidelines for low-level actions. On a use case from the prior art, with our HRL framework, we optimized the maximum number of retransmissions and transmission power of industrial devices. Our extensive simulation results on the factory automation scenario show that the HRL framework achieves better performance as the baseline single-agent RL method, with significantly less overhead of signal transmissions and delay compared to the one-agent RL methods.
    摘要 超可靠低延迟通信服务(URLLC)在5G中被视为实现严格可靠性和延迟要求的应用场景。一种实现URLLC服务的方法是通过强化学习(RL)有效地分配无线资源。然而,传统RL方法中的决策变量(即在不同网络层部署)通常在同一控制循环中优化,这会导致控制循环延迟的限制以及过分的信号传输和能耗。在这篇论文中,我们提出了一种多代理层RL(HRL)框架,该框架允许实现多级策略,并且在不同层次上有不同的控制循环时间尺度。靠近基站的代理在更快的控制循环中部署,而Edge或更接近核心网络的代理则提供高级指导 для低级动作。在一个优化 industrial device 的Factory automation scenario中,我们使用HRL框架进行优化,并取得了较好的性能,与基准单代理RL方法相比,减少了信号传输的 overhead和延迟。Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and widely used in informal writing. If you need Traditional Chinese, please let me know.

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

  • paper_url: http://arxiv.org/abs/2307.13412
  • repo_url: None
  • paper_authors: Stylianos I. Venieris, Javier Fernandez-Marques, Nicholas D. Lane
  • for: 本研究旨在提高FPGA基于Convolutional Neural Networks(CNN)加速器的性能和能效性。
  • methods: 本文提出了一种新的CNN推理系统,称为unzipFPGA,它使用了一种新的硬件架构,包括一个 weights生成模块,可以在运行时生成权重,以解决受限制的带宽的问题。此外,文章还提出了一种自动硬件快照方法,可以根据目标CNN设备对硬件进行优化,从而提高精度和性能的平衡。
  • results: 根据结果表明,unzipFPGA可以实现2.57倍的性能效率提升相比高优化的GPU设计,并且可以达到3.94倍的性能密度,超过了一系列state-of-the-art FPGA基于CNN加速器。
    Abstract The unprecedented accuracy of convolutional neural networks (CNNs) across a broad range of AI tasks has led to their widespread deployment in mobile and embedded settings. In a pursuit for high-performance and energy-efficient inference, significant research effort has been invested in the design of FPGA-based CNN accelerators. In this context, single computation engines constitute a popular approach to support diverse CNN modes without the overhead of fabric reconfiguration. Nevertheless, this flexibility often comes with significantly degraded performance on memory-bound layers and resource underutilisation due to the suboptimal mapping of certain layers on the engine's fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. This paper presents unzipFPGA, a novel CNN inference system that counteracts the limitations of existing CNN engines. The proposed framework comprises a novel CNN hardware architecture that introduces a weights generator module that enables the on-chip on-the-fly generation of weights, alleviating the negative impact of limited bandwidth on memory-bound layers. We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair, leading to an improved accuracy-performance balance. Finally, we introduce an input selective processing element (PE) design that balances the load between PEs in suboptimally mapped layers. The proposed framework yields hardware designs that achieve an average of 2.57x performance efficiency gain over highly optimised GPU designs for the same power constraints and up to 3.94x higher performance density over a diverse range of state-of-the-art FPGA-based CNN accelerators.
    摘要 “ convolutional neural networks (CNNs) 在许多人工智能任务中表现无 precedent 的准确率,导致它们在移动和嵌入设备上广泛应用。为了实现高性能且能效的推理,大量的研究精力被投入到基于 FPGA 的 CNN 加速器的设计中。在这个上下文中,单 computation 引擎是一种广泛使用的方法,以支持多种 CNN 模式,而无需 fabric 重新配置的开销。然而,这种灵活性通常会导致在占用内存层的执行中表现下降和资源利用率下降,因为在固定配置的引擎上对某些层的映射是不优化的。在这种情况下,我们 investigate 了在 CNN 引擎设计方面的影响,特别是那些引入预处理阶段来压缩 веса的模型。我们称这些方法为“在 fly”。本文提出了 unzipFPGA,一种新的 CNN 推理系统,该系统通过引入 weights 生成器模块来解决限制现有 CNN 引擎的缺点。我们还提供了一种自适应硬件方法,该方法根据目标 CNN-设备对可以自动调整 weights 生成机制,从而提高精度-性能平衡。 finally,我们引入了一种输入选择处理元件(PE)的设计,以平衡在不优化的层中的负载。提出的框架可以在同等功耗和能源约束下实现2.57倍的性能效率提升,相比高优化的 GPU 设计,以及3.94倍的性能密度提升,超过了多种现有的 FPGA 基于 CNN 加速器。”

The Double-Edged Sword of Big Data and Information Technology for the Disadvantaged: A Cautionary Tale from Open Banking

  • paper_url: http://arxiv.org/abs/2307.13408
  • repo_url: None
  • paper_authors: Savina Dine Kim, Galina Andreeva, Michael Rovatsos
  • for: 本研究探讨了开放银行技术的隐含不平等风险,通过使用 machine learning(ML)技术和 UK FinTech 银行数据集来示例。
  • methods: 本研究使用了三种 ML 分类器来预测 финан参与者的可能性,并通过集成特征分析groups exhibiting不同的大小和形式的 Financial Vulnerability(FV)。
  • results: 研究发现,工程化的金融行为特征可以预测排除个人信息的 omitted 个人特征,特别是敏感或保护特征,这解释了开放银行数据的隐藏危险。
    Abstract This research article analyses and demonstrates the hidden implications for fairness of seemingly neutral data coupled with powerful technology, such as machine learning (ML), using Open Banking as an example. Open Banking has ignited a revolution in financial services, opening new opportunities for customer acquisition, management, retention, and risk assessment. However, the granularity of transaction data holds potential for harm where unnoticed proxies for sensitive and prohibited characteristics may lead to indirect discrimination. Against this backdrop, we investigate the dimensions of financial vulnerability (FV), a global concern resulting from COVID-19 and rising inflation. Specifically, we look to understand the behavioral elements leading up to FV and its impact on at-risk, disadvantaged groups through the lens of fair interpretation. Using a unique dataset from a UK FinTech lender, we demonstrate the power of fine-grained transaction data while simultaneously cautioning its safe usage. Three ML classifiers are compared in predicting the likelihood of FV, and groups exhibiting different magnitudes and forms of FV are identified via clustering to highlight the effects of feature combination. Our results indicate that engineered features of financial behavior can be predictive of omitted personal information, particularly sensitive or protected characteristics, shedding light on the hidden dangers of Open Banking data. We discuss the implications and conclude fairness via unawareness is ineffective in this new technological environment.
    摘要 Using a unique dataset from a UK FinTech lender, the article demonstrates the power of fine-grained transaction data while cautioning its safe usage. Three machine learning (ML) classifiers are compared in predicting the likelihood of FV, and groups exhibiting different magnitudes and forms of FV are identified through clustering. The results show that engineered features of financial behavior can be predictive of omitted personal information, particularly sensitive or protected characteristics, highlighting the hidden dangers of Open Banking data.The article concludes that fairness via unawareness is ineffective in this new technological environment and discusses the implications for ensuring fairness in the use of Open Banking data. The findings have important implications for the financial industry, policymakers, and consumers, highlighting the need for careful consideration of the potential risks and benefits of Open Banking data and the importance of ensuring fairness in its use.

Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space

  • paper_url: http://arxiv.org/abs/2307.13390
  • repo_url: None
  • paper_authors: Xuan Zhao, Klaus Broelemann, Gjergji Kasneci
  • for: 本研究旨在提供一种新的方法来生成Counterfactual Explanations(CE),以帮助用户更好地理解AI系统的决策过程和改进其结果。
  • methods: 本研究使用了一种基于自适应卷积神经网络的方法,首先将矩阵空间转换成一个 mixture of Gaussian distributions 的形式,然后通过线性 interpolate 生成 CE。
  • results: 对于各种图像和表格数据集,我们的方法能够具有比例尺度和数据抽象的优势,并且能够高效地返回更加真实的结果,相比三种现有的方法。
    Abstract Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions: 1. What are the crucial factors that led to an automated prediction/decision? 2. How can these factors be changed to achieve a more favorable outcome from a user's perspective? Thus, guiding the user's interaction with AI systems by proposing easy-to-understand explanations and easy-to-attain feasible changes is essential for the trustworthy adoption and long-term acceptance of AI systems. In the literature, various methods have been proposed to generate CEs, and different quality measures have been suggested to evaluate these methods. However, the generation of CEs is usually computationally expensive, and the resulting suggestions are unrealistic and thus non-actionable. In this paper, we introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions. CEs are then generated in latent space by linear interpolation between the query sample and the centroid of the target class. We show that our method maintains the characteristics of the input sample during the counterfactual search. In various experiments, we show that the proposed method is competitive based on different quality measures on image and tabular datasets -- efficiently returns results that are closer to the original data manifold compared to three state-of-the-art methods, which are essential for realistic high-dimensional machine learning applications.
    摘要 “ counterfactual 解释 (CEs) 是 Algorithmic Recourse 中一种重要的工具,用于回答以下两个问题:1. 自动预测/决策中的关键因素为何?2. 如何变化这些因素以获得更有利的结果从用户的角度?因此,为AI系统的使用者提供易于理解的解释和可行的改变建议是Algorithmic Recourse 的重要 Component。在文献中,多种方法已经被提出供Counterfactual 解释,并且不同的质量指标已经被建议来评估这些方法。然而,Counterfactual 解释的生成通常是 computationally expensive 的,并且生成的建议通常是不现实的,因此无法使用。在本文中,我们创新了一种用于预训binary classifier的Counterfactual 解释方法,通过首先将 autoencoder 的latent space 变成一个 mixture of Gaussian distributions。Counterfactual 解释在latent space中generated by linear interpolation between the query sample and the centroid of the target class。我们显示了我们的方法可以维持输入样本的特性。在多个实验中,我们显示了我们的方法与三种state-of-the-art方法相比,能够实现更高的质量指标。”

BotHawk: An Approach for Bots Detection in Open Source Software Projects

  • paper_url: http://arxiv.org/abs/2307.13386
  • repo_url: https://github.com/bifenglin/bothawk
  • paper_authors: Fenglin Bi, Zhiwei Zhu, Wei Wang, Xiaoya Xia, Hassan Ali Khan, Peng Pu
  • for: 这个研究旨在调查开源软件项目中的机器人账户,并尝试准确地识别机器人账户。
  • methods: 该研究使用了一种严格的数据采集工作流程,以确保收集到的数据准确、可重复、可扩展和有效。研究人员还提出了一种名为BotHawk的机器人检测模型,可以高效地检测开源软件项目中的机器人账户。
  • results: 研究人员通过分析17个特征在5个维度中,确定了开源软件项目中机器人账户的四种类型。此外,研究人员发现,跟踪者数、仓库数和标签含义最有用于识别账户类型。BotHawk模型在检测开源软件项目中的机器人账户方面表现出色,其AUC为0.947,F1分数为0.89。
    Abstract Social coding platforms have revolutionized collaboration in software development, leading to using software bots for streamlining operations. However, The presence of open-source software (OSS) bots gives rise to problems including impersonation, spamming, bias, and security risks. Identifying bot accounts and behavior is a challenging task in the OSS project. This research aims to investigate bots' behavior in open-source software projects and identify bot accounts with maximum possible accuracy. Our team gathered a dataset of 19,779 accounts that meet standardized criteria to enable future research on bots in open-source projects. We follow a rigorous workflow to ensure that the data we collect is accurate, generalizable, scalable, and up-to-date. We've identified four types of bot accounts in open-source software projects by analyzing their behavior across 17 features in 5 dimensions. Our team created BotHawk, a highly effective model for detecting bots in open-source software projects. It outperforms other models, achieving an AUC of 0.947 and an F1-score of 0.89. BotHawk can detect a wider variety of bots, including CI/CD and scanning bots. Furthermore, we find that the number of followers, number of repositories, and tags contain the most relevant features to identify the account type.
    摘要 社交代码平台已经革命化软件开发合作,使用软件机器人来简化操作。然而,开源软件(OSS)机器人的存在导致了多种问题,包括伪造、诈骗、偏见和安全风险。标识机器人帐户和行为是开源项目中的挑战。本研究目的是调查开源软件项目中的机器人行为,并尽可能准确地识别机器人帐户。我们的团队收集了19,779个符合标准化riteria的帐户,以便未来对开源项目中的机器人进行研究。我们采用了严格的工作流程,以确保收集的数据准确、可重复、可扩展和时尚。我们通过分析17个特征在5个维度来Identify four types of bot accounts in open-source software projects。我们创建了BotHawk模型,可以高效地检测开源软件项目中的机器人。它比其他模型高效,AUC为0.947,F1分数为0.89。BotHawk可以检测更多的机器人,包括CI/CD和扫描机器人。此外,我们发现帐户类型的最有用特征是粉丝数、仓库数和标签。

Scaff-PD: Communication Efficient Fair and Robust Federated Learning

  • paper_url: http://arxiv.org/abs/2307.13381
  • repo_url: None
  • paper_authors: Yaodong Yu, Sai Praneeth Karimireddy, Yi Ma, Michael I. Jordan
  • for: 提高 Federated Learning 中的公平性和鲁棒性,适用于资源受限和多样化环境。
  • methods: 使用 acceleration primal dual (APD) 算法,利用偏好 corrected local steps (as in Scaffold) 实现更高效的通信协调和更快的收敛速度。
  • results: 在多个 benchmark 数据集上测试,Scaff-PD 能够提高公平性和鲁棒性,同时保持竞争性的准确率。
    Abstract We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning. Our approach improves fairness by optimizing a family of distributionally robust objectives tailored to heterogeneous clients. We leverage the special structure of these objectives, and design an accelerated primal dual (APD) algorithm which uses bias corrected local steps (as in Scaffold) to achieve significant gains in communication efficiency and convergence speed. We evaluate Scaff-PD on several benchmark datasets and demonstrate its effectiveness in improving fairness and robustness while maintaining competitive accuracy. Our results suggest that Scaff-PD is a promising approach for federated learning in resource-constrained and heterogeneous settings.
    摘要 我团队现请Scaff-PD,一种快速并通信效率高的分布robust federated learning算法。我们的方法通过优化适应于异构客户端的分布robust目标函数来提高公平性。我们利用这些目标函数的特殊结构,并设计了一种加速的 principales dual(APD)算法,使用偏好修正的本地步骤(如Scaffold)来实现重要的通信效率和速度增加。我们在多个 Referenced datasets上评估Scaff-PD,并证明其在保持竞争性准确性的情况下提高公平性和鲁棒性。我们的结果表明Scaff-PD是一种有前途的approach federated learning中的资源受限和异构环境中。

Submodular Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.13372
  • repo_url: https://github.com/manish-pra/non-additive-rl
  • paper_authors: Manish Prajapat, Mojmír Mutný, Melanie N. Zeilinger, Andreas Krause
  • for: 这个论文是为了解决强迫学习(RL)中的奖励问题,奖励通常是加法的,但在许多重要应用中,奖励具有减少返回的特点,例如覆盖控制、实验设计和信息 PATH 规划。
  • methods: 作者提出了一种新的概念——submodular RL(SubRL),它寻找更一般、非加法(历史相互作用)的奖励模型,使用 submodular 集合函数来捕捉减少返回的特点。然而,在总的来说,即使在表格设定中,这种优化问题是Difficult to approximate。
  • results: 作者提出了一种简单的policy gradient算法——SubPO,它可以处理非加法奖励。SubPO 可以在一些假设下 recuperate 优化的常数因子应用,并且在大 state-和 action- 空间下可以进行本地优化。作者通过应用 SubPO 到不同的应用中,如生物多样性监测、抽象实验设计、信息 PATH 规划和覆盖最大化,来展示其效果。结果表明 SubPO 具有高效的样本使用和可扩展性。
    Abstract In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are $\textit{independent}$ of states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. To tackle this, we propose $\textit{submodular RL}$ (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns. Unfortunately, in general, even in tabular settings, we show that the resulting optimization problem is hard to approximate. On the other hand, motivated by the success of greedy algorithms in classical submodular optimization, we propose SubPO, a simple policy gradient-based algorithm for SubRL that handles non-additive rewards by greedily maximizing marginal gains. Indeed, under some assumptions on the underlying Markov Decision Process (MDP), SubPO recovers optimal constant factor approximations of submodular bandits. Moreover, we derive a natural policy gradient approach for locally optimizing SubRL instances even in large state- and action- spaces. We showcase the versatility of our approach by applying SubPO to several applications, such as biodiversity monitoring, Bayesian experiment design, informative path planning, and coverage maximization. Our results demonstrate sample efficiency, as well as scalability to high-dimensional state-action spaces.
    摘要 在增强学习(RL)中,状态奖励通常是加法的,并且根据马可夫假设,它们是独立的。在许多重要应用中,如覆盖控制、实验设计和有益路径规划,奖励自然地具有减少的返回,即在相似的状态前后访问的情况下,奖励的价值逐渐减少。为解决这个问题,我们提议了“增强RL”(SubRL),一种潜在优化更一般、非加法(历史相依)奖励的 paradigma。然而,在总的来说,即使在表格设置中,我们显示出的优化问题是难以估算的。在这种情况下,我们提出了一种简单的政策梯度方法,即SubPO,用于解决SubRL问题。SubPO通过积极地最大化权重的增加来处理非加法奖励。在某些假设下,SubPO可以在MDP中获得优化的常量因子approximation。此外,我们还 deriv了一种自然的政策梯度方法,用于本地优化SubRL实例,即使在大的状态和动作空间中。我们在各种应用中应用SubPO,如生物多样性监测、推论实验设计、有益路径规划和覆盖最大化。我们的结果显示了样本效率,以及可扩展性。

Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation

  • paper_url: http://arxiv.org/abs/2307.13371
  • repo_url: None
  • paper_authors: Fengxue Zhang, Jialin Song, James Bowden, Alexander Ladd, Yisong Yue, Thomas A. Desautels, Yuxin Chen
  • for: 这篇论文是关于 Bayesian 优化 (BO) 在高维和非站ARY enario 中的研究。
  • methods: 论文提出了一个框架,called BALLET,它可以在高维和非站ARY enario 中实现 Bayesian 优化。BALLET 使用了两个 probabilistic 模型:一个粗糙的 Gaussian 过程 (GP) 来识别高信度区域 (ROI),以及一个本地化的 GP 来优化在 ROI 中。
  • results: 论文证明了 BALLET 可以有效缩小搜索空间,并且可以比标准 BO 方法更紧的对应 regret bound。论文还进行了实验证明,证明了 BALLET 在实际应用中的效果。
    Abstract We study Bayesian optimization (BO) in high-dimensional and non-stationary scenarios. Existing algorithms for such scenarios typically require extensive hyperparameter tuning, which limits their practical effectiveness. We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP). Our approach is easy to tune, and is able to focus on local region of the optimization space that can be tackled by existing BO methods. The key idea is to use two probabilistic models: a coarse GP to identify the ROI, and a localized GP for optimization within the ROI. We show theoretically that BALLET can efficiently shrink the search space, and can exhibit a tighter regret bound than standard BO without ROI filtering. We demonstrate empirically the effectiveness of BALLET on both synthetic and real-world optimization tasks.
    摘要 我们研究 bayesian 优化(BO)在高维和非站点场景下。现有的算法通常需要广泛的 гипер参数调整,这限制了它们的实际效果。我们提出了一个框架,叫做 BALLET,它可以动态筛选出高信息域的兴趣点(ROI),作为非参数型 probabilistic 模型,如 Gaussian process(GP)的超级集。我们的方法容易调整,可以将关注点放在可以由现有的 BO 方法解决的本地优化空间上。关键思想是使用两种 probabilistic 模型:一个粗细的 GP 来识别 ROI,并一个局部化的 GP 进行优化在 ROI 中。我们证明了 BALLET 可以有效缩小搜索空间,并可以比标准 BO 无 ROI 筛选更紧的 regret bound。我们在 sintetic 和实际优化任务上证明了 BALLET 的实际效果。

Computational Guarantees for Doubly Entropic Wasserstein Barycenters via Damped Sinkhorn Iterations

  • paper_url: http://arxiv.org/abs/2307.13370
  • repo_url: None
  • paper_authors: Lénaïc Chizat, Tomas Vaškevičius
  • for: Computation of doubly regularized Wasserstein barycenters
  • methods: Damped Sinkhorn iterations followed by exact maximization/minimization steps
  • results: Convergence guarantees for any choice of regularization parameters, and non-asymptotic convergence guarantees for approximating Wasserstein barycenters between discrete point clouds in the free-support/grid-free setting.Here’s the format you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope that helps!
    Abstract We study the computation of doubly regularized Wasserstein barycenters, a recently introduced family of entropic barycenters governed by inner and outer regularization strengths. Previous research has demonstrated that various regularization parameter choices unify several notions of entropy-penalized barycenters while also revealing new ones, including a special case of debiased barycenters. In this paper, we propose and analyze an algorithm for computing doubly regularized Wasserstein barycenters. Our procedure builds on damped Sinkhorn iterations followed by exact maximization/minimization steps and guarantees convergence for any choice of regularization parameters. An inexact variant of our algorithm, implementable using approximate Monte Carlo sampling, offers the first non-asymptotic convergence guarantees for approximating Wasserstein barycenters between discrete point clouds in the free-support/grid-free setting.
    摘要 我们研究双正则化 Wasserstein 质心的计算,这是最近引入的一种内外正则化强度控制的泛化积分质心。先前的研究表明,不同的正则化参数选择可以统一各种束缚积分质心,同时还可以揭示新的质心,包括特殊情况下的减偏积分质心。在这篇文章中,我们提出了一种计算双正则化 Wasserstein 质心的算法,该算法基于抑制式谱散数据进行融合,然后使用精确的最大化/最小化步骤,可以保证任何正则化参数选择的收敛。在实际应用中,我们还提出了一种使用伪随机抽样来实现准确的质心计算的不精准变体,这是自由支持/网格自由设置下的第一个不对称收敛保证的方法。

Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers

  • paper_url: http://arxiv.org/abs/2307.14367
  • repo_url: None
  • paper_authors: Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, Michalis Vazirgiannis
  • for: 这篇论文的目的是提出一个新的方法来预测蛋白质的功能,这个方法使用 Graph Neural Networks(GNNs) 和 Large Language Models(LLMs) 在encoder-decoder架构中结合,以生成蛋白质功能的详细描述。
  • methods: 这篇论文使用的方法是一种 multimodal 方法,结合蛋白质的序列、结构和文本描述,使用 GNNs 和 LLMs 进行融合,实现蛋白质功能的全面表示。
  • results: 这篇论文的实验结果显示,这个新的方法可以实现更高的预测精度,并且可以生成蛋白质功能的详细描述。
    Abstract The complex nature of big biological systems pushed some scientists to classify its understanding under the inconceivable missions. Different leveled challenges complicated this task, one of is the prediction of a protein's function. In recent years, significant progress has been made in this field through the development of various machine learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e assigning predefined labels to proteins. In this work, we propose a novel approach, \textbf{Prot2Text}, which predicts a protein function's in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including proteins' sequences, structures, and textual annotations. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate prediction of proteins' functions. The code, the models and a demo will be publicly released.
    摘要 大生物系统的复杂性让一些科学家将其理解列入不可思议任务之列。不同的等级挑战困扰了这个任务,其中之一是蛋白质功能预测。在过去几年,我们在这一领域进行了重要的进步,通过开发多种机器学习方法。然而,大多数现有方法将任务定义为多类别问题,即将蛋白质分配预先定义的标签。在这种情况下,我们提出了一种新的方法——Prot2Text,它预测蛋白质功能在自由文本格式下,超越传统的二分或分类化预测。我们通过将图 neural network(GNN)和大型自然语言模型(LLM)组合在encoder-decoder框架中,能够集成多种蛋白质数据类型,包括序列、结构和文本注释。这种多模式方法允许我们对蛋白质功能进行整体表示,使得生成详细和准确的描述。为评估我们的模型,我们从SwissProt中提取了多模式蛋白质数据集,并通过实验证明Prot2Text的效果。这些结果显示了多模式模型的融合,特别是GNN和LLM的融合,为研究人员提供了更加准确的蛋白质功能预测工具。代码、模型和demo将公共发布。

High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers

  • paper_url: http://arxiv.org/abs/2307.13352
  • repo_url: None
  • paper_authors: Puning Zhao, Zhiguo Wan
  • for: 这篇论文是关于robust分布式学习,拥有Byzantine失败的情况下的研究。
  • methods: 该方法采用了直方正方向的半验证方法,可以在高维问题上进行解决,并且可以适应arbitrary数量的Byzantine攻击者。
  • results: 我们的研究表明,该方法可以在高维问题上实现最佳的统计效果,并且与前一些研究相比,它在维度上具有更好的性能。
    Abstract Robust distributed learning with Byzantine failures has attracted extensive research interests in recent years. However, most of existing methods suffer from curse of dimensionality, which is increasingly serious with the growing complexity of modern machine learning models. In this paper, we design a new method that is suitable for high dimensional problems, under arbitrary number of Byzantine attackers. The core of our design is a direct high dimensional semi-verified mean estimation method. Our idea is to identify a subspace first. The components of mean value perpendicular to this subspace can be estimated via gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. We then use our new method as the aggregator of distributed learning problems. Our theoretical analysis shows that the new method has minimax optimal statistical rates. In particular, the dependence on dimensionality is significantly improved compared with previous works.
    摘要 robust 分布式学习受到了最近几年的广泛研究兴趣。然而,大多数现有方法受到了维度灾难的拥挤,这在现代机器学习模型的复杂度逐渐增加时变得越来越严重。在这篇论文中,我们设计了适用于高维问题的新方法,可以抗 resist 任意数量的拜占庭攻击者。我们的设计核心在于直接使用高维半验证平均值计算方法。我们的想法是先Identify一个子空间,然后通过上传 worker 机器的梯度向量来计算沿着这个子空间的方向的部分,而在这个子空间内部使用 auxillary 数据来计算剩余的部分。我们然后使用我们的新方法来汇集分布式学习问题。我们的理论分析表明,我们的新方法具有最优的最小最大统计率。具体来说,与前一些工作相比,我们的方法在维度方面具有显著的改进。

Explainable Disparity Compensation for Efficient Fair Ranking

  • paper_url: http://arxiv.org/abs/2307.14366
  • repo_url: None
  • paper_authors: Abraham Gale, Amélie Marian
  • for: This paper aims to address the issue of disparate outcomes in decision systems, specifically in ranking functions, and proposes data-driven compensatory measures to improve fairness.
  • methods: The proposed measures rely on generating bonus points for members of underrepresented groups to address disparity in the ranking function. Efficient sampling-based algorithms are used to calculate the number of bonus points to minimize disparity.
  • results: The authors validate their algorithms using real-world school admissions and recidivism datasets, and compare their results with those of existing fair ranking algorithms. The results show that their proposed measures can effectively improve fairness in the ranking function.Here’s the full text in Simplified Chinese:
  • for: 这篇论文目标是解决决策系统中的不平等结果问题,具体来说是对排名函数中的不平等进行补偿。
  • methods: 提议的补偿措施基于为受排除群体成员分配加分点,以解决排名函数中的不平等。 authors使用高效的采样算法来计算加分点的数量,以最小化不平等。
  • results: authors使用实际的学校招生和重犯罪数据集来验证他们的算法,并与现有的公平排名算法进行比较。结果表明,提议的补偿措施可以有效地提高排名函数的公平性。
    Abstract Ranking functions that are used in decision systems often produce disparate results for different populations because of bias in the underlying data. Addressing, and compensating for, these disparate outcomes is a critical problem for fair decision-making. Recent compensatory measures have mostly focused on opaque transformations of the ranking functions to satisfy fairness guarantees or on the use of quotas or set-asides to guarantee a minimum number of positive outcomes to members of underrepresented groups. In this paper we propose easily explainable data-driven compensatory measures for ranking functions. Our measures rely on the generation of bonus points given to members of underrepresented groups to address disparity in the ranking function. The bonus points can be set in advance, and can be combined, allowing for considering the intersections of representations and giving better transparency to stakeholders. We propose efficient sampling-based algorithms to calculate the number of bonus points to minimize disparity. We validate our algorithms using real-world school admissions and recidivism datasets, and compare our results with that of existing fair ranking algorithms.
    摘要

Feature Importance Measurement based on Decision Tree Sampling

  • paper_url: http://arxiv.org/abs/2307.13333
  • repo_url: https://github.com/tsudalab/dt-sampler
  • paper_authors: Chao Huang, Diptesh Das, Koji Tsuda
  • for: 用于提高树基型模型中FeatureImportance的可 interpretability和稳定性。
  • methods: 使用SAT理论来测试FeatureImportance,具有 fewer parameters 和更高的可 interpretability,适用于实际问题。
  • results: 在实际问题中,DT-Sampler可以提供更高的可 interpretability和稳定性,并且比Random Forest具有更少的参数。Translation:
  • for: Used to improve the interpretability and stability of feature importance in tree-based models.
  • methods: Uses SAT theory to test feature importance, with fewer parameters and higher interpretability, applicable to real-world problems.
  • results: In practical problems, DT-Sampler can provide higher interpretability and stability, and has fewer parameters than Random Forest.
    Abstract Random forest is effective for prediction tasks but the randomness of tree generation hinders interpretability in feature importance analysis. To address this, we proposed DT-Sampler, a SAT-based method for measuring feature importance in tree-based model. Our method has fewer parameters than random forest and provides higher interpretability and stability for the analysis in real-world problems. An implementation of DT-Sampler is available at https://github.com/tsudalab/DT-sampler.
    摘要 随机森林可以很有效地进行预测任务,但随机生成树的randomness会降低特征重要性的解释性。为解决这个问题,我们提出了DT-Sampler,一种基于SAT的方法来测量树型模型中特征的重要性。我们的方法有 fewer parameters than random forest,并且在实际问题中提供了更高的解释性和稳定性。DT-Sampler的实现可以在https://github.com/tsudalab/DT-sampler中找到。

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

  • paper_url: http://arxiv.org/abs/2307.13332
  • repo_url: None
  • paper_authors: Philip Amortila, Nan Jiang, Csaba Szepesvári
  • for: 这篇论文主要针对 linear off-policy value function estimation 问题进行研究,具体来说是研究函数approximation factor在不同设置下的优化形式。
  • methods: 论文使用了各种方法来研究函数approximation factor,包括使用权重$L_2$ norm、$L_\infty$ norm、状态别名和完整/半 coverage的state space。
  • results: 论文的结果显示,在不同的设置下,函数approximation factor的优化形式是不同的,并且可以确定具体的常数因素。特别是,$L_2(\mu)$ norm 下的两个实例特定因素和 $L_\infty$ norm 下的一个常数因素被证明为决定了偏离策略评估的困难程度。
    Abstract Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.
    摘要 theoretically guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. in this paper, we study this question in linear off-policy value function estimation, where many open questions remain. we study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. we establish the optimal asymptotic approximation factors (up to constants) for all of these settings. in particular, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.Note that Simplified Chinese is a written language, and the translation is based on the standardized grammar and vocabulary of Simplified Chinese. However, the actual translation may vary depending on the specific context and register used in the original text.

Unleash the Power of Context: Enhancing Large-Scale Recommender Systems with Context-Based Prediction Models

  • paper_url: http://arxiv.org/abs/2308.01231
  • repo_url: None
  • paper_authors: Jan Hartman, Assaf Klein, Davorin Kopič, Natalia Silberstein
  • for: 提高大规模商业推荐系统的性能,具有广泛的个性化推荐应用场景。
  • methods: 基于用户和上下文特征的预测模型,不考虑物品特征,可以减少服务成本。
  • results: 实验表明,这种方法可以在线上和离线上的商业指标中带来显著改善,而且对服务成本的影响很小。
    Abstract In this work, we introduce the notion of Context-Based Prediction Models. A Context-Based Prediction Model determines the probability of a user's action (such as a click or a conversion) solely by relying on user and contextual features, without considering any specific features of the item itself. We have identified numerous valuable applications for this modeling approach, including training an auxiliary context-based model to estimate click probability and incorporating its prediction as a feature in CTR prediction models. Our experiments indicate that this enhancement brings significant improvements in offline and online business metrics while having minimal impact on the cost of serving. Overall, our work offers a simple and scalable, yet powerful approach for enhancing the performance of large-scale commercial recommender systems, with broad implications for the field of personalized recommendations.
    摘要 在这项工作中,我们介绍了基于上下文的预测模型。这种预测模型根据用户和上下文特征决定用户行为(如点击或购买)的概率,不考虑物品本身的特征。我们已经认为这种模型途径具有很多有价值的应用,包括培养一个辅助上下文基于模型来估计点击概率,并将其预测作为ctr预测模型中的一个特征。我们的实验表明,这种增强可以在线上和Offline商业指标方面带来显著改善,而无需增加服务成本。总之,我们的工作提供了一种简单、可扩展、 yet 具有强大能力的方法来提高大规模的商业推荐系统的性能,对个人化推荐领域产生广泛的影响。

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

  • paper_url: http://arxiv.org/abs/2307.13304
  • repo_url: https://github.com/jerry-chee/quip
  • paper_authors: Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa
  • for: 本研究探讨大语言模型(LLM)后期参数归一化。
  • methods: 我们提出了一种基于干扰量和矩阵方向的归一化方法(QuIP),包括两个步骤:(1) 适应归一化过程中的二次proxy目标函数; (2) 高效的预处理和后处理,通过随机正交矩阵来保证参数和偏差矩阵的不一致。
  • results: 我们的实验表明,QuIP可以提高多种现有的归一化算法的性能,并且在只使用两个位数据时实现了首个可行的LLM归一化方法。我们的代码可以在https://github.com/jerry-chee/QuIP上找到。
    Abstract This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/jerry-chee/QuIP .
    摘要
  1. An adaptive rounding procedure that minimizes a quadratic proxy objective.2. Efficient pre- and post-processing that ensures weight and Hessian incoherence through multiplication by random orthogonal matrices.We also provide the first theoretical analysis for an LLM-scale quantization algorithm and show that our theory applies to an existing method, OPTQ. Our experiments demonstrate that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code is available at https://github.com/jerry-chee/QuIP.

Word Sense Disambiguation as a Game of Neurosymbolic Darts

  • paper_url: http://arxiv.org/abs/2307.16663
  • repo_url: None
  • paper_authors: Tiansi Dong, Rafet Sifa
  • for: 提高Word Sense Disambiguation(WSD)任务的性能,突破深度学习方法的“玻璃天花”
  • methods: 提出了一种新的神经符号方法,利用嵌入式的方法,通过嵌入式的方法来实现简单的逻辑推理,并通过游戏“dart”来训练Transformer模型
  • results: 在多个测试数据集上达到了F1分数的90%以上,并且在不同的n-ball嵌入中得到了70%-75%的覆盖率,表明该方法可以超越深度学习方法的性能 bound.
    Abstract Word Sense Disambiguation (WSD) is one of the hardest tasks in natural language understanding and knowledge engineering. The glass ceiling of 80% F1 score is recently achieved through supervised deep-learning, enriched by a variety of knowledge graphs. Here, we propose a novel neurosymbolic methodology that is able to push the F1 score above 90%. The core of our methodology is a neurosymbolic sense embedding, in terms of a configuration of nested balls in n-dimensional space. The centre point of a ball well-preserves word embedding, which partially fix the locations of balls. Inclusion relations among balls precisely encode symbolic hypernym relations among senses, and enable simple logic deduction among sense embeddings, which cannot be realised before. We trained a Transformer to learn the mapping from a contextualized word embedding to its sense ball embedding, just like playing the game of darts (a game of shooting darts into a dartboard). A series of experiments are conducted by utilizing pre-training n-ball embeddings, which have the coverage of around 70% training data and 75% testing data in the benchmark WSD corpus. The F1 scores in experiments range from 90.1% to 100.0% in all six groups of test data-sets (each group has 4 testing data with different sizes of n-ball embeddings). Our novel neurosymbolic methodology has the potential to break the ceiling of deep-learning approaches for WSD. Limitations and extensions of our current works are listed.
    摘要

Modify Training Directions in Function Space to Reduce Generalization Error

  • paper_url: http://arxiv.org/abs/2307.13290
  • repo_url: None
  • paper_authors: Yi Yu, Wenlian Lu, Boyu Chen
  • for: 提高神经网络模型的泛化性能
  • methods: 使用修改后的自然向导方法在神经网络函数空间进行 theoretically 分析,并利用 eigendecompositions 和统计学理论来derive 神经网络函数的泛化误差
  • results: 提出一个基于 eigendecompositions 和统计学理论的泛化误差减少方法,并通过数学示例证明该方法可以改善神经网络模型的泛化性能。此外,这种 theoretically 方法还可以解释许多现有的泛化提高方法的效果。
    Abstract We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We firstly present analytical expression for the function learned by this modified natural gradient under the assumptions of Gaussian distribution and infinite width limit. Thus, we explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory. By decomposing of the total generalization error attributed to different eigenspace of the kernel in function space, we propose a criterion for balancing the errors stemming from training set and the distribution discrepancy between the training set and the true data. Through this approach, we establish that modifying the training direction of the neural network in function space leads to a reduction in the total generalization error. Furthermore, We demonstrate that this theoretical framework is capable to explain many existing results of generalization enhancing methods. These theoretical results are also illustrated by numerical examples on synthetic data.
    摘要 我们提出了一种基于神经网络函数空间的修改后自然算法的理论分析。我们首先提出了假设 Gaussian 分布和无限宽限的情况下,修改后自然算法所学习的函数的analytical表达。这使我们可以明确地 Compute 修改后神经网络函数的通用错误。我们还将Total generalization error decomposed into different eigenspace of the kernel in function space,并提出一个均衡错误的对象,以减少Total generalization error。此外,我们还证明了这个理论框架可以解释许多现有的通用提升方法的结果。这些理论结果也通过了实验示例 validate 在Synthetic data 上。Here's the translation in Traditional Chinese:我们提出了一种基于神经网络函数空间的修改后自然算法的理论分析。我们首先提出了假设 Gaussian 分布和无限宽限的情况下,修改后自然算法所学习的函数的analytical表达。这使我们可以明确地 Compute 修改后神经网络函数的通用错误。我们还将Total generalization error decomposed into different eigenspace of the kernel in function space,并提出一个均衡错误的对象,以减少Total generalization error。此外,我们还证明了这个理论框架可以解释许多现有的通用提升方法的结果。这些理论结果也通过了实验示例 validate 在Synthetic data 上。

Curvature-based Transformer for Molecular Property Prediction

  • paper_url: http://arxiv.org/abs/2307.13275
  • repo_url: None
  • paper_authors: Yili Chen, Zhengyu Li, Zheng Wan, Hui Yu, Xian Wei
  • for: 提高基于人工智能的药物设计中分子属性预测的能力
  • methods: 引入Discretization of Ricci Curvature来提高图граaph神经网络模型对分子图数据的结构信息抽取能力
  • results: 在PCQM4M-LST、MoleculeNet等化学分子数据集上进行实验,与Uni-Mol、Graphormer等模型进行比较,结果表明该方法可以达到当前最佳结果。另外,Discretized Ricci curvature还能够捕捉分子结构和功能关系的信息,描述分子图数据的地方几何特征。
    Abstract The prediction of molecular properties is one of the most important and challenging tasks in the field of artificial intelligence-based drug design. Among the current mainstream methods, the most commonly used feature representation for training DNN models is based on SMILES and molecular graphs, although these methods are concise and effective, they also limit the ability to capture spatial information. In this work, we propose Curvature-based Transformer to improve the ability of Graph Transformer neural network models to extract structural information on molecular graph data by introducing Discretization of Ricci Curvature. To embed the curvature in the model, we add the curvature information of the graph as positional Encoding to the node features during the attention-score calculation. This method can introduce curvature information from graph data without changing the original network architecture, and it has the potential to be extended to other models. We performed experiments on chemical molecular datasets including PCQM4M-LST, MoleculeNet and compared with models such as Uni-Mol, Graphormer, and the results show that this method can achieve the state-of-the-art results. It is proved that the discretized Ricci curvature also reflects the structural and functional relationship while describing the local geometry of the graph molecular data.
    摘要 “分子质量预测是人工智能基于药物设计的一个最重要和挑战性任务。现今主流方法中,最常用的特征表示方法是基于SMILES和分子图,尽管这些方法简洁有效,但它们也限制了捕捉空间信息的能力。在这种工作中,我们提出了几何基于 transformer 的 Curvature-based Transformer,以提高对分子图数据的结构信息抽取能力。在计算注意力分布时,我们添加了图的 Ricci 曲率信息作为 pozitional Encoding,以将曲率信息 embed 到节点特征中。这种方法可以在不改变原始网络结构的情况下,将曲率信息从图数据中引入到模型中,并且具有扩展性。我们在 PCQM4M-LST、MoleculeNet 等化学分子数据集上进行了实验,与 Uni-Mol、Graphormer 等模型进行比较,结果显示,这种方法可以达到领先的成绩。这也证明了对分子图数据的几何基本特征的描述,同时还能够反映分子的结构和功能关系。”

Unbiased Weight Maximization

  • paper_url: http://arxiv.org/abs/2307.13270
  • repo_url: None
  • paper_authors: Stephen Chung
  • for: 本研究旨在提出一种生物学上有效的人工神经网络(ANN)训练方法,即对每个单元视为杂种学习(RL)代理,以视网膜为一个团队。
  • methods: 本方法使用REINFORCE算法,但是由于单元之间的信息不准确传递,会导致学习过程缓慢,难以扩展到大规模网络。为解决这个问题,提出了Weight Maximization方法,即每个隐藏单元可以通过自己的出口权重的 нор 来最大化自己的学习效果。
  • results: 研究人员分析了Weight Maximization方法的理论性质,并提出了一种变体Unbiased Weight Maximization。这种新方法可以提供一种不偏学习规则,使学习速度加快,并在网络规模增加时保持良好的性能。具体来说,这是目前所知道的第一种可以快速学习、适用于大规模网络的不偏学习规则。
    Abstract A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. Nevertheless, this learning method is often slow and scales poorly with network size due to inefficient structural credit assignment, since a single reward signal is broadcast to all units without considering individual contributions. Weight Maximization, a proposed solution, replaces a unit's reward signal with the norm of its outgoing weight, thereby allowing each hidden unit to maximize the norm of the outgoing weight instead of the global reward signal. In this research report, we analyze the theoretical properties of Weight Maximization and propose a variant, Unbiased Weight Maximization. This new approach provides an unbiased learning rule that increases learning speed and improves asymptotic performance. Notably, to our knowledge, this is the first learning rule for a network of Bernoulli-logistic units that is unbiased and scales well with the number of network's units in terms of learning speed.
    摘要 一种生物学可能性的方法 для训练人工神经网络(ANN)是将每个单元视为一个随机反弹学习(RL)代理,从而考虑神经网络为一支代理队伍。因此,所有单元都可以通过REINFORCE本地学习规则,该规则由全局奖励信号修饰,与生物观察到的 synaptic plasticity更加相似。然而,这种学习方法通常慢和网络大小不好扩展,因为不效的结构归因分配,单个奖励信号 Broadcast 到所有单元而无法考虑单元之间的贡献。Weight Maximization,一种提议的解决方案,将单元的奖励信号替换为出口重量的norm, allowing each hidden unit to maximize the norm of the outgoing weight instead of the global reward signal。在这份研究报告中,我们分析Weight Maximization的理论性质和提出一种变体,即不偏学习Weight Maximization。这种新的学习规则提供了一个不偏的学习规则,提高学习速度和长期性表现。值得注意的是,我们知道,这是一种可以扩展到神经网络中的 Bernoulli-logistic 单元数量的学习规则,并且与网络大小成直线关系。

Federated K-Means Clustering via Dual Decomposition-based Distributed Optimization

  • paper_url: http://arxiv.org/abs/2307.13267
  • repo_url: None
  • paper_authors: Vassilios Yfantis, Achim Wagner, Martin Ruskowski
  • for: This paper is written for researchers and practitioners interested in distributed optimization for machine learning, particularly in the context of $ K $-means clustering.
  • methods: The paper uses dual decomposition to solve the distributed training of $ K $-means clustering problems. The authors propose a mixed-integer quadratically constrained programming-based formulation of the clustering training problem and evaluate the performance of three optimization algorithms (subgradient method, bundle trust method, and quasi-Newton dual ascent algorithm) on a set of benchmark problems.
  • results: The paper demonstrates the potential of using dual decomposition for distributed training of $ K $-means clustering problems, but notes that the mixed-integer programming-based formulation of the clustering problems suffers from weak integer relaxations. The authors evaluate the performance of three optimization algorithms and show that the proposed approach can potentially enable an efficient solution in the future, both in a central and distributed setting.
    Abstract The use of distributed optimization in machine learning can be motivated either by the resulting preservation of privacy or the increase in computational efficiency. On the one hand, training data might be stored across multiple devices. Training a global model within a network where each node only has access to its confidential data requires the use of distributed algorithms. Even if the data is not confidential, sharing it might be prohibitive due to bandwidth limitations. On the other hand, the ever-increasing amount of available data leads to large-scale machine learning problems. By splitting the training process across multiple nodes its efficiency can be significantly increased. This paper aims to demonstrate how dual decomposition can be applied for distributed training of $ K $-means clustering problems. After an overview of distributed and federated machine learning, the mixed-integer quadratically constrained programming-based formulation of the $ K $-means clustering training problem is presented. The training can be performed in a distributed manner by splitting the data across different nodes and linking these nodes through consensus constraints. Finally, the performance of the subgradient method, the bundle trust method, and the quasi-Newton dual ascent algorithm are evaluated on a set of benchmark problems. While the mixed-integer programming-based formulation of the clustering problems suffers from weak integer relaxations, the presented approach can potentially be used to enable an efficient solution in the future, both in a central and distributed setting.
    摘要 使用分布式优化在机器学习中可以受到保持隐私和提高计算效率的两种动机。一个是训练数据可能会被存储在多个设备上,而每个节点只有访问自己的敏感数据时,需要使用分布式算法来训练全球模型。另一个是由于数据量的增加,导致大规模机器学习问题的出现。通过将训练过程分布到多个节点来加速其效率。这篇论文旨在演示如何使用分布式优化解决分布式训练 $ K $-means clustering问题。文章首先介绍分布式和联邦机器学习,然后提出基于杂Integer编程的$ K $-means clustering训练问题的混合形式。通过在不同节点上分布数据并通过共识约束相连接这些节点来进行分布式训练。最后,文章评估了在一组 benchmark 问题上的贪婪法、杂Integer法和 quasi-Newton dual ascent 算法的性能。虽然杂Integer编程基本形式的 clustering 问题受到弱的整数放宽,但是该方法可能可以在未来的中心和分布式设置中启用高效的解决方案。

Federated Split Learning with Only Positive Labels for resource-constrained IoT environment

  • paper_url: http://arxiv.org/abs/2307.13266
  • repo_url: None
  • paper_authors: Praveen Joshi, Chandra Thapa, Mohammed Hasanuzzaman, Ted Scully, Haithem Afli
  • for: 提高 IoT 设备数据隐私和提高模型训练效率
  • methods: 使用分布式合作机器学习(DCML)和分布式分裂学习(SFL)技术,并在客户端模型部分应用本地批处理和批处理随机混淆
  • results: SFPL 比 SFL 提高了模型训练效率和预测精度,具体达到了以下因素:(i)CIFAR-100 数据集上,SFPL 比 SFL 提高了 ResNet-56 和 ResNet-32 模型的训练效率,分别提高了51.54和32.57倍;(ii)CIFAR-10 数据集上,SFPL 比 SFL 提高了 ResNet-32 和 ResNet-8 模型的训练效率,分别提高了9.23和8.52倍。
    Abstract Distributed collaborative machine learning (DCML) is a promising method in the Internet of Things (IoT) domain for training deep learning models, as data is distributed across multiple devices. A key advantage of this approach is that it improves data privacy by removing the necessity for the centralized aggregation of raw data but also empowers IoT devices with low computational power. Among various techniques in a DCML framework, federated split learning, known as splitfed learning (SFL), is the most suitable for efficient training and testing when devices have limited computational capabilities. Nevertheless, when resource-constrained IoT devices have only positive labeled data, multiclass classification deep learning models in SFL fail to converge or provide suboptimal results. To overcome these challenges, we propose splitfed learning with positive labels (SFPL). SFPL applies a random shuffling function to the smashed data received from clients before supplying it to the server for model training. Additionally, SFPL incorporates the local batch normalization for the client-side model portion during the inference phase. Our results demonstrate that SFPL outperforms SFL: (i) by factors of 51.54 and 32.57 for ResNet-56 and ResNet-32, respectively, with the CIFAR-100 dataset, and (ii) by factors of 9.23 and 8.52 for ResNet-32 and ResNet-8, respectively, with CIFAR-10 dataset. Overall, this investigation underscores the efficacy of the proposed SFPL framework in DCML.
    摘要 分布式协同机器学习(DCML)是互联网物联网(IoT)领域的一种有前途的方法,用于训练深度学习模型,因为数据分布在多个设备上。DCML的一个优点是改善数据隐私,不需要将原始数据集中化,同时也使 IoT 设备具备较低的计算能力。在 DCML 框架中,联邦分割学习(SFL)是最适合高效地训练和测试,当设备有限的计算能力时。然而,当资源有限的 IoT 设备只有正例数据时,SFL 中的多类分类深度学习模型会无法 converge 或提供低优的结果。为了解决这些挑战,我们提议使用分布式学习Positive Label(SFPL)。SFPL 使用客户端接收到的数据进行随机混淆函数处理,然后将其提供给服务器进行模型训练。此外,SFPL 还在推理阶段在客户端上实现本地批处理标准化。我们的结果表明,SFPL 在 CIFAR-100 和 CIFAR-10 数据集上分别比 SFL 提高了51.54 和 32.57 倍,并且在 CIFAR-10 数据集上比 SFL 提高了9.23 和 8.52 倍。总的来说,这种研究证明了我们提议的 SFPL 框架在 DCML 中的效果。

Structural Credit Assignment with Coordinated Exploration

  • paper_url: http://arxiv.org/abs/2307.13256
  • repo_url: None
  • paper_authors: Stephen Chung
    for: 这个论文旨在提出一种生物学上可能性的人工神经网络(ANN)训练方法,该方法是将每个单元视为随机奖励学习(RL)代理,从而将网络视为一群代理。methods: 该方法使用REINFORCE本地学习规则,该规则由全局奖励信号修饰,与生物观察到的 synaptic plasticity更加一致。然而,这种学习方法的启用缓慢,并且与网络大小不相关。这种缓慢的原因是:(i)所有单元独立探索网络,(ii)所有单元都使用同一个奖励来评估其行动。因此,可以分为两类方法来改进结构准确评估。results: 我们提出使用博尔ツ曼机或回卷网络进行协调探索。我们发现,在训练博尔ツ曼机时,可以消除负阶段,其学习规则与奖励调整 Hebbian 学习规则相似。实验结果表明,协调探索在多个随机和离散单元基于 REINFORCE 训练速度上明显超过独立探索,甚至超过 straight-through estimator(STE)反propagation。
    Abstract A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. However, this learning method tends to be slow and does not scale well with the size of the network. This inefficiency arises from two factors impeding effective structural credit assignment: (i) all units independently explore the network, and (ii) a single reward is used to evaluate the actions of all units. Accordingly, methods aimed at improving structural credit assignment can generally be classified into two categories. The first category includes algorithms that enable coordinated exploration among units, such as MAP propagation. The second category encompasses algorithms that compute a more specific reward signal for each unit within the network, like Weight Maximization and its variants. In this research report, our focus is on the first category. We propose the use of Boltzmann machines or a recurrent network for coordinated exploration. We show that the negative phase, which is typically necessary to train Boltzmann machines, can be removed. The resulting learning rules are similar to the reward-modulated Hebbian learning rule. Experimental results demonstrate that coordinated exploration significantly exceeds independent exploration in training speed for multiple stochastic and discrete units based on REINFORCE, even surpassing straight-through estimator (STE) backpropagation.
    摘要 生物学上有效的人工神经网络(ANN)训练方法是将每个单元视为随机奖励学习(RL)代理,从而考虑整个网络为一支代理队伍。这样做的优点是每个单元都可以通过REINFORCE本地学习规则和全局奖励信号来学习,这更加符合生物观察到的神经元强化突变。然而,这种学习方法具有两个缺点,导致效率低下:(1)所有单元独立探索网络,(2)网络中的所有单元都接受同一个奖励。这两点导致了结构准确评价的障碍。为了改进结构准确评价,一般可以分为两类方法:第一类是使单元之间协同探索的算法,如MAP协同传播;第二类是计算网络中每个单元的更加准确的奖励信号,如质量最大化和其变种。本研究报告的焦点是第一类方法。我们提议使用博尔ツ曼机或回归网络进行协同探索。我们发现,通常需要训练博尔ツ曼机的负阶段可以除去。结果的学习规则类似于奖励调节的希质bean学习规则。实验结果表明,协同探索在多个随机离散单元基于REINFORCE训练速度上明显高于独立探索,甚至超过STE归整梯度归整。

RoSAS: Deep Semi-Supervised Anomaly Detection with Contamination-Resilient Continuous Supervision

  • paper_url: http://arxiv.org/abs/2307.13239
  • repo_url: https://github.com/xuhongzuo/rosas
  • paper_authors: Hongzuo Xu, Yijie Wang, Guansong Pang, Songlei Jian, Ning Liu, Yongjun Wang
  • for: 这篇论文的目的是提出一种新的半指导型异常检测方法,以提高异常检测的性能。
  • methods: 这篇论文使用了一种新的混合梯度法,将涉猎到异常的标签资料与正常标签资料混合在一起,从而创建了新的标签资料集。同时,这篇论文还使用了一个特别的目标函数来规范网络,以提高网络的Robustness。
  • results: 这篇论文的实验结果显示,该方法可以在11个真实世界数据集上取得20%-30%的提升,并且在不同的异常污染水平和不同数量的标签异常下展现出更加稳定和更好的性能。
    Abstract Semi-supervised anomaly detection methods leverage a few anomaly examples to yield drastically improved performance compared to unsupervised models. However, they still suffer from two limitations: 1) unlabeled anomalies (i.e., anomaly contamination) may mislead the learning process when all the unlabeled data are employed as inliers for model training; 2) only discrete supervision information (such as binary or ordinal data labels) is exploited, which leads to suboptimal learning of anomaly scores that essentially take on a continuous distribution. Therefore, this paper proposes a novel semi-supervised anomaly detection method, which devises \textit{contamination-resilient continuous supervisory signals}. Specifically, we propose a mass interpolation method to diffuse the abnormality of labeled anomalies, thereby creating new data samples labeled with continuous abnormal degrees. Meanwhile, the contaminated area can be covered by new data samples generated via combinations of data with correct labels. A feature learning-based objective is added to serve as an optimization constraint to regularize the network and further enhance the robustness w.r.t. anomaly contamination. Extensive experiments on 11 real-world datasets show that our approach significantly outperforms state-of-the-art competitors by 20%-30% in AUC-PR and obtains more robust and superior performance in settings with different anomaly contamination levels and varying numbers of labeled anomalies. The source code is available at https://github.com/xuhongzuo/rosas/.
    摘要 semi-supervised异常检测方法可以借鉴一些异常示例,以实现与不supervised模型相比的显著改善。然而,它们仍然受到两个限制:1)无标签异常(即异常杂化)可能会导致学习过程中的混乱,当所有无标签数据作为模型训练中的内liers使用时;2)只是利用简单的数据标签(如二分或ORDinal数据标签),导致异常分数的学习变得不优化。因此,本文提出了一种新的 semi-supervised异常检测方法,即使用“杂化防御”的continuous supervisory signals。具体来说,我们提出了一种杂化 interpolating方法,以帮助异常标注样本的异常程度进行灵活的销毁,并创建了新的数据样本,其中每个样本都有连续的异常度标签。此外,杂化区域可以通过组合正确标注的数据来覆盖。我们还添加了一个特征学习基于的目标函数,以便为杂化异常进行更好的规范化和强化。我们在11个真实世界数据集上进行了广泛的实验,结果显示,我们的方法在AUC-PR方面与当前竞争对手相比,提高了20%-30%,并且在不同的异常杂化水平和异常标注数量的情况下具有更加稳定和优秀的性能。代码可以在https://github.com/xuhongzuo/rosas/ obtain。

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

  • paper_url: http://arxiv.org/abs/2307.13236
  • repo_url: None
  • paper_authors: Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya Zhang
  • for: Audio-visual segmentation (AVS) task, 用于将声音cue integrate into视频帧中的物体 segmentation
  • methods: 提出了一种新的AUDIO-aware query-enhanced TRANSFORMER(AuTR)方法,通过多模态变换架构和声音注意力机制来深度融合和聚合声音-视频特征
  • results: 比前方法更高的性能和更好的泛化能力在多声音和开放集成enario中
    Abstract The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues. However, current fusion-based methods have the performance limitations due to the small receptive field of convolution and inadequate fusion of audio-visual features. To overcome these issues, we propose a novel \textbf{Au}dio-aware query-enhanced \textbf{TR}ansformer (AuTR) to tackle the task. Unlike existing methods, our approach introduces a multimodal transformer architecture that enables deep fusion and aggregation of audio-visual features. Furthermore, we devise an audio-aware query-enhanced transformer decoder that explicitly helps the model focus on the segmentation of the pinpointed sounding objects based on audio signals, while disregarding silent yet salient objects. Experimental results show that our method outperforms previous methods and demonstrates better generalization ability in multi-sound and open-set scenarios.
    摘要 文本:目标是使用音频cue将视频帧中的响应物分 segment。然而,现有的混合方法受限于小感知范围和不足的音频视频特征混合。为解决这些问题,我们提出一种新的听音感知Query加强的 transformer(AuTR)来解决这个任务。与现有方法不同,我们的方法 introduce一种多模态 transformer架构,该架构允许深度融合和音频视频特征的总结。此外,我们设计了一种听音感知Query加强的 transformer解码器,该解码器会在音频信号上显式地帮助模型将注意力集中在音频cue上,而忽略沉默却重要的物体。实验结果表明,我们的方法在多音和开放集成enario中表现出色,并且比前方法更好地普适化。Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

Spectral-DP: Differentially Private Deep Learning through Spectral Perturbation and Filtering

  • paper_url: http://arxiv.org/abs/2307.13231
  • repo_url: None
  • paper_authors: Ce Feng, Nuo Xu, Wujie Wen, Parv Venkitasubramaniam, Caiwen Ding
  • For: This paper proposes a new approach to differentially private deep learning called Spectral-DP, which improves upon existing methods by achieving a desired privacy guarantee with a lower noise scale and thus better utility.* Methods: The paper uses a combination of gradient perturbation in the spectral domain and spectral filtering to achieve differential privacy, and develops methods for both convolutional and fully connected layers.* Results: The paper shows through comprehensive experiments that Spectral-DP has uniformly better utility performance compared to state-of-the-art DP-SGD based approaches, both in training from scratch and transfer learning settings.Here’s the simplified Chinese text:
  • for: 这篇论文提出了一种新的扩展privacy深度学习方法,叫做Spectral-DP,它可以在保持隐私性的同时提高实用性。
  • methods: 该论文使用了spectral domain中的梯度偏移和spectral filtering来实现扩展privacy,并为 convolutional和fully connected层都开发了方法。
  • results: 经过了广泛的实验,论文显示Spectral-DP在训练从头开始和传输学习设置下都有uniformly更好的实用性表现,比采用现有的DP-SGD基于方法更好。
    Abstract Differential privacy is a widely accepted measure of privacy in the context of deep learning algorithms, and achieving it relies on a noisy training approach known as differentially private stochastic gradient descent (DP-SGD). DP-SGD requires direct noise addition to every gradient in a dense neural network, the privacy is achieved at a significant utility cost. In this work, we present Spectral-DP, a new differentially private learning approach which combines gradient perturbation in the spectral domain with spectral filtering to achieve a desired privacy guarantee with a lower noise scale and thus better utility. We develop differentially private deep learning methods based on Spectral-DP for architectures that contain both convolution and fully connected layers. In particular, for fully connected layers, we combine a block-circulant based spatial restructuring with Spectral-DP to achieve better utility. Through comprehensive experiments, we study and provide guidelines to implement Spectral-DP deep learning on benchmark datasets. In comparison with state-of-the-art DP-SGD based approaches, Spectral-DP is shown to have uniformly better utility performance in both training from scratch and transfer learning settings.
    摘要 diffeential privacy 是深度学习算法中广泛接受的隐私标准,实现 diffeential privacy 需要使用带有直接噪声的 dense neural network 的启发式梯度下降(DP-SGD)。DP-SGD 需要在每个梯度上添加直接噪声,以实现隐私,但是这会导致较高的额外成本。在这项工作中,我们介绍 Spectral-DP,一种新的启发式隐私学习方法,它在spectral domain中添加梯度扰动,并通过spectral filtering来实现隐私保证,并且具有较低的噪声级别和更好的实用性。我们开发了基于 Spectral-DP 的启发式深度学习方法,包括具有 convolution 和 fully connected 层的架构。尤其是在 fully connected 层上,我们将 block-circulant 基于的空间重构与 Spectral-DP 结合使用,以实现更好的实用性。通过广泛的实验,我们研究了 Spectral-DP 深度学习的实现方法,并提供了实现指南。与state-of-the-art DP-SGD 基于方法相比,Spectral-DP 在训练从头开始和转移学习设置下具有更好的实用性表现。

A Primer on the Data Cleaning Pipeline

  • paper_url: http://arxiv.org/abs/2307.13219
  • repo_url: None
  • paper_authors: Rebecca C. Steorts
  • for: 这篇论文主要是为了介绍数据整理管道(data cleaning pipeline)的科学,以便在下游任务、预测分析或统计分析中使用“净化数据”。
  • methods: 论文介绍了数据整理管道的四个阶段,包括数据预处理、数据整理、数据检查和数据净化。
  • results: 论文介绍了一些常用的数据整理方法和技术,以及在实际应用中的效果。
    Abstract The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this expansion, the statistical and methodological questions around data integration, or rather merging multiple data sources, has also grown. Specifically, the science of the ``data cleaning pipeline'' contains four stages that allow an analyst to perform downstream tasks, predictive analyses, or statistical analyses on ``cleaned data.'' This article provides a review of this emerging field, introducing technical terminology and commonly used methods.
    摘要 “过去一代,数据库的可用性,包括电子健康数据、社交媒体数据、专利数据和调查等,在快速增长。这种增长也导致了数据集成问题的统计和方法问题的增长。特别是“数据清洁管道”科学中的四个阶段,允许分析员在“净化数据”后进行下游任务、预测分析或统计分析。本文将介绍这个新兴领域,并介绍技术术语和常用方法。”Note: "数据清洁管道" (data cleaning pipeline) is a term used to describe the process of preparing data for analysis, including cleaning, transforming, and integrating data from multiple sources.

FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

  • paper_url: http://arxiv.org/abs/2307.13214
  • repo_url: None
  • paper_authors: Huy Q. Le, Minh N. H. Nguyen, Chu Myaet Thwal, Yu Qiao, Chaoning Zhang, Choong Seon Hong
    for:这个研究旨在提出一个基于联邦学习的多 modal 机器学习架构,以便在多个客户端上实现共同训练一个通用的全球模型,而无需分享私人数据。methods:这个架构使用了一种 semi-supervised 学习方法,以利用不同modalities的表现。它还包括一个发散基于多 modal 嵌入知识转移机制,名为 FedMEKT,可以将服务器和客户端的学习模型中的共同知识转移到参与的客户端上。results:这个研究透过三个多 modal 人类活动识别数据集进行广泛的实验,展示了 FedMEKT 可以实现更好的全球嵌入器性能,并且保护使用者的隐私和模型参数,并且需要较少的通信成本。
    Abstract Federated learning (FL) enables a decentralized machine learning paradigm for multiple clients to collaboratively train a generalized global model without sharing their private data. Most existing works simply propose typical FL systems for single-modal data, thus limiting its potential on exploiting valuable multimodal data for future personalized applications. Furthermore, the majority of FL approaches still rely on the labeled data at the client side, which is limited in real-world applications due to the inability of self-annotation from users. In light of these limitations, we propose a novel multimodal FL framework that employs a semi-supervised learning approach to leverage the representations from different modalities. Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset. Our FedMEKT iteratively updates the generalized global encoders with the joint embedding knowledge from the participating clients. Thereby, to address the modality discrepancy and labeled data constraint in existing FL systems, our proposed FedMEKT comprises local multimodal autoencoder learning, generalized multimodal autoencoder construction, and generalized classifier learning. Through extensive experiments on three multimodal human activity recognition datasets, we demonstrate that FedMEKT achieves superior global encoder performance on linear evaluation and guarantees user privacy for personal data and model parameters while demanding less communication cost than other baselines.
    摘要 联邦学习(FL)可以实现分布式机器学习的分布式机器学习模型,让多个客户端共同训练一个通用全球模型,而不需要分享私人数据。现有大多数工作都只是提议典型的FL系统,因此限制了其在多模态数据上的潜在应用。另外,大多数FL方法仍然依赖客户端上的标注数据,这在实际应用中是有限的,由于用户无法进行自我标注。为了解决这些限制,我们提出了一种新的多模态FL框架,该框架使用半指导学习方法,以利用不同模式的表示。将这个概念引入系统,我们开发了一种基于储革的多模态嵌入知识传递机制,即FedMEKT,该机制让服务器和客户端可以交换归一化的知识。我们的FedMEKT通过迭代更新通用全球编码器,以获取参与客户端的共同归一化知识。因此,我们的提议的FedMEKT包括本地多模态自动编码学习、通用多模态自动编码器建构和通用分类学习。通过对三个多模态人动识别数据集进行广泛的实验,我们证明了FedMEKT可以在线评估中 достичь更高的全球编码器性能,保护用户隐私和个人数据,同时减少通信成本。

Transferability of Graph Neural Networks using Graphon and Sampling Theories

  • paper_url: http://arxiv.org/abs/2307.13206
  • repo_url: None
  • paper_authors: A. Martina Neuman, Jason J. Bramburger
  • for: 本研究旨在应用graphon来提高graph neural network(GNN)的可转移性。
  • methods: 本研究使用了two-layer graphon neural network(WNN)架构,并证明了其能够高效地近似带限信号。
  • results: 研究表明,使用WNN架构可以在不同图形式的数据上保持高度的表现,而且可以在不同图大小之间进行可转移学习。
    Abstract Graph neural networks (GNNs) have become powerful tools for processing graph-based information in various domains. A desirable property of GNNs is transferability, where a trained network can swap in information from a different graph without retraining and retain its accuracy. A recent method of capturing transferability of GNNs is through the use of graphons, which are symmetric, measurable functions representing the limit of large dense graphs. In this work, we contribute to the application of graphons to GNNs by presenting an explicit two-layer graphon neural network (WNN) architecture. We prove its ability to approximate bandlimited signals within a specified error tolerance using a minimal number of network weights. We then leverage this result, to establish the transferability of an explicit two-layer GNN over all sufficiently large graphs in a sequence converging to a graphon. Our work addresses transferability between both deterministic weighted graphs and simple random graphs and overcomes issues related to the curse of dimensionality that arise in other GNN results. The proposed WNN and GNN architectures offer practical solutions for handling graph data of varying sizes while maintaining performance guarantees without extensive retraining.
    摘要 граф neural networks (GNNs) 已成为处理图形信息的有力工具。一个愿望的特性是转移性,其中训练好的网络可以将图形信息交换到另一个图形上,而不需要重新训练,并保持准确性。一种最近提出的捕捉GNNs的转移性的方法是通过图拟函数(graphons),它们是对大量紧凑图的极限函数。在这个工作中,我们对GNNs中的图拟函数应用了Explicit two-layer graphon neural network(WNN)架构。我们证明了它可以在给定的误差范围内近似离散信号,使用最小的网络重量。然后,我们利用这个结论,以证明Explicit two-layer GNN在所有足够大的图上的转移性。我们的工作解决了对权重图和简单随机图之间的转移性问题,并超越了其他GNN结果中的尺度繁殖问题。提出的WNN和GNN架构为处理图数据的不同大小而提供了实用的解决方案,无需进行广泛的重新训练。

Federated Distributionally Robust Optimization with Non-Convex Objectives: Algorithm and Analysis

  • paper_url: http://arxiv.org/abs/2307.14364
  • repo_url: None
  • paper_authors: Yang Jiao, Kai Yang, Dongjin Song
  • For: + The paper aims to solve the federated distributionally robust optimization (FDRO) problem, which is to find an optimal decision that minimizes the worst-case cost over the ambiguity set of probability distributions in a distributed environment.* Methods: + The proposed algorithm is called Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE). + The algorithm leverages the prior distribution using a new uncertainty set called constrained D-norm uncertainty set.* Results: + The proposed algorithm is guaranteed to converge and the iteration complexity is analyzed. + Extensive empirical studies on real-world datasets demonstrate that the proposed method can achieve fast convergence, remain robust against data heterogeneity and malicious attacks, and trade off robustness with performance.
    Abstract Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been widely applied in diverse applications, e.g., network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to different scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the federated distributionally robust optimization (FDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.
    摘要 Distributionally Robust Optimization (DRO) 目标是找到最佳决策,以最小化不确定性集中的最差成本。这种技术在多个应用中广泛使用,如网络行为分析、风险管理等。然而,现有的 DRO 技术面临三大挑战:1)如何在分布式环境中 asynchronous 更新; 2)如何有效地利用先前分布; 3)如何适当调整不确定性中的度量。为此,我们提出了一个异步分布式算法,名为异步单脚拟合gradient projection(ASPIRE)算法,并与itErative Active SEt方法(EASE)结合以解决联邦分布式不确定优化(FDRO)问题。此外,我们还开发了一个新的不确定集,即受限的 D-norm 不确定集,以有效地利用先前分布并flexibly控制不确定性度量。最后,我们的理论分析表明,提案的算法可以保证收敛,并且迭代复杂性也进行了分析。实际实验表明,提案的方法可以不仅快速收敛,同时也能够对数据不一致和恶意攻击具有抗性。

An Investigation into Glomeruli Detection in Kidney H&E and PAS Images using YOLO

  • paper_url: http://arxiv.org/abs/2307.13199
  • repo_url: https://github.com/AlexeyAB/darknet
  • paper_authors: Kimia Hemmatirad, Morteza Babaie, Jeffrey Hodgin, Liron Pantanowitz, H. R. Tizhoosh
    for: This paper aims to assist pathologists in detecting glomeruli in human kidney images using computerized solutions, specifically by proposing an automated tissue structure detection and segmentation method using the YOLO-v4 object detector.methods: The YOLO-v4 model was trained on whole slide images and fine-tuned on a private dataset from the University of Michigan for glomeruli detection. Multiple experiments were conducted using different training data and stains.results: The model achieved average specificity and sensitivity for all experiments and outperformed existing segmentation methods on the same datasets. However, the design and validation for different stains still depend on the variability of public multi-stain datasets.
    Abstract Context: Analyzing digital pathology images is necessary to draw diagnostic conclusions by investigating tissue patterns and cellular morphology. However, manual evaluation can be time-consuming, expensive, and prone to inter- and intra-observer variability. Objective: To assist pathologists using computerized solutions, automated tissue structure detection and segmentation must be proposed. Furthermore, generating pixel-level object annotations for histopathology images is expensive and time-consuming. As a result, detection models with bounding box labels may be a feasible solution. Design: This paper studies. YOLO-v4 (You-Only-Look-Once), a real-time object detector for microscopic images. YOLO uses a single neural network to predict several bounding boxes and class probabilities for objects of interest. YOLO can enhance detection performance by training on whole slide images. YOLO-v4 has been used in this paper. for glomeruli detection in human kidney images. Multiple experiments have been designed and conducted based on different training data of two public datasets and a private dataset from the University of Michigan for fine-tuning the model. The model was tested on the private dataset from the University of Michigan, serving as an external validation of two different stains, namely hematoxylin and eosin (H&E) and periodic acid-Schiff (PAS). Results: Average specificity and sensitivity for all experiments, and comparison of existing segmentation methods on the same datasets are discussed. Conclusions: Automated glomeruli detection in human kidney images is possible using modern AI models. The design and validation for different stains still depends on variability of public multi-stain datasets.
    摘要 Context: 分析数字 PATHOLOGY 图像是必要的,以便从investigating 组织模式和细胞形态中提取诊断结论。然而,手动评估可能会 consume 时间和成本,并且存在 между观察者和内部观察者的差异。目标:使用计算机化解决方案,自动检测和分割组织结构。此外,为 Histopathology 图像生成像素级对象标注是昂贵和时间consuming。因此,使用 bounding box 标签的检测模型可能是一个可行的解决方案。Design:本文研究了 YOLO-v4(You-Only-Look-Once),一种实时物体检测器,用于微scopic 图像。YOLO 使用单个神经网络预测多个 bounding box 和对象类概率。YOLO 可以通过训练整个扫描图像来提高检测性能。本文使用 YOLO-v4 进行了 glomeruli 检测在人脏肾图像中。基于两个公共数据集和一个由美国密歇根大学提供的私人数据集进行了多个实验,并对模型进行了微调。模型在密歇根大学私人数据集上进行了测试,作为对两种染色物(HE和PAS)的EXternal 验证。Results:本文的结果显示,使用 YOLO-v4 可以自动检测人脏肾中的 glomeruli。average 特异性和敏感性的讨论,以及现有的分 segmentation 方法在同一个数据集上的比较。Conclusions:使用现代 AI 模型,自动检测人脏肾中的 glomeruli 是可能的。然而,设计和验证不同染色物的方法仍然виси于多个公共多染色物数据集的变化。

Knowledge-enhanced Neuro-Symbolic AI for Cybersecurity and Privacy

  • paper_url: http://arxiv.org/abs/2308.02031
  • repo_url: None
  • paper_authors: Aritran Piplai, Anantaa Kotal, Seyedreza Mohseni, Manas Gaur, Sudip Mittal, Anupam Joshi
  • for: This paper is written to explore the potential of Neuro-Symbolic Artificial Intelligence (AI) in addressing the challenges of explainability and safety in AI systems, particularly in the domains of cybersecurity and privacy.
  • methods: The paper uses a combination of neural networks and symbolic knowledge graphs to integrate the strengths of both approaches, enabling AI systems to reason, learn, and generalize in a manner understandable to experts.
  • results: The paper demonstrates the potential of Neuro-Symbolic AI to improve the explainability and safety of AI systems in complex environments, specifically in the domains of cybersecurity and privacy.
    Abstract Neuro-Symbolic Artificial Intelligence (AI) is an emerging and quickly advancing field that combines the subsymbolic strengths of (deep) neural networks and explicit, symbolic knowledge contained in knowledge graphs to enhance explainability and safety in AI systems. This approach addresses a key criticism of current generation systems, namely their inability to generate human-understandable explanations for their outcomes and ensure safe behaviors, especially in scenarios with \textit{unknown unknowns} (e.g. cybersecurity, privacy). The integration of neural networks, which excel at exploring complex data spaces, and symbolic knowledge graphs, which represent domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts. This article describes how applications in cybersecurity and privacy, two most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from Neuro-Symbolic AI.
    摘要 neuromorphic artificial intelligence (AI) 是一个emerging 和rapidly advancing field,它将SUBSYMBOLIC strengths of (deep) neural networks 和Explicit, symbolic knowledge contained in knowledge graphs 搭配,以提高AI系统的可解释性和安全性。这种方法解决了当前一代系统的一个批评,即它们无法生成人理解的解释,特别是在“Unknown unknowns”(例如Cybersecurity, privacy)的场景中。 combining neural networks, which excel at exploring complex data spaces, and symbolic knowledge graphs, which represent domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts。This article describes how applications in cybersecurity and privacy, two most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from Neuro-Symbolic AI。

Counterfactual Explanation Policies in RL

  • paper_url: http://arxiv.org/abs/2307.13192
  • repo_url: None
  • paper_authors: Shripad V. Deshmukh, Srivatsan R, Supriti Vijay, Jayakumar Subramanian, Chirag Agarwal
  • for: 这个论文旨在探讨如何使用对假设的解释来分析RL策略,以便更好地理解RL策略的含义。
  • methods: 该论文提出了一种新的Counterpol方法,通过在RL中 incorporating counterfactuals in supervised learning,以生成对策略的解释。
  • results: 实验结果表明,Counterpol可以帮助分析RL策略,并且可以在不同的状态和动作空间中工作。
    Abstract As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in mapping observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL. Extensive empirical analysis shows the efficacy of COUNTERPOL in generating explanations for (un)learning skills while keeping close to the original policy. Our results on five different RL environments with diverse state and action spaces demonstrate the utility of counterfactual explanations, paving the way for new frontiers in designing and developing counterfactual policies.
    摘要 “在增强学习(Reinforcement Learning,RL)Agent increasingly 应用于多元决策问题中,使得Policy learned by these frameworks in mapping observations to a probability distribution of possible actions的可读性变得越来越重要。然而,有很少关于这些复杂的策略的系统性理解,即对策略的改进或恶化的最小变化。在这种情况下,我们提出了COUNTERPOL,第一个使用对假设的批评性解释来分析RL策略的框架。我们通过在RL中 incorporating counterfactuals in supervised learning with the target outcome regulated using desired return来实现这一点。我们还证明了COUNTERPOL和常用的信任区基于RL策略优化方法之间的理论联系。我们的实验结果表明,COUNTERPOL能够生成高效的解释,并保持与原始策略几乎相同。我们在五个不同的RL环境中进行了Extensive empirical analysis,这些环境包括多种状态和动作空间。我们的结果表明,对假设的批评性解释可以帮助RL Agent 学习和改进策略,开辟出新的前iers in designing and developing counterfactual policies。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.

Neural Memory Decoding with EEG Data and Representation Learning

  • paper_url: http://arxiv.org/abs/2307.13181
  • repo_url: None
  • paper_authors: Glenn Bruns, Michael Haidar, Federico Rubino
  • for: 本研究描述了一种使用神经网络快速解oding记忆的方法,用于从EEG数据中提取记忆。
  • methods: 该方法使用深度表示学习,并使用supervised contrastive损失来将EEG记录转换到低维度空间中。
  • results: 该方法可以在EEG数据中准确地识别概念,其准确率为About 78.4%( chance 4%)。此外,该方法还应用于信息检索问题,可以生成基于EEG数据的预测文档列表。
    Abstract We describe a method for the neural decoding of memory from EEG data. Using this method, a concept being recalled can be identified from an EEG trace with an average top-1 accuracy of about 78.4% (chance 4%). The method employs deep representation learning with supervised contrastive loss to map an EEG recording of brain activity to a low-dimensional space. Because representation learning is used, concepts can be identified even if they do not appear in the training data set. However, reference EEG data must exist for each such concept. We also show an application of the method to the problem of information retrieval. In neural information retrieval, EEG data is captured while a user recalls the contents of a document, and a list of links to predicted documents is produced.
    摘要 我们描述了一种使用神经网络对电enzephalogram(EEG)数据进行记忆解oding的方法。使用这种方法,在EEG追踪记录中检测到的概念可以与高达78.4%的准确率(Random 4%)进行匹配。该方法利用深度表示学习和监督对比损失来将EEG记录的脑动activty映射到低维度空间中。由于使用表示学习,即使概念没有出现在训练数据集中,也可以正确地识别出它们。然而,每个概念都需要相应的参照EEG数据。我们还展示了该方法的应用于信息检索问题。在神经信息检索中,EEG数据被记录在用户回忆文档内容时,并生成一个预测文档列表。

Evaluating the reliability of automatically generated pedestrian and bicycle crash surrogates

  • paper_url: http://arxiv.org/abs/2307.13178
  • repo_url: None
  • paper_authors: Agnimitra Sengupta, S. Ilgin Guler, Vikash V. Gayah, Shannon Warchol
  • For: This study aims to assess the reliability of automatically generated surrogates in predicting confirmed conflicts involving vulnerable road users (VRUs) at signalized intersections.* Methods: The study uses a video-based event monitoring system to collect data on VRU and motor vehicle interactions at 15 signalized intersections in Pennsylvania. Advanced data-driven models are used to analyze the surrogate data, including automatically collectable variables such as speeds, movements, and post-encroachment time, as well as manually collected variables like signal states, lighting, and weather conditions.* Results: The findings highlight the varying importance of specific surrogates in predicting true conflicts, with some being more informative than others. The results can assist transportation agencies in prioritizing infrastructure investments, such as bike lanes and crosswalks, and evaluating their effectiveness.
    Abstract Vulnerable road users (VRUs), such as pedestrians and bicyclists, are at a higher risk of being involved in crashes with motor vehicles, and crashes involving VRUs also are more likely to result in severe injuries or fatalities. Signalized intersections are a major safety concern for VRUs due to their complex and dynamic nature, highlighting the need to understand how these road users interact with motor vehicles and deploy evidence-based countermeasures to improve safety performance. Crashes involving VRUs are relatively infrequent, making it difficult to understand the underlying contributing factors. An alternative is to identify and use conflicts between VRUs and motorized vehicles as a surrogate for safety performance. Automatically detecting these conflicts using a video-based systems is a crucial step in developing smart infrastructure to enhance VRU safety. The Pennsylvania Department of Transportation conducted a study using video-based event monitoring system to assess VRU and motor vehicle interactions at fifteen signalized intersections across Pennsylvania to improve VRU safety performance. This research builds on that study to assess the reliability of automatically generated surrogates in predicting confirmed conflicts using advanced data-driven models. The surrogate data used for analysis include automatically collectable variables such as vehicular and VRU speeds, movements, post-encroachment time, in addition to manually collected variables like signal states, lighting, and weather conditions. The findings highlight the varying importance of specific surrogates in predicting true conflicts, some being more informative than others. The findings can assist transportation agencies to collect the right types of data to help prioritize infrastructure investments, such as bike lanes and crosswalks, and evaluate their effectiveness.
    摘要 易受损的路用者(VRU),如行人和自行车客,与机动车在交通冲突中更容易发生事故,事故也更可能导致严重的伤害或死亡。信号控制的交叉口是VRU安全的主要问题,因为它们的复杂性和动态性使得需要更好地了解路用者与机动车之间的互动,并采取基于证据的减少措施以提高安全性。与机动车相撞的事故 rarely occurs,这使得Difficult to understand the underlying contributing factors. An alternative is to identify and use conflicts between VRUs and motorized vehicles as a surrogate for safety performance. Using a video-based system to automatically detect these conflicts is a crucial step in developing smart infrastructure to enhance VRU safety.在美国宾夕法尼亚州交通部的一项研究中,使用视频基本监测系统评估了全州十五个信号控制交叉口中VRU和机动车之间的互动,以提高VRU安全性。本研究基于这项研究,以自动生成的代理数据来评估确定的冲突。代理数据包括自动收集的变量,如机动车和VRU的速度、移动、后续时间,以及手动收集的变量,如信号状态、灯光和天气情况。研究发现代理数据中的特定变量在预测真正的冲突中扮演着不同的重要性。这些发现可以帮助交通部门收集适当的数据,以便优先投入基础设施投资,如自行车道和横渡道,并评估其效果。

Unsupervised reconstruction of accelerated cardiac cine MRI using Neural Fields

  • paper_url: http://arxiv.org/abs/2307.14363
  • repo_url: None
  • paper_authors: Tabita Catalán, Matías Courdurier, Axel Osses, René Botnar, Francisco Sahli Costabal, Claudia Prieto
  • for: 这项研究的目的是提出一种无监督的深度学习方法来加速卡ди亚维度MRI的重建。
  • methods: 该方法基于启发型神经场表示,并在实验中使用金字塔型多晶维度收集器进行受损样本收集。
  • results: 实验结果表明,该方法可以在实验室中实现高质量的卡ди亚维度MRI重建,并且比传统方法更具有空间和时间表示能力。
    Abstract Cardiac cine MRI is the gold standard for cardiac functional assessment, but the inherently slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available. In this work, we propose an unsupervised approach based on implicit neural field representations for cardiac cine MRI (so called NF-cMRI). The proposed method was evaluated in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 26x and 52x, achieving good image quality, and comparable spatial and improved temporal depiction than a state-of-the-art reconstruction technique.
    摘要 卡ди亚金枪MRI是心脏功能评估的标准,但它的自然slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available. In this work, we propose an unsupervised approach based on implicit neural field representations for cardiac cine MRI (so called NF-cMRI). The proposed method was evaluated in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 26x and 52x, achieving good image quality, and comparable spatial and improved temporal depiction than a state-of-the-art reconstruction technique.Here's the translation in Traditional Chinese:卡迪亚金枪MRI是心脏功能评估的标准,但它的自然slow acquisition process creates the necessity of reconstruction approaches for accelerated undersampled acquisitions. Several regularization approaches that exploit spatial-temporal redundancy have been proposed to reconstruct undersampled cardiac cine MRI. More recently, methods based on supervised deep learning have been also proposed to further accelerate acquisition and reconstruction. However, these techniques rely on usually large dataset for training, which are not always available. In this work, we propose an unsupervised approach based on implicit neural field representations for cardiac cine MRI (so called NF-cMRI). The proposed method was evaluated in in-vivo undersampled golden-angle radial multi-coil acquisitions for undersampling factors of 26x and 52x, achieving good image quality, and comparable spatial and improved temporal depiction than a state-of-the-art reconstruction technique.

Multi-UAV Speed Control with Collision Avoidance and Handover-aware Cell Association: DRL with Action Branching

  • paper_url: http://arxiv.org/abs/2307.13158
  • repo_url: None
  • paper_authors: Zijiang Yan, Wael Jaafar, Bassant Selim, Hina Tabassum
  • for: 提高运输和通信性能,包括碰撞避免、连接稳定和交换。
  • methods: 使用深度强化学习解决多架空交通干道上UAV协调决策和移动速度优化问题,形式化为Markov决策过程(MDP),UAV状态定义为速度和通信数据率。提议一种神经网络架构,具有共享决策模块和多个网络分支,每个分支专门处理特定的行动维度在2D交通通信空间。
  • results: 通过 simulation 结果表明,与现有 refer bench 比较,该方法可以提高18.32%。
    Abstract This paper presents a deep reinforcement learning solution for optimizing multi-UAV cell-association decisions and their moving velocity on a 3D aerial highway. The objective is to enhance transportation and communication performance, including collision avoidance, connectivity, and handovers. The problem is formulated as a Markov decision process (MDP) with UAVs' states defined by velocities and communication data rates. We propose a neural architecture with a shared decision module and multiple network branches, each dedicated to a specific action dimension in a 2D transportation-communication space. This design efficiently handles the multi-dimensional action space, allowing independence for individual action dimensions. We introduce two models, Branching Dueling Q-Network (BDQ) and Branching Dueling Double Deep Q-Network (Dueling DDQN), to demonstrate the approach. Simulation results show a significant improvement of 18.32% compared to existing benchmarks.
    摘要 To address the multi-dimensional action space, the proposed solution features a neural architecture with a shared decision module and multiple network branches, each dedicated to a specific action dimension in a 2D transportation-communication space. This design enables independence for individual action dimensions.Two models, Branching Dueling Q-Network (BDQ) and Branching Dueling Double Deep Q-Network (Dueling DDQN), are introduced to demonstrate the approach. Simulation results show a significant improvement of 18.32% compared to existing benchmarks.translate to 简化中文 as follows:这篇论文提出了一种深度强化学习解决方案,用于优化多架航空器(UAV)的维度协调决策和移动速度在3D空中高速公路上。目标是提高交通和通信性能,包括避免冲突、连接和交换。问题被形式化为一个Markov决策过程(MDP),UAV的状态被定义为速度和通信数据速率。为了处理多维动作空间,提议的解决方案具有一个共享决策模块和多个网络分支,每个分支专门处理一个特定的动作维度在2D交通通信空间中。这种设计允许每个动作维度独立进行决策。提议的解决方案还 introduce了两种模型:分支决策网络(BDQ)和分支决策双深度网络(DDQN),以示解决方案。实验结果显示,与现有 referential 比较,提议的方案具有18.32%的显著提高。

Discovering interpretable elastoplasticity models via the neural polynomial method enabled symbolic regressions

  • paper_url: http://arxiv.org/abs/2307.13149
  • repo_url: None
  • paper_authors: Bahador Bahmani, Hyoung Suk Suh, WaiChing Sun
  • for: 这篇论文旨在提出一种两步机器学习方法,以帮助解释神经网络模型的含义。
  • methods: 这种方法首先使用超级vised学习获得单变量特征映射,然后使用符号回归将这些映射转化为数学模型。
  • results: 这种方法可以解决神经网络模型的解释性问题,同时提供了一些优点,如缩放问题的解决和代码的可重用性。
    Abstract Conventional neural network elastoplasticity models are often perceived as lacking interpretability. This paper introduces a two-step machine-learning approach that returns mathematical models interpretable by human experts. In particular, we introduce a surrogate model where yield surfaces are expressed in terms of a set of single-variable feature mappings obtained from supervised learning. A postprocessing step is then used to re-interpret the set of single-variable neural network mapping functions into mathematical form through symbolic regression. This divide-and-conquer approach provides several important advantages. First, it enables us to overcome the scaling issue of symbolic regression algorithms. From a practical perspective, it enhances the portability of learned models for partial differential equation solvers written in different programming languages. Finally, it enables us to have a concrete understanding of the attributes of the materials, such as convexity and symmetries of models, through automated derivations and reasoning. Numerical examples have been provided, along with an open-source code to enable third-party validation.
    摘要
  1. Scalability: The symbolic regression algorithms are able to handle large datasets.2. Portability: The learned models can be easily integrated into partial differential equation solvers written in different programming languages.3. Interpretability: The approach provides a concrete understanding of the attributes of the materials, such as convexity and symmetries of models, through automated derivations and reasoning.Numerical examples are provided, along with open-source code for third-party validation.

Learnable wavelet neural networks for cosmological inference

  • paper_url: http://arxiv.org/abs/2307.14362
  • repo_url: https://github.com/chris-pedersen/learnablewavelets
  • paper_authors: Christian Pedersen, Michael Eickenberg, Shirley Ho
  • for: cosmological inference and marginalisation over astrophysical effects
  • methods: 使用可学习散射转换,一种基于 convolutional neural network 的trainable wavelets滤波器
  • results: 比 CNN 更高效,特别是对小训练数据样本; 提供可读的干扰网络Here’s the breakdown of each point:1. for: The paper is written for the purpose of cosmological inference and marginalisation over astrophysical effects using the learnable scattering transform.2. methods: The paper uses the learnable scattering transform, which is a type of convolutional neural network that utilizes trainable wavelets as filters, to perform cosmological inference and marginalisation over astrophysical effects.3. results: The paper shows that scattering architectures can outperform a convolutional neural network (CNN) in terms of performance, especially when dealing with small training data samples. Additionally, the paper presents a lightweight scattering network that is highly interpretable.
    Abstract Convolutional neural networks (CNNs) have been shown to both extract more information than the traditional two-point statistics from cosmological fields, and marginalise over astrophysical effects extremely well. However, CNNs require large amounts of training data, which is potentially problematic in the domain of expensive cosmological simulations, and it is difficult to interpret the network. In this work we apply the learnable scattering transform, a kind of convolutional neural network that uses trainable wavelets as filters, to the problem of cosmological inference and marginalisation over astrophysical effects. We present two models based on the scattering transform, one constructed for performance, and one constructed for interpretability, and perform a comparison with a CNN. We find that scattering architectures are able to outperform a CNN, significantly in the case of small training data samples. Additionally we present a lightweight scattering network that is highly interpretable.
    摘要 卷积神经网络(CNN)能够提取更多的信息于 cosmological 场的传统两点统计,并且很好地把astrophysical效应卷积。但是,CNN需要大量的训练数据,这可能是costly cosmological simulations中的问题,而且很难 interpret the network。在这个工作中,我们使用可学习的散射变换,一种基于trainable wavelets的卷积神经网络,来解决 cosmological inference和astrophysical effects的卷积。我们提出了两种基于散射变换的模型,一种是 для性能,另一种是 для可读性,并与CNN进行比较。我们发现,散射架构能够超过CNN,特别是在小训练样本情况下。此外,我们还提出了一个轻量级的散射网络,具有很好的可读性。

Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation Framework

  • paper_url: http://arxiv.org/abs/2307.13147
  • repo_url: https://github.com/floriankrach/pd-njode
  • paper_authors: William Andersson, Jakob Heiss, Florian Krach, Josef Teichmann
  • for: 预测连续时间随机过程,具有不规则和部分观测。
  • methods: 使用Path-Dependent Neural Jump ODE(PD-NJ-ODE)模型,学习最佳预测给 irregularly sampled时间序列的受限观测。
  • results: 提供了两种扩展,使模型能够满足不独立的时间序列和均匀观测,并且提供了理论保证和实际示例。
    Abstract The Path-Dependent Neural Jump ODE (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them.
    摘要 “path-dependent neural jump ODE(PD-NJ-ODE)是一种预测连续时间随机过程的模型,特别是在含有异常和缺失观测的时间序列上。在现有的方法中,这种方法学习最佳预测, givens irregularly sampled time series of incomplete past observations。在这篇文章中,我们讨论了两种扩展,以增强这种模型的应用范围和提供理论保证。”Here's a breakdown of the translation:* "Path-Dependent Neural Jump ODE" (PD-NJ-ODE) is translated as "path-dependent neural jump ODE" (PD-NJ-ODE), which is a direct translation of the English name.* "in particular" is translated as "特别是" (tèbié shì), which is a common way to emphasize a specific aspect of the previous statement.* "the method learns optimal forecasts given irregularly sampled time series of incomplete past observations" is translated as "这种方法学习最佳预测, givens irregularly sampled time series of incomplete past observations" (zhè zhǒng fāng yào xué xí zhì yì, gěi yìn zhèng yè qián zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "givens" to clarify the context.* "So far the process itself and the coordinate-wise observation times were assumed to be independent" is translated as "在现有的方法中,这种过程自身和坐标点wise的观测时间被认为是独立的" (zhè zhǒng fāng yào xiàng yì, zhè zhǒng fāng yào xiàng yì zhèng yè qián zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word " coordinate-wise" to clarify the context.* "and observations were assumed to be noiseless" is translated as "并且观测被认为是噪声的" (bèi qǐ zhèng yì, gòu yì zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "噪声" (noise) to clarify the context.* "In this work we discuss two extensions to lift these restrictions" is translated as "在这篇文章中,我们讨论了两种扩展,以增强这种模型的应用范围" (zhè zhǒng fāng yào xiàng yì, wǒmen tiǎo yì zhèng zhèng yīn zhèng yè qián zhèng yǐ jīn zhèng yè qián zhèng). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "扩展" (extension) to clarify the context.* "and provide theoretical guarantees as well as empirical examples for them" is translated as "并提供理论保证,以及实证示例" (bèi tīng yì zhèng, yǐng qǐ zhèng yì). This translation maintains the structure of the original sentence, but uses more formal language and adds the word "理论保证" (theoretical guarantees) to clarify the context.

Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?

  • paper_url: http://arxiv.org/abs/2307.13136
  • repo_url: None
  • paper_authors: Megan Richards, Polina Kirichenko, Diane Bouchacourt, Mark Ibrahim
    for:* 这种研究旨在测试对象识别模型在不同地区的普遍性。methods:* 使用了两个 datasets of objects from households across the globe,并进行了大量的实验研究。results:* 发现标准 benchmarks 不能准确度量实际世界中的普遍性,而实际的地区差异导致模型的性能差异较大。* 发现规模alone 不能 garantate 实际世界中的一致性,而在早期实验中,简单地 retrained 最后一层的数据可以减少地区差异。
    Abstract For more than a decade, researchers have measured progress in object recognition on ImageNet-based generalization benchmarks such as ImageNet-A, -C, and -R. Recent advances in foundation models, trained on orders of magnitude more data, have begun to saturate these standard benchmarks, but remain brittle in practice. This suggests standard benchmarks, which tend to focus on predefined or synthetic changes, may not be sufficient for measuring real world generalization. Consequently, we propose studying generalization across geography as a more realistic measure of progress using two datasets of objects from households across the globe. We conduct an extensive empirical evaluation of progress across nearly 100 vision models up to most recent foundation models. We first identify a progress gap between standard benchmarks and real-world, geographical shifts: progress on ImageNet results in up to 2.5x more progress on standard generalization benchmarks than real-world distribution shifts. Second, we study model generalization across geographies by measuring the disparities in performance across regions, a more fine-grained measure of real world generalization. We observe all models have large geographic disparities, even foundation CLIP models, with differences of 7-20% in accuracy between regions. Counter to modern intuition, we discover progress on standard benchmarks fails to improve geographic disparities and often exacerbates them: geographic disparities between the least performant models and today's best models have more than tripled. Our results suggest scaling alone is insufficient for consistent robustness to real-world distribution shifts. Finally, we highlight in early experiments how simple last layer retraining on more representative, curated data can complement scaling as a promising direction of future work, reducing geographic disparity on both benchmarks by over two-thirds.
    摘要 We conducted an extensive empirical evaluation of progress across nearly 100 vision models, including the most recent foundation models. Our results reveal a significant gap between progress on ImageNet and real-world, geographical shifts. Specifically, we found that progress on ImageNet results in up to 2.5 times more progress on standard generalization benchmarks than real-world distribution shifts.Furthermore, we studied model generalization across geographies by measuring the disparities in performance across regions, providing a more fine-grained measure of real-world generalization. Our findings show that all models, including foundation CLIP models, exhibit large geographic disparities, with differences of 7-20% in accuracy between regions. Surprisingly, we found that progress on standard benchmarks does not improve geographic disparities and often exacerbates them. In fact, the geographic disparities between the least performant models and today's best models have more than tripled.Our results suggest that scaling alone is insufficient for achieving consistent robustness to real-world distribution shifts. However, we discovered that simple last layer retraining on more representative, curated data can complement scaling as a promising direction of future work. By doing so, we were able to reduce geographic disparity on both benchmarks by over two-thirds. Our findings highlight the importance of considering real-world distribution shifts when evaluating progress in object recognition and provide a new direction for future research.

simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects

  • paper_url: http://arxiv.org/abs/2307.13133
  • repo_url: None
  • paper_authors: Maria Bauza, Antonia Bronars, Yifan Hou, Ian Taylor, Nikhil Chavan-Dafle, Alberto Rodriguez
  • for: 这篇论文旨在解决 robotic manipulation 中的一般性和精度之间的矛盾。
  • methods: 该论文提出了一种叫做 simPLE 的解决方案,用于精度的 pick-and-place 任务。 simPLE 包括三个主要组成部分:任务意识 grasping,视听感知和重新抓取规划。
  • results: 在一个 dual-arm 机器人上,使用 simPLE 可以成功地完成 15 种不同的物体的 pick-and-place 任务,其中 6 种物体的成功率高于 90%,而剩下 11 种物体的成功率高于 80%。视频可以在 http://mcube.mit.edu/research/simPLE.html 上查看。
    Abstract Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware grasping, visuotactile perception, and regrasp planning. Task-aware grasping computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects. Videos are available at http://mcube.mit.edu/research/simPLE.html .
    摘要 现有的 роботиче系统存在一种明显的矛盾:一个机器人可以解决一个特定任务,但缺乏高级别的总体化能力,即能够解决多个任务而不失去精度。这篇论文探讨精度和通用性的pick-and-place解决方案。在精度的pick-and-place任务中,机器人将无结构的物品排序成结构化的排序,以便进一步的操作。我们提出了simPLE(从模拟到选择和置入)解决方案,该方案可以帮助机器人准确地找到、重新抓取和置入物品,只需要物品的CAD模型,无需互联网经验。我们开发了三个主要组成部分:任务意识 grasping、视听感知和重新抓取规划。任务意识 grasping计算可行的抓取方式,以确保稳定、可见和置入的情况。视听感知模型通过对实际观察与 simulate 的对比,通过超过学习来学习实际观察。最后,我们解决了一个短路问题,以计算手中重新抓取的最佳动作。在配备视听感知的双臂机器人上,我们用simPLE实现了15种多样化的物品的pick-and-place任务,物品的形状覆盖广泛,simPLE在90%的时间内成功地将物品置入结构化的排序中,距离1毫米。视频可以在http://mcube.mit.edu/research/simPLE.html 中找到。

A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

  • paper_url: http://arxiv.org/abs/2307.13127
  • repo_url: None
  • paper_authors: Spencer Giddens, Yiwang Zhou, Kevin R. Krull, Tara M. Brinkman, Peter X. K. Song, Fang Liu
  • For: 这个论文的目的是提出一种在使用敏感数据时实现隐私保护的可靠学习方法,以解决数据隐私issue。* Methods: 该论文使用了 differential privacy(DP)框架,并提出了一种基于 weights ERM 的首个具有隐私保证的算法,以及一种对现有 DP-ERM 方法的推广。* Results: 实验研究表明,通过在 wERM 中应用 DP 保证可以train OWL 模型,而不会导致模型性能下降。这种隐私保护的 OWL 方法在实验中得到了证明,并在实际临床试验中得到了应用。
    Abstract It is commonplace to use data containing personal information to build predictive models in the framework of empirical risk minimization (ERM). While these models can be highly accurate in prediction, results obtained from these models with the use of sensitive data may be susceptible to privacy attacks. Differential privacy (DP) is an appealing framework for addressing such data privacy issues by providing mathematically provable bounds on the privacy loss incurred when releasing information from sensitive data. Previous work has primarily concentrated on applying DP to unweighted ERM. We consider an important generalization to weighted ERM (wERM). In wERM, each individual's contribution to the objective function can be assigned varying weights. In this context, we propose the first differentially private wERM algorithm, backed by a rigorous theoretical proof of its DP guarantees under mild regularity conditions. Extending the existing DP-ERM procedures to wERM paves a path to deriving privacy-preserving learning methods for individualized treatment rules, including the popular outcome weighted learning (OWL). We evaluate the performance of the DP-wERM application to OWL in a simulation study and in a real clinical trial of melatonin for sleep health. All empirical results demonstrate the viability of training OWL models via wERM with DP guarantees while maintaining sufficiently useful model performance. Therefore, we recommend practitioners consider implementing the proposed privacy-preserving OWL procedure in real-world scenarios involving sensitive data.
    摘要 通常使用包含个人信息的数据来构建预测模型在预测风险最小化(ERM)框架中。虽然这些模型可以在预测上具有非常高的准确性,但使用敏感数据时可能会遭受隐私攻击。不等式隐私(DP)是一个吸引人的框架,可以为隐私泄露提供数学上可证明的约束。前一个研究主要集中在应用DP于不加权的ERM。我们考虑了一个重要的扩展,即加权ERM(wERM)。在wERM中,每个个体的对象函数中的贡献可以被分配不同的权重。在这种情况下,我们提出了首个具有DP保证的wERM算法,并提供了在轻度Regularity Conditions下的理论证明。通过扩展现有的DP-ERM过程,我们开辟了一条privacy-preserving学习方法的道路,包括流行的结果权重学习(OWL)。我们在一个Simulation Study和一个实际的临床试验中评估了DP-wERM应用于OWL的性能。所有实验结果表明,可以通过wERM WITH DP保证来训练OWL模型,而不会丧失有用的模型性能。因此,我们建议实践者在涉及敏感数据的实际场景中考虑实施我们提议的隐私保护OWL过程。

A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe

  • paper_url: http://arxiv.org/abs/2307.14361
  • repo_url: None
  • paper_authors: Sanad Aburass, Osama Dorgham, Jamil Al Shaqsi
  • for: 这个研究旨在使用Kaggle的个性化医学:重新定义癌症治疗数据集来类型基因变化。
  • methods: 该模型 combinig LSTM, BiLSTM, CNN, GRU, 以及GloVe来进行基因变化分类。
  • results: 该模型在准确率、精度、回归率、F1分数和平均方差方面与其他模型进行比较,并取得了最高的表现。此外,它还需要较少的训练时间,从而实现了性能和效率的完美协同。
    Abstract This study presents an ensemble model combining LSTM, BiLSTM, CNN, GRU, and GloVe to classify gene mutations using Kaggle's Personalized Medicine: Redefining Cancer Treatment dataset. The results were compared against well-known transformers like as BERT, Electra, Roberta, XLNet, Distilbert, and their LSTM ensembles. Our model outperformed all other models in terms of accuracy, precision, recall, F1 score, and Mean Squared Error. Surprisingly, it also needed less training time, resulting in a perfect combination of performance and efficiency. This study demonstrates the utility of ensemble models for difficult tasks such as gene mutation classification.
    摘要 Note:* LSTM: Long Short-Term Memory* BiLSTM: Bidirectional Long Short-Term Memory* CNN: Convolutional Neural Network* GRU: Gated Recurrent Unit* GloVe: Global Vectors for Word Representation* BERT: Bidirectional Encoder Representations from Transformers* Electra: Efficient Lifelong End-to-End Text Recognition* Roberta: Robustly Optimized BERT Pretraining Approach* XLNet: Extreme Language Modeling* Distilbert: Distilled BERTPlease note that the translation is in Simplified Chinese, and the words and phrases in bold are the names of the models and techniques used in the study.

Deep Bradley-Terry Rating: Quantifying Properties from Comparisons

  • paper_url: http://arxiv.org/abs/2307.13709
  • repo_url: None
  • paper_authors: Satoru Fujii
  • for: 这篇论文的目的是为了量化和评估未知对象的属性。
  • methods: 这篇论文使用了深度学习框架,并将布莱德利-泰勒模型 integrate 到 neural network 结构中。此外,它还推广到不平等环境下,以便更好地应用于实际场景。
  • results: 经过实验分析,这篇论文成功地量化和估算了欲要的属性。
    Abstract Many properties in the real world can't be directly observed, making them difficult to learn. To deal with this challenging problem, prior works have primarily focused on estimating those properties by using graded human scores as the target label in the training. Meanwhile, rating algorithms based on the Bradley-Terry model are extensively studied to evaluate the competitiveness of players based on their match history. In this paper, we introduce the Deep Bradley-Terry Rating (DBTR), a novel machine learning framework designed to quantify and evaluate properties of unknown items. Our method seamlessly integrates the Bradley-Terry model into the neural network structure. Moreover, we generalize this architecture further to asymmetric environments with unfairness, a condition more commonly encountered in real-world settings. Through experimental analysis, we demonstrate that DBTR successfully learns to quantify and estimate desired properties.
    摘要 多种Properties在现实世界中难以直接观察,这使得它们学习变得困难。以前的工作主要通过使用排名作为目标标签进行估算这些Properties。而BRADLEY-TERRY模型的评分算法在评估玩家的竞争力方面得到了广泛的研究。在这篇论文中,我们引入了深度BRADLEY-TERRY评分(DBTR),一种新的机器学习框架,用于评估和评价未知项目的属性。我们将BRADLEY-TERRY模型与神经网络结构结合,并将其扩展到偏袋环境中,以适应实际世界中更常见的不平等条件。通过实验分析,我们证明了DBTR成功地学习和估算所需的属性。

Conformal prediction for frequency-severity modeling

  • paper_url: http://arxiv.org/abs/2307.13124
  • repo_url: https://github.com/heltongraziadei/conformal-fs
  • paper_authors: Helton Graziadei, Paulo C. Marques F., Eduardo F. L. de Melo, Rodrigo S. Targino
  • for: 预测保险索赔数量
  • methods: 非参数模型无关框架、分割兼容预测、基于出袋机制的适应宽度调整
  • results: 适用于实际数据集和仿真数据集,可以生成具有finite sample statistcial guarantees的预测间隔
    Abstract We present a nonparametric model-agnostic framework for building prediction intervals of insurance claims, with finite sample statistical guarantees, extending the technique of split conformal prediction to the domain of two-stage frequency-severity modeling. The effectiveness of the framework is showcased with simulated and real datasets. When the underlying severity model is a random forest, we extend the two-stage split conformal prediction procedure, showing how the out-of-bag mechanism can be leveraged to eliminate the need for a calibration set and to enable the production of prediction intervals with adaptive width.
    摘要 我们提出了一种非参数化、模型无关的框架,用于构建保险索赔预测范围,具有有限样本统计保证,基于分割哲学预测技术扩展到两个阶段频率严重模型预测领域。我们通过使用 simulate 和实际数据示例,证明了该框架的效iveness。当下面际严重模型是随机森林时,我们对两个阶段分割哲学预测过程进行扩展,并示出了如何通过尝试机制来消除需要Calibration集和生成适应宽度的预测范围。

An Explainable Geometric-Weighted Graph Attention Network for Identifying Functional Networks Associated with Gait Impairment

  • paper_url: http://arxiv.org/abs/2307.13108
  • repo_url: https://github.com/favour-nerrise/xgw-gat
  • paper_authors: Favour Nerrise, Qingyu Zhao, Kathleen L. Poston, Kilian M. Pohl, Ehsan Adeli
    for:这个研究旨在更好地理解帕金森病(PD)的运动进程,特别是步行障碍和平衡问题的发展。通过识别大脑功能失调的特征,可以更好地理解PD的运动进程,从而开发更有效和个性化的治疗方法。methods:这个研究使用了一种可解释的、几何学的、质量权重 graphs attention neural network(xGW-GAT),用于预测帕金森病患者的步行障碍级别。xGW-GAT使用了函数连接矩阵来表示整个连接网络,并使用个性化的注意力mask来提取个体和群体水平的解释。results:这个研究发现,xGW-GAT在resting-state功能磁共振成像(rs-fMRI)数据集中的帕金森病患者中出色地预测了步行障碍级别,并提供了解释性的功能子网络结构。与现有的方法相比,xGW-GAT模型成功地超越了其他方法,同时揭示了临床有关的连接模式。
    Abstract One of the hallmark symptoms of Parkinson's Disease (PD) is the progressive loss of postural reflexes, which eventually leads to gait difficulties and balance problems. Identifying disruptions in brain function associated with gait impairment could be crucial in better understanding PD motor progression, thus advancing the development of more effective and personalized therapeutics. In this work, we present an explainable, geometric, weighted-graph attention neural network (xGW-GAT) to identify functional networks predictive of the progression of gait difficulties in individuals with PD. xGW-GAT predicts the multi-class gait impairment on the MDS Unified PD Rating Scale (MDS-UPDRS). Our computational- and data-efficient model represents functional connectomes as symmetric positive definite (SPD) matrices on a Riemannian manifold to explicitly encode pairwise interactions of entire connectomes, based on which we learn an attention mask yielding individual- and group-level explainability. Applied to our resting-state functional MRI (rs-fMRI) dataset of individuals with PD, xGW-GAT identifies functional connectivity patterns associated with gait impairment in PD and offers interpretable explanations of functional subnetworks associated with motor impairment. Our model successfully outperforms several existing methods while simultaneously revealing clinically-relevant connectivity patterns. The source code is available at https://github.com/favour-nerrise/xGW-GAT .
    摘要 一个典型的parkinson病(PD)的表现之一是慢慢地失去姿态反射,这会导致步态困难和平衡问题。确定潜在的脑功能干预在步态困难方面可能是理解PD motor进程的关键,从而开发更有效和个性化的治疗方法。在这项工作中,我们提出了一种可解释的、几何的、质量权重的神经网络(xGW-GAT),用于预测PD患者的步态困难级别。xGW-GAT预测了MDS联合PD评估rating scale(MDS-UPDRS)中的多个步态困难类型。我们的计算和数据有效的模型将功能连接矩阵(connectome)表示为对称正定的矩阵(SPD),并在RIemannian manifold上进行Explicitly编码对整个connectome的对称对应关系,基于这些对应关系我们学习一个注意力mask,以获得个体和组级别的解释。应用于我们的resting-state功能MRI(rs-fMRI)数据集中的PD患者,xGW-GAT确定了与步态困难相关的功能连接模式,并提供了可解释的功能子网络相关于运动障碍。我们的模型成功击败了一些现有的方法,同时揭示了临床有用的连接模式。模型代码可以在https://github.com/favour-nerrise/xGW-GAT 上获取。

Contrastive Example-Based Control

  • paper_url: http://arxiv.org/abs/2307.13101
  • repo_url: https://github.com/khatch31/laeo
  • paper_authors: Kyle Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan Salakhutdinov, Sergey Levine, Chelsea Finn
  • for: 本研究旨在提出一种基于例子的离线控制方法,可以学习多步转移的隐藏模型,而不需要 specify 奖励函数。
  • methods: 本方法使用了数据驱动的方法,通过从转移动力学中采样得到的例子来学习隐藏模型,并使用这些模型来预测转移的Q值。
  • results: 相比基elines,本方法在多个状态基于和图像基于的离线控制任务中表现出色,并且在数据集大小增加时显示了更好的稳定性和扩展性。
    Abstract While many real-world problems that might benefit from reinforcement learning, these problems rarely fit into the MDP mold: interacting with the environment is often expensive and specifying reward functions is challenging. Motivated by these challenges, prior work has developed data-driven approaches that learn entirely from samples from the transition dynamics and examples of high-return states. These methods typically learn a reward function from high-return states, use that reward function to label the transitions, and then apply an offline RL algorithm to these transitions. While these methods can achieve good results on many tasks, they can be complex, often requiring regularization and temporal difference updates. In this paper, we propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. We show that this implicit model can represent the Q-values for the example-based control problem. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions; additional experiments demonstrate improved robustness and scaling with dataset size.
    摘要 虽然许多实际问题可以受惠于强化学习,但这些问题很少遵循MDP模型:与环境交互通常是昂贵的,并且指定奖励函数是困难的。为了解决这些挑战,先前的工作已经开发了基于数据的方法,这些方法通过从转移动态和高返回状态中提取样本来学习。这些方法通常会学习一个奖励函数从高返回状态,使用这个奖励函数来标注转移,然后应用在这些转移上的离线RL算法。虽然这些方法可以在许多任务上达到良好的结果,但它们可能是复杂的,经常需要规则化和时间差更新。在这篇论文中,我们提出了一种离线、示例基于的控制方法,这种方法学习了多步转移的隐藏模型,而不是奖励函数。我们示示了这个隐藏模型可以表示示例基于的控制问题中的Q值。在一系列的状态基本和图像基本的离线控制任务上,我们的方法超过了基于学习的奖励函数的基eline,其他实验还证明了我们的方法具有更好的 Robustness 和数据集大小增长。

Label Noise: Correcting a Correction

  • paper_url: http://arxiv.org/abs/2307.13100
  • repo_url: None
  • paper_authors: William Toner, Amos Storkey
  • for: addressing the issue of overfitting in training neural network classifiers on datasets with label noise
  • methods: proposing a more direct approach to mitigate overfitting by imposing a lower bound on the empirical risk during training
  • results: providing theoretical results with explicit, easily computable bounds on the minimum achievable noisy risk for different loss functions, and demonstrating significant enhancement in robustness with virtually no additional computational cost.Here’s the full text in Simplified Chinese:
  • for: 这个研究旨在解决对于训练神经网络分类器的标签杂读问题
  • methods: 我们提出了一种更直接的方法,通过在训练过程中对实验风险下设置下限,以遏止过拟合
  • results: 我们提供了具体、容易计算的下限 bounds,以及实验结果,显示这种方法可以帮助提高神经网络分类器的Robustness,而且无需额外的计算成本。
    Abstract Training neural network classifiers on datasets with label noise poses a risk of overfitting them to the noisy labels. To address this issue, researchers have explored alternative loss functions that aim to be more robust. However, many of these alternatives are heuristic in nature and still vulnerable to overfitting or underfitting. In this work, we propose a more direct approach to tackling overfitting caused by label noise. We observe that the presence of label noise implies a lower bound on the noisy generalised risk. Building upon this observation, we propose imposing a lower bound on the empirical risk during training to mitigate overfitting. Our main contribution is providing theoretical results that yield explicit, easily computable bounds on the minimum achievable noisy risk for different loss functions. We empirically demonstrate that using these bounds significantly enhances robustness in various settings, with virtually no additional computational cost.
    摘要 <>将神经网络分类器训练数据集中的标签噪声可能会导致模型过度适应。为解决这个问题,研究人员已经探索了一些alternative的损失函数,以增强模型的Robustness。然而,许多这些alternative都是HEURISTIC的 Natur,可能会导致过度适应或者下降。在这项工作中,我们提出了一种更直接的方法来处理标签噪声导致的过度适应。我们发现,标签噪声存在一个下界,这个下界对于不同的损失函数来说都是可能的。基于这个发现,我们提议在训练过程中尝试在Empirical risk下设置下界,以避免过度适应。我们的主要贡献是提供了理论结果,可以给出不同损失函数的明确、容易计算的下界,以降低过度适应的风险。我们的实验结果表明,使用这些下界可以在不同的设置下提高模型的Robustness,而且几乎没有额外的计算成本。

Interpretable Ensemble Learning for Materials Property Prediction with Classical Interatomic Potentials: Carbon as an Example

  • paper_url: http://arxiv.org/abs/2308.10818
  • repo_url: None
  • paper_authors: Xinyu Jiang, Haofan Sun, Kamal Choudhary, Houlong Zhuang, Qiong Nian
  • for: 这篇论文目的是用机器学习(ML)技术预测晶体材料的性能。
  • methods: 该方法使用了回归树 ensemble learning,不需要任何描述符,直接使用分子动力学计算的物理性质作为输入。
  • results: 结果显示,对于碳杂合物的小样本数据,ensemble learning的预测结果比使用传统的分子动力学potential更准确,并且能够捕捉9种不同的分子动力学potential中的相对准确性。
    Abstract Machine learning (ML) is widely used to explore crystal materials and predict their properties. However, the training is time-consuming for deep-learning models, and the regression process is a black box that is hard to interpret. Also, the preprocess to transfer a crystal structure into the input of ML, called descriptor, needs to be designed carefully. To efficiently predict important properties of materials, we propose an approach based on ensemble learning consisting of regression trees to predict formation energy and elastic constants based on small-size datasets of carbon allotropes as an example. Without using any descriptor, the inputs are the properties calculated by molecular dynamics with 9 different classical interatomic potentials. Overall, the results from ensemble learning are more accurate than those from classical interatomic potentials, and ensemble learning can capture the relatively accurate properties from the 9 classical potentials as criteria for predicting the final properties.
    摘要 机器学习(ML)广泛应用于探索晶体材料和预测其性能。然而,训练深度学习模型需时间consuming, regression 过程是一个难以解释的黑盒子。此外,将晶体结构转换为 ML 的输入,即描述符,需要仔细设计。为了有效预测材料的重要性能,我们提出了基于集成学习的方法,包括回归树来预测基于小型 datasets of carbon allotropes 的形成能gy和弹性常数。无需使用任何描述符,输入是通过分子动力学计算的物理性质。总的来说, ensemble 学习的结果比 классиical interatomic potentials 更加准确,并且 ensemble 学习可以捕捉来自 nine classical potentials 的相对准确性作为预测最终性能的标准。

Fairness Under Demographic Scarce Regime

  • paper_url: http://arxiv.org/abs/2307.13081
  • repo_url: None
  • paper_authors: Patrik Joslin Kenfack, Samira Ebrahimi Kahou, Ulrich Aïvodji
  • for: 本研究旨在 Addressing the limitation of prior works on fairness, which assume full access to demographic information, but in reality, demographic information may be partially available or unavailable due to privacy concerns.
  • methods: 本研究提出了一个 attribute classifier 建构框架,通过将不确定性写入模型中,并在有demographic信息的样本上强制施行公平性约束,以提高公平精度贡献。
  • results: 经过实验显示,提出的框架可以实现更好的公平精度贡献,并且超越了使用真实敏感特征的模型。此外,模型还能够在没有demographic信息的情况下提供更好的公平精度贡献。
    Abstract Most existing works on fairness assume the model has full access to demographic information. However, there exist scenarios where demographic information is partially available because a record was not maintained throughout data collection or due to privacy reasons. This setting is known as demographic scarce regime. Prior research have shown that training an attribute classifier to replace the missing sensitive attributes (proxy) can still improve fairness. However, the use of proxy-sensitive attributes worsens fairness-accuracy trade-offs compared to true sensitive attributes. To address this limitation, we propose a framework to build attribute classifiers that achieve better fairness-accuracy trade-offs. Our method introduces uncertainty awareness in the attribute classifier and enforces fairness on samples with demographic information inferred with the lowest uncertainty. We show empirically that enforcing fairness constraints on samples with uncertain sensitive attributes is detrimental to fairness and accuracy. Our experiments on two datasets showed that the proposed framework yields models with significantly better fairness-accuracy trade-offs compared to classic attribute classifiers. Surprisingly, our framework outperforms models trained with constraints on the true sensitive attributes.
    摘要 现有大多数工作假设模型拥有完整的人口信息。然而,有些场景下人口信息部分可用,例如记录不完整或因隐私原因无法获取。这种情况被称为人口缺乏 regime。先前的研究表明,使用代理敏感特征来代替缺失的敏感特征可以改善公平。然而,使用代理敏感特征会对公平精度负面影响比使用真实的敏感特征更大。为解决这些限制,我们提出了一个框架,用于建立具有更好的公平精度负面影响的特征分类器。我们的方法在特征分类器中引入了不确定性意识,并在具有最低不确定性的人口信息上遵循公平约束。我们的实验表明,对不确定的敏感特征进行公平约束是对公平和准确性的负面影响。我们的方法在两个数据集上进行了实验,并显示了与经典特征分类器相比,我们的框架可以获得显著更好的公平精度负面影响。另外,我们的方法还超过使用约束的真实敏感特征模型。

Adaptive Certified Training: Towards Better Accuracy-Robustness Tradeoffs

  • paper_url: http://arxiv.org/abs/2307.13078
  • repo_url: None
  • paper_authors: Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard
  • for: 提高模型的强健性和标准准确率之间的质量衡量。
  • methods: 基于适应证明的半径的训练方法,通过提高模型的准确率和强健性来实现更好的准确率-强健性质量衡量。
  • results: 在MNIST、CIFAR-10和TinyImageNet dataset上,提出的方法可以实现更高的强健性和标准准确率之间的质量衡量,特别是在CIFAR-10和TinyImageNet dataset上,模型的强健性可以提高至两倍的水平,而且保持同等水平的标准准确率。
    Abstract As deep learning models continue to advance and are increasingly utilized in real-world systems, the issue of robustness remains a major challenge. Existing certified training methods produce models that achieve high provable robustness guarantees at certain perturbation levels. However, the main problem of such models is a dramatically low standard accuracy, i.e. accuracy on clean unperturbed data, that makes them impractical. In this work, we consider a more realistic perspective of maximizing the robustness of a model at certain levels of (high) standard accuracy. To this end, we propose a novel certified training method based on a key insight that training with adaptive certified radii helps to improve both the accuracy and robustness of the model, advancing state-of-the-art accuracy-robustness tradeoffs. We demonstrate the effectiveness of the proposed method on MNIST, CIFAR-10, and TinyImageNet datasets. Particularly, on CIFAR-10 and TinyImageNet, our method yields models with up to two times higher robustness, measured as an average certified radius of a test set, at the same levels of standard accuracy compared to baseline approaches.
    摘要

General-Purpose Multi-Modal OOD Detection Framework

  • paper_url: http://arxiv.org/abs/2307.13069
  • repo_url: None
  • paper_authors: Viet Duong, Qiong Wu, Zhengyi Zhou, Eric Zavesky, Jiahe Chen, Xiangzhou Liu, Wen-Ling Hsu, Huajie Shao
  • for: 这个研究的目的是为了实现多种不同的假值检测方法,以确保机器学习系统的安全性和可靠性。
  • methods: 这个研究使用了一个通用的弱监督的假值检测框架,叫做WOOD,它结合了一个二分类器和一个对照学习部分,以获得两者的好处。
  • results: 实验结果显示,WOOD模型在多个真实世界的数据集上表现出色,能够同时在三个不同的假值enario中实现高准确性。
    Abstract Out-of-distribution (OOD) detection identifies test samples that differ from the training data, which is critical to ensuring the safety and reliability of machine learning (ML) systems. While a plethora of methods have been developed to detect uni-modal OOD samples, only a few have focused on multi-modal OOD detection. Current contrastive learning-based methods primarily study multi-modal OOD detection in a scenario where both a given image and its corresponding textual description come from a new domain. However, real-world deployments of ML systems may face more anomaly scenarios caused by multiple factors like sensor faults, bad weather, and environmental changes. Hence, the goal of this work is to simultaneously detect from multiple different OOD scenarios in a fine-grained manner. To reach this goal, we propose a general-purpose weakly-supervised OOD detection framework, called WOOD, that combines a binary classifier and a contrastive learning component to reap the benefits of both. In order to better distinguish the latent representations of in-distribution (ID) and OOD samples, we adopt the Hinge loss to constrain their similarity. Furthermore, we develop a new scoring metric to integrate the prediction results from both the binary classifier and contrastive learning for identifying OOD samples. We evaluate the proposed WOOD model on multiple real-world datasets, and the experimental results demonstrate that the WOOD model outperforms the state-of-the-art methods for multi-modal OOD detection. Importantly, our approach is able to achieve high accuracy in OOD detection in three different OOD scenarios simultaneously. The source code will be made publicly available upon publication.
    摘要 外部数据(OOD)检测可以识别测试样本与训练数据之间的差异,这是机器学习(ML)系统的安全性和可靠性的关键。虽然大量方法已经开发出来检测uni-modal OOD样本,但只有一些关注多模态 OOD 检测。当前的对比学习基于方法主要在一个给定的图像和其相应的文本描述来自新领域的情况下进行多模态 OOD 检测。然而,实际世界中 ML 系统的部署可能会遇到更多的异常情况,如感知器故障、坏天气和环境变化。因此,本研究的目标是同时从多个不同的 OOD 场景中进行细致的检测。为达到这个目标,我们提出了一种通用的弱监督 OOD 检测框架,称为 WOOD,该框架结合了一个二分类器和一个对比学习组件,以便充分利用它们的优势。为了更好地分别 ID 和 OOD 样本的幂本表示,我们采用了缺角损失来约束它们的相似性。此外,我们开发了一个新的评分指标,以集成 binary 分类器和对比学习的预测结果,以便更好地识别 OOD 样本。我们对多个实际世界数据集进行了实验,结果显示,提出的 WOOD 模型在多modal OOD 检测中高度超越了现状的方法。特别是,我们的方法能够同时高精度地识别 OOD 样本在三个不同的 OOD 场景中。代码将在发表后公开。

Personalized Category Frequency prediction for Buy It Again recommendations

  • paper_url: http://arxiv.org/abs/2308.01195
  • repo_url: None
  • paper_authors: Amit Pande, Kunal Ghosh, Rankyung Park
  • for: 提高用户体验和站点参与度,预测用户可能会购买的商品
  • methods: 提出一种层次PCIC模型,包括个性化类别模型(PC模型)和个性化类别内项模型(IC模型),模型使用存活模型和时间序列模型生成特征,然后使用类别划分的神经网络进行训练
  • results: 与12个基线比较,PCIC在四个标准开放数据集上提高了NDCG达16%,同时提高了回归率约2%,并在大规模数据集上进行了扩展和训练(超过8小时),并在一家大商户的官方网站上进行了AB测试,导致了用户参与度的显著提高
    Abstract Buy It Again (BIA) recommendations are crucial to retailers to help improve user experience and site engagement by suggesting items that customers are likely to buy again based on their own repeat purchasing patterns. Most existing BIA studies analyze guests personalized behavior at item granularity. A category-based model may be more appropriate in such scenarios. We propose a recommendation system called a hierarchical PCIC model that consists of a personalized category model (PC model) and a personalized item model within categories (IC model). PC model generates a personalized list of categories that customers are likely to purchase again. IC model ranks items within categories that guests are likely to consume within a category. The hierarchical PCIC model captures the general consumption rate of products using survival models. Trends in consumption are captured using time series models. Features derived from these models are used in training a category-grained neural network. We compare PCIC to twelve existing baselines on four standard open datasets. PCIC improves NDCG up to 16 percent while improving recall by around 2 percent. We were able to scale and train (over 8 hours) PCIC on a large dataset of 100M guests and 3M items where repeat categories of a guest out number repeat items. PCIC was deployed and AB tested on the site of a major retailer, leading to significant gains in guest engagement.
    摘要 Buy It Again (BIA) 建议是重要的 для零售商,帮助改善用户体验和网站参与度,通过建议客户可能会再次购买的商品,基于他们自己的重复购买模式。大多数现有的BIA研究分析客人个性化行为的项目粒度。我们提出一种推荐系统,即层次PCIC模型,包括个性化类别模型(PC模型)和个性化类别内容模型(IC模型)。PC模型生成个性化的类别列表,客户可能会购买的类别。IC模型将类别内容排名,客户可能会在类别内消耗的商品。层次PCIC模型捕捉产品的总消耗率,使用存生模型捕捉消耗趋势。这些特征被用于训练分类器。我们与十二个基准值进行比较,PCIC提高了NDCG达16%,同时提高了回归率约2%。我们可以在8小时内扩展和训练PCIC(超过100万客户和300万个商品),其中客户重复购买的类别大于客户重复购买的商品。PCIC在一家大型零售商的官方网站上进行了部署和AB测试,导致用户参与度显著提高。

Feature Gradient Flow for Interpreting Deep Neural Networks in Head and Neck Cancer Prediction

  • paper_url: http://arxiv.org/abs/2307.13061
  • repo_url: None
  • paper_authors: Yinzhu Jin, Jonathan C. Garneau, P. Thomas Fletcher
  • for: 这篇论文旨在介绍一种新的深度学习模型解释技术,即通过计算模型的梯度流来解释模型做出决策时使用的特征。
  • methods: 该技术使用了计算模型的梯度流来定义输入数据空间中的非线性坐标,从而解释模型做出决策时使用的信息。然后,通过比较特征的梯度流度量和基线噪声特征的梯度流度量来评估特征的重要性。
  • results: 在使用了该技术进行训练后,模型的解释性得到了提高。研究人员通过计算模型的梯度流来评估特征的重要性,并发现了一些有用的特征,例如肿瘤大小和形态等。
    Abstract This paper introduces feature gradient flow, a new technique for interpreting deep learning models in terms of features that are understandable to humans. The gradient flow of a model locally defines nonlinear coordinates in the input data space representing the information the model is using to make its decisions. Our idea is to measure the agreement of interpretable features with the gradient flow of a model. To then evaluate the importance of a particular feature to the model, we compare that feature's gradient flow measure versus that of a baseline noise feature. We then develop a technique for training neural networks to be more interpretable by adding a regularization term to the loss function that encourages the model gradients to align with those of chosen interpretable features. We test our method in a convolutional neural network prediction of distant metastasis of head and neck cancer from a computed tomography dataset from the Cancer Imaging Archive.
    摘要 Simplified Chinese:这篇论文介绍了一种新的技术,即特征涟流,可以使深度学习模型变得更加解释性。特征涟流定义了模型在输入数据空间中的非线性坐标,表示模型做出决策的信息。我们的方法是测量特定的可解释特征与模型涟流的一致程度,并评估特定特征对模型的重要性。我们还开发了一种方法,通过添加一个减少项到损失函数中,使模型的梯度与选择的可解释特征的梯度进行对齐。我们在计算tomography数据集中预测头颈癌 distant metastasis的 convolutional neural network中测试了我们的方法。

MARIO: Model Agnostic Recipe for Improving OOD Generalization of Graph Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.13055
  • repo_url: https://github.com/zhuyun97/mario
  • paper_authors: Yun Zhu, Haizhou Shi, Zhenshuo Zhang, Siliang Tang
  • for: 本文研究的问题是非标注图数据上的非标注泛化(Out-of-distribution, OOD)泛化问题,特别是图神经网络(Graph Neural Network, GNN)在分布转移时的敏感性问题。
  • methods: 我们提出了一种名为MARIO的模型无关的热革命方法,用于改进非标注图谱离散学习方法的OOD泛化性能。 MARIO包括两个原则:信息瓶颈(Information Bottleneck, IB)原则以实现泛化表示,以及不变原则,通过对敏感数据进行对抗数据增强来获得不变表示。
  • results: 我们通过广泛的实验表明,我们的方法可以在OOD测试集上实现状态之最的性能,而与现有方法相比,在标注测试集上保持相似的性能。代码可以在 GitHub 上找到:https://github.com/ZhuYun97/MARIO。
    Abstract In this work, we investigate the problem of out-of-distribution (OOD) generalization for unsupervised learning methods on graph data. This scenario is particularly challenging because graph neural networks (GNNs) have been shown to be sensitive to distributional shifts, even when labels are available. To address this challenge, we propose a \underline{M}odel-\underline{A}gnostic \underline{R}ecipe for \underline{I}mproving \underline{O}OD generalizability of unsupervised graph contrastive learning methods, which we refer to as MARIO. MARIO introduces two principles aimed at developing distributional-shift-robust graph contrastive methods to overcome the limitations of existing frameworks: (i) Information Bottleneck (IB) principle for achieving generalizable representations and (ii) Invariant principle that incorporates adversarial data augmentation to obtain invariant representations. To the best of our knowledge, this is the first work that investigates the OOD generalization problem of graph contrastive learning, with a specific focus on node-level tasks. Through extensive experiments, we demonstrate that our method achieves state-of-the-art performance on the OOD test set, while maintaining comparable performance on the in-distribution test set when compared to existing approaches. The source code for our method can be found at: https://github.com/ZhuYun97/MARIO
    摘要 在这个工作中,我们研究了无监督学习方法在图数据上的 OUT-OF-DISTRIBUTION(OOD)泛化问题。这种情况特别是挑战性的,因为图神经网络(GNNs)已经被证明是 Distributional Shifts 敏感的,即使标签可用。为了解决这个挑战,我们提出了一种名为 MARIO 的模型无关的照片,用于提高无监督图对比学习方法的 OOD 泛化性。MARIO 包括两个原则:(i) 信息瓶颈(IB)原则,以实现泛化表示,以及(ii) 不变原则,通过对数据进行对抗式数据增强来获得不变表示。根据我们所知,这是首次研究 OOD 泛化问题的图对比学习方法,具体注重节点级任务。通过广泛的实验,我们证明了我们的方法在 OOD 测试集上具有最佳性能,而且与现有方法相比,在同一个测试集上保持了相似的性能。MARIO 的源代码可以在 GitHub 上找到:https://github.com/ZhuYun97/MARIO。

Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

  • paper_url: http://arxiv.org/abs/2307.12983
  • repo_url: https://github.com/Improbable-AI/pql
  • paper_authors: Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal
  • For: 这篇论文是关于加速复杂任务的强化学习的研究,通过使用大量的训练数据来提高模型的性能。* Methods: 这篇论文使用了Isaac Gym提供的GPU基于的 simulate系统,通过并行收集数据、策略学习和价值学习来提高强化学习的效率。* Results: 这篇论文提出了一种并行$Q$-学习(PQL)方案,可以在几个工作站上并行进行数据收集、策略学习和价值学习,从而提高强化学习的效率。在实验中,PQL方案可以扩展到数以千计的并行环境,并调查了学习速度的重要因素。
    Abstract Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that $Q$-learning can be scaled to \textit{tens of thousands of parallel environments} and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.
    摘要 强化学习因为复杂任务而需要大量训练数据,Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel $Q$-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that $Q$-learning can be scaled to 万�� nombreux parallel environments and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you prefer Traditional Chinese, I can provide that as well.

3D-LLM: Injecting the 3D World into Large Language Models

  • paper_url: http://arxiv.org/abs/2307.12981
  • repo_url: https://github.com/UMass-Foundation-Model/3D-LLM
  • paper_authors: Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
  • for: 这个研究想要尝试在大语言模型(LLM)和视力语言模型(VLM)上插入3D世界,以提高这些模型在多个任务上的表现,包括常识推理。
  • methods: 这个研究使用了三种提示机制,并利用3D特征提取器从渲染的多视图图像中提取3D特征来训练3D语言模型(3D-LLM)。
  • results: 研究表明,在ScanQA任务上,该模型的表现比状态正常基eline高出9%的BLEU-1分数。此外,在3D描述、任务组合和3D辅助对话等任务上,该模型也表现出优于2D VLM。 Qualitative例子也显示,该模型可以完成跨度外的任务。
    Abstract Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs. Project Page: : https://vis-www.cs.umass.edu/3dllm/.
    摘要 大型语言模型(LLM)和视觉语言模型(VLM)已经被证明可以在多个任务上表现出色,如常识 reasoning。强大的这些模型可以是,它们没有与3D物理世界相关的概念,包括空间关系、可用性、物理学、布局等。在这项工作中,我们提议将3D世界注入到大型语言模型中,并引入一个全新的3D-LLM家族。specifically,3D-LLM可以从3D点云和其特征输入,并完成一系列3D相关任务,包括captioning、dense captioning、3D问答、任务分解、3D定位、3D-assisted dialog、导航等。通过我们设计的三种提示机制,我们能够收集超过300k个3D语言数据覆盖这些任务。为了效率地训练3D-LLM,我们首先利用3D特征提取器从渲染多视图图像中获取3D特征。然后,我们使用2D VLM作为我们的背部来训练我们的3D-LLM。通过引入3D本地化机制,3D-LLM可以更好地捕捉3D空间信息。在ScanQA上进行实验,我们的模型比州态艺术基eline的基eline高出大幅度(例如,BLEU-1分数超过州态艺术基eline的分数 by 9%)。此外,在我们保留的数据集上进行3D captioning、任务组合和3D-assisted dialogue的实验中,我们的模型超过2D VLM。Qualitative例子也表明我们的模型可以完成现有LLM和VLM的任务之外的更多任务。项目页面:https://vis-www.cs.umass.edu/3dllm/.

An Isometric Stochastic Optimizer

  • paper_url: http://arxiv.org/abs/2307.12979
  • repo_url: None
  • paper_authors: Jacob Jackson
  • for: 这 paper 的目的是解释 Adam 优化器的成功,并基于这个原理提出一种新的优化器。
  • methods: 这 paper 使用了一种新的优化器 called Iso,它使每个参数的步长独立于其他参数的 нор。 Additionally, the paper proposes a variant of Iso called IsoAdam, which allows for the transfer of optimal hyperparameters from Adam.
  • results: 实验结果表明,IsoAdam 在训练一个小型 Transformer 时比 Adam 快速。
    Abstract The Adam optimizer is the standard choice in deep learning applications. I propose a simple explanation of Adam's success: it makes each parameter's step size independent of the norms of the other parameters. Based on this principle I derive Iso, a new optimizer which makes the norm of a parameter's update invariant to the application of any linear transformation to its inputs and outputs. I develop a variant of Iso called IsoAdam that allows optimal hyperparameters to be transferred from Adam, and demonstrate that IsoAdam obtains a speedup over Adam when training a small Transformer.
    摘要 《Adam优化器在深度学习应用中是标准选择。我提出了对Adam成功的简单解释:它使每个参数的步长独立于其他参数的norm。基于这个原理,我 derivated一种新的优化器叫做Iso,它使参数更新的 нор免受输入和输出的任何线性变换的影响。我还开发了一种名为IsoAdam的变体,它允许从Adam中传输优化参数,并证明IsoAdam在训练小型Transformer时比Adam快。》Note that Simplified Chinese is used here, which is one of the two standard forms of Chinese writing. Traditional Chinese is the other form, and it may be used in different regions or contexts.

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems

  • paper_url: http://arxiv.org/abs/2307.12975
  • repo_url: None
  • paper_authors: Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao, Mengdi Wang
  • for: 这个论文是关于决策问题中的奖励工程。在实际应用中,常有无明显的奖励函数选择的情况,因此引入人工反馈以帮助学习奖励函数的方法得到了广泛应用。
  • methods: 这个论文使用了人工反馈来学习奖励函数,并且提出了一种基于偏好的方法,该方法在recent empirical applications such as InstructGPT中表现出色。
  • results: 论文提供了一种理论分析,证明了基于偏好的方法在offline上下文ual bandits中的优势。具体来说,论文提高了运行policy learning方法的人工分类样本的模型和下界分析,并与基于偏好的方法的下界比较,证明了基于偏好的方法具有较低的下界。
    Abstract A crucial task in decision-making problems is reward engineering. It is common in practice that no obvious choice of reward function exists. Thus, a popular approach is to introduce human feedback during training and leverage such feedback to learn a reward function. Among all policy learning methods that use human feedback, preference-based methods have demonstrated substantial success in recent empirical applications such as InstructGPT. In this work, we develop a theory that provably shows the benefits of preference-based methods in offline contextual bandits. In particular, we improve the modeling and suboptimality analysis for running policy learning methods on human-scored samples directly. Then, we compare it with the suboptimality guarantees of preference-based methods and show that preference-based methods enjoy lower suboptimality.
    摘要 决策问题中一项非常重要的任务是奖励工程。在实践中,不存在明显的奖励函数选择。因此,一种受欢迎的方法是在训练过程中引入人类反馈,并使用这些反馈学习一个奖励函数。在所有基于策略学习方法中使用人类反馈的方法中,偏好基于方法在最近的实际应用中,如InstructGPT,已经实现了显著的成功。在这项工作中,我们发展了一种理论,证明了偏好基于方法在线上上下文带动机中的优点。具体来说,我们改进了运行策略学习方法直接使用人类评分样本的模型和下optimality分析。然后,我们与偏好基于方法的下optimality保证进行比较,并证明偏好基于方法在下optimality方面具有更低的下optimality。

Big Data - Supply Chain Management Framework for Forecasting: Data Preprocessing and Machine Learning Techniques

  • paper_url: http://arxiv.org/abs/2307.12971
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Md Abrar Jahin, Md Sakib Hossain Shovon, Jungpil Shin, Istiyaque Ahmed Ridoy, Yoichi Tomioka, M. F. Mridha
  • For: This paper aims to systematically identify and comparatively analyze state-of-the-art supply chain (SC) forecasting strategies and technologies, and to propose a novel framework incorporating Big Data Analytics in SC Management.* Methods: The proposed framework includes problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization. The paper discusses the need for different types of forecasting according to the period or SC objective, and recommends SC KPIs and error-measurement systems to optimize the top-performing model.* Results: The paper illustrates the adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning).Here are the three points in Simplified Chinese text:* For: 这篇论文目标是系统地检查和比较现有的供应链(SC)预测策略和技术,并提出一种新的框架,将大数据分析integrated into SC Management。* Methods: 该提案的框架包括问题识别、数据来源、探索数据分析、机器学习模型训练、 гипер参数调整、性能评估和优化。 paper discusses the need for different types of forecasting according to the period or SC objective, and recommends SC KPIs and error-measurement systems to optimize the top-performing model.* Results: paper illustrates the adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning).
    Abstract This article intends to systematically identify and comparatively analyze state-of-the-art supply chain (SC) forecasting strategies and technologies. A novel framework has been proposed incorporating Big Data Analytics in SC Management (problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization), forecasting effects on human-workforce, inventory, and overall SC. Initially, the need to collect data according to SC strategy and how to collect them has been discussed. The article discusses the need for different types of forecasting according to the period or SC objective. The SC KPIs and the error-measurement systems have been recommended to optimize the top-performing model. The adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency have been illustrated. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning). The contribution of this research lies in the standard SC process framework proposal, recommended forecasting data analysis, forecasting effects on SC performance, machine learning algorithms optimization followed, and in shedding light on future research.
    摘要 Translated into Simplified Chinese:这篇文章的目的是系统地找出当前最佳实践的供应链(SC)预测策略和技术,并提出一种新的框架,该框架包括在SC管理中使用大数据分析。文章讨论了SC预测的数据收集方式和不同类型的预测,以及适用于不同的时间间隔或SC目标。文章还建议了SC指标和错误度量系统,以便优化最佳模型。文章还描述了预测对人工资源、存储和整体SC的影响,以及管理决策的依赖于SC指标以确定模型性能参数和改善运营管理、透明度和规划效率。文章还提出了一种循环连接的框架,该框架包括根据后处理指标进行预处理优化,以及供应链管理、人力决策、成本、生产和容量规划。本文的贡献在于提出了标准SC过程框架、预测数据分析、预测对SC性能的影响、机器学习算法优化和未来研究方向。

A Connection between One-Step Regularization and Critic Regularization in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.12968
  • repo_url: https://github.com/ben-eysenbach/ac-connection
  • paper_authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov
  • for: 这 paper 的目的是解释一些关于 Offline Reinforcement Learning 的问题,包括如何使用一步或多步的策略改进来避免过拟合。
  • methods: 这 paper 使用了一些不同的方法来进行策略改进,包括 advantage-weighted regression 和 conditional behavioral cloning。它们的主要区别在于,一步方法会在一步的策略改进后停止,而多步方法会通过多个步骤来进行策略改进。
  • results: 这 paper 的实验结果表明,一步 RL 可以与多步 RL 相比肩,但是它们在不同的问题上的表现可能不同。具体来说,一步 RL 在需要强regularization的问题上可能更 Competitive。
    Abstract As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One-step methods perform regularization by doing just a single step of policy improvement, while critic regularization methods do many steps of policy improvement with a regularized objective. These methods appear distinct. One-step methods, such as advantage-weighted regression and conditional behavioral cloning, truncate policy iteration after just one step. This ``early stopping'' makes one-step RL simple and stable, but can limit its asymptotic performance. Critic regularization typically requires more compute but has appealing lower-bound guarantees. In this paper, we draw a close connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While practical implementations violate our assumptions and critic regularization is typically applied with smaller regularization coefficients, our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters. Our results that every problem can be solved with a single step of policy improvement, but rather that one-step RL might be competitive with critic regularization on RL problems that demand strong regularization.
    摘要 如果机器学习问题受限于数据,有效的离线RL算法需要谨慎的补偿来避免过拟合。一步方法通过做一步策略改进来实现补偿,而批评补偿方法则通过多个步骤的策略改进来实现补偿。这些方法看起来很不同。一步方法,如优点权重回归和Conditional Behavioral Cloning,在策略迭代后 truncate 策略迭代。这个``早期停止''使一步RL简单和稳定,但可能限制其极限性表现。批评补偿通常需要更多的计算,但它具有吸引人的下界保证。在这篇论文中,我们 Draw a close connection between these methods:在应用多步批评补偿方法时,使用补偿系数为1就可以获得与一步RL相同的策略。虽然实际实现可能违反我们的假设,但我们的实验表明,我们的分析可以准确预测实际的离线RL方法(CQL和一步RL)在通用的 гиперпараметры下的表现。我们的结果表明,每个问题都可以通过一步策略改进来解决,但是一步RL可能与批评补偿在RL问题中具有强补偿需求的问题竞争。

Learning Dense Correspondences between Photos and Sketches

  • paper_url: http://arxiv.org/abs/2307.12967
  • repo_url: https://github.com/cogtoolslab/photo-sketch-correspondence
  • paper_authors: Xuanchen Lu, Xiaolong Wang, Judith E Fan
  • for: 这个论文旨在研究计算机系统是如何模拟人类的简图理解能力?
  • methods: 作者提出了一种基于自我超vised学习的方法,使用尺寸变换网络来估计简图和照片之间的匹配关系。
  • results: 研究发现,该方法可以超过多个强化基elines,并且与其他抽象级别的方法相比,具有更高的准确率。然而,研究还发现了人类和机器系统之间的差异,提示了更多的研究空间。
    Abstract Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic. Moreover, human sketch understanding goes beyond categorization -- critically, it also entails understanding how individual elements within a sketch correspond to parts of the physical world it represents. What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, $\textit{PSC6k}$, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. Our model uses a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo extracted by a contrastive learning-based ConvNet backbone. We found that this approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, our benchmark also revealed systematic differences between predictions of the suite of models we tested and those of humans. Taken together, our work suggests a promising path towards developing artificial systems that achieve more human-like understanding of visual images at different levels of abstraction. Project page: https://photo-sketch-correspondence.github.io
    摘要 人类很自然地理解绘图和现实世界之间的连接,即使绘图非常不真实。此外,人类绘图理解不仅是分类,更重要的是理解绘图中的各个元素与物理世界中的部分之间的对应关系。为解答这个问题,我们提出了两个贡献:首先,我们引入了一个新的绘图-照片对应 bencmark,称为 $\textit{PSC6k}$,包含150万个绘图-照片对应的注释,其中有6250个绘图和125种物品类别。其次,我们提出了一种自动学习的方法,用于学习绘图-照片对应的密集对应关系,基于近期对照片对应学习的进步。我们的模型使用一个空间变换网络来估算绘图和照片的卷积流,从而学习绘图中的各个元素与物理世界中的部分之间的对应关系。我们发现,这种方法比许多强大的基elines表现出色,并且生成的预测值与其他旋转基elines的预测值是量化一致的。然而,我们的benchmark还发现了模型预测值与人类预测值之间的系统性差异。总之,我们的工作建议了一种可能的方法,可以开发出更加人类化的视觉图像理解系统,以达到不同层次的抽象水平。项目页面:https://photo-sketch-correspondence.github.io

Synthetic pre-training for neural-network interatomic potentials

  • paper_url: http://arxiv.org/abs/2307.15714
  • repo_url: https://github.com/jla-gardner/nnp-pre-training
  • paper_authors: John L. A. Gardner, Kathryn T. Baker, Volker L. Deringer
  • for: 该研究旨在提高atomistic materials模型中ml基于potential的精度和稳定性,通过使用“synthetic”数据进行预训练。
  • methods: 研究使用了一种基于图 neural network的equivariant graph-neural-network potential,并通过大量生成的synthetic数据进行预训练,然后 fine-tune 到一个较小的量子力学参考数据集。
  • results: 研究发现,通过使用synthetic数据进行预训练,可以提高模型的精度和稳定性,并且可以避免一些由量子力学参考数据集的限制。
    Abstract Machine learning (ML) based interatomic potentials have transformed the field of atomistic materials modelling. However, ML potentials depend critically on the quality and quantity of quantum-mechanical reference data with which they are trained, and therefore developing datasets and training pipelines is becoming an increasingly central challenge. Leveraging the idea of "synthetic" (artificial) data that is common in other areas of ML research, we here show that synthetic atomistic data, themselves obtained at scale with an existing ML potential, constitute a useful pre-training task for neural-network interatomic potential models. Once pre-trained with a large synthetic dataset, these models can be fine-tuned on a much smaller, quantum-mechanical one, improving numerical accuracy and stability in computational practice. We demonstrate feasibility for a series of equivariant graph-neural-network potentials for carbon, and we carry out initial experiments to test the limits of the approach.
    摘要

Efficiently Sampling the PSD Cone with the Metric Dikin Walk

  • paper_url: http://arxiv.org/abs/2307.12943
  • repo_url: None
  • paper_authors: Yunbum Kook, Santosh S. Vempala
  • for: 这篇论文关注于半定义程序的效率计算 frontier。
  • methods: 论文使用了Dikin walk和相关的metric的概念,并提出了一种新的metric的选择方法,以提高混合时间和每步复杂度。
  • results: 论文的结果表明,使用这种新的metric和混合方法可以大幅降低混合时间和每步复杂度,并且可以将依赖于约束的数量限制为多少。
    Abstract Semi-definite programs represent a frontier of efficient computation. While there has been much progress on semi-definite optimization, with moderate-sized instances currently solvable in practice by the interior-point method, the basic problem of sampling semi-definite solutions remains a formidable challenge. The direct application of known polynomial-time algorithms for sampling general convex bodies to semi-definite sampling leads to a prohibitively high running time. In addition, known general methods require an expensive rounding phase as pre-processing. Here we analyze the Dikin walk, by first adapting it to general metrics, then devising suitable metrics for the PSD cone with affine constraints. The resulting mixing time and per-step complexity are considerably smaller, and by an appropriate choice of the metric, the dependence on the number of constraints can be made polylogarithmic. We introduce a refined notion of self-concordant matrix functions and give rules for combining different metrics. Along the way, we further develop the theory of interior-point methods for sampling.
    摘要 semi-definite 计划表示一种高效计算的前沿。虽然有很多进步在半定义优化方面,但基本的半定义解析问题仍然是一项挑战。直接将通用 convex 体的算法应用到半定义 sampling 中会导致非常高的运行时间。此外,已知的通用方法需要费时的舒缩阶段作为先决条件。我们分析了 Dikin 步行,首先适应到通用度量,然后适应 PSD cone 中的 affine 约束。得到的混合时间和每步复杂度较小,并且通过适当的度量选择,对数量约束的依赖可以被polylogarithmic。我们引入了自适应矩阵函数的更加细化的定义,并给出了不同度量的结合规则。在过程中,我们进一步发展了内点方法的 sampling 理论。

On Privileged and Convergent Bases in Neural Network Representations

  • paper_url: http://arxiv.org/abs/2307.12941
  • repo_url: None
  • paper_authors: Davis Brown, Nikhil Vyas, Yamini Bansal
  • for: 本研究探讨了神经网络学习的表示学习是否具有特权和共同基准。Specifically, we examine the significance of feature directions represented by individual neurons.
  • methods: 我们使用arbitrary rotations of neural representations来检验神经网络的特权性。我们还比较了由不同随机初始化生成的神经网络的基准。
  • results: 我们的研究发现,神经网络不 converge to a unique basis,而且基准相关性在各层神经网络中增加了 significannotly。Linear Mode Connectivity也与网络宽度相关,但这种相关性不是由基准相关性增加的。
    Abstract In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish that arbitrary rotations of neural representations cannot be inverted (unlike linear networks), indicating that they do not exhibit complete rotational invariance. Subsequently, we explore the possibility of multiple bases achieving identical performance. To do this, we compare the bases of networks trained with the same parameters but with varying random initializations. Our study reveals two findings: (1) Even in wide networks such as WideResNets, neural networks do not converge to a unique basis; (2) Basis correlation increases significantly when a few early layers of the network are frozen identically. Furthermore, we analyze Linear Mode Connectivity, which has been studied as a measure of basis correlation. Our findings give evidence that while Linear Mode Connectivity improves with increased network width, this improvement is not due to an increase in basis correlation.
    摘要 在本研究中,我们研究神经网络学习的表示方式是否具有特权和归一化的基准。我们专门研究神经元个体表达的特征方向的重要性。首先,我们证明神经网络中的表示不能被逆转(不同于线性网络),这表明它们不具有完全的旋转不变性。接着,我们探索是否存在多个基准可以达到同样的性能。为此,我们比较具有相同参数但具有不同随机初始化的网络的基准。我们的研究发现了两个结论:(1)甚至在宽度较大的网络如WideResNets中,神经网络并不会 converges to a unique basis;(2)基准相关性在冻结某些早期层时明显增加。此外,我们分析了Linear Mode Connectivity,这是一种基准相关性的度量。我们的发现表明,Linear Mode Connectivity在网络宽度增加时会提高,但这并不是基准相关性的提高。

HOOD: Real-Time Robust Human Presence and Out-of-Distribution Detection with Low-Cost FMCW Radar

  • paper_url: http://arxiv.org/abs/2308.02396
  • repo_url: None
  • paper_authors: Sabri Mustafa Kahya, Muhammet Sami Yavuz, Eckehard Steinbach
  • For: 这种研究旨在实现indoor环境中的人员存在检测,使用60GHz短距离FMCW雷达,并提供一个实时Robust人员存在和非典型检测方法(HOOD)。* Methods: 该方法基于重构建立 architecture,使用60GHz短距离FMCW雷达生成 macro和微距离Doppler图像(RDIs),并通过对RDIs的重构来实现人员存在和非典型检测。* Results: 在基于60GHz短距离FMCW雷达的数据集上,HOOD方法实现了94.36%的平均AUROC水平,并在不同的人类场景下表现良好。此外,HOOD方法也在常见的OOD检测指标上表现出excel。实际实验结果可以在以下链接中找到:https://muskahya.github.io/HOOD
    Abstract Human presence detection in indoor environments using millimeter-wave frequency-modulated continuous-wave (FMCW) radar is challenging due to the presence of moving and stationary clutters in indoor places. This work proposes "HOOD" as a real-time robust human presence and out-of-distribution (OOD) detection method by exploiting 60 GHz short-range FMCW radar. We approach the presence detection application as an OOD detection problem and solve the two problems simultaneously using a single pipeline. Our solution relies on a reconstruction-based architecture and works with radar macro and micro range-Doppler images (RDIs). HOOD aims to accurately detect the "presence" of humans in the presence or absence of moving and stationary disturbers. Since it is also an OOD detector, it aims to detect moving or stationary clutters as OOD in humans' absence and predicts the current scene's output as "no presence." HOOD is an activity-free approach that performs well in different human scenarios. On our dataset collected with a 60 GHz short-range FMCW Radar, we achieve an average AUROC of 94.36%. Additionally, our extensive evaluations and experiments demonstrate that HOOD outperforms state-of-the-art (SOTA) OOD detection methods in terms of common OOD detection metrics. Our real-time experiments are available at: https://muskahya.github.io/HOOD
    摘要 人体存在检测在室内环境中使用毫米波频率调制连续波(FMCW)雷达是具有挑战性,原因在于室内的移动和静止干扰物。这项工作提出了“HOOD”实时可靠人体存在和非典型检测方法,通过利用60GHz短距离FMCW雷达。我们将存在检测应用作为非典型检测问题,并同时解决两个问题使用单一管道。我们的解决方案基于重建建筑,并与雷达macro和微范围Doppler图像(RDI)结合使用。HOOD hoped to accurately detect human presence in the presence or absence of moving and stationary disturbances. Since it is also an OOD detector, it aims to detect moving or stationary clutters as OOD in humans' absence and predicts the current scene's output as "no presence." HOOD is an activity-free approach that performs well in different human scenarios. On our dataset collected with a 60 GHz short-range FMCW Radar, we achieve an average AUROC of 94.36%. Additionally, our extensive evaluations and experiments demonstrate that HOOD outperforms state-of-the-art (SOTA) OOD detection methods in terms of common OOD detection metrics. Our real-time experiments are available at: https://muskahya.github.io/HOOD.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

  • paper_url: http://arxiv.org/abs/2307.12926
  • repo_url: None
  • paper_authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • for: 本文研究了 Contextual Bandits 和模仿学习问题,learner 缺乏直接行动的奖励信息,而是可以在每个回合 queries 专家,获得偏好反馈。learner 的目标是 minimize 执行动的尴尬吗,同时 minimize queries 的数量。
  • methods: 本文提出了一种 Algorithm ,利用在线回归 oracle 来选择行动和决定是否 queries。该 Algorithm 基于函数类型,可以在适当的链接函数下表示专家的偏好模型。
  • results: 本文证明了该 Algorithm 在 Contextual Bandits 设置下可以获得 $O(\min{\sqrt{T}, d/\Delta})$ 的尴尬 regret bound,其中 $T$ 是互动次数,$d$ 是函数类型的 eluder 维度,$\Delta$ 是最佳动作对所有上下文的最小偏好。此外,该 Algorithm 只需要 $O(\min{T, d^2/\Delta^2})$ 个 queries。在 imitation learning 设置下,本文也提出了一种 Algorithm,并证明了其在 regret 和 queries 上有类似的 guarantees。 interessingly,该 Algorithm 可以在专家不优秀时,even learn to outperform the underlying expert,这表明了 preference-based feedback 在 imitation learning 中的实际优势。
    Abstract We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback. The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert. In this paper, we assume that the learner has access to a function class that can represent the expert's preference model under appropriate link functions, and provide an algorithm that leverages an online regression oracle with respect to this function class for choosing its actions and deciding when to query. For the contextual bandit setting, our algorithm achieves a regret bound that combines the best of both worlds, scaling as $O(\min\{\sqrt{T}, d/\Delta\})$, where $T$ represents the number of interactions, $d$ represents the eluder dimension of the function class, and $\Delta$ represents the minimum preference of the optimal action over any suboptimal action under all contexts. Our algorithm does not require the knowledge of $\Delta$, and the obtained regret bound is comparable to what can be achieved in the standard contextual bandits setting where the learner observes reward signals at each round. Additionally, our algorithm makes only $O(\min\{T, d^2/\Delta^2\})$ queries to the expert. We then extend our algorithm to the imitation learning setting, where the learning agent engages with an unknown environment in episodes of length $H$ each, and provide similar guarantees for regret and query complexity. Interestingly, our algorithm for imitation learning can even learn to outperform the underlying expert, when it is suboptimal, highlighting a practical benefit of preference-based feedback in imitation learning.
    摘要 我们考虑了上下文强化策略和模仿学习问题,learner缺乏直接行动的奖励信息。相反,learner可以在每个回合中活动地询问专家, compare two actions 并获得受损的偏好反馈。learner的目标是二元的:一方面,避免 Executed 动作的后悔,另一方面,避免向专家提问。在这篇文章中,我们假设learner有Function class的存在,可以表示专家的偏好模型,并提供了一个给予online regression oracle 的算法,用于选择动作和决定当 query。 For 上下文强化策略设定,我们的算法可以获得 $O(\min\{\sqrt{T}, d/\Delta\})$ 的后悔 bound,where $T$ represents the number of interactions, $d$ represents the eluder dimension of the function class, and $\Delta$ represents the minimum preference of the optimal action over any suboptimal action under all contexts。我们的算法不需要知道 $\Delta$,而且的后悔 bound 与标准上下文强化策略设定,where the learner observes reward signals at each round,相同。此外,我们的算法仅需要 $O(\min\{T, d^2/\Delta^2\})$ 问题给专家。然后,我们延伸我们的算法到模仿学习设定,learner在每个回合中与未知环境进行交互,并提供了相似的后悔和问题复杂性 guarantee。有趣的是,我们的算法可以在模仿学习设定中learn to outperform the underlying expert,当专家是不良的时候,显示了偏好反馈在模仿学习中的实际优点。

  • paper_url: http://arxiv.org/abs/2307.14359
  • repo_url: None
  • paper_authors: Benny Wong
  • for: 这个研究论文是为了探讨一种新的优化方法—— Gaussian Crunching Search(GCS),以及其在不同领域中的应用。
  • methods: 本研究使用 Gaussian Crunching Search(GCS)方法,启发自狄拉克分布中的粒子行为,旨在高效地探索解决空间并趋向于全局最优点。
  • results: 通过实验评估和与现有优化方法比较,本研究展示了 GCS 方法的优势和特点。这篇研究论文对于优化方法的研究和实践者都是一个有价值的资源。
    Abstract Optimization methods are essential in solving complex problems across various domains. In this research paper, we introduce a novel optimization method called Gaussian Crunching Search (GCS). Inspired by the behaviour of particles in a Gaussian distribution, GCS aims to efficiently explore the solution space and converge towards the global optimum. We present a comprehensive analysis of GCS, including its working mechanism, and potential applications. Through experimental evaluations and comparisons with existing optimization methods, we highlight the advantages and strengths of GCS. This research paper serves as a valuable resource for researchers, practitioners, and students interested in optimization, providing insights into the development and potential of Gaussian Crunching Search as a new and promising approach.
    摘要 优化方法是解决复杂问题的关键,在不同领域都有广泛应用。本研究论文介绍一种新的优化方法——高斯压缩搜索(GCS)。该方法 draws inspiration from高斯分布中粒子的行为,旨在效率地探索解决空间并趋向于全球最优点。我们对GCS进行了全面的分析,包括它的工作机制和潜在应用。通过实验评估和现有优化方法的比较,我们提出了GCS的优势和特点。这篇研究论文对优化领域的研究人员、实践者和学生都是一种有价值的资源,为他们提供了GCS的开发和潜力的深入了解。

Graph Neural Networks For Mapping Variables Between Programs – Extended Version

  • paper_url: http://arxiv.org/abs/2307.13014
  • repo_url: https://github.com/pmorvalho/ecai23-gnns-for-mapping-variables-between-programs
  • paper_authors: Pedro Orvalho, Jelle Piepenbrock, Mikoláš Janota, Vasco Manquinho
  • for: 本研究旨在提出一种基于图神经网络(GNN)的变量映射方法,用于比较两个程序的变量集。
  • methods: 本研究使用GNN来映射两个程序的抽象 sintaxis树(AST)中的变量集。
  • results: 实验结果表明,我们的方法可以正确地映射83%的评估数据集中的变量。此外,当前领先的程序修复方法仅能修复约72%的错误程序,而我们的方法则可以修复约88.5%的错误程序。
    Abstract Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.
    摘要 自动化程序分析是计算机科学多个领域的关键研究领域之一,尤其是形式方法和人工智能。由于程序等价问题是不可解决的,因此比较两个程序很困难。通常,为了比较两个程序,需要在两个程序中变量集的关系。因此,将变量 между两个程序映射到相同的空间是有用的,可以用于许多任务,如程序等价、程序分析、程序修复和冲击检测。在这种工作中,我们提出使用图 neural network(GNN)将两个程序的抽象语法树(AST)中的变量集映射到相同的空间。为了证明变量映射的强大性,我们提出了三种用例,用于修复 novice 程序员在入门编程作业(IPA)中常见的错误。我们的实验结果表明,我们的方法可以正确地映射83%的评估数据集。此外,我们的实验还表明,现有的程序修复方法,强调程序结构,只能修复约72%的错误程序。与此相比,我们的方法, solely 基于变量映射,可以修复约88.5%的错误程序。