cs.LG - 2023-08-15

Dyadic Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.07843
  • repo_url: https://github.com/statisticalreinforcementlearninglab/roadmap2.0testbed
  • paper_authors: Shuangning Li, Lluis Salvat Niell, Sung Won Choi, Inbal Nahum-Shani, Guy Shani, Susan Murphy
  • for: 该论文旨在提高健康结果,通过在日常生活中提供便捷的干预方法。
  • methods: 该论文提出了一种基于在线演进学习算法,以个性化干预发送方式,基于Contextual因素和target人和他们的护理伴侣之前的反应。
  • results: 通过在模拟场景和实际数据集上进行实验研究,提出了一种 bayesian和层次的dyadic RL算法,并证明了其可预测性。
    Abstract Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in helping individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship between a target person and their care partner -- with the aim of enhancing social support. In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impact the dyad across multiple time intervals. The developed dyadic RL is Bayesian and hierarchical. We formally introduce the problem setup, develop dyadic RL and establish a regret bound. We demonstrate dyadic RL's empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study.
    摘要 Mobile health aimed to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proved crucial in helping individuals manage burdensome medical conditions. This presented opportunities in mobile health to design interventions that targeted the dyadic relationship - the relationship between a target person and their care partner - with the aim of enhancing social support. In this paper, we developed dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impacted the dyad across multiple time intervals. The developed dyadic RL was Bayesian and hierarchical. We formally introduced the problem setup, developed dyadic RL, and established a regret bound. We demonstrated dyadic RL's empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study.

Simple and Efficient Partial Graph Adversarial Attack: A New Perspective

  • paper_url: http://arxiv.org/abs/2308.07834
  • repo_url: https://github.com/pasalab/pga
  • paper_authors: Guanghui Zhu, Mengyu Chen, Chunfeng Yuan, Yihua Huang
  • for: 提高图 neural network 的 robustness和安全性,针对图中所有节点的全球攻击方法。
  • methods: 提出一种全新的partial graph attack(PGA)方法,选择易于攻击的节点作为攻击目标,并提出了一种层次目标选择策略、一种成本效果较高的锚点选择策略和一种迭代循环增强的迭代式攻击方法。
  • results: 对比其他现有的图全球攻击方法,PGA可以实现显著提高攻击效果和攻击效率。
    Abstract As the study of graph neural networks becomes more intensive and comprehensive, their robustness and security have received great research interest. The existing global attack methods treat all nodes in the graph as their attack targets. Although existing methods have achieved excellent results, there is still considerable space for improvement. The key problem is that the current approaches rigidly follow the definition of global attacks. They ignore an important issue, i.e., different nodes have different robustness and are not equally resilient to attacks. From a global attacker's view, we should arrange the attack budget wisely, rather than wasting them on highly robust nodes. To this end, we propose a totally new method named partial graph attack (PGA), which selects the vulnerable nodes as attack targets. First, to select the vulnerable items, we propose a hierarchical target selection policy, which allows attackers to only focus on easy-to-attack nodes. Then, we propose a cost-effective anchor-picking policy to pick the most promising anchors for adding or removing edges, and a more aggressive iterative greedy-based attack method to perform more efficient attacks. Extensive experimental results demonstrate that PGA can achieve significant improvements in both attack effect and attack efficiency compared to other existing graph global attack methods.
    摘要 Our approach consists of three key components:1. Hierarchical target selection policy: This policy allows attackers to focus on easy-to-attack nodes, reducing the overall cost of the attack.2. Cost-effective anchor-picking policy: This policy selects the most promising anchors for adding or removing edges, maximizing the attack effect while minimizing the cost.3. Iterative greedy-based attack method: This method performs more efficient attacks by iteratively adding or removing edges based on the selected anchors.Our extensive experimental results show that PGA achieves significant improvements in both attack effect and attack efficiency compared to other existing graph global attack methods.

REFORMS: Reporting Standards for Machine Learning Based Science

  • paper_url: http://arxiv.org/abs/2308.07832
  • repo_url: None
  • paper_authors: Sayash Kapoor, Emily Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Malik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, Arvind Narayanan
  • For: The paper aims to provide clear reporting standards for machine learning (ML) based science to address the issues of validity, reproducibility, and generalizability in scientific research.* Methods: The paper presents the REFORMS checklist, a set of 32 questions and guidelines developed based on a consensus of 19 researchers across various disciplines.* Results: The REFORMS checklist can serve as a resource for researchers, referees, and journals to ensure transparency and reproducibility in ML-based scientific research.Here is the information in Simplified Chinese text:* For: 这篇论文目标是提供机器学习(ML)基于科学研究的清晰报告标准,以解决科学研究中有效性、可重现性和普适性的问题。* Methods: 论文提出了REFORMS检查表(Reporting Standards For Machine Learning Based Science),这是基于19位研究者来自计算机科学、数据科学、数学、社会科学和生物医学等领域的共识,包括32个问题和对应的指南。* Results: REFORMS检查表可以为研究者、审稿人和杂志编辑提供一个资源,以确保机器学习基于科学研究的透明度和可重现性。
    Abstract Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
    摘要 机器学习(ML)方法在科学研究中广泛应用,但是其应用也伴随着有效性、可重复性和普遍性的失败。这些失败可能会阻碍科学进步,导致无效的宣称得到共识,并且可能会下降机器学习基于科学的威信。机器学习方法经常在不同领域应用并失败,这使我们意识到了需要提供明确的报告标准。基于过去的文献检索,我们提出了REFORMS检查列表(Reporting Standards For Machine Learning Based Science)。它包括32个问题和一对拥有相同目标的指南。REFORMS是由19名来自计算机科学、数据科学、数学、社会科学和生物医学科学的研究人员共识而成,它可以作为研究人员设计和实施研究时的参考,同时也可以用于审稿人们审核文章,以及杂志 enforcing 透明度和可重复性的标准。

CMISR: Circular Medical Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2308.08567
  • repo_url: None
  • paper_authors: Honggui Li, Maria Trocan, Dimitri Galayko, Mohamad Sawan
  • for: 提高医疗影像超分辨率(MISR)的性能,提出一种基于循环反馈的医疗影像超分辨率框架(CMISR)。
  • methods: 使用循环反馈机制,分为本地反馈和全局反馈两类,实现了关键点稳定性和稳定性。
  • results: CMISR在三种缩放因子和三个开源医疗影像dataset上的实验结果表明,其在重建性能方面胜过传统MISR,特别适用于医疗影像中具有强的边缘或激烈对比。
    Abstract Classical methods of medical image super-resolution (MISR) utilize open-loop architecture with implicit under-resolution (UR) unit and explicit super-resolution (SR) unit. The UR unit can always be given, assumed, or estimated, while the SR unit is elaborately designed according to various SR algorithms. The closed-loop feedback mechanism is widely employed in current MISR approaches and can efficiently improve their performance. The feedback mechanism may be divided into two categories: local and global feedback. Therefore, this paper proposes a global feedback-based closed-cycle framework, circular MISR (CMISR), with unambiguous UR and SR elements. Mathematical model and closed-loop equation of CMISR are built. Mathematical proof with Taylor-series approximation indicates that CMISR has zero recovery error in steady-state. In addition, CMISR holds plug-and-play characteristic which can be established on any existing MISR algorithms. Five CMISR algorithms are respectively proposed based on the state-of-the-art open-loop MISR algorithms. Experimental results with three scale factors and on three open medical image datasets show that CMISR is superior to MISR in reconstruction performance and is particularly suited to medical images with strong edges or intense contrast.
    摘要 传统的医疗影像超分辨 (MISR) 方法使用开放式架构,其中隐式下解 (UR) 单元和显式超分辨 (SR) 单元是分开的。UR单元可以被给定、 Assuming 或估算,而SR单元则根据不同的SR算法进行精心设计。现有的MISR方法广泛采用了关闭着反馈机制,可以有效提高其性能。反馈机制可以分为两类:本地反馈和全球反馈。因此,本文提出了一种基于全球反馈的循环式框架,即循环MISR (CMISR),其中UR和SR元素具有明确的定义。我们建立了CMISR的数学模型和关闭着方程,并通过泰勒级数拟合得出了CMISR在稳态状态下的零回归误差。此外,CMISR具有插件和玩儿特性,可以在任何现有的MISR算法基础上实现。我们根据现有的开放式MISR算法,分别提出了5种CMISR算法。实验结果表明,CMISR在重建性能方面高于MISR,特别适用于医疗影像中具有强的边缘或激烈的对比。

Cerberus: A Deep Learning Hybrid Model for Lithium-Ion Battery Aging Estimation and Prediction Based on Relaxation Voltage Curves

  • paper_url: http://arxiv.org/abs/2308.07824
  • repo_url: None
  • paper_authors: Yue Xiang, Bo Jiang, Haifeng Dai
  • For: The paper aims to estimate and predict the capacity aging of lithium-ion batteries using a hybrid model based on deep learning, which can accurately forecast the future capacity of the batteries.* Methods: The model uses historical capacity decay data and extracts salient features from charge and discharge relaxation processes to estimate the present capacity and predict future capacity.* Results: The model achieves a mean absolute percentage error (MAPE) of 0.29% under a charging condition of 0.25C, demonstrating its effectiveness in estimating and predicting capacity aging using real-world relaxation processes and historical capacity records within battery management systems (BMS).
    Abstract The degradation process of lithium-ion batteries is intricately linked to their entire lifecycle as power sources and energy storage devices, encompassing aspects such as performance delivery and cycling utilization. Consequently, the accurate and expedient estimation or prediction of the aging state of lithium-ion batteries has garnered extensive attention. Nonetheless, prevailing research predominantly concentrates on either aging estimation or prediction, neglecting the dynamic fusion of both facets. This paper proposes a hybrid model for capacity aging estimation and prediction based on deep learning, wherein salient features highly pertinent to aging are extracted from charge and discharge relaxation processes. By amalgamating historical capacity decay data, the model dynamically furnishes estimations of the present capacity and forecasts of future capacity for lithium-ion batteries. Our approach is validated against a novel dataset involving charge and discharge cycles at varying rates. Specifically, under a charging condition of 0.25C, a mean absolute percentage error (MAPE) of 0.29% is achieved. This outcome underscores the model's adeptness in harnessing relaxation processes commonly encountered in the real world and synergizing with historical capacity records within battery management systems (BMS), thereby affording estimations and prognostications of capacity decline with heightened precision.
    摘要 锂离子电池的衰退过程与其整个生命周期深度相关,涵盖性能提供和能量存储等方面。因此,正确和快速地估计或预测锂离子电池的衰退状况备受广泛关注。然而,现有研究主要集中在 either 衰退估计或预测,忽视了这两个方面的动态融合。本文提出了一种基于深度学习的锂离子电池容量衰退估计和预测模型,其中抽象出了具有衰退相关性的充电和充放电过程特征。通过结合历史容量衰退数据,模型在实时提供了当前容量的估计和未来容量的预测,并且在充电条件下0.25C下达到了0.29%的平均绝对百分比误差(MAPE)。这一结果表明模型能够充分利用实际世界中常见的充电和充放电过程,同时与锂离子电池管理系统(BMS)中的历史容量记录相结合,从而为容量衰退的估计和预测提供了更高精度。

Deep reinforcement learning for process design: Review and perspective

  • paper_url: http://arxiv.org/abs/2308.07822
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Qinghe Gao, Artur M. Schweidtmann
  • for: 本研究旨在探讨人工智能技术如何加速化学工业中的可再生能源和原料供应转型。
  • methods: 本研究使用深度强化学习,一种机器学习技术,来解决化学工程中复杂的决策问题,并且探讨了这些技术在过程设计中的应用前景。
  • results: 本研究对现有的深度强化学习在过程设计中的应用进行了抽象和评估,并探讨了未来这些技术在化学工程中的发展前景。
    Abstract The transformation towards renewable energy and feedstock supply in the chemical industry requires new conceptual process design approaches. Recently, breakthroughs in artificial intelligence offer opportunities to accelerate this transition. Specifically, deep reinforcement learning, a subclass of machine learning, has shown the potential to solve complex decision-making problems and aid sustainable process design. We survey state-of-the-art research in reinforcement learning for process design through three major elements: (i) information representation, (ii) agent architecture, and (iii) environment and reward. Moreover, we discuss perspectives on underlying challenges and promising future works to unfold the full potential of reinforcement learning for process design in chemical engineering.
    摘要 “对于化学工业中的可再生能源和原料供应转型,需要新的概念过程设计方法。现在,人工智能技术的突破发展带来了加速这个转型的机遇。特别是深度强化学习,一种机器学习的 subclass,在解决复杂决策问题和推动可持续过程设计方面表现出了潜力。我们通过三个主要元素:(i)信息表示,(ii)代理架构,以及(iii)环境和奖励,总结了现代研究的深度强化学习在过程设计方面的状况。此外,我们还讨论了下一步的挑战和未来研究的前景,以探索深度强化学习在化学工程中的潜力。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Quantifying the Cost of Learning in Queueing Systems

  • paper_url: http://arxiv.org/abs/2308.07817
  • repo_url: None
  • paper_authors: Daniel Freund, Thodoris Lykouris, Wentao Weng
  • For: This paper is written for researchers and practitioners interested in queueing systems and their optimal control, particularly in the context of parameter uncertainty.* Methods: The paper proposes a new metric called the Cost of Learning in Queueing (CLQ) to quantify the maximum increase in time-averaged queue length caused by parameter uncertainty. The authors also propose a unified analysis framework that bridges Lyapunov and bandit analysis to establish the results.* Results: The paper characterizes the CLQ of a single-queue multi-server system and extends the results to multi-queue multi-server systems and networks of queues. The authors show that the CLQ is a useful metric for evaluating the performance of queueing systems in the presence of parameter uncertainty.
    Abstract Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of system parameters. Of course, this assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms. In this paper, we argue that an asymptotic metric, which focuses on late-stage performance, is insufficient to capture the intrinsic statistical complexity of learning in queueing systems which typically occurs in the early stage. Instead, we propose the Cost of Learning in Queueing (CLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the CLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for CLQ that bridges Lyapunov and bandit analysis, which could be of independent interest.
    摘要 queueing 系统是广泛应用的随机模型,有用cases在通信网络、医疗、服务系统等。虽然其优化控制得到了广泛的研究,但大多数现有方法假设系统参数具有完美的知识。然而,这种假设在实践中rarely holds,因此引起了一种Recent Line of Work on Bandit Learning for Queueing Systems。这个流行的研究方向主要关注 asymptotic performance of the proposed algorithms。在这篇论文中,我们 argue that an asymptotic metric, which focuses on late-stage performance, is insufficient to capture the intrinsic statistical complexity of learning in queueing systems, which typically occurs in the early stage。 Instead, we propose the Cost of Learning in Queueing (CLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty。 We characterize the CLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues。在证明我们的结果时,我们提出了一个统一的分析框架 для CLQ,该框架可以将 Lyapunov 和 bandit 分析相结合,这可能会对独立的研究有益。

Fairness and Privacy in Federated Learning and Their Implications in Healthcare

  • paper_url: http://arxiv.org/abs/2308.07805
  • repo_url: https://github.com/UVA-MLSys/DS7406
  • paper_authors: Navya Annapareddy, Jade Preston, Judy Fox
    for: This paper aims to provide an overview of the typical lifecycle of fair federated learning in research and an updated taxonomy to account for the current state of fairness in implementations, with a focus on the healthcare domain.methods: The paper uses a decentralized approach to training machine learning models, called federated learning, to address data security, privacy, and vulnerability considerations.results: The paper highlights the challenges of implementing fairness in federated learning, including node data not being independent and identically distributed (iid), clients requiring high levels of communication overhead between peers, and the heterogeneity of different clients within a network with respect to dataset bias and size. The paper also provides added insight into the implications and challenges of implementing and supporting fairness in federated learning in the healthcare domain.
    Abstract Currently, many contexts exist where distributed learning is difficult or otherwise constrained by security and communication limitations. One common domain where this is a consideration is in Healthcare where data is often governed by data-use-ordinances like HIPAA. On the other hand, larger sample sizes and shared data models are necessary to allow models to better generalize on account of the potential for more variability and balancing underrepresented classes. Federated learning is a type of distributed learning model that allows data to be trained in a decentralized manner. This, in turn, addresses data security, privacy, and vulnerability considerations as data itself is not shared across a given learning network nodes. Three main challenges to federated learning include node data is not independent and identically distributed (iid), clients requiring high levels of communication overhead between peers, and there is the heterogeneity of different clients within a network with respect to dataset bias and size. As the field has grown, the notion of fairness in federated learning has also been introduced through novel implementations. Fairness approaches differ from the standard form of federated learning and also have distinct challenges and considerations for the healthcare domain. This paper endeavors to outline the typical lifecycle of fair federated learning in research as well as provide an updated taxonomy to account for the current state of fairness in implementations. Lastly, this paper provides added insight into the implications and challenges of implementing and supporting fairness in federated learning in the healthcare domain.
    摘要 当前,有许多情况存在分布式学习是困难或受到安全和通信限制的情况。一个常见的领域是医疗领域,数据经常受到数据使用规定如HIPAA的限制。然而,更大的样本大小和共享数据模型是必要的,以使模型更好地泛化,因为可能存在更多的变化和平衡不足表示的类别。分布式学习是一种分布式学习模型,它使得数据在分布式学习网络中被训练,并解决了数据安全、隐私和抵触问题,因为数据本身不被分布式学习网络中的节点共享。主要挑战包括节点数据不是独立和同分布(iid)、客户需要高度的同域通信开销和客户网络中的数据偏好和大小不均。随着领域的发展,对分布式学习的公平性也被引入,并通过新的实现方式。公平性方法与标准的分布式学习不同,也有特殊的挑战和医疗领域中的考虑。本文尝试将研究中的公平分布式学习的典型生命周期和更新的分类表示出来,并提供了对当前公平性实现的更多的深入视角。最后,本文还提供了在实施和支持公平分布式学习在医疗领域中的挑战和问题。

Adaptive Noise Covariance Estimation under Colored Noise using Dynamic Expectation Maximization

  • paper_url: http://arxiv.org/abs/2308.07797
  • repo_url: https://github.com/ajitham123/DEM_NCM
  • paper_authors: Ajith Anil Meera, Pablo Lanillos
  • for: 这篇论文是为了提出一个新的脑心理静电组织(Brain-Inspired Algorithm),用于精确地估计动态系统中的噪音协调矩阵(Noise Covariance Matrix,NCM)。
  • methods: 这个算法extend了Dynamic Expectation Maximization(DEM)算法,以在线上估计噪音协调矩阵和状态估计,并且可以适应彩色噪音(colored noise)的情况。
  • results: 透过Randomized numerical simulations,我们展示了我们的估计方法在彩色噪音下比基准方法(Variational Bayes)更好,并且在高彩色噪音情况下也能够实现更好的结果。
    Abstract The accurate estimation of the noise covariance matrix (NCM) in a dynamic system is critical for state estimation and control, as it has a major influence in their optimality. Although a large number of NCM estimation methods have been developed, most of them assume the noises to be white. However, in many real-world applications, the noises are colored (e.g., they exhibit temporal autocorrelations), resulting in suboptimal solutions. Here, we introduce a novel brain-inspired algorithm that accurately and adaptively estimates the NCM for dynamic systems subjected to colored noise. Particularly, we extend the Dynamic Expectation Maximization algorithm to perform both online noise covariance and state estimation by optimizing the free energy objective. We mathematically prove that our NCM estimator converges to the global optimum of this free energy objective. Using randomized numerical simulations, we show that our estimator outperforms nine baseline methods with minimal noise covariance estimation error under colored noise conditions. Notably, we show that our method outperforms the best baseline (Variational Bayes) in joint noise and state estimation for high colored noise. We foresee that the accuracy and the adaptive nature of our estimator make it suitable for online estimation in real-world applications.
    摘要 预测动态系统中噪声 covariance 矩阵(NCM)的准确性是控制和状态估计中关键的,因为它对系统的优化产生了很大的影响。虽然有很多 NCMEstimation 方法已经开发,但大多数它们假设噪声是白噪声(即噪声无相关性)。然而,在实际应用中,噪声通常是染色的(即噪声展现了时间自相关性),从而导致估计结果不佳。在这篇文章中,我们介绍了一种基于脑神经元的新算法,可以准确地适应 colored noise 的动态系统 NCM 估计。我们在 Dynamic Expectation Maximization 算法的基础上扩展了该算法,以在线进行噪声 covariance 和状态估计,并通过优化自由能对象来实现。我们数学证明了我们的 NCMEstimation 算法 converge 到 global optimum 的自由能对象上。使用随机数字 simulations,我们示出了我们的估计算法在噪声 Conditions 下比基eline方法(Variational Bayes)的噪声 covariance 估计误差较低。特别是,我们示出了我们的方法在高染色噪声 Conditions 下与 Variational Bayes 的联合噪声和状态估计表现较好。我们认为我们的方法的准确性和适应性使其适用于实际应用中的在线估计。

Implementing Quantum Generative Adversarial Network (qGAN) and QCBM in Finance

  • paper_url: http://arxiv.org/abs/2308.08448
  • repo_url: None
  • paper_authors: Santanu Ganguly
  • for: 这个论文探讨了应用量子机器学习(QML)在金融领域的未来研究方向,以及在金融世界中各种应用的QML模型。
  • methods: 本文使用实际的金融数据和模拟环境,比较了不同QML模型的性能,包括qGAN和QCBM等。
  • results: 研究显示,QML在金融领域可能具有未来的优势,并且qGAN模型在某些情况下表现出了明显的优势。
    Abstract Quantum machine learning (QML) is a cross-disciplinary subject made up of two of the most exciting research areas: quantum computing and classical machine learning (ML), with ML and artificial intelligence (AI) being projected as the first fields that will be impacted by the rise of quantum machines. Quantum computers are being used today in drug discovery, material & molecular modelling and finance. In this work, we discuss some upcoming active new research areas in application of quantum machine learning (QML) in finance. We discuss certain QML models that has become areas of active interest in the financial world for various applications. We use real world financial dataset and compare models such as qGAN (quantum generative adversarial networks) and QCBM (quantum circuit Born machine) among others, using simulated environments. For the qGAN, we define quantum circuits for discriminators and generators and show promises of future quantum advantage via QML in finance.
    摘要 量子机器学习(QML)是一个跨学科领域,包括量子计算和经典机器学习(ML),被认为是未来量子机器的发展将首先影响的两个领域之一。现在,量子计算机已经在药物发现、物质和分子模拟以及金融领域中使用。在这个工作中,我们讨论了在金融领域应用量子机器学习(QML)的一些新 aktive研究领域。我们讨论了一些在金融界引起了广泛关注的QML模型,如quantum generative adversarial networks(qGAN)和Quantum Circuit Born Machine(QCBM)等,并使用实际世界金融数据进行比较。对于qGAN,我们定义了量子电路 для批分类器和生成器,并显示了未来量子优势的承诺。

Informed Named Entity Recognition Decoding for Generative Language Models

  • paper_url: http://arxiv.org/abs/2308.07791
  • repo_url: None
  • paper_authors: Tobias Deußer, Lars Hillebrand, Christian Bauckhage, Rafet Sifa
  • for: 本研究旨在提出一种简单 yet effective的方法,即 Informed Named Entity Recognition Decoding (iNERD),用于循环的 named entity recognition 任务。
  • methods: 该方法利用当前的生成模型来进行语言理解,并采用一种有知识的 decoding 方式,将信息提取 tasks 与开放式文本生成结合,从而提高性能并消除所有的幻觉。
  • results: 通过在 eight 个 named entity recognition 数据集上评估五种生成语言模型,研究发现该方法在未知实体类集合环境下表现出优异的适应能力,特别是在不知道实体类集合时,表现更加出色。
    Abstract Ever-larger language models with ever-increasing capabilities are by now well-established text processing tools. Alas, information extraction tasks such as named entity recognition are still largely unaffected by this progress as they are primarily based on the previous generation of encoder-only transformer models. Here, we propose a simple yet effective approach, Informed Named Entity Recognition Decoding (iNERD), which treats named entity recognition as a generative process. It leverages the language understanding capabilities of recent generative models in a future-proof manner and employs an informed decoding scheme incorporating the restricted nature of information extraction into open-ended text generation, improving performance and eliminating any risk of hallucinations. We coarse-tune our model on a merged named entity corpus to strengthen its performance, evaluate five generative language models on eight named entity recognition datasets, and achieve remarkable results, especially in an environment with an unknown entity class set, demonstrating the adaptability of the approach.
    摘要 现代语言模型不断增长,功能也不断提高。可是,信息提取任务,如名实Recognition,仍然受到这些进步的影响很少,因为它们基本上是基于上一代encoder-only transformer模型。在这里,我们提出了一种简单 yet effective的方法,即 Informed Named Entity Recognition Decoding(iNERD)。它将名实Recognition视为生成过程,利用当前的生成模型对语言理解能力,并采用了了知情 decode 策略,将开放式文本生成和信息提取相结合,提高性能,并完全消除任何幻觉的风险。我们在合并的名实Corpus上粗略调整我们的模型,使其在八个名实Recognition 数据集上表现出色,特别是在未知类型集的环境中,表现出了适应性。

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

  • paper_url: http://arxiv.org/abs/2308.07787
  • repo_url: https://github.com/joannahong/diffv2s
  • paper_authors: Jeongsoo Choi, Joanna Hong, Yong Man Ro
  • for: 这个研究的目的是提高视频到语音合成的精度和可理解性,使得可以从视频输入中恰当地重建出高质量的语音。
  • methods: 这个研究使用了一种新的视觉导向说话嵌入表示器,它使用了一个自我超vised预训练模型和提问调整技术来提取嵌入。此外,它还使用了一种扩散基于的视频到语音合成模型,称为DiffV2S,其 conditioned于提取的嵌入和输入视频帧中的视觉表示。
  • results: 这个研究的实验结果表明,DiffV2S可以保持输入视频帧中的音素细节,同时创造一个高度可理解的mel-spectrogram,其中每个说话者的身份都被保留。相比之下,DiffV2S的表现比之前的视频到语音合成技术更高。
    Abstract Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speaking style guidance from a reference auditory information. Nevertheless, it is not always possible to obtain the audio information from the corresponding video input, especially during the inference time. In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time. Using the extracted vision-guided speaker embedding representations, we further develop a diffusion-based video-to-speech synthesis model, so called DiffV2S, conditioned on those speaker embeddings and the visual representation extracted from the input video. The proposed DiffV2S not only maintains phoneme details contained in the input video frames, but also creates a highly intelligible mel-spectrogram in which the speaker identities of the multiple speakers are all preserved. Our experimental results show that DiffV2S achieves the state-of-the-art performance compared to the previous video-to-speech synthesis technique.
    摘要 近期研究已经展示了视频到语音合成的出色成果,即通过视觉输入重建语音。然而,之前的研究往往因缺乏足够的指导,使模型很难准确地推理出正确的内容和合适的声音。为解决这问题,他们采用了外部的 speaker embedding 作为引导,从参考听力信息中提取出speaker embedding。然而,在推理时不一定能获取相应的音频信息,特别是在推理时。在这篇论文中,我们提出了一种新的视频引导的 speaker embedding EXTRACTOR,使用自我超vised pre-trained模型和提示调整技术。通过这种方法,我们可以从输入视频信息中提取出丰富的 speaker embedding信息,而不需要外部的音频信息。使用提取的视频引导 speaker embedding表示,我们进一步开发了一种扩散基于的视频到语音合成模型,称为DiffV2S。DiffV2S 模型通过 Conditioned on those speaker embeddings and the visual representation extracted from the input video, the proposed DiffV2S not only maintains phoneme details contained in the input video frames, but also creates a highly intelligible mel-spectrogram in which the speaker identities of the multiple speakers are all preserved. Our experimental results show that DiffV2S achieves the state-of-the-art performance compared to the previous video-to-speech synthesis technique.

Hierarchical generative modelling for autonomous robots

  • paper_url: http://arxiv.org/abs/2308.07775
  • repo_url: None
  • paper_authors: Kai Yuan, Noor Sajid, Karl Friston, Zhibin Li
  • for: investigate the fundamental aspect of motor control in autonomous robotic operations, and develop a hierarchical generative model to achieve versatile sensorimotor control.
  • methods: use hierarchical generative modeling, multi-level planning, and numerical/physical simulation to achieve autonomous completion of complex tasks.
  • results: demonstrate the effectiveness of using human-inspired motor control algorithms, and show the ability of a humanoid robot to retrieve, transport, open, walk through a door, approach, and kick a football, while showing robust performance in presence of body damage and ground irregularities.Here’s the summary in Traditional Chinese:
  • for: 研究自主机械操作中的动作控制基础,并开发一个嵌入式生成模型以实现多标的感知动作控制。
  • methods: 使用嵌入式生成模型、多层规划和数据/物理模拟来完成自主任务。
  • results: 显示人类动作控制算法的效果,并展示一个人型机器人能够自主完成复杂任务,例如抓取、运输、开启、通过门、与足球进行踢动作,并在身体损坏和地面不平的情况下保持Robust性。
    Abstract Humans can produce complex whole-body motions when interacting with their surroundings, by planning, executing and combining individual limb movements. We investigated this fundamental aspect of motor control in the setting of autonomous robotic operations. We approach this problem by hierarchical generative modelling equipped with multi-level planning-for autonomous task completion-that mimics the deep temporal architecture of human motor control. Here, temporal depth refers to the nested time scales at which successive levels of a forward or generative model unfold, for example, delivering an object requires a global plan to contextualise the fast coordination of multiple local movements of limbs. This separation of temporal scales also motivates robotics and control. Specifically, to achieve versatile sensorimotor control, it is advantageous to hierarchically structure the planning and low-level motor control of individual limbs. We use numerical and physical simulation to conduct experiments and to establish the efficacy of this formulation. Using a hierarchical generative model, we show how a humanoid robot can autonomously complete a complex task that necessitates a holistic use of locomotion, manipulation, and grasping. Specifically, we demonstrate the ability of a humanoid robot that can retrieve and transport a box, open and walk through a door to reach the destination, approach and kick a football, while showing robust performance in presence of body damage and ground irregularities. Our findings demonstrated the effectiveness of using human-inspired motor control algorithms, and our method provides a viable hierarchical architecture for the autonomous completion of challenging goal-directed tasks.
    摘要 人类可以生成复杂全身运动when interacting with他们的环境,通过规划、执行和组合个体肢体运动。我们在自主 робоック操作中调查了这一基本问题。我们采用层次生成模型,带有多级规划,以模仿人类motor控制的深度 temporal architecture。在这里, temporal depth 指的是成功层次模型 unfold 的不同时间尺度,例如,为了交付物品,需要一个全局规划,以Contextualize 多个快速协调的肢体运动。这种层次分离也驱动了机器人和控制。具体来说,以实现多样化的感知动作控制,是通过层次结构的规划和低级动作控制来实现的。我们通过数字和物理模拟进行实验,并证明了这种形式的有效性。使用层次生成模型,我们展示了一个人型机器人可以自主完成一个复杂任务,需要整体的运动、抓取、搬运和踢球等多种功能。Specifically,我们示出了一个人型机器人可以拾取和运输一个箱子,通过门打开和走进去到目的地,并且在踢球时表现出了Robust performance 的特点。我们的发现表明了使用人类 inspirited motor control算法的有效性,而我们的方法提供了一种可靠的层次建筑,以便自主完成具有挑战性的目标指导任务。

A Graph Encoder-Decoder Network for Unsupervised Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.07774
  • repo_url: None
  • paper_authors: Mahsa Mesgaran, A. Ben Hamza
  • for: 检测图граFC中异常节点
  • methods: 使用自适应图形编码器-解码器模型,学习异常分数函数,将节点排序根据其异常程度。编码阶段使用新型的LCPool方法,通过本地化约束的线性编码来生成团 assignment matrix,解决了传统方法中学习参数的问题,提高了效率和可解释性。解码阶段使用LCUnpool方法重construct原始图гра的结构和节点特征。
  • results: 在六个基准数据集上进行了实验评估,结果表明该方法在比较状态前的异常检测方法中表现出色,超过了现有方法。
    Abstract A key component of many graph neural networks (GNNs) is the pooling operation, which seeks to reduce the size of a graph while preserving important structural information. However, most existing graph pooling strategies rely on an assignment matrix obtained by employing a GNN layer, which is characterized by trainable parameters, often leading to significant computational complexity and a lack of interpretability in the pooling process. In this paper, we propose an unsupervised graph encoder-decoder model to detect abnormal nodes from graphs by learning an anomaly scoring function to rank nodes based on their degree of abnormality. In the encoding stage, we design a novel pooling mechanism, named LCPool, which leverages locality-constrained linear coding for feature encoding to find a cluster assignment matrix by solving a least-squares optimization problem with a locality regularization term. By enforcing locality constraints during the coding process, LCPool is designed to be free from learnable parameters, capable of efficiently handling large graphs, and can effectively generate a coarser graph representation while retaining the most significant structural characteristics of the graph. In the decoding stage, we propose an unpooling operation, called LCUnpool, to reconstruct both the structure and nodal features of the original graph. We conduct empirical evaluations of our method on six benchmark datasets using several evaluation metrics, and the results demonstrate its superiority over state-of-the-art anomaly detection approaches.
    摘要 graph neural networks (GNNs) 的一个重要 компонент是聚合操作,它想要将 graphs 的大小减少,保留重要的结构信息。然而,大多数现有的 graph 聚合策略依赖一个由 GNN 层所得到的对称矩阵,这个矩阵通常具有可读的参数,往往导致计算复杂和模型解释不足。在这篇文章中,我们提出了一个无supervised graph encoder-decoder模型,用于侦测 graphs 中的异常点。在编码阶段,我们设计了一个名为 LCPool 的新的聚合机制,通过本地性受限的线性编码来找到一个对称矩阵,并通过解决一个最小二乘问题来找到一个最佳的对称矩阵。由于在编码过程中强制 enforcing 本地性限制,LCPool 可以免除学习参数,可以高效地处理大型 graphs,并且可以将原始图的主要结构特征传递到更粗糙的表示中。在解码阶段,我们提出了一个名为 LCUnpool 的解码操作,用于重建原始图的结构和节点特征。我们在六个 benchmark dataset 上进行了实验评估,结果显示我们的方法在与现有的侦测方法比较之下表现出色。

MOLE: MOdular Learning FramEwork via Mutual Information Maximization

  • paper_url: http://arxiv.org/abs/2308.07772
  • repo_url: None
  • paper_authors: Tianchao Li, Yulong Pei
  • for: 这个论文是为了介绍一种异步和本地学习框架,即Module Learning Framework(MOLE)。
  • methods: 这个框架将神经网络归一化为层,通过矩阵乘法来定义训练目标,并逐渐训练每个模块以达到最大化矩阵乘法的目标。
  • results: 实验表明,MOLE可以在向量-, 网格-和图形数据上进行高效的训练,并且可以解决图形数据上的节点级和图级任务。因此,MOLE已经在不同类型的数据上得到了实验证明。I hope that helps! Let me know if you have any other questions.
    Abstract This paper is to introduce an asynchronous and local learning framework for neural networks, named Modular Learning Framework (MOLE). This framework modularizes neural networks by layers, defines the training objective via mutual information for each module, and sequentially trains each module by mutual information maximization. MOLE makes the training become local optimization with gradient-isolated across modules, and this scheme is more biologically plausible than BP. We run experiments on vector-, grid- and graph-type data. In particular, this framework is capable of solving both graph- and node-level tasks for graph-type data. Therefore, MOLE has been experimentally proven to be universally applicable to different types of data.
    摘要 这篇论文旨在介绍一种异步和本地学习框架 для神经网络,名为模块学习框架(MOLE)。这个框架将神经网络归类为层,通过每个模块的互信息定义训练目标,并逐渐训练每个模块以互信息最大化。MOLE使得训练变成了本地优化,梯度在模块之间隔离,这种方式更加生物学可能性高于BP。我们在矢量-, 网格-和图形数据上进行了实验,并证明MOLE可以解决图形数据中的图级和节点级任务。因此,MOLE在不同类型的数据上都有广泛的应用前景。

NeFL: Nested Federated Learning for Heterogeneous Clients

  • paper_url: http://arxiv.org/abs/2308.07761
  • repo_url: None
  • paper_authors: Honggu Kang, Seohyeon Cha, Jinwoo Shin, Jongmyeong Lee, Joonhyuk Kang
  • for: 这个研究旨在解决联合学习(Federated Learning,FL)训练过程中缓态或无法进行训练的客户端(即慢车)对整个训练时间的影响,以及实现训练模型的更好可扩展性。
  • methods: 本研究提出了一个称为嵌套联合学习(NeFL)的架构,它可以将模型分解为多个子模型,并使用深度和宽度的扩展来实现。NeFL还使用了解析方程(ODEs)来调整步长大小,以便在不同的客户端上进行训练。
  • results: 透过一系列实验,本研究表明NeFL可以实现训练模型的更好可扩展性,特别是在最差的子模型(例如CIFAR-10上的8.33提升)。此外,NeFL与最近的FL研究相互协调。
    Abstract Federated learning (FL) is a promising approach in distributed learning keeping privacy. However, during the training pipeline of FL, slow or incapable clients (i.e., stragglers) slow down the total training time and degrade performance. System heterogeneity, including heterogeneous computing and network bandwidth, has been addressed to mitigate the impact of stragglers. Previous studies split models to tackle the issue, but with less degree-of-freedom in terms of model architecture. We propose nested federated learning (NeFL), a generalized framework that efficiently divides a model into submodels using both depthwise and widthwise scaling. NeFL is implemented by interpreting models as solving ordinary differential equations (ODEs) with adaptive step sizes. To address the inconsistency that arises when training multiple submodels with different architecture, we decouple a few parameters. NeFL enables resource-constrained clients to effectively join the FL pipeline and the model to be trained with a larger amount of data. Through a series of experiments, we demonstrate that NeFL leads to significant gains, especially for the worst-case submodel (e.g., 8.33 improvement on CIFAR-10). Furthermore, we demonstrate NeFL aligns with recent studies in FL.
    摘要 Federated learning (FL) 是一种有前途的方法,可以保持隐私性在分布式学习中。然而,在 FL 训练管线中,慢速或无法进行训练的客户端(即废物)会导致总训练时间增加和性能下降。系统多样性,包括不同的计算和网络带宽,已经被解决以减少废物的影响。先前的研究把模型分成了两部分来解决问题,但是这些方法具有较少的度量自由度,对于模型架构而言。我们提出了嵌套 federated learning(NeFL),一个通用的框架,可以快速地将模型分成子模型,使用深度和宽度的扩展。NeFL 通过将模型视为解决 ordinary differential equations(ODEs)的解,并使用适应步长来实现。为了解决不同架构下训练多个子模型时出现的不一致,我们将一些参数分离。NeFL 允许资源有限的客户端能够有效地加入 FL 管线,并让模型在更多数据上进行训练。通过一系列实验,我们展示了 NeFL 对 CIFAR-10 等数据集的进步,特别是最差的子模型(例如,8.33 倍进步)。此外,我们还证明 NeFL 与最近的 FL 研究相关。

Forward-Backward Reasoning in Large Language Models for Verification

  • paper_url: http://arxiv.org/abs/2308.07758
  • repo_url: None
  • paper_authors: Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok
  • for: 提高理解任务中的推理能力
  • methods: 使用反向推理和前向推理的组合方法
  • results: 实验结果表明,FOBAR方法在多个数据集和三种LLM中表现出状元水平的推理能力
    Abstract Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., "\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}" Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.
    摘要 链式思维(CoT)提示法在多种理解任务中表现出色。自康凝 \citep{wang2023selfconsistency}提出了采样多个理解链,以便通过不同的答案而产生多个可能性。在这篇论文中,我们提出了一种使用反向思维的新方法,用于验证候选答案。我们将问题中的一个token用${\bf x}$进行遮盖,然后询问LLM predict该遮盖的token,当提供一个简单的模板,即 "\textit{\textbf{如果我们知道上面的问题的答案是 \{一个候选答案\}, то值Unknown变量${\bf x}$是什么?}"。 intuitionally,LLM可以成功预测遮盖的token,如果提供的候选答案是正确的。我们还提出了FOBAR,用于将前向和反向思维相结合,以估算候选答案的概率。我们在六个数据集和三个LLM上进行了广泛的实验,实验结果表明,FOBAR在多种理解 bencmarks 上达到了领先的性能。

Exploiting Sparsity in Automotive Radar Object Detection Networks

  • paper_url: http://arxiv.org/abs/2308.07748
  • repo_url: None
  • paper_authors: Marius Lippke, Maurice Quach, Sascha Braun, Daniel Köhler, Michael Ulrich, Bastian Bischoff, Wei Yap Tan
  • for: 这 paper 的目的是提出一种基于 sparse convolutional neural network 的对象检测方法,用于解决自动驾驶系统中的环境感知问题。
  • methods: 该 paper 使用了 grid-based detection 和 sparse backbone 架构,并提出了 sparse kernel point pillars (SKPP) 和 dual voxel point convolutions (DVPC) 等技术来解决 радиар特有的挑战。
  • results: 该 paper 在 nuScenes 数据集上进行了评测,并证明了 SKPP-DPVCN 架构可以比基线和前一个状态的对象检测方法提高 Car AP4.0 的性能,并降低了平均缩放错误 (ASE) 值。
    Abstract Having precise perception of the environment is crucial for ensuring the secure and reliable functioning of autonomous driving systems. Radar object detection networks are one fundamental part of such systems. CNN-based object detectors showed good performance in this context, but they require large compute resources. This paper investigates sparse convolutional object detection networks, which combine powerful grid-based detection with low compute resources. We investigate radar specific challenges and propose sparse kernel point pillars (SKPP) and dual voxel point convolutions (DVPC) as remedies for the grid rendering and sparse backbone architectures. We evaluate our SKPP-DPVCN architecture on nuScenes, which outperforms the baseline by 5.89% and the previous state of the art by 4.19% in Car AP4.0. Moreover, SKPP-DPVCN reduces the average scale error (ASE) by 21.41% over the baseline.
    摘要 “精准感知环境是自动驾驶系统的关键,以确保其安全和可靠运行。雷达对象检测网络是这种系统的基本组件之一。使用CNN的对象检测器显示了良好的性能,但它们需要大量的计算资源。这篇论文研究了稀疏 convolutional 对象检测网络,它们将强大的格子基础与低计算资源相结合。我们研究了雷达特有挑战,并提出了 sparse kernel point pillars(SKPP)和 dual voxel point convolutions(DVPC)来解决grid rendering和稀疏脊梁架构的问题。我们评估了我们的 SKPP-DPVCN 架构在 nuScenes 上,其与基准值相比提高了4.19%,并且与前一个状态的艺术提高了5.89%。此外,SKPP-DPVCN 还下降了平均扩散误差(ASE)的21.41%。”

Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

  • paper_url: http://arxiv.org/abs/2308.07741
  • repo_url: None
  • paper_authors: Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev, Stefan Bauer, Manuel Wüthrich, Markus Wulfmeier, Martin Riedmiller, Arthur Allshire, Qiang Wang, Robert McCarthy, Hangyeol Kim, Jongchan Baek Pohang, Wookyong Kwon, Shanliang Qian, Yasunori Toshimitsu, Mike Yan Michelis, Amirhossein Kazemipour, Arman Raayatsanati, Hehui Zheng, Barnabasa Gavin Cangan, Bernhard Schölkopf, Georg Martius
  • for: 本研究的目的是bridge reinforcement learning(RL)和机器人共同体,让参与者通过实际操作real robot来验证RL算法的性能。
  • methods: 本研究使用了现有的real robot dataset,并提供了丰富的软件文档和初始化阶段,使参与者可以轻松地在real robot上进行学习和评估。
  • results: 研究发现,winning teams使用的方法可以在real robot上实现高效的dexterous manipulation任务,并且比预期的state-of-the-art offline RL算法更高效。
    Abstract Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore served as a bridge between the RL and robotics communities by allowing participants to experiment remotely with a real robot - as easily as in simulation. In the last years, offline reinforcement learning has matured into a promising paradigm for learning from pre-collected datasets, alleviating the reliance on expensive online interactions. We therefore asked the participants to learn two dexterous manipulation tasks involving pushing, grasping, and in-hand orientation from provided real-robot datasets. An extensive software documentation and an initial stage based on a simulation of the real set-up made the competition particularly accessible. By giving each team plenty of access budget to evaluate their offline-learned policies on a cluster of seven identical real TriFinger platforms, we organized an exciting competition for machine learners and roboticists alike. In this work we state the rules of the competition, present the methods used by the winning teams and compare their results with a benchmark of state-of-the-art offline RL algorithms on the challenge datasets.
    摘要 实验在真正机器人上具有时间和成本的限制,因此许多学习强化(RL)社区使用模拟器来开发和比较算法。然而,在实际环境中的交互性较复杂时,在模拟器上获得的 Insight 可能不准确。为了 bridging 这两个社区,我们在2022年的真机器人挑战中让参与者通过远程控制真机器人来进行实验,与在模拟器上进行实验一样简单。在过去几年中,离线学习强化学习(offline RL)已经成熟为一种有前途的学习方法,可以避免在临时便捷的在线交互中花费高昂的成本。因此,我们要求参与者通过学习提供的真机器人数据集来完成两项灵活的机械操作任务,包括推动、抓取和手中 orienting。为了使参与者更加方便地参与到竞赛中,我们提供了广泛的软件文档和一个基于真实设置的初始阶段。为了让每个团队有足够的访问预算来评估他们在一群七个相同的真机器人平台上的离线学习策略,我们组织了一场吸引了机器人学家和学习机器人之间的精彩竞赛。在这篇文章中,我们介绍了竞赛规则,表明赢家们使用的方法,并与当前的离线RL算法在挑战数据集上的比较。

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

  • paper_url: http://arxiv.org/abs/2308.07728
  • repo_url: None
  • paper_authors: Seokhyeon Ha, Sunbeom Jung, Jungwoo Lee
  • For: 本研究旨在提出一种新的方法,以便在不同目标领域进行精度的适应和性能优化。* Methods: 本研究使用了域名映射和精度评估来缓解特征扭曲问题,并通过Linear Probing和精度调整来优化头层。* Results: 对比基eline方法,本研究的方法在域外数据上显示出较高的性能,并且可以减少特征扭曲。
    Abstract Fine-tuning pre-trained neural network models has become a widely adopted approach across various domains. However, it can lead to the distortion of pre-trained feature extractors that already possess strong generalization capabilities. Mitigating feature distortion during adaptation to new target domains is crucial. Recent studies have shown promising results in handling feature distortion by aligning the head layer on in-distribution datasets before performing fine-tuning. Nonetheless, a significant limitation arises from the treatment of batch normalization layers during fine-tuning, leading to suboptimal performance. In this paper, we propose Domain-Aware Fine-Tuning (DAFT), a novel approach that incorporates batch normalization conversion and the integration of linear probing and fine-tuning. Our batch normalization conversion method effectively mitigates feature distortion by reducing modifications to the neural network during fine-tuning. Additionally, we introduce the integration of linear probing and fine-tuning to optimize the head layer with gradual adaptation of the feature extractor. By leveraging batch normalization layers and integrating linear probing and fine-tuning, our DAFT significantly mitigates feature distortion and achieves improved model performance on both in-distribution and out-of-distribution datasets. Extensive experiments demonstrate that our method outperforms other baseline methods, demonstrating its effectiveness in not only improving performance but also mitigating feature distortion.
    摘要 “已成为各领域的普遍采用方法,微型网络组件的精致调整已成为一个广泛应用的方法。然而,这可能会导致原有具备强化泛化能力的预训网络组件的扭曲。缓和预训网络组件的扭曲是非常重要的。 recent studies have shown promising results in handling feature distortion by aligning the head layer on in-distribution datasets before performing fine-tuning. However, a significant limitation arises from the treatment of batch normalization layers during fine-tuning, leading to suboptimal performance. In this paper, we propose Domain-Aware Fine-Tuning (DAFT), a novel approach that incorporates batch normalization conversion and the integration of linear probing and fine-tuning. Our batch normalization conversion method effectively mitigates feature distortion by reducing modifications to the neural network during fine-tuning. Additionally, we introduce the integration of linear probing and fine-tuning to optimize the head layer with gradual adaptation of the feature extractor. By leveraging batch normalization layers and integrating linear probing and fine-tuning, our DAFT significantly mitigates feature distortion and achieves improved model performance on both in-distribution and out-of-distribution datasets. Extensive experiments demonstrate that our method outperforms other baseline methods, demonstrating its effectiveness in not only improving performance but also mitigating feature distortion.”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening

  • paper_url: http://arxiv.org/abs/2308.07707
  • repo_url: https://github.com/if-loops/selective-synaptic-dampening
  • paper_authors: Jack Foster, Stefan Schoepf, Alexandra Brintrup
  • for: 本研究旨在解决机器学习模型忘记Specific information的挑战,以遵守数据隐私法规和 removing harmful, manipulated, or outdated information。
  • methods: 本研究提出了一种名为Selective Synaptic Dampening(SSD)的两步,Post hoc,无需重新训练的方法,它快速、高效,不需要长期存储训练数据。
  • results: 对比 existed unlearning 方法,SSD 的性能与重新训练方法相当,这表明了无需重新训练的后置式忘记方法的可行性。
    Abstract Machine unlearning, the ability for a machine learning model to forget, is becoming increasingly important to comply with data privacy regulations, as well as to remove harmful, manipulated, or outdated information. The key challenge lies in forgetting specific information while protecting model performance on the remaining data. While current state-of-the-art methods perform well, they typically require some level of retraining over the retained data, in order to protect or restore model performance. This adds computational overhead and mandates that the training data remain available and accessible, which may not be feasible. In contrast, other methods employ a retrain-free paradigm, however, these approaches are prohibitively computationally expensive and do not perform on par with their retrain-based counterparts. We present Selective Synaptic Dampening (SSD), a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data. First, SSD uses the Fisher information matrix of the training and forgetting data to select parameters that are disproportionately important to the forget set. Second, SSD induces forgetting by dampening these parameters proportional to their relative importance to the forget set with respect to the wider training data. We evaluate our method against several existing unlearning methods in a range of experiments using ResNet18 and Vision Transformer. Results show that the performance of SSD is competitive with retrain-based post hoc methods, demonstrating the viability of retrain-free post hoc unlearning approaches.
    摘要 机器学习模型的忘记能力,也就是机器学习模型的“忘记”,在符合数据隐私法规以及移除有害、操纵或过时信息方面变得越来越重要。然而,现有的状态 искусственный智能技术通常需要一定的重新训练,以保护或恢复模型在保留的数据上的性能。这会增加计算开销,并且需要训练数据保持可用和可访问,这可能不是可行的。相比之下,其他方法采用一种不需要重新训练的方法,但这些方法的计算成本过高,并且性能不如重新训练的方法。我们提出了一种新的两步、Post Hoc、无需重新训练的机器学习忘记方法:选择性神经元减弱(SSD)。首先,SSD使用训练和忘记数据的 Fisher 信息矩阵来选择对忘记集数据的重要参数。然后,SSD 通过对这些参数进行减弱,使其与忘记集数据相对更重要的参数相比,来实现忘记。我们在使用 ResNet18 和 Vision Transformer 进行了一系列实验,结果表明 SSD 的性能与重新训练后的Post Hoc方法相当竞争,这说明了无需重新训练的忘记方法的可行性。

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

  • paper_url: http://arxiv.org/abs/2308.07706
  • repo_url: None
  • paper_authors: Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal
  • for: 本研究旨在提高医疗领域图像分割 task 的效果,通过文本引导来增强视觉特征。
  • methods: 本研究使用多Modal vision-language模型,包括图像描述和图像特征,以捕捉图像描述中的semantic信息,提高医疗领域图像分割 task 的效果。
  • results: 研究发现,existings vision language模型在多个 dataset 上的传输性不高,需要进行手动调整或 fine-tuning 以适应医疗领域。而通过生成不同的图像描述来训练模型,可以提高模型的性能。
    Abstract Medical Image Segmentation is crucial in various clinical applications within the medical domain. While state-of-the-art segmentation models have proven effective, integrating textual guidance to enhance visual features for this task remains an area with limited progress. Existing segmentation models that utilize textual guidance are primarily trained on open-domain images, raising concerns about their direct applicability in the medical domain without manual intervention or fine-tuning. To address these challenges, we propose using multimodal vision-language models for capturing semantic information from image descriptions and images, enabling the segmentation of diverse medical images. This study comprehensively evaluates existing vision language models across multiple datasets to assess their transferability from the open domain to the medical field. Furthermore, we introduce variations of image descriptions for previously unseen images in the dataset, revealing notable variations in model performance based on the generated prompts. Our findings highlight the distribution shift between the open-domain images and the medical domain and show that the segmentation models trained on open-domain images are not directly transferrable to the medical field. But their performance can be increased by finetuning them in the medical datasets. We report the zero-shot and finetuned segmentation performance of 4 Vision Language Models (VLMs) on 11 medical datasets using 9 types of prompts derived from 14 attributes.
    摘要 医疗图像分割是医疗领域内的重要应用之一。虽然现有的分割模型已经证明有效,但是将文本指导integrated到图像特征中以提高分割性能仍然是一个有限的领域。现有的分割模型主要是在开放领域图像上训练,这引发了对其直接适用性在医疗领域的担忧。为了解决这些挑战,我们提议使用多Modal vision-language模型来捕捉图像描述和图像中的semantic信息,以便分割多种医疗图像。本研究对多个数据集进行了广泛的评估,以评估现有的视力语言模型在医疗领域的转移性。此外,我们还引入了未经见过的图像描述,并证明了基于生成的提示的模型性能的很大变化。我们的发现表明了开放领域图像和医疗领域之间的分布差异,并证明了训练在开放领域图像上的分割模型不能直接应用于医疗领域。但是,通过finetuning,可以提高这些模型在医疗数据集上的性能。我们对4种视力语言模型在11个医疗数据集上进行了零容量和finetuning的分割性能测试,使用9种基于14个特征的提示。

Parametric entropy based Cluster Centriod Initialization for k-means clustering of various Image datasets

  • paper_url: http://arxiv.org/abs/2308.07705
  • repo_url: None
  • paper_authors: Faheem Hussayn, Shahid M Shah
  • for: 这篇论文的目的是提出一种基于参数 entropy 的 k-means 初始化方法,以提高 k-means 算法在图像数据上的表现。
  • methods: 该论文使用了多种参数 entropy 来初始化 k-means 算法的中心点,并对不同的图像 dataset 进行了测试。
  • results: 研究发现,不同的 dataset 使用不同的参数 entropy 可以提供更好的结果,而且提议的方法可以提高 k-means 算法在图像数据上的表现。 例如,在 Satellite、Toys、Fruits、Cars 等 dataset 上,使用 Taneja entropy、Kapur entropy、Aczel Daroczy entropy 和 Sharma Mittal entropy 等参数 entropy 可以提供更好的结果。
    Abstract One of the most employed yet simple algorithm for cluster analysis is the k-means algorithm. k-means has successfully witnessed its use in artificial intelligence, market segmentation, fraud detection, data mining, psychology, etc., only to name a few. The k-means algorithm, however, does not always yield the best quality results. Its performance heavily depends upon the number of clusters supplied and the proper initialization of the cluster centroids or seeds. In this paper, we conduct an analysis of the performance of k-means on image data by employing parametric entropies in an entropy based centroid initialization method and propose the best fitting entropy measures for general image datasets. We use several entropies like Taneja entropy, Kapur entropy, Aczel Daroczy entropy, Sharma Mittal entropy. We observe that for different datasets, different entropies provide better results than the conventional methods. We have applied our proposed algorithm on these datasets: Satellite, Toys, Fruits, Cars, Brain MRI, Covid X-Ray.
    摘要 一种非常常用但简单的聚类分析算法是k-means算法。k-means算法在人工智能、市场 segmentation、诈骗探测、数据挖掘、心理学等领域都有广泛的应用,只是名些。然而,k-means算法并不总是能够提供最佳的结果。其性能很大程度上取决于提供的聚类数量和聚类中心点或种子的初始化。在这篇论文中,我们通过使用参数 entropy 来初始化聚类中心点,并提出了适用于普通图像数据集的最佳 entropy 度量。我们使用了多种 entropy,如Taneja entropy、Kapur entropy、Aczel Daroczy entropy和Sharma Mittal entropy。我们发现,不同的数据集,不同的 entropy 度量会提供更好的结果。我们在这些数据集上应用了我们的提议的算法:卫星、玩具、水果、汽车、脑Magnetic Resonance Imaging(MRI)、Covid X-Ray。

Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images

  • paper_url: http://arxiv.org/abs/2308.07688
  • repo_url: None
  • paper_authors: Soroosh Tayebi Arasteh, Leo Misera, Jakob Nikolas Kather, Daniel Truhn, Sven Nebelung
  • for: 这个研究的目的是探索可以使用非医学影像进行自主学习预训(SSL),以提高医学影像分析中的人工智能(AI)精度。
  • methods: 我们使用了一个视觉转化器,并将其初始化为(i)SSL预训自然影像(DINOv2)、(ii)SL预训自然影像(ImageNet dataset)和(iii)SL预训颈部X线成像(MIMIC-CXR dataset)。
  • results: 我们在6个大型全球颈部X线成像数据集上进行了过80万张颈部X线成像的测试,并识别了20多种不同的医学影像找到结果。我们的SSL预训策略不仅在所有数据集上比ImageNet预训(P<0.001)表现更好,甚至在某些情况下还超过了SL在MIMIC-CXR数据集上的表现。
    Abstract Pre-training datasets, like ImageNet, have become the gold standard in medical image analysis. However, the emergence of self-supervised learning (SSL), which leverages unlabeled data to learn robust features, presents an opportunity to bypass the intensive labeling process. In this study, we explored if SSL for pre-training on non-medical images can be applied to chest radiographs and how it compares to supervised pre-training on non-medical images and on medical images. We utilized a vision transformer and initialized its weights based on (i) SSL pre-training on natural images (DINOv2), (ii) SL pre-training on natural images (ImageNet dataset), and (iii) SL pre-training on chest radiographs from the MIMIC-CXR database. We tested our approach on over 800,000 chest radiographs from six large global datasets, diagnosing more than 20 different imaging findings. Our SSL pre-training on curated images not only outperformed ImageNet-based pre-training (P<0.001 for all datasets) but, in certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest that selecting the right pre-training strategy, especially with SSL, can be pivotal for improving artificial intelligence (AI)'s diagnostic accuracy in medical imaging. By demonstrating the promise of SSL in chest radiograph analysis, we underline a transformative shift towards more efficient and accurate AI models in medical imaging.
    摘要 预训 datasets,如 ImageNet,已成为医学影像分析的标准。然而,自动学习(SSL)技术的出现,可以使用无标签数据学习强大的特征,可能以替代复杂的标签过程。在这项研究中,我们研究了是否可以将SSL预训短图用于骨胸影像,并与其他两种预训方法进行比较。我们使用了一种视觉转换器,并将其参数初始化为(i)SSL预训自然图像(DINOv2),(ii)SL预训自然图像(ImageNet dataset),和(iii)SL预训骨胸影像(MIMIC-CXR数据库)。我们对超过800,000个骨胸影像进行测试,识别了 более20种不同的医学影像发现。我们的SSL预训策略不仅在所有数据集上超过ImageNet预训策略(P<0.001),而且在某些情况下, même exceeded SL在MIMIC-CXR数据库上。我们的发现表明,选择合适的预训策略,特别是使用SSL,可以对医学影像识别精度进行改进。通过证明SSL在骨胸影像分析中的推荐,我们强调了医学影像识别模型的更有效和精度的转型。

DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.07687
  • repo_url: https://github.com/cure-lab/diffguard
  • paper_authors: Ruiyuan Gao, Chenchen Zhao, Lanqing Hong, Qiang Xu
  • for: 本研究的目的是提出一种基于扩展模型的Semantic Out-of-Distribution(OOD)检测方法,以提高图像分类器的OOD检测性能。
  • methods: 该方法使用了 conditional Generative Adversarial Network(cGAN)来增大图像空间中的semantic mismatch,并且使用pre-trained diffusion models来实现Semantic mismatch-guided OOD detection。
  • results: 实验结果表明,DiffGuard可以在Cifar-10和ImageNet上达到州-of-the-art的OOD检测性能,并且可以与现有的OOD检测技术相结合以获得更高的检测性能。
    Abstract Given a classifier, the inherent property of semantic Out-of-Distribution (OOD) samples is that their contents differ from all legal classes in terms of semantics, namely semantic mismatch. There is a recent work that directly applies it to OOD detection, which employs a conditional Generative Adversarial Network (cGAN) to enlarge semantic mismatch in the image space. While achieving remarkable OOD detection performance on small datasets, it is not applicable to ImageNet-scale datasets due to the difficulty in training cGANs with both input images and labels as conditions. As diffusion models are much easier to train and amenable to various conditions compared to cGANs, in this work, we propose to directly use pre-trained diffusion models for semantic mismatch-guided OOD detection, named DiffGuard. Specifically, given an OOD input image and the predicted label from the classifier, we try to enlarge the semantic difference between the reconstructed OOD image under these conditions and the original input image. We also present several test-time techniques to further strengthen such differences. Experimental results show that DiffGuard is effective on both Cifar-10 and hard cases of the large-scale ImageNet, and it can be easily combined with existing OOD detection techniques to achieve state-of-the-art OOD detection results.
    摘要 (简化中文)给定一个分类器,则它的Out-of-Distribution(OOD)样本的内在特性是与所有法定类型的内容不同,即semantic mismatch。有一项最近的工作直接应用它到OOD检测中,使用conditional Generative Adversarial Network(cGAN)来扩大图像空间中的semantic mismatch。尽管在小 dataset上达到了惊人的OOD检测性能,但是在ImageNet scale dataset上不可能因为cGAN的训练是不可能的。因为diffusion模型比cGAN更容易训练,在这项工作中,我们直接使用预训练的diffusion模型来实现semantic mismatch-guided OOD检测,名为DiffGuard。具体来说,给定一个OOD输入图像和分类器预测的标签,我们尝试通过在这些条件下重建OOD图像,并与原始输入图像进行比较,以扩大semantic difference。我们还提供了多种测试时技术来进一步强化这种差异。实验结果表明,DiffGuard是效果好的,在Cifar-10和ImageNet中的困难情况下都能够达到state-of-the-art的OOD检测结果。

Portfolio Selection via Topological Data Analysis

  • paper_url: http://arxiv.org/abs/2308.07944
  • repo_url: None
  • paper_authors: Petr Sokerin, Kristian Kuznetsov, Elizaveta Makhneva, Alexey Zaytsev
  • for: 投资决策中的资产组合管理是一项重要的任务,但传统方法往往无法实现合理的性能。
  • methods: 本文提出了一种两阶段的投资资产组合建立方法,首先生成时间序列表示,然后进行划分。该方法利用了 topological data analysis(TDA)特征来生成表示,从而揭示时间序列数据中的Topological结构。
  • results: 实验结果显示,我们提出的方法在不同时间帧下具有superior性能,与其他方法相比,这种性能的稳定性和可靠性得到了证明。这些结果表明TDA可以作为一种强大的工具来选择资产组合。
    Abstract Portfolio management is an essential part of investment decision-making. However, traditional methods often fail to deliver reasonable performance. This problem stems from the inability of these methods to account for the unique characteristics of multivariate time series data from stock markets. We present a two-stage method for constructing an investment portfolio of common stocks. The method involves the generation of time series representations followed by their subsequent clustering. Our approach utilizes features based on Topological Data Analysis (TDA) for the generation of representations, allowing us to elucidate the topological structure within the data. Experimental results show that our proposed system outperforms other methods. This superior performance is consistent over different time frames, suggesting the viability of TDA as a powerful tool for portfolio selection.
    摘要 资产管理是投资决策的重要组成部分,但传统方法通常无法提供合理的性能。这个问题源于这些方法无法考虑股票市场多元时间序列数据的特殊特征。我们提出了一种两阶段方法,用于建立公司股票投资组合。该方法包括生成时间序列表示,然后进行归一化。我们的方法利用基于拓扑数据分析(TDA)的特征来生成表示,从而揭示时间序列数据中的拓扑结构。实验结果显示,我们的提议系统在不同时间框架下具有优秀的性能,这表明TDA可以成为资产选择中的强大工具。

Gradient-Based Post-Training Quantization: Challenging the Status Quo

  • paper_url: http://arxiv.org/abs/2308.07662
  • repo_url: None
  • paper_authors: Edouard Yvinec, Arnaud Dapogny, Kevin Bailly
  • for: 这篇论文的目的是提出一种新的量化方法,以提高量化深度神经网络的效率和可扩展性。
  • methods: 这篇论文使用了Gradient-based post-training quantization(GPTQ)方法,并且挑战了常用的GPTQ方法设计。具体来说,这篇论文提出了一些最佳实践方法,例如调整参数选择、增强特征变数、选择参考集等,以提高量化的效率和可扩展性。
  • results: 这篇论文的实验结果显示,这些最佳实践方法可以实现 significiant 的性能改进(例如,在ViT模型上,使用4位量化可以提高6.819点的表现),这些结果显示了这篇论文的量化方法的可行性和有效性。
    Abstract Quantization has become a crucial step for the efficient deployment of deep neural networks, where floating point operations are converted to simpler fixed point operations. In its most naive form, it simply consists in a combination of scaling and rounding transformations, leading to either a limited compression rate or a significant accuracy drop. Recently, Gradient-based post-training quantization (GPTQ) methods appears to be constitute a suitable trade-off between such simple methods and more powerful, yet expensive Quantization-Aware Training (QAT) approaches, particularly when attempting to quantize LLMs, where scalability of the quantization process is of paramount importance. GPTQ essentially consists in learning the rounding operation using a small calibration set. In this work, we challenge common choices in GPTQ methods. In particular, we show that the process is, to a certain extent, robust to a number of variables (weight selection, feature augmentation, choice of calibration set). More importantly, we derive a number of best practices for designing more efficient and scalable GPTQ methods, regarding the problem formulation (loss, degrees of freedom, use of non-uniform quantization schemes) or optimization process (choice of variable and optimizer). Lastly, we propose a novel importance-based mixed-precision technique. Those guidelines lead to significant performance improvements on all the tested state-of-the-art GPTQ methods and networks (e.g. +6.819 points on ViT for 4-bit quantization), paving the way for the design of scalable, yet effective quantization methods.
    摘要 Translated into Simplified Chinese:量化已成为深度神经网络的高效部署的关键步骤,将浮点操作转换为更简单的固定点操作。在最简单的形式下,它只是通过缩放和圆拟操作的组合来实现压缩率的受限或减少准确率。在最近,Gradient-based post-training quantization(GPTQ)方法变得更加重要,它们在尝试量化LLMs时, scalability of the quantization process是关键。GPTQ主要是通过学习圆拟操作来实现,使用一个小量化集。在这项工作中,我们挑战了GPTQ方法的常见选择。具体来说,我们发现该过程在一些变量(weight选择、特征增强、选择量化集)的影响下具有一定的抗预测性。此外,我们还提出了一些优化GPTQ方法的最佳实践,包括问题表示(损失、自由度、非均匀量化方案)和优化过程(变量和优化器选择)。最后,我们提出了一种重要性基于混合精度技术。这些指南导致了所有测试的现有GPTQ方法和网络(例如,+6.819点在ViT中 для 4比特量化)获得了显著性能提高,开启了可扩展、有效的量化方法的设计。

Attention Is Not All You Need Anymore

  • paper_url: http://arxiv.org/abs/2308.07661
  • repo_url: https://github.com/rprokap/pset-9
  • paper_authors: Zhe Chen
  • for: 提高 transformer 性能
  • methods: 提出一种drop-in replacement self-attention mechanism,称为Extractor
  • results: 实验结果表明,将 self-attention mechanism replaced with Extractor 可以提高 transformer 性能,并且可以更快than self-attention mechanism。
    Abstract In recent years, the popular Transformer architecture has achieved great success in many application areas, including natural language processing and computer vision. Many existing works aim to reduce the computational and memory complexity of the self-attention mechanism in the Transformer by trading off performance. However, performance is key for the continuing success of the Transformer. In this paper, a drop-in replacement for the self-attention mechanism in the Transformer, called the Extractor, is proposed. Experimental results show that replacing the self-attention mechanism with the Extractor improves the performance of the Transformer. Furthermore, the proposed Extractor has the potential to run faster than the self-attention since it has a much shorter critical path of computation. Additionally, the sequence prediction problem in the context of text generation is formulated using variable-length discrete-time Markov chains, and the Transformer is reviewed based on our understanding.
    摘要

From Commit Message Generation to History-Aware Commit Message Completion

  • paper_url: http://arxiv.org/abs/2308.07655
  • repo_url: https://github.com/jetbrains-research/commit_message_generation
  • paper_authors: Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav Golubev, Danny Dig, Timofey Bryksin
  • for: 提高 commits 的质量和个性化程度,使开发者更容易跟踪变更和协作。
  • methods: 利用 previous commit history 作为额外 контекст,通过 completion 和 generation 两种方法来生成高质量 commits。
  • results: 结果显示,在某些情况下,使用 completion 方法可以达到更高的质量和个性化程度,而使用历史信息可以提高 CMG 模型在生成任务中的表现。
    Abstract Commit messages are crucial to software development, allowing developers to track changes and collaborate effectively. Despite their utility, most commit messages lack important information since writing high-quality commit messages is tedious and time-consuming. The active research on commit message generation (CMG) has not yet led to wide adoption in practice. We argue that if we could shift the focus from commit message generation to commit message completion and use previous commit history as additional context, we could significantly improve the quality and the personal nature of the resulting commit messages. In this paper, we propose and evaluate both of these novel ideas. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. We use this dataset to evaluate the completion setting and the usefulness of the historical context for state-of-the-art CMG models and GPT-3.5-turbo. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages. As for the history, the results show that historical information improves the performance of CMG models in the generation task, and the performance of GPT-3.5-turbo in both generation and completion.
    摘要 <>转换文本到简化中文。<>软件开发中的提交消息非常重要,它允许开发者跟踪更改并协作有效。尽管它们的重要性,但大多数提交消息缺乏重要信息,因为写好提交消息是时间consuming和繁琐的。有活跃的研究在提交消息生成(CMG)领域,但尚未得到广泛的实践应用。我们认为,如果我们可以将注重点从提交消息生成转移到提交消息完成,并使用之前的提交历史作为更多的上下文,我们可以大幅提高提交消息质量和个性化度。 在这篇论文中,我们提出并评估了两个新的想法。由于现有的数据集缺乏历史数据,我们收集和分享了一个新的数据集called CommitChronicle,包含20种编程语言的10.7万个提交。我们使用这个数据集来评估完成设定和使用历史上下文来评估当前CMG模型和GPT-3.5-turbo的表现。我们的结果表明,在某些情况下,提交消息完成比生成更好,而且GPT-3.5-turbo在详细的消息中表现较差,但在某些情况下具有潜在的潜力。对于历史信息,我们的结果表明,历史信息可以提高CMG模型在生成任务中的表现,并且GPT-3.5-turbo在生成和完成任务中的表现。

Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping

  • paper_url: http://arxiv.org/abs/2308.07641
  • repo_url: None
  • paper_authors: Boyu Chen, Hanxuan Chen, Jiao He, Fengyu Sun, Shangling Jui
  • for: 这个论文的目的是提出一种简单 yet novel的线性映射方法来实现优秀的网络压缩性能。
  • methods: 这个论文使用的方法是一种叫做ternary SVD(TSVD)的 pseudo SVD,其限制了 $U$ 和 $V$ 矩阵在 SVD 中的形式为 ${\pm 1, 0}$ 的三元矩阵。这意味着在计算 $U(\cdot)$ 和 $V(\cdot)$ 时只需要使用加法操作。
  • results: 实验结果表明,TSVD 可以在不同类型的网络和任务中实现当今基eline模型如 ConvNext、Swim、BERT 和大型语言模型 OPT 的状态级压缩性能。
    Abstract We present a simple yet novel parameterized form of linear mapping to achieves remarkable network compression performance: a pseudo SVD called Ternary SVD (TSVD). Unlike vanilla SVD, TSVD limits the $U$ and $V$ matrices in SVD to ternary matrices form in $\{\pm 1, 0\}$. This means that instead of using the expensive multiplication instructions, TSVD only requires addition instructions when computing $U(\cdot)$ and $V(\cdot)$. We provide direct and training transition algorithms for TSVD like Post Training Quantization and Quantization Aware Training respectively. Additionally, we analyze the convergence of the direct transition algorithms in theory. In experiments, we demonstrate that TSVD can achieve state-of-the-art network compression performance in various types of networks and tasks, including current baseline models such as ConvNext, Swim, BERT, and large language model like OPT.
    摘要 我们提出了一种简单 yet novel的线性映射参数化方法,可以实现出色的网络压缩性能:一种叫做ternary SVD(TSVD)的 Pseudo SVD。 unlike vanilla SVD, TSVD限制了 $U$ 和 $V$ 矩阵在 SV 中仅能是三元矩阵(\{\pm 1, 0\})。这意味着在计算 $U(\cdot)$ 和 $V(\cdot)$ 时,TSVD 只需要使用加法指令,而不需要使用昂贵的乘法指令。我们提供了直接迁移算法和训练迁移算法,如Post Training Quantization 和 Quantization Aware Training 等。此外,我们还对直接迁移算法的整体性进行了理论分析。在实验中,我们表明了 TSVD 可以在不同类型的网络和任务上实现state-of-the-art的压缩性能,包括当前基eline模型 ConvNext、Swim、BERT 以及大型语言模型 OPT。

Backpropagation Path Search On Adversarial Transferability

  • paper_url: http://arxiv.org/abs/2308.07625
  • repo_url: None
  • paper_authors: Zhuoer Xu, Zhangxuan Gu, Jianping Zhang, Shiwen Cui, Changhua Meng, Weiqiang Wang
  • for: 防御深度神经网络受到敌意例之攻击,需要在部署前测试模型的可靠性。
  • methods: 基于传输的攻击者使用拷贝模型构建敌意例,然后将其传输到黑盒环境中部署的受害者模型。为了增强攻击性能,结构基于的攻击者修改了反propagation路径,但现有的结构基于的攻击者忽略了 convolution 模块,并使用伪函数来修改反propagation图。
  • results: 我们提出了 backPropagation pAth Search (PAS),解决了上述两个问题。我们首先提出了 SkipConv,用于调整 convolution 模块的反propagation路径。以免攻击路径过拟合 surrogate 模型,我们还构建了 DAG 基于搜索空间,使用一步靠近法评估路径,并使用 bayesian 优化来搜索最佳路径。我们在各种传输设置下进行了广泛的实验,显示 PAS 可以大幅提高攻击成功率,包括常训练的模型和防御模型。
    Abstract Deep neural networks are vulnerable to adversarial examples, dictating the imperativeness to test the model's robustness before deployment. Transfer-based attackers craft adversarial examples against surrogate models and transfer them to victim models deployed in the black-box situation. To enhance the adversarial transferability, structure-based attackers adjust the backpropagation path to avoid the attack from overfitting the surrogate model. However, existing structure-based attackers fail to explore the convolution module in CNNs and modify the backpropagation graph heuristically, leading to limited effectiveness. In this paper, we propose backPropagation pAth Search (PAS), solving the aforementioned two problems. We first propose SkipConv to adjust the backpropagation path of convolution by structural reparameterization. To overcome the drawback of heuristically designed backpropagation paths, we further construct a DAG-based search space, utilize one-step approximation for path evaluation and employ Bayesian Optimization to search for the optimal path. We conduct comprehensive experiments in a wide range of transfer settings, showing that PAS improves the attack success rate by a huge margin for both normally trained and defense models.
    摘要 深度神经网络容易受到敌意例际的攻击,因此在部署之前测试模型的可靠性是非常重要的。转移基于攻击者通过附加模型制造敌意例并将其传递到部署在黑盒子情况下的受害模型。为增强敌意例的可传递性,结构基于攻击者可以修改归并征求的路径,以避免攻击过拟合附加模型。然而,现有的结构基于攻击者未能探索CNN中的卷积模块,并修改归并图表使用了规则性的方法,导致效果有限。在这篇论文中,我们提出了backPropagation pAth Search(PAS),解决以下两个问题。我们首先提出了SkipConv,用于调整卷积后的归并路径。为了超越规则性设计的归并路径的缺点,我们进一步构建了DAG基本搜索空间,使用一步逼近方法评估归并路径,并使用bayesian优化来搜索最佳路径。我们在各种转移设置下进行了广泛的实验,结果显示,PAS可以大幅提高攻击成功率,包括常训练的模型和防御模型。

A Multilayer Perceptron-based Fast Sunlight Assessment for the Conceptual Design of Residential Neighborhoods under Chinese Policy

  • paper_url: http://arxiv.org/abs/2308.07616
  • repo_url: None
  • paper_authors: Can Jiang, Xiong Liang, Yu-Cheng Zhou, Yong Tian, Shengli Xu, Jia-Rui Lin, Zhiliang Ma, Shiji Yang, Hao Zhou
  • for: 本研究旨在应用深度学习技术来加速建筑设计阶段的日照时数 simulations,以减少计算时间和提高设计效率。
  • methods: 本研究提出了一个多层感知器(Multilayer Perceptron,MLP)基本的一阶预测方法,可以快速地预测建筑物的日照时数。方法首先将建筑物分解为多个立方体形状的部分,然后运用一个一阶预测模型来预测每个部分的日照时数。
  • results: 经过三个 numeral experiments,包括水平层和倾斜分析、模拟运算和优化,结果显示,本方法可以将计算时间降低到1/841/50,并保持96.5%98%的准确性。此外,基于提案的模型,也开发了一个实用的住宅区布局规划插件 для Rhino 7/Grasshopper。
    Abstract In Chinese building codes, it is required that residential buildings receive a minimum number of hours of natural, direct sunlight on a specified winter day, which represents the worst sunlight condition in a year. This requirement is a prerequisite for obtaining a building permit during the conceptual design of a residential project. Thus, officially sanctioned software is usually used to assess the sunlight performance of buildings. These software programs predict sunlight hours based on repeated shading calculations, which is time-consuming. This paper proposed a multilayer perceptron-based method, a one-stage prediction approach, which outputs a shading time interval caused by the inputted cuboid-form building. The sunlight hours of a site can be obtained by calculating the union of the sunlight time intervals (complement of shading time interval) of all the buildings. Three numerical experiments, i.e., horizontal level and slope analysis, and simulation-based optimization are carried out; the results show that the method reduces the computation time to 1/84~1/50 with 96.5%~98% accuracies. A residential neighborhood layout planning plug-in for Rhino 7/Grasshopper is also developed based on the proposed model. This paper indicates that deep learning techniques can be adopted to accelerate sunlight hour simulations at the conceptual design phase.
    摘要 中国建筑标准要求住宅建筑在指定的冬季日子上得到最少的自然、直接日光时间,这是为了获得建筑许可证的必要条件。因此,官方批准的软件通常用于评估建筑的日光性能。这些软件计算出日光时间基于重复的遮挡计算,这是时间消耗大。本文提出了基于多层感知器的方法,一种一stage预测方法,它输出一个输入的立方体形建筑物遮挡时间间隔。通过计算所有建筑物的遮挡时间间隔的并集(补做遮挡时间间隔),可以获得建筑地点的日光时间。三个数学实验,即水平层和坡度分析,以及基于仿真优化的模拟,都表明了该方法可以将计算时间减少到1/84~1/50,并保持96.5%~98%的准确性。此外,基于提议的模型也开发了一个基于Rhino 7/Grasshopper的住宅街区规划插件。本文表明,深度学习技术可以在概念设计阶段加速日光时间的估算。

Searching for Novel Chemistry in Exoplanetary Atmospheres using Machine Learning for Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.07604
  • repo_url: None
  • paper_authors: Roy T. Forestano, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu
  • for: 本研究旨在开发新的快速高效的机器学习方法,用于检测望远镜观测数据中异常的行星,以找到具有不同化学成分的行星和可能的生物标志物。
  • methods: 本研究使用了两种流行的异常检测方法:本地异常因子和一类支持向量机器学习。
  • results: 研究成功地应用了这两种方法于大量的人工数据库中,并通过ROC曲线评估和比较了两种方法的性能。
    Abstract The next generation of telescopes will yield a substantial increase in the availability of high-resolution spectroscopic data for thousands of exoplanets. The sheer volume of data and number of planets to be analyzed greatly motivate the development of new, fast and efficient methods for flagging interesting planets for reobservation and detailed analysis. We advocate the application of machine learning (ML) techniques for anomaly (novelty) detection to exoplanet transit spectra, with the goal of identifying planets with unusual chemical composition and even searching for unknown biosignatures. We successfully demonstrate the feasibility of two popular anomaly detection methods (Local Outlier Factor and One Class Support Vector Machine) on a large public database of synthetic spectra. We consider several test cases, each with different levels of instrumental noise. In each case, we use ROC curves to quantify and compare the performance of the two ML techniques.
    摘要 下一代望远镜将提供大量高分辨率光谱数据,用于千个外围星球的分析。数据量和星球数量的增加,大大推动了新的快速高效的方法的开发,用于标注有趣的星球进行重新观测和详细分析。我们提议通过机器学习(ML)技术进行外围星球谱spectra中异常(新型)检测,以找到不寻常的化学组成和甚至搜索未知生物标志。我们成功地在大规模公共数据库中使用synthetic spectra进行了两种流行的异常检测方法的实验(Local Outlier Factor和One Class Support Vector Machine),并在不同的实rumental noise水平下进行了多个测试case。在每个测试case中,我们使用ROC曲线来评估和比较两种ML技术的性能。

Generating Personas for Games with Multimodal Adversarial Imitation Learning

  • paper_url: http://arxiv.org/abs/2308.07598
  • repo_url: None
  • paper_authors: William Ahlberg, Alessandro Sestini, Konrad Tollmar, Linus Gisslén
  • for: 本研究旨在开发一种可以生成多个个性化策略的协同学习方法,以便模拟人类游戏玩家的多种玩法。
  • methods: 本研究使用了多模块生成 adversarial imitation learning(MultiGAIL)方法,通过在单机器模型中学习多个专家策略,并使用多个评估器来学习环境奖励。
  • results: 实验结果表明,MultiGAIL方法可以在连续和离散动作空间中的两个环境中生成多个个性化策略,并且在这些环境中表现出色。
    Abstract Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. However, this requires complex reward engineering, and the agent's resulting policy is often unpredictable. Going beyond reinforcement learning is necessary to model a wide range of human playstyles, which can be difficult to represent with a reward function. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting. Multimodal Generative Adversarial Imitation Learning (MultiGAIL) uses an auxiliary input parameter to learn distinct personas using a single-agent model. MultiGAIL is based on generative adversarial imitation learning and uses multiple discriminators as reward models, inferring the environment reward by comparing the agent and distinct expert policies. The reward from each discriminator is weighted according to the auxiliary input. Our experimental analysis demonstrates the effectiveness of our technique in two environments with continuous and discrete action spaces.
    摘要 � Reinforcement learning 已经成功地将机器人训练到人类水准。但是,这需要复杂的奖励工程,并且机器人的结果策略可能是随机的。为了模型人类玩家的广泛风格,超出奖励学习是必要的。这篇论文介绍了一种具有多个人格的循环学习方法,即多模倍GAIL(MultiGAIL)。MultiGAIL使用辅助输入参数来学习不同的人格,使用多个批评者作为奖励模型,推算环境奖励通过比较机器人和具体专家策略。奖励从每个批评者被权重根据辅助输入。我们的实验分析表明我们的技术在维度和整数动作空间的两个环境中具有效果。

High-Probability Risk Bounds via Sequential Predictors

  • paper_url: http://arxiv.org/abs/2308.07588
  • repo_url: None
  • paper_authors: Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi
  • for: 这 paper written for what? + 这 paper 目的是提供一种在线学习方法,可以在 minimal assumptions 下提供sequential regret bounds和in-expectation risk bounds。
  • methods: 这 paper 使用哪些方法? + 这 paper 使用 online learning methods,包括 general online learning algorithms 和 second-order correction to the loss function。
  • results: 这 paper 得到了哪些结果? + 这 paper 得到了nearly optimal high-probability risk bounds for several classical statistical estimation problems,such as discrete distribution estimation, linear regression, logistic regression, and conditional density estimation。
    Abstract Online learning methods yield sequential regret bounds under minimal assumptions and provide in-expectation risk bounds for statistical learning. However, despite the apparent advantage of online guarantees over their statistical counterparts, recent findings indicate that in many important cases, regret bounds may not guarantee tight high-probability risk bounds in the statistical setting. In this work we show that online to batch conversions applied to general online learning algorithms can bypass this limitation. Via a general second-order correction to the loss function defining the regret, we obtain nearly optimal high-probability risk bounds for several classical statistical estimation problems, such as discrete distribution estimation, linear regression, logistic regression, and conditional density estimation. Our analysis relies on the fact that many online learning algorithms are improper, as they are not restricted to use predictors from a given reference class. The improper nature of our estimators enables significant improvements in the dependencies on various problem parameters. Finally, we discuss some computational advantages of our sequential algorithms over their existing batch counterparts.
    摘要 在线学习方法提供序列 regret bound nder minimal 假设,并提供预期 риск bound для统计学学习。然而, despite the apparent advantage of online guarantees over their statistical counterparts, recent findings indicate that in many important cases, regret bounds may not guarantee tight high-probability risk bounds in the statistical setting. In this work, we show that online to batch conversions applied to general online learning algorithms can bypass this limitation. Via a general second-order correction to the loss function defining the regret, we obtain nearly optimal high-probability risk bounds for several classical statistical estimation problems, such as discrete distribution estimation, linear regression, logistic regression, and conditional density estimation. Our analysis relies on the fact that many online learning algorithms are improper, as they are not restricted to use predictors from a given reference class. The improper nature of our estimators enables significant improvements in the dependencies on various problem parameters. Finally, we discuss some computational advantages of our sequential algorithms over their existing batch counterparts.

Temporal Interest Network for Click-Through Rate Prediction

  • paper_url: http://arxiv.org/abs/2308.08487
  • repo_url: https://github.com/shenweichen/DSIN
  • paper_authors: Haolin Zhou, Junwei Pan, Xinyi Zhou, Xihua Chen, Jie Jiang, Xiaofeng Gao, Guihai Chen
  • for: 预测点击率 (CTR) 的预测,研究者发现了用户行为历史记录的四元相关性(行为语义、目标语义、行为时间和目标时间)对性能的影响。
  • methods: 研究者使用了各种用户行为方法,包括 Semantic Embedding 和 Temporal Encoding,以及 Target-Aware Attention 和 Target-Aware Representation。
  • results: 研究者发现,现有方法无法学习这种四元相关性,而他们提出的 Temporal Interest Network (TIN) 可以有效地捕捉这种相关性,并在 Amazon 和 Alibaba 数据集上进行了广泛的评估,并与最佳基eline相比,TIN 表现出了0.43% 和 0.29% 的提升。
    Abstract The history of user behaviors constitutes one of the most significant characteristics in predicting the click-through rate (CTR), owing to their strong semantic and temporal correlation with the target item. While the literature has individually examined each of these correlations, research has yet to analyze them in combination, that is, the quadruple correlation of (behavior semantics, target semantics, behavior temporal, and target temporal). The effect of this correlation on performance and the extent to which existing methods learn it remain unknown. To address this gap, we empirically measure the quadruple correlation and observe intuitive yet robust quadruple patterns. We measure the learned correlation of several representative user behavior methods, but to our surprise, none of them learn such a pattern, especially the temporal one. In this paper, we propose the Temporal Interest Network (TIN) to capture the quadruple semantic and temporal correlation between behaviors and the target. We achieve this by incorporating target-aware temporal encoding, in addition to semantic embedding, to represent behaviors and the target. Furthermore, we deploy target-aware attention, along with target-aware representation, to explicitly conduct the 4-way interaction. We performed comprehensive evaluations on the Amazon and Alibaba datasets. Our proposed TIN outperforms the best-performing baselines by 0.43\% and 0.29\% on two datasets, respectively. Comprehensive analysis and visualization show that TIN is indeed capable of learning the quadruple correlation effectively, while all existing methods fail to do so. We provide our implementation of TIN in Tensorflow.
    摘要 历史用户行为特征是预测点击率(CTR)的一个最重要的特征,因为它们具有强的 semantics和时间相关性。 Although literature has individually examined each of these correlations, research has yet to analyze them in combination, that is, the quadruple correlation of (behavior semantics, target semantics, behavior temporal, and target temporal). The effect of this correlation on performance and the extent to which existing methods learn it remain unknown. To address this gap, we empirically measure the quadruple correlation and observe intuitive yet robust quadruple patterns. We measure the learned correlation of several representative user behavior methods, but to our surprise, none of them learn such a pattern, especially the temporal one.In this paper, we propose the Temporal Interest Network (TIN) to capture the quadruple semantic and temporal correlation between behaviors and the target. We achieve this by incorporating target-aware temporal encoding, in addition to semantic embedding, to represent behaviors and the target. Furthermore, we deploy target-aware attention, along with target-aware representation, to explicitly conduct the 4-way interaction. We performed comprehensive evaluations on the Amazon and Alibaba datasets. Our proposed TIN outperforms the best-performing baselines by 0.43% and 0.29% on two datasets, respectively. Comprehensive analysis and visualization show that TIN is indeed capable of learning the quadruple correlation effectively, while all existing methods fail to do so. We provide our implementation of TIN in Tensorflow.

IoT Data Trust Evaluation via Machine Learning

  • paper_url: http://arxiv.org/abs/2308.11638
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Timothy Tadj, Reza Arablouei, Volkan Dedeoglu
    for:This paper aims to address the lack of publicly-available datasets for evaluating the trustworthiness of IoT data by proposing a data synthesis method called random walk infilling (RWI) to augment existing trustworthy datasets with untrustworthy data.methods:The proposed method uses RWI to generate untrustworthy data from existing trustworthy datasets, and extracts new features from IoT time-series sensor data that capture its auto-correlation and cross-correlation with neighboring sensors. These features are used to learn ML models for recognizing the trustworthiness of IoT sensor data.results:The proposed method outperforms existing ML-based approaches to IoT data trust evaluation, and a semi-supervised ML approach that requires only about 10% of the data labeled offers competitive performance while being more practical. The results also show that the ML models learned from datasets augmented via RWI generalize well to unseen data.
    Abstract Various approaches based on supervised or unsupervised machine learning (ML) have been proposed for evaluating IoT data trust. However, assessing their real-world efficacy is hard mainly due to the lack of related publicly-available datasets that can be used for benchmarking. Since obtaining such datasets is challenging, we propose a data synthesis method, called random walk infilling (RWI), to augment IoT time-series datasets by synthesizing untrustworthy data from existing trustworthy data. Thus, RWI enables us to create labeled datasets that can be used to develop and validate ML models for IoT data trust evaluation. We also extract new features from IoT time-series sensor data that effectively capture its auto-correlation as well as its cross-correlation with the data of the neighboring (peer) sensors. These features can be used to learn ML models for recognizing the trustworthiness of IoT sensor data. Equipped with our synthesized ground-truth-labeled datasets and informative correlation-based feature, we conduct extensive experiments to critically examine various approaches to evaluating IoT data trust via ML. The results reveal that commonly used ML-based approaches to IoT data trust evaluation, which rely on unsupervised cluster analysis to assign trust labels to unlabeled data, perform poorly. This poor performance can be attributed to the underlying unsubstantiated assumption that clustering provides reliable labels for data trust, a premise that is found to be untenable. The results also show that the ML models learned from datasets augmented via RWI while using the proposed features generalize well to unseen data and outperform existing related approaches. Moreover, we observe that a semi-supervised ML approach that requires only about 10% of the data labeled offers competitive performance while being practically more appealing compared to the fully-supervised approaches.
    摘要 各种基于指导或无指导机器学习(ML)的方法已经为评估互联网器件数据(IoT)的可信度提出了多种方法。然而,评估它们在实际世界中的有效性很难,主要因为缺乏相关的公共可用数据集,用于比较。由于获取这些数据集困难,我们提议一种数据生成方法,即随机扩散填充(RWI),以增强IoT时间序数据集。通过将可信数据 Synthesize into不可信数据,我们可以创建可用于开发和验证ML模型的标注数据集。我们还提取了IoT时间序感知器数据中有效地捕捉自动相关性以及与邻近(邻居)感知器数据的相关性。这些特征可以用来学习识别IoT感知器数据的可信度。配备我们生成的标注数据集和有用的相关特征,我们进行了广泛的实验,critically examine了多种基于ML的IoT数据可信度评估方法。结果显示,常用的ML基于方法,通过不supervised cluster analysis将无标签数据分类为可信数据,表现不佳。这些结果可以归结于这些方法下的一个不实际的假设,即分类提供可靠的数据可信度标签。结果还表明,使用RWI生成的数据集和我们提出的特征来学习ML模型,在未看到数据时generalize well,并且超过现有相关方法。此外,我们发现 semi-supervised ML方法,只需要约10%的数据标注,可以提供竞争力强的性能,而且在实际应用中更加吸引人。

Story Visualization by Online Text Augmentation with Context Memory

  • paper_url: http://arxiv.org/abs/2308.07575
  • repo_url: https://github.com/yonseivnl/cmota
  • paper_authors: Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi
  • for: 提高文本描述到图像生成 task 中的语言多样性抗锋性。
  • methods: 提出了一种基于 Bi-directional Transformer 框架的内存架构,并在训练时使用在线文本增强来生成多个 pseudo-descriptions 作为补做性超级vision 的权威指导。
  • results: 在 Pororo-SV 和 Flintstones-SV 两个受欢迎的 SV benchmark 上,提出的方法与现状相比,在多个纪录中表现出优于其他方法,包括 FID、人物 F1、帧精度、 BLEU-2/3 和 R-precision 等指标。
    Abstract Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convincing images (e.g., with a correct character or with a proper background of the scene) remains a challenge. To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. In extensive experiments on the two popular SV benchmarks, i.e., the Pororo-SV and Flintstones-SV, the proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision with similar or less computational complexity.
    摘要 story visualization (SV) 是一个具有挑战性的文本到图像生成任务,因为不仅需要从文本描述中提取视觉细节,还需要在多句话中编码长期上下文。而现有的尝试主要是为每句文本生成Semantically relevant的图像,但是在保持场景背景和人物性别正确的情况下,在整个段落上编码上下文并生成情节感地投入的图像仍然是一个挑战。为此,我们提议一种新的内存架构,用于Bi-directional Transformer框架的在线文本增强,在训练过程中生成多个假描述作为补做性的超vision,以提高语言变化的适应性。在两个流行的 SV 标准测试集上,即Pororo-SV 和 Flintstones-SV,我们的方法与现有的状态arius signicantly outperform,包括 FID、character F1、frame accuracy、BLEU-2/3 和 R-precision 等多个维度的指标,同时具有相似或更少的计算复杂度。

Synthetic data generation method for hybrid image-tabular data using two generative adversarial networks

  • paper_url: http://arxiv.org/abs/2308.07573
  • repo_url: None
  • paper_authors: Tomohiro Kikuchi, Shouhei Hanaoka, Takahiro Nakao, Tomomi Takenaga, Yukihiro Nomura, Harushi Mori, Takeharu Yoshikawa
  • for: 提供一种生成静脉呼吸图像和结构化表格数据的新方法,以解决医疗隐私和数据共享问题。
  • methods: 使用自适应GAN和条件表格GAN模型,将大型公共数据库(pDB)中的静脉呼吸图像维度减少,并将图像编码器与原始数据库(oDB)中的图像进行对应。
  • results: 成功生成了多样化的合成医疗记录,保持了图像和表格数据之间的协调性,并通过视觉评估、分布分析和分类任务进行评估。
    Abstract The generation of synthetic medical records using generative adversarial networks (GANs) has become increasingly important for addressing privacy concerns and promoting data sharing in the medical field. In this paper, we propose a novel method for generating synthetic hybrid medical records consisting of chest X-ray images (CXRs) and structured tabular data (including anthropometric data and laboratory tests) using an auto-encoding GAN ({\alpha}GAN) and a conditional tabular GAN (CTGAN). Our approach involves training a {\alpha}GAN model on a large public database (pDB) to reduce the dimensionality of CXRs. We then applied the trained encoder of the GAN model to the images in original database (oDB) to obtain the latent vectors. These latent vectors were combined with tabular data in oDB, and these joint data were used to train the CTGAN model. We successfully generated diverse synthetic records of hybrid CXR and tabular data, maintaining correspondence between them. We evaluated this synthetic database (sDB) through visual assessment, distribution of interrecord distances, and classification tasks. Our evaluation results showed that the sDB captured the features of the oDB while maintaining the correspondence between the images and tabular data. Although our approach relies on the availability of a large-scale pDB containing a substantial number of images with the same modality and imaging region as those in the oDB, this method has the potential for the public release of synthetic datasets without compromising the secondary use of data.
    摘要 现代生成技术在医疗领域得到了广泛应用,特别是使用生成对抗网络(GANs)生成合成医疗记录,以解决隐私问题和促进数据共享。在这篇论文中,我们提出了一种新的方法,使用自动编码GAN(αGAN)和条件表格GAN(CTGAN)生成 hybrid 的胸部X射线图像(CXR)和结构化表格数据(包括人体测量数据和实验室测试)。我们的方法包括在大规模公共数据库(pDB)中训练αGAN模型,以减少CXR的维度。然后,我们将训练过的GAN模型的编码器应用到oDB中的图像上,以获取秘密 вектор。这些秘密 вектор与表格数据在oDB中进行结合,并将这些联合数据用于训练CTGAN模型。我们成功地生成了多样性的合成记录,保持了图像和表格数据之间的协调。我们通过视觉评估、记录间距离分布和分类任务进行评估。我们的评估结果表明,sDB捕捉了oDB中的特征,同时保持了图像和表格数据之间的协调。虽然我们的方法需要大规模的pDB,但这种方法具有公共发布合成数据库的潜在优势,不会损害数据的次要使用。

Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition

  • paper_url: http://arxiv.org/abs/2308.07571
  • repo_url: https://github.com/osvai/ske2grid
  • paper_authors: Dongqi Cai, Yangyuxuan Kang, Anbang Yao, Yurong Chen
  • for: 这篇论文提出了一种基于骨架的动作识别表征学习框架,即Ske2Grid,以提高骨架基рован动作识别的准确率。
  • methods: 该框架使用了三种新的设计方法:图节点指定变换(GIT)、上升变换(UPT)和进步学习策略(PLS)。GIT用于构建一个固定大小的网格图像块,而UPT用于填充网格图像块中的节点。PLS用于解决一步UPT的严格性问题,并且可以逐步提高表征能力。
  • results: 实验表明,Ske2Grid在六个主流骨架基рован动作识别数据集上表现出色,与现有GCN基于解决方案相比,无需添加其他特殊设计。代码和模型可以在https://github.com/OSVAI/Ske2Grid上下载。
    Abstract This paper presents Ske2Grid, a new representation learning framework for improved skeleton-based action recognition. In Ske2Grid, we define a regular convolution operation upon a novel grid representation of human skeleton, which is a compact image-like grid patch constructed and learned through three novel designs. Specifically, we propose a graph-node index transform (GIT) to construct a regular grid patch through assigning the nodes in the skeleton graph one by one to the desired grid cells. To ensure that GIT is a bijection and enrich the expressiveness of the grid representation, an up-sampling transform (UPT) is learned to interpolate the skeleton graph nodes for filling the grid patch to the full. To resolve the problem when the one-step UPT is aggressive and further exploit the representation capability of the grid patch with increasing spatial size, a progressive learning strategy (PLS) is proposed which decouples the UPT into multiple steps and aligns them to multiple paired GITs through a compact cascaded design learned progressively. We construct networks upon prevailing graph convolution networks and conduct experiments on six mainstream skeleton-based action recognition datasets. Experiments show that our Ske2Grid significantly outperforms existing GCN-based solutions under different benchmark settings, without bells and whistles. Code and models are available at https://github.com/OSVAI/Ske2Grid
    摘要 First, we propose a graph-node index transform (GIT) to construct a regular grid patch by assigning the nodes in the skeleton graph one by one to the desired grid cells. To ensure that GIT is a bijection and enrich the expressiveness of the grid representation, we also learn an up-sampling transform (UPT) to interpolate the skeleton graph nodes for filling the grid patch to the full.To further exploit the representation capability of the grid patch with increasing spatial size, we propose a progressive learning strategy (PLS) that decouples the UPT into multiple steps and aligns them to multiple paired GITs through a compact cascaded design learned progressively.We construct networks upon prevailing graph convolution networks and conduct experiments on six mainstream skeleton-based action recognition datasets. The results show that our Ske2Grid significantly outperforms existing GCN-based solutions under different benchmark settings, without any additional modifications or tricks. The code and models are available at https://github.com/OSVAI/Ske2Grid.

Semi-Supervised Learning with Multiple Imputations on Non-Random Missing Labels

  • paper_url: http://arxiv.org/abs/2308.07562
  • repo_url: None
  • paper_authors: Jason Lu, Michael Ma, Huaze Xu, Zixi Xu
  • for: 这个论文主要针对 semi-supervised learning (SSL) 中的三个主要问题:missing at random (MAR)、missing completely at random (MCAR) 和 missing not at random (MNAR)。
  • methods: 这篇论文提出了两种新的方法,用于combine multiple imputation models,以提高准确性和减少偏见。第一种方法是使用多个插值模型,创建信任区间,并应用一个阈值来忽略低信任 pseudo-labels。第二种方法是our new method,SSL with De-biased Imputations (SSL-DI),通过过滤不准确的数据,找到一个准确可靠的子集,然后将这个子集插值到另一个 SSL 模型中,以减少偏见。
  • results: 该论文的实验结果表明,提出的方法可以在 MCAR 和 MNAR Situations 中效果地减少偏见,并在类别准确率方面与现有方法相比,表现出较高的性能。
    Abstract Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data. This is a very common application of ML as it is unrealistic to obtain a fully labeled dataset. Researchers have tackled three main issues: missing at random (MAR), missing completely at random (MCAR), and missing not at random (MNAR). The MNAR problem is the most challenging of the three as one cannot safely assume that all class distributions are equal. Existing methods, including Class-Aware Imputation (CAI) and Class-Aware Propensity (CAP), mostly overlook the non-randomness in the unlabeled data. This paper proposes two new methods of combining multiple imputation models to achieve higher accuracy and less bias. 1) We use multiple imputation models, create confidence intervals, and apply a threshold to ignore pseudo-labels with low confidence. 2) Our new method, SSL with De-biased Imputations (SSL-DI), aims to reduce bias by filtering out inaccurate data and finding a subset that is accurate and reliable. This subset of the larger dataset could be imputed into another SSL model, which will be less biased. The proposed models have been shown to be effective in both MCAR and MNAR situations, and experimental results show that our methodology outperforms existing methods in terms of classification accuracy and reducing bias.
    摘要 《半supervised学习(SSL)实现方法》在实际应用中,通常不可能获得完全标注数据集,因此SSL成为了非常常见的应用。研究人员面临着三个主要问题:Random missing(MAR)、完全随机缺失(MCAR)和非随机缺失(MNAR)。MNAR问题是三个问题中最为困难,因为不可能安全地假设所有类别分布相同。现有的方法,包括Class-Aware Imputation(CAI)和Class-Aware Propensity(CAP),几乎忽略了不Random的未标注数据。本文提出了两种新的方法,用于 combinig多个替补模型,以达到更高的准确率和更少的偏见。1. 我们使用多个替补模型,创建信任区间,并应用一个阈值,以忽略低信任 Pseudo-labels。2. 我们的新方法,SSL with De-biased Imputations(SSL-DI),旨在减少偏见,通过筛选不准确的数据,并找到一个准确可靠的子集,并将这个子集用于另一个SSL模型,以减少偏见。我们的方法在MCAR和MNAR情况下都有显著的优势,实验结果表明,我们的方法在分类精度和减少偏见方面都超过了现有方法。

A User-Centered Evaluation of Spanish Text Simplification

  • paper_url: http://arxiv.org/abs/2308.07556
  • repo_url: None
  • paper_authors: Adrian de Wynter, Anthony Hevia, Si-Qing Chen
  • for: 评估西班牙文简化(TS)生产系统中的表现,通过复杂句子和复杂词语识别两个字库。
  • methods: 使用神经网络对西班牙语TS进行评估,并与最常见的西班牙语特有可读性分数进行比较,发现神经网络在预测用户TS偏好时表现更好。
  • results: 发现多语言模型在同一任务中下降表现,而所有模型往往围绕毫不重要的统计特征(如句子长度)集中焦点。
    Abstract We present an evaluation of text simplification (TS) in Spanish for a production system, by means of two corpora focused in both complex-sentence and complex-word identification. We compare the most prevalent Spanish-specific readability scores with neural networks, and show that the latter are consistently better at predicting user preferences regarding TS. As part of our analysis, we find that multilingual models underperform against equivalent Spanish-only models on the same task, yet all models focus too often on spurious statistical features, such as sentence length. We release the corpora in our evaluation to the broader community with the hopes of pushing forward the state-of-the-art in Spanish natural language processing.
    摘要 我们对西班牙文简化文本(TS)进行了评估,使用了两个文本库,专注于复杂句子和复杂单词识别。我们比较了最常见的西班牙语特有的可读性分数,以及神经网络,并发现后者在预测用户对TS的偏好时表现更好。在我们的分析中,我们发现了许多多语言模型在同一任务上表现较差,但所有模型很多时候强调无关紧要的统计特征,如句子长度。我们将我们的评估 corpora 发布给广泛的社区,以促进西班牙自然语言处理领域的进步。

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2308.07553
  • repo_url: None
  • paper_authors: Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein
  • for: 防止毒素攻击影响模型行为
  • methods: 使用渐进隐私和随机采样 Gaussian 机制确保测试实例对 finite 数量毒素样例的不变性
  • results: 提供更大于先前证明的攻击Robustness 保证
    Abstract Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.
    摘要 毒素攻击可以夹击模型行为,通过小量修改训练集来让模型表现出巨大的影响。虽然有针对特定毒素攻击的防御方法,但这些防御方法通常不提供任何保证,因此可能会被新的攻击所打砸。相比之下,通过检查最坏情况的证明 Certified Defences,我们可以提供对测试实例的预测结果进行保证,并 garantuee 测试实例对于有限多个毒素样本的修改后的预测结果的一致性。通过利用差分隐私和随机 Gaussian 机制,我们的模型可以保证对于每个测试实例,对于有限多个毒素样本的修改后的预测结果的一致性。这使得我们的模型可以提供更大于之前证明的防御 robustness。

Domain Adaptation via Minimax Entropy for Real/Bogus Classification of Astronomical Alerts

  • paper_url: http://arxiv.org/abs/2308.07538
  • repo_url: None
  • paper_authors: Guillermo Cabrera-Vives, César Bolivar, Francisco Förster, Alejandra M. Muñoz Arancibia, Manuel Pérez-Carrasco, Esteban Reyes
  • for: 这个论文旨在研究域域适应(Domain Adaptation)技术,用于实时分析大量天文数据。
  • methods: 该论文使用了四个不同的数据集(HiTS、DES、ATLAS和ZTF),研究这些数据集之间的域Shift,并通过微调和半supervised深度适应来提高一个简单的深度学习分类模型。
  • results: 研究发现,微调和MME模型可以在目标数据集上提高基本模型的准确率,但MME模型不会对源数据集产生影响。只需要一些来自目标数据集的标注项(一个或 fewer),微调和MME模型就可以显著提高分类精度。
    Abstract Time domain astronomy is advancing towards the analysis of multiple massive datasets in real time, prompting the development of multi-stream machine learning models. In this work, we study Domain Adaptation (DA) for real/bogus classification of astronomical alerts using four different datasets: HiTS, DES, ATLAS, and ZTF. We study the domain shift between these datasets, and improve a naive deep learning classification model by using a fine tuning approach and semi-supervised deep DA via Minimax Entropy (MME). We compare the balanced accuracy of these models for different source-target scenarios. We find that both the fine tuning and MME models improve significantly the base model with as few as one labeled item per class coming from the target dataset, but that the MME does not compromise its performance on the source dataset.
    摘要 时域天文学在实时处理大量数据方面升级,导致多流机器学习模型的发展。在这项工作中,我们研究天文学警报真假分类中的领域适应(DA),使用四个不同的数据集:HiTS、DES、ATLAS和ZTF。我们研究这些数据集之间的领域差异,并通过微调和半supervised深度DA via Minimax Entropy(MME)提高了基本模型的性能。我们对不同的源目标场景进行比较,发现两者都可以大幅提高基本模型,但MME不会在目标集中妥协性能。

KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification

  • paper_url: http://arxiv.org/abs/2308.08563
  • repo_url: None
  • paper_authors: Likang Wu, Junji Jiang, Hongke Zhao, Hao Wang, Defu Lian, Mengdi Zhang, Enhong Chen
  • for: zero-shot node classification (ZNC) task in graph data analysis, to predict nodes from unseen classes
  • methods: Knowledge-Aware Multi-Faceted (KMF) framework that enhances label semantics via extracted KG-based topics, and reconstructs node content to a topic-level representation
  • results: extensive experiments on several public graph datasets, demonstrating effectiveness and generalization of KMF compared to state-of-the-art baselines, and an application of zero-shot cross-domain recommendation.
    Abstract Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features' prototypes and labels' semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It's necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph's instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.
    摘要 近期,零批节点分类(ZNC)已成为图数据分析中emerging和重要的任务。这个任务的目标是从训练过程中未见过的类型中预测节点。现有的工作主要利用图神经网络(GNNs)来协调特征的抽象和标签的 semantics,从而实现知识传递从见到未见类型。然而,先前的工作忽略了多元的semantic orientation在特征-semanticAlignment中,即节点的内容通常涵盖多个标签的 semantics,这些标签相关的多个话题。为了提高模型的通用性,我们提出了一种知识具有Multi-Faceted框架(KMF),该框架可以提高标签的 semantics richness via 提取的知识图(KG)基于话题。然后,每个节点的内容被重建到话题级别的表示,这种表示提供了多个标签的多元和细化的semantic relevancy。由于图的实例(即节点)表示的特殊性,我们开发了一种 novel geometric constraint来缓解由节点信息汇集所引起的 prototype drift问题。最后,我们在多个公共图据集上进行了广泛的实验,并设计了零shot Cross-Domain recommender。量化结果表明,KMF的效果和通用性在比较现有的基准下都显著。

Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem

  • paper_url: http://arxiv.org/abs/2308.07536
  • repo_url: None
  • paper_authors: Jincheng Cao, Ruichen Jiang, Nazanin Abolfazli, Erfan Yazdandoost Hamedani, Aryan Mokhtari
  • For: 该论文研究了一类随机二级优化问题,即随机简单二级优化问题,其中我们寻找一个最优的解,满足一个随机对象函数的最小化。* Methods: 我们提出了一种新的随机二级优化方法,利用随机割辑来近似下一级问题的解,然后通过条件梯度更新和减少误差来控制随机导数引起的错误。* Results: 我们证明了,对于凹函数上的上一级问题,我们的方法需要 $\tilde{\mathcal{O}(\max{1/\epsilon_f^{2},1/\epsilon_g^{2}})$ 随机票 query,以获得一个 $\epsilon_f$-优化的上一级解和 $\epsilon_g$-优化的下一级解。这个证明超越了之前的最佳记录 $\mathcal{O}(\max{1/\epsilon_f^{4},1/\epsilon_g^{4}})$. 更进一步,对于非凹函数上的上一级问题,我们的方法需要最多 $\tilde{\mathcal{O}(\max{1/\epsilon_f^{3},1/\epsilon_g^{3}})$ 随机票 query,以找到一个 $(\epsilon_f, \epsilon_g)$-静点。在finite-sum设定下,我们证明了我们的方法需要 $\tilde{\mathcal{O}(\sqrt{n}/\epsilon)$ 和 $\tilde{\mathcal{O}(\sqrt{n}/\epsilon^{2})$ 随机票 query,它们取决于对象函数是 convex 还是 non-convex。
    Abstract In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via a stochastic cutting plane, and then run a conditional gradient update with variance reduction techniques to control the error induced by using stochastic gradients. For the case that the upper-level function is convex, our method requires $\tilde{\mathcal{O}(\max\{1/\epsilon_f^{2},1/\epsilon_g^{2}\}) $ stochastic oracle queries to obtain a solution that is $\epsilon_f$-optimal for the upper-level and $\epsilon_g$-optimal for the lower-level. This guarantee improves the previous best-known complexity of $\mathcal{O}(\max\{1/\epsilon_f^{4},1/\epsilon_g^{4}\})$. Moreover, for the case that the upper-level function is non-convex, our method requires at most $\tilde{\mathcal{O}(\max\{1/\epsilon_f^{3},1/\epsilon_g^{3}\}) $ stochastic oracle queries to find an $(\epsilon_f, \epsilon_g)$-stationary point. In the finite-sum setting, we show that the number of stochastic oracle calls required by our method are $\tilde{\mathcal{O}(\sqrt{n}/\epsilon)$ and $\tilde{\mathcal{O}(\sqrt{n}/\epsilon^{2})$ for the convex and non-convex settings, respectively, where $\epsilon=\min \{\epsilon_f,\epsilon_g\}$.
    摘要 在这篇论文中,我们研究一类随机二重优化问题,即随机简单二重优化问题,其中我们寻找最优解集的最小值。我们提出了一种新的随机二重优化方法,使用随机割辑来近似下一级问题的解集,然后运行一个 conditional gradient update 的方法来控制由随机导数引起的误差。对于凸Upper-level函数的情况,我们的方法需要 $\tilde{\mathcal{O}(\max\{1/\epsilon_f^{2},1/\epsilon_g^{2}\}) $ 随机oracle查询来获得一个 $\epsilon_f$-优化的上一级解和 $\epsilon_g$-优化的下一级解。这个 garantía提高了之前的最佳 Complexity 的 $\mathcal{O}(\max\{1/\epsilon_f^{4},1/\epsilon_g^{4}\})$.而对于非凸 Upper-level函数的情况,我们的方法需要最多 $\tilde{\mathcal{O}(\max\{1/\epsilon_f^{3},1/\epsilon_g^{3}\}) $ 随机oracle查询来找到一个$(\epsilon_f, \epsilon_g)$-静点。在finite-sum设定下,我们证明了我们的方法需要 $\tilde{\mathcal{O}(\sqrt{n}/\epsilon)$ 和 $\tilde{\mathcal{O}(\sqrt{n}/\epsilon^{2})$ 随机oracle查询,其中 $\epsilon = \min \{\epsilon_f, \epsilon_g\}$.

Inverse Lithography Physics-informed Deep Neural Level Set for Mask Optimization

  • paper_url: http://arxiv.org/abs/2308.12299
  • repo_url: None
  • paper_authors: Xing-Yu Ma, Shaogang Hao
  • for: 这篇论文主要是为了提出一种基于深度学习的 inverse lithography physics-informed deep neural level set(ILDLS)方法,用于mask优化。
  • methods: 这篇论文使用了深度学习(DL)方法,并将 inverse lithography physics incorporated into DL 框架中。具体来说,这篇论文使用了 level set 基于的 inverse lithography technology(ILT)作为层,并在这个层中进行了mask预测和修正。
  • results: 相比于纯度DL和ILT,ILDLS可以减少计算时间的数个数量级,同时提高了印刷可能性和过程窗口(PW)。这篇论文的结果表明,ILDLS可以提供一种高效的mask优化解决方案。
    Abstract As the feature size of integrated circuits continues to decrease, optical proximity correction (OPC) has emerged as a crucial resolution enhancement technology for ensuring high printability in the lithography process. Recently, level set-based inverse lithography technology (ILT) has drawn considerable attention as a promising OPC solution, showcasing its powerful pattern fidelity, especially in advanced process. However, massive computational time consumption of ILT limits its applicability to mainly correcting partial layers and hotspot regions. Deep learning (DL) methods have shown great potential in accelerating ILT. However, lack of domain knowledge of inverse lithography limits the ability of DL-based algorithms in process window (PW) enhancement and etc. In this paper, we propose an inverse lithography physics-informed deep neural level set (ILDLS) approach for mask optimization. This approach utilizes level set based-ILT as a layer within the DL framework and iteratively conducts mask prediction and correction to significantly enhance printability and PW in comparison with results from pure DL and ILT. With this approach, computation time is reduced by a few orders of magnitude versus ILT. By gearing up DL with knowledge of inverse lithography physics, ILDLS provides a new and efficient mask optimization solution.
    摘要 In this paper, we propose an inverse lithography physics-informed deep neural level set (ILDLS) approach for mask optimization. This approach utilizes level set-based ILT as a layer within the DL framework and iteratively conducts mask prediction and correction to significantly enhance printability and PW in comparison with results from pure DL and ILT. With this approach, computation time is reduced by a few orders of magnitude versus ILT. By combining DL with knowledge of inverse lithography physics, ILDLS provides a new and efficient mask optimization solution.

FeatGeNN: Improving Model Performance for Tabular Data with Correlation-based Feature Extraction

  • paper_url: http://arxiv.org/abs/2308.07527
  • repo_url: None
  • paper_authors: Sammuel Ramos Silva, Rodrigo Silva
  • for: 提高机器学习模型性能和统计分析中获取更多信息
  • methods: 使用 corr 函数作为池化函数,从数据矩阵中提取和创建新特征
  • results: 在多个标准测试集上比较 FeatGeNN 与现有 AutoFE 方法,显示 FeatGeNN 可以提高模型性能。 corr 函数可以作为 tabular 数据中 pooling 函数的有力的替代方案。
    Abstract Automated Feature Engineering (AutoFE) has become an important task for any machine learning project, as it can help improve model performance and gain more information for statistical analysis. However, most current approaches for AutoFE rely on manual feature creation or use methods that can generate a large number of features, which can be computationally intensive and lead to overfitting. To address these challenges, we propose a novel convolutional method called FeatGeNN that extracts and creates new features using correlation as a pooling function. Unlike traditional pooling functions like max-pooling, correlation-based pooling considers the linear relationship between the features in the data matrix, making it more suitable for tabular data. We evaluate our method on various benchmark datasets and demonstrate that FeatGeNN outperforms existing AutoFE approaches regarding model performance. Our results suggest that correlation-based pooling can be a promising alternative to max-pooling for AutoFE in tabular data applications.
    摘要 自动Feature工程(AutoFE)已成为机器学习项目中重要的任务,因为它可以提高模型性能并提供更多的统计分析信息。然而,现有的AutoFE方法大多依赖于手动创建特征或使用生成大量特征的方法,这可能会占用大量计算资源并导致过拟合。为解决这些挑战,我们提出了一种新的卷积方法 called FeatGeNN,它通过对数据矩阵中特征之间的相关性进行抽象,从而生成新的特征。与传统的最大值抽取函数不同,相关性基于的抽取函数更适合逻辑数据。我们在多个 benchmark 数据集上评估了我们的方法,并证明了 FeatGeNN 在AutoFE中超过了现有的方法。我们的结果表明,相关性基于的抽取函数可以成为 tabular 数据应用中 AutoFE 中的一个有前途的代替方案。

Potential of Deep Operator Networks in Digital Twin-enabling Technology for Nuclear System

  • paper_url: http://arxiv.org/abs/2308.07523
  • repo_url: None
  • paper_authors: Kazuma Kobayashi, Syed Bahauddin Alam
  • for: 这个研究旨在提出一种可靠且高精度的数据零层级模型(DeepONet),用于数位双(DT)系统中的核工程应用。
  • methods: 这个研究使用DeepONet方法,将函数作为输入数据,从训练数据中构建了算子G。
  • results: DeepONet方法在解决困难的粒子运输问题上展现了杰出的预测精度,比传统机器学习方法更高。
    Abstract This research introduces the Deep Operator Network (DeepONet) as a robust surrogate modeling method within the context of digital twin (DT) systems for nuclear engineering. With the increasing importance of nuclear energy as a carbon-neutral solution, adopting DT technology has become crucial to enhancing operational efficiencies, safety, and predictive capabilities in nuclear engineering applications. DeepONet exhibits remarkable prediction accuracy, outperforming traditional ML methods. Through extensive benchmarking and evaluation, this study showcases the scalability and computational efficiency of DeepONet in solving a challenging particle transport problem. By taking functions as input data and constructing the operator $G$ from training data, DeepONet can handle diverse and complex scenarios effectively. However, the application of DeepONet also reveals challenges related to optimal sensor placement and model evaluation, critical aspects of real-world implementation. Addressing these challenges will further enhance the method's practicality and reliability. Overall, DeepONet presents a promising and transformative tool for nuclear engineering research and applications. Its accurate prediction and computational efficiency capabilities can revolutionize DT systems, advancing nuclear engineering research. This study marks an important step towards harnessing the power of surrogate modeling techniques in critical engineering domains.
    摘要 Simplified Chinese:这项研究介绍了深度操作网络(DeepONet)作为核动力工程中数字双(DT)系统中的稳定和准确的模拟方法。随着核能作为碳中和解方案的重要性增加,采用DT技术已经成为核动力工程应用中提高操作效率、安全性和预测能力的关键。DeepONet在多个核心问题上表现出了惊人的预测精度,超越传统机器学习方法。通过广泛的 benchmarking 和评估,这项研究展示了 DeepONet 在解决复杂的粒子传输问题时的扩展性和计算效率。通过将函数作为输入数据,从训练数据中构建操作符 $G$,DeepONet 可以有效地处理多样化和复杂的场景。然而, DeepONet 的应用也暴露了优化传感器布局和模型评估的挑战,这些挑战在实际应用中是关键的。解决这些挑战将进一步提高 DeepONet 的实用性和可靠性。总的来说,DeepONet 提供了核动力工程研究和应用中的一个可靠和转型的工具。它的准确预测和计算效率能力可以对 DT 系统进行革命性的改进,推动核动力工程研究。这项研究标志着使用表达式模拟技术在关键工程领域的应用的重要一步。

Nonlinearity, Feedback and Uniform Consistency in Causal Structural Learning

  • paper_url: http://arxiv.org/abs/2308.07520
  • repo_url: None
  • paper_authors: Shuyan Wang
  • for: 这个论文的目的是找到自动搜索方法,用于从观察数据中学习 causal structure。
  • methods: 这个论文使用的方法包括 modifying Strong Faithfulness 和 relaxing sufficiency assumption,以扩展 causal discovery 方法的应用范围。
  • results: 这个论文的研究结果表明,通过 relaxing 简化假设,可以扩展 causal discovery 方法的应用范围,并且可以学习 causal structure with latent variables。
    Abstract The goal of Causal Discovery is to find automated search methods for learning causal structures from observational data. In some cases all variables of the interested causal mechanism are measured, and the task is to predict the effects one measured variable has on another. In contrast, sometimes the variables of primary interest are not directly observable but instead inferred from their manifestations in the data. These are referred to as latent variables. One commonly known example is the psychological construct of intelligence, which cannot directly measured so researchers try to assess through various indicators such as IQ tests. In this case, casual discovery algorithms can uncover underlying patterns and structures to reveal the causal connections between the latent variables and between the latent and observed variables. This thesis focuses on two questions in causal discovery: providing an alternative definition of k-Triangle Faithfulness that (i) is weaker than strong faithfulness when applied to the Gaussian family of distributions, (ii) can be applied to non-Gaussian families of distributions, and (iii) under the assumption that the modified version of Strong Faithfulness holds, can be used to show the uniform consistency of a modified causal discovery algorithm; relaxing the sufficiency assumption to learn causal structures with latent variables. Given the importance of inferring cause-and-effect relationships for understanding and forecasting complex systems, the work in this thesis of relaxing various simplification assumptions is expected to extend the causal discovery method to be applicable in a wider range with diversified causal mechanism and statistical phenomena.
    摘要 目的是找到自动搜寻方法,以了解观察资料中的 causal 结构。在某些情况下,所有兴趣的 causal 机制中的所有变量都是观察到的,而任务是预测一个观察到的变量对另一个变量的影响。相比之下,有时候兴趣的变量不是直接观察到的,而是从资料中的现象推断出来的。这些被称为潜在变量。例如,心理学中的智商不能直接观察,因此研究人员会尝试透过不同的指标,如智商测验,来评估。在这种情况下, causal 发现算法可以探测到底层结构和联系,以显示 causal 连接between latent 变量和观察到的变量。本论文关注两个问题:提供一个弱 faithfulness 定义,其中(i)在 Gaussian 家族中的分布下是弱 faithfulness 的,(ii)可以应用到非 Gaussian 家族的分布,以及(iii)在 modified 稳定假设下,可以用来证明一个修改版的 causal 发现算法的均匀一致性。将适当的假设放宽,以学习包含潜在变量的 causal 结构。由于推断 causal 关系的重要性,这项工作预期能够将 causal 发现方法扩展到更广泛的应用,包括多元的 causal 机制和 statistically 现象。

Distilling Knowledge from Resource Management Algorithms to Neural Networks: A Unified Training Assistance Approach

  • paper_url: http://arxiv.org/abs/2308.07511
  • repo_url: None
  • paper_authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Zhisheng Yin, Haibo Zhou, Wei Quan
  • for: 提高多用户通信系统中信号干扰比例(SINR)优化的精度和速度。
  • methods: 使用知识储存(KD)技术和人工神经网络(NN)结合 traditional SINR 优化方法,以提高无监督和强化学习方法的性能和速度。
  • results: 在模拟结果中,提出的AD方法比传统学习方法更高效,并且能够解决无监督学习和强化学习中常见的问题,如获取优质解决方案和避免过度适应。
    Abstract As a fundamental problem, numerous methods are dedicated to the optimization of signal-to-interference-plus-noise ratio (SINR), in a multi-user setting. Although traditional model-based optimization methods achieve strong performance, the high complexity raises the research of neural network (NN) based approaches to trade-off the performance and complexity. To fully leverage the high performance of traditional model-based methods and the low complexity of the NN-based method, a knowledge distillation (KD) based algorithm distillation (AD) method is proposed in this paper to improve the performance and convergence speed of the NN-based method, where traditional SINR optimization methods are employed as ``teachers" to assist the training of NNs, which are ``students", thus enhancing the performance of unsupervised and reinforcement learning techniques. This approach aims to alleviate common issues encountered in each of these training paradigms, including the infeasibility of obtaining optimal solutions as labels and overfitting in supervised learning, ensuring higher convergence performance in unsupervised learning, and improving training efficiency in reinforcement learning. Simulation results demonstrate the enhanced performance of the proposed AD-based methods compared to traditional learning methods. Remarkably, this research paves the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization.
    摘要 Traditional model-based optimization methods have been widely used to optimize signal-to-interference-plus-noise ratio (SINR) in multi-user settings, but these methods are often complex and have high computational complexity. To address this issue, this paper proposes a knowledge distillation (KD) based algorithm distillation (AD) method that combines the strengths of traditional model-based methods and neural network (NN) based approaches. The proposed method uses traditional SINR optimization methods as "teachers" to assist the training of NNs, which are "students", to improve the performance and convergence speed of the NN-based method. This approach can alleviate common issues encountered in each of these training paradigms, such as the infeasibility of obtaining optimal solutions as labels and overfitting in supervised learning, ensuring higher convergence performance in unsupervised learning, and improving training efficiency in reinforcement learning. Simulation results show that the proposed AD-based methods outperform traditional learning methods, paving the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization.Here's the word-for-word translation of the text into Simplified Chinese:多种方法都是为了优化信号干扰比例(SINR)的多用户设置中的问题。虽然传统的模型基于优化方法具有强大的表现,但它们的计算复杂性高。为了解决这个问题,这篇文章提出了知识储备(KD)基于算法储备(AD)方法。该方法使用传统的SINR优化方法作为“教师”,以帮助训练神经网络(NN),作为“学生”,从而提高NN的性能和速度。这种方法可以解决每个训练模式中的常见问题,如获得优化解为标签的不可能性和超参数过敏,确保更高的整合性表现,并提高循环学习的训练效率。实验结果表明,提议的AD-based方法比传统的学习方法更高效。这些研究开创了传统优化思想和新兴神经网络技术在无线通信系统优化中的集成。

Data Race Detection Using Large Language Models

  • paper_url: http://arxiv.org/abs/2308.07505
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Le Chen, Xianzhong Ding, Murali Emani, Tristan Vanderbruggen, Pei-hung Lin, Chuanhua Liao
  • for: 本文旨在探讨使用大语言模型(LLMs)来检测数据竞争问题,以代替手动创建资源密集的工具。
  • methods: 本文提出了一种基于提示工程和精度调整技术的新的数据竞争检测方法,使用了特制的DRB-ML数据集,并对代表性的LLMs和开源LLMs进行了评估和精度调整。
  • results: 实验表明,LLMs可以成为数据竞争检测的可能的方法,但是它们仍无法与传统的数据竞争检测工具相比提供详细的变量对 causing数据竞争的信息。
    Abstract Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing data races.
    摘要 大型语言模型(LLM)正在展示出很大的承诺,用作高性能计算程序分析和优化的代替策略,减少手动工具的创建成本。在这篇论文中,我们探索了一种基于提示工程和精细调整技术的新型LLM数据竞争检测方法。我们创建了一个专门的数据集名为DRB-ML,它是基于DataRaceBench的数据集,并具有精细的标签,显示数据竞争对的存在、相关变量、行号、读写信息等。然后,我们使用DRB-ML评估了一些代表性的LLM,并对开源LLM进行了精细调整。我们的实验表明,LLM可以成为数据竞争检测的有效方法,但是它们仍无法与传统的数据竞争检测工具相比,提供详细的变量对 causing 数据竞争的信息。

ST-MLP: A Cascaded Spatio-Temporal Linear Framework with Channel-Independence Strategy for Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2308.07496
  • repo_url: None
  • paper_authors: Zepu Wang, Yuqi Nie, Peng Sun, Nam H. Nguyen, John Mulvey, H. Vincent Poor
  • For: 本研究旨在提高智能交通系统中的流量管理优化,通过提出一种简洁准确的交通预测模型。* Methods: 本文提出了一种基于多层感知器(MLP)模块和线性层的简洁空间时间图 neural network(STGNN)模型,利用时间信息、空间信息和预定的图结构,并实现了通道独立策略。* Results: 实验结果显示,ST-MLP模型在准确率和计算效率两个方面都有较高的表现,比其他模型和现有的STGNNs架构更高。
    Abstract The criticality of prompt and precise traffic forecasting in optimizing traffic flow management in Intelligent Transportation Systems (ITS) has drawn substantial scholarly focus. Spatio-Temporal Graph Neural Networks (STGNNs) have been lauded for their adaptability to road graph structures. Yet, current research on STGNNs architectures often prioritizes complex designs, leading to elevated computational burdens with only minor enhancements in accuracy. To address this issue, we propose ST-MLP, a concise spatio-temporal model solely based on cascaded Multi-Layer Perceptron (MLP) modules and linear layers. Specifically, we incorporate temporal information, spatial information and predefined graph structure with a successful implementation of the channel-independence strategy - an effective technique in time series forecasting. Empirical results demonstrate that ST-MLP outperforms state-of-the-art STGNNs and other models in terms of accuracy and computational efficiency. Our finding encourages further exploration of more concise and effective neural network architectures in the field of traffic forecasting.
    摘要 Translation notes:* "prompt and precise" is translated as "快速和准确" (kuài sù and zhèng jí) to emphasize the importance of timeliness and accuracy in traffic forecasting.* "Spatio-Temporal Graph Neural Networks" is translated as "空间时间图 neural network" (kōng jiān shí jiān tiě xiào) to emphasize the graph structure and the combination of spatial and temporal information.* " Multi-Layer Perceptron" is translated as "多层感知器" (duō cèng kàn shì qì) to emphasize the hierarchical structure of the model.* "channel-independence strategy" is translated as "通道独立策略" (tōng dào dāo lì bàng yì) to emphasize the technique's ability to improve the model's performance without relying on specific channel information.

Adaptive Tracking of a Single-Rigid-Body Character in Various Environments

  • paper_url: http://arxiv.org/abs/2308.07491
  • repo_url: None
  • paper_authors: Taesoo Kwon, Taehong Gu, Jaewon Ahn, Yoonsang Lee
  • for: This paper proposes a deep reinforcement learning method for simulating full-body human motions in various scenarios, with the goal of adapting to unobserved environmental changes and controller transitions without requiring additional learning.
  • methods: The proposed method uses the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and trains a policy to track a reference motion using deep reinforcement learning. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion.
  • results: The proposed method is demonstrated to be sample-efficient and able to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning. The policy can be efficiently trained within 30 minutes on an ultraportable laptop.Here is the simplified Chinese version of the three key points:
  • for: 这篇论文提出了一种基于深度学习的人体全身动作模拟方法,以适应不同的环境和控制器转换,无需进行额外的学习。
  • methods: 该方法使用中心动力学模型(CDM)表示全身人体为单一静体(SRB),并通过深度强化学习训练一个策略来跟踪参照动作。SRB模拟被形式化为quadratic programming(QP)问题,策略输出一个动作,使SRB人体按照参照动作进行动作。
  • results: 该方法能够快速地在ultraportable笔记本电脑上进行高效地训练,并在不同的环境中进行适应,如跑在不平的地面上或推Push一个箱子,以及控制器转换。
    Abstract Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy that is capable of adapting to various unobserved environmental changes and controller transitions without requiring any additional learning. Due to the reduced dimension of state and action space, the learning process is sample-efficient. The final full-body motion is kinematically generated in a physically plausible way, based on the state of the simulated SRB character. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion. We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning.
    摘要 Translated into Simplified Chinese: desde la introducción de DeepMimic [Peng et al. 2018], la investigación subsiguiente se ha centrado en expandir el repertorio de movimientos simulados en diversas escenarios. En este estudio, proponemos un enfoque alternativo para lograr esto, un método de aprendizaje por refuerzo profundo basado en la simulación de un cuerpo rígido único (SRB). Usando el modelo de dinámica centroidal (CDM) para expresar el cuerpo rígido completo como un SRB y entrenar una política para seguir una referencia de movimiento, podemos obtener una política que es capaz de adaptarse a cambios ambientales no observados y transiciones de controlador sin necesidad de aprendizaje adicional. Debido a la reducción de la dimensión del espacio de estado y acción, el proceso de aprendizaje es eficiente en muestras. El movimiento final es generado de manera plausible físicamente, basado en el estado de la simulación del SRB. La simulación del SRB se formulates como un problema de programación cuadrática (QP), y la política devuelve una acción que permite al SRB seguir la referencia de movimiento. Demostramos que nuestra política, entrenada eficientemente dentro de 30 minutos en un portátil ultra, tiene la capacidad de adaptarse a entornos que no se han experimentado durante el aprendizaje, como correr sobre terreno irregular o empujar una caja, y transiciones entre políticas aprendidas, sin aprendizaje adicional.

O-1: Self-training with Oracle and 1-best Hypothesis

  • paper_url: http://arxiv.org/abs/2308.07486
  • repo_url: None
  • paper_authors: Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi
  • for: 本研究旨在提出一种新的自教程目标函数O-1,以减少训练偏见和统一训练和评估 метри。
  • methods: O-1是EMBR的快速变体,可以使用 Both supervised和Unsupervised数据,并且可以提高 oracle假设。
  • results: 对于SpeechStew数据集和一个大规模的内部数据集,O-1对识别效果有13%-25%的相对提升,与EMBR相比,O-1在不同的SpeechStew数据集上提高了80%的相对幅度,并在内部数据集上与oracle WER之间减少了12%的差距。总的来说,O-1在大规模数据集上实现了9%的相对提升。
    Abstract We introduce O-1, a new self-training objective to reduce training bias and unify training and evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum Bayes Risk (EMBR), that boosts the oracle hypothesis and can accommodate both supervised and unsupervised data. We demonstrate the effectiveness of our approach in terms of recognition on publicly available SpeechStew datasets and a large-scale, in-house data set. On Speechstew, the O-1 objective closes the gap between the actual and oracle performance by 80\% relative compared to EMBR which bridges the gap by 43\% relative. O-1 achieves 13\% to 25\% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12\% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset. Overall, O-1 results in a 9\% relative improvement in WER over EMBR, thereby speaking to the scalability of the proposed objective for large-scale datasets.
    摘要 我们引入O-1,一个新的自我训练目标,以减少训练偏见和统一训练和评估度量 для语音识别。O-1是EMBR的更快版本,可以提高oracle假设和处理bothsupervised和无监督数据。我们透过使用O-1目标,在公开ailable的SpeechStew数据集和大规模内部数据集上显示了效果。在SpeechStew上,O-1目标可以与oracle性能相对比较,将实际性能与oracle性能之间的差距降低了80%相对数据。在SpeechStew的不同数据集上,O-1目标可以与EMBR目标相比,实现13%至25%的相对改善,并且对于内部数据集的oracle WER进行了12%的相对降低。总之,O-1目标对EMBR目标的9%相对改善,说明了这个目标的扩展性。

OCDaf: Ordered Causal Discovery with Autoregressive Flows

  • paper_url: http://arxiv.org/abs/2308.07480
  • repo_url: https://github.com/vahidzee/ocdaf
  • paper_authors: Hamidreza Kamkari, Vahid Zehtab, Vahid Balazadeh, Rahul G. Krishnan
  • for: 学习 causal graphs 从观察数据中学习 causal graphs
  • methods: 使用 order-based 方法,通过 continuous search 算法找到 causal structures
  • results: 在 Sachs 和 SynTReN benchmark 上达到 state-of-the-art 性能,并在多种 parametic 和 nonparametric synthetic datasets 中验证了identifiability theory 的正确性。
    Abstract We propose OCDaf, a novel order-based method for learning causal graphs from observational data. We establish the identifiability of causal graphs within multivariate heteroscedastic noise models, a generalization of additive noise models that allow for non-constant noise variances. Drawing upon the structural similarities between these models and affine autoregressive normalizing flows, we introduce a continuous search algorithm to find causal structures. Our experiments demonstrate state-of-the-art performance across the Sachs and SynTReN benchmarks in Structural Hamming Distance (SHD) and Structural Intervention Distance (SID). Furthermore, we validate our identifiability theory across various parametric and nonparametric synthetic datasets and showcase superior performance compared to existing baselines.
    摘要 我们提出了OCDaf,一种基于顺序的方法,用于从观察数据中学习 causal 图。我们证明了 causal 图在多变量非常性噪声模型中可以uniquely 特征标识,这是常量噪声模型的推广。基于这些模型和 afine autoregressive normalizing flows 的结构相似性,我们引入了连续搜索算法来找到 causal 结构。我们的实验表明在 Sachs 和 SynTReN benchmark 上的状态当前性表现,以及在 Structural Hamming Distance (SHD) 和 Structural Intervention Distance (SID) 中的优秀表现。此外,我们还验证了我们的可 identificability 理论,并在不同的参数和非参数 synthetic 数据上显示了superior的表现,与现有的基准值相比。

Symphony: Optimized Model Serving using Centralized Orchestration

  • paper_url: http://arxiv.org/abs/2308.07470
  • repo_url: None
  • paper_authors: Lequn Chen, Weixin Deng, Anirudh Canumalla, Yu Xin, Matthai Philipose, Arvind Krishnamurthy
  • for: 提高深度神经网络(DNN)模型推理的加速率和服务级别目标(SLO)。
  • methods: 使用中央化调度系统,可以扩展到百万个请求每秒,并将万个GPU进行协调。使用非工作保存的调度算法,可以实现高批处理效率,同时也可以启用灵活自适应扩展。
  • results: 通过广泛的实验表明,Symphony系统比前一代系统高效性可以达到4.7倍。
    Abstract The orchestration of deep neural network (DNN) model inference on GPU clusters presents two significant challenges: achieving high accelerator efficiency given the batching properties of model inference while meeting latency service level objectives (SLOs), and adapting to workload changes both in terms of short-term fluctuations and long-term resource allocation. To address these challenges, we propose Symphony, a centralized scheduling system that can scale to millions of requests per second and coordinate tens of thousands of GPUs. Our system utilizes a non-work-conserving scheduling algorithm capable of achieving high batch efficiency while also enabling robust autoscaling. Additionally, we developed an epoch-scale algorithm that allocates models to sub-clusters based on the compute and memory needs of the models. Through extensive experiments, we demonstrate that Symphony outperforms prior systems by up to 4.7x higher goodput.
    摘要 深度神经网络(DNN)模型推理在GPU集群中的协调表现两大挑战:实现批处理性质下的加速器效率,同时满足响应时间服务水平目标(SLO)。为解决这些挑战,我们提议了Symphony,一个可扩展到百万个请求每秒的中央调度系统,可以协调数万个GPU。我们的系统使用非工作保持式调度算法,可以实现高批处理效率,同时也允许自动扩缩。此外,我们还开发了一种时间尺度算法,将模型分配到子集群基于计算和内存需求。通过广泛的实验,我们证明Symphony比先前系统高效性更高,最高可以达4.7倍。

Omega-Regular Reward Machines

  • paper_url: http://arxiv.org/abs/2308.07469
  • repo_url: None
  • paper_authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
  • for: 这篇论文旨在探讨如何透过奖励机制来训练智能代理人(Agent)来完成任务,但是设计合适的奖励机制是训练成功的关键。
  • methods: 这篇论文使用了奖励机制的两种形式:奖励机器和欧姆regular语言,以表达不同类型的学习目标。
  • results: 论文提出了奖励机器和欧姆regular语言的 интеграción,以实现更加表达力和有效的奖励机制,并提出了一种基于模型自由学习算法的ε-优化策略来对奖励机器进行计算。通过实验,证明了提出的方法的有效性。
    Abstract Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.
    摘要 《强化学习(RL)是一种强大的方法,用于训练代理人员执行任务,但设计合适的奖励机制是RL的成功关键。然而,在许多情况下,学习目标的复杂性超出了Markov预测的能力,需要更加复杂的奖励机制。奖励机器和ωRegular语言是两种用于表达非Markov奖励的形式主义,分别用于量化和质量目标。本文提出了ωRegular奖励机器,它将奖励机器与ωRegular语言集成,以实现RL中的表达和效果的奖励机制。我们提出了一种无模型RL算法,用于计算ε优策略对ωRegular奖励机器,并通过实验评估提出的算法的效果。》Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

There Is a Digital Art History

  • paper_url: http://arxiv.org/abs/2308.07464
  • repo_url: https://github.com/Gracetyty/art-gallery
  • paper_authors: Leonardo Impett, Fabian Offert
  • for: 本文重新评问 Johanna Drucker 十年前的问题:是否有一种数字艺术历史?在大规模变换器基础模型的出现下,传统类型的神经网络已经成为数字艺术历史的一部分,但是这些模型的知识价值和方法价值尚未得到系统的分析。
  • methods: 本文主要分析了两个方面:一是大规模视模型中新编码的视觉文化,对数字艺术历史有很大的影响;二是使用当今大规模视模型研究艺术史和城市规划等领域的技术案例,提出了一种新的批判方法,该方法考虑模型和其应用之间的知识杂糅。
  • results: 本文的结果表明,大规模视模型在艺术史和城市规划等领域的应用需要一种新的批判方法,该方法可以通过读取研究数据集和训练数据集的视觉意识来批判模型和其应用的知识杂糅。
    Abstract In this paper, we revisit Johanna Drucker's question, "Is there a digital art history?" -- posed exactly a decade ago -- in the light of the emergence of large-scale, transformer-based vision models. While more traditional types of neural networks have long been part of digital art history, and digital humanities projects have recently begun to use transformer models, their epistemic implications and methodological affordances have not yet been systematically analyzed. We focus our analysis on two main aspects that, together, seem to suggest a coming paradigm shift towards a "digital" art history in Drucker's sense. On the one hand, the visual-cultural repertoire newly encoded in large-scale vision models has an outsized effect on digital art history. The inclusion of significant numbers of non-photographic images allows for the extraction and automation of different forms of visual logics. Large-scale vision models have "seen" large parts of the Western visual canon mediated by Net visual culture, and they continuously solidify and concretize this canon through their already widespread application in all aspects of digital life. On the other hand, based on two technical case studies of utilizing a contemporary large-scale visual model to investigate basic questions from the fields of art history and urbanism, we suggest that such systems require a new critical methodology that takes into account the epistemic entanglement of a model and its applications. This new methodology reads its corpora through a neural model's training data, and vice versa: the visual ideologies of research datasets and training datasets become entangled.
    摘要 在这篇论文中,我们重新探讨了 Johanna Drucker 提出的问题:“是否有数字艺术历史?” —— exactly a decade ago —— 在大规模变换器基础模型的出现下。 although more traditional types of neural networks have long been part of digital art history, and digital humanities projects have recently begun to use transformer models, their epistemic implications and methodological affordances have not yet been systematically analyzed. We focus our analysis on two main aspects that, together, seem to suggest a coming paradigm shift towards a "digital" art history in Drucker's sense. On the one hand, the visual-cultural repertoire newly encoded in large-scale vision models has an outsized effect on digital art history. The inclusion of significant numbers of non-photographic images allows for the extraction and automation of different forms of visual logics. Large-scale vision models have "seen" large parts of the Western visual canon mediated by Net visual culture, and they continuously solidify and concretize this canon through their already widespread application in all aspects of digital life. On the other hand, based on two technical case studies of utilizing a contemporary large-scale visual model to investigate basic questions from the fields of art history and urbanism, we suggest that such systems require a new critical methodology that takes into account the epistemic entanglement of a model and its applications. This new methodology reads its corpora through a neural model's training data, and vice versa: the visual ideologies of research datasets and training datasets become entangled.Here's a word-for-word translation of the text into Simplified Chinese:在这篇论文中,我们重新探讨了 Johanna Drucker 提出的问题:“是否有数字艺术历史?” —— exact 10 年前 —— 在大规模变换器基础模型的出现下。 although more traditional types of neural networks have long been part of digital art history, and digital humanities projects have recently begun to use transformer models, their epistemic implications and methodological affordances have not yet been systematically analyzed. We focus our analysis on two main aspects that, together, seem to suggest a coming paradigm shift towards a "digital" art history in Drucker's sense. On the one hand, the visual-cultural repertoire newly encoded in large-scale vision models has an outsized effect on digital art history. The inclusion of significant numbers of non-photographic images allows for the extraction and automation of different forms of visual logics. Large-scale vision models have "seen" large parts of the Western visual canon mediated by Net visual culture, and they continuously solidify and concretize this canon through their already widespread application in all aspects of digital life. On the other hand, based on two technical case studies of utilizing a contemporary large-scale visual model to investigate basic questions from the fields of art history and urbanism, we suggest that such systems require a new critical methodology that takes into account the epistemic entanglement of a model and its applications. This new methodology reads its corpora through a neural model's training data, and vice versa: the visual ideologies of research datasets and training datasets become entangled.

Inductive Knowledge Graph Completion with GNNs and Rules: An Analysis

  • paper_url: http://arxiv.org/abs/2308.07942
  • repo_url: https://github.com/anilakash/indkgc
  • paper_authors: Akash Anil, Víctor Gutiérrez-Basulto, Yazmín Ibañéz-García, Steven Schockaert
  • for: 本研究旨在解释 inductive knowledge graph completion 任务中,模型如何学习推理规则,并用于预测测试图库中的链接。
  • methods: 本研究使用了规则基于的方法,并研究了一些变种来解决具体的问题。
  • results: 研究发现,变种方法可以减少不可靠的实体的影响,并且可以保持 interpretability 优势。而且,一种变种方法,可以不断地利用整个知识图,并且一直高于 NBFNet 的性能。
    Abstract The task of inductive knowledge graph completion requires models to learn inference patterns from a training graph, which can then be used to make predictions on a disjoint test graph. Rule-based methods seem like a natural fit for this task, but in practice they significantly underperform state-of-the-art methods based on Graph Neural Networks (GNNs), such as NBFNet. We hypothesise that the underperformance of rule-based methods is due to two factors: (i) implausible entities are not ranked at all and (ii) only the most informative path is taken into account when determining the confidence in a given link prediction answer. To analyse the impact of these factors, we study a number of variants of a rule-based approach, which are specifically aimed at addressing the aforementioned issues. We find that the resulting models can achieve a performance which is close to that of NBFNet. Crucially, the considered variants only use a small fraction of the evidence that NBFNet relies on, which means that they largely keep the interpretability advantage of rule-based methods. Moreover, we show that a further variant, which does look at the full KG, consistently outperforms NBFNet.
    摘要 任务是完成 inductive 知识图结构需要模型学习从训练图中推导出推理模式,以便在测试图上进行预测。 规则基本方法似乎适合这个任务,但在实践中它们在比基于图神经网络(GNNS)的方法下表现出现下降。我们认为这是因为两个因素:(i)不可能的实体没有被排序,(ii)只考虑最有用的路径来确定链接预测答案的信任度。为了分析这些因素的影响,我们研究了一些 variants of rule-based approach,它们专门解决这些问题。我们发现这些模型可以达到与 NBFNet 相似的性能,而且它们只使用了一小部分的证据,这意味着它们保留了规则基本方法的解释性优势。此外,我们还证明了一个 variant,它查看了整个知识图,可以一直高于 NBFNet 的性能。

GRU-D-Weibull: A Novel Real-Time Individualized Endpoint Prediction

  • paper_url: http://arxiv.org/abs/2308.07452
  • repo_url: None
  • paper_authors: Xiaoyang Ruan, Liwei Wang, Charat Thongprayoon, Wisit Cheungpasitporn, Hongfang Liu
  • for: 这个研究的目的是提出一种新的方法,GRU-D-Weibull,用于模型Weibull分布,以实现个人化终点预测和人口水平风险管理。
  • methods: 这个方法使用了门控Recurrent Unit(GRU)和衰减(D)来模型Weibull分布,并实现了实时个人化终点预测和人口水平风险管理。
  • results: 使用了6879名CKD4阶段4患者的 cohort,我们评估了GRU-D-Weibull在终点预测中的表现。GRU-D-Weibull在终点预测中的C-指数在指定日期为0.7,而后续4.3年的跟踪中为0.77,与随机生存树相当。我们的方法实现了终点预测中的绝对L1损失为1.1年(SD 0.95),并在4年的跟踪中达到最低值为0.45年(SD0.3),与其他方法相比显著出众。GRU-D-Weibull consistently constrained the predicted survival probability within a smaller and more fixed range compared to other models throughout the follow-up period.
    Abstract Accurate prediction models for individual-level endpoints and time-to-endpoints are crucial in clinical practice. In this study, we propose a novel approach, GRU-D-Weibull, which combines gated recurrent units with decay (GRU-D) to model the Weibull distribution. Our method enables real-time individualized endpoint prediction and population-level risk management. Using a cohort of 6,879 patients with stage 4 chronic kidney disease (CKD4), we evaluated the performance of GRU-D-Weibull in endpoint prediction. The C-index of GRU-D-Weibull was ~0.7 at the index date and increased to ~0.77 after 4.3 years of follow-up, similar to random survival forest. Our approach achieved an absolute L1-loss of ~1.1 years (SD 0.95) at the CKD4 index date and a minimum of ~0.45 years (SD0.3) at 4 years of follow-up, outperforming competing methods significantly. GRU-D-Weibull consistently constrained the predicted survival probability at the time of an event within a smaller and more fixed range compared to other models throughout the follow-up period. We observed significant correlations between the error in point estimates and missing proportions of input features at the index date (correlations from ~0.1 to ~0.3), which diminished within 1 year as more data became available. By post-training recalibration, we successfully aligned the predicted and observed survival probabilities across multiple prediction horizons at different time points during follow-up. Our findings demonstrate the considerable potential of GRU-D-Weibull as the next-generation architecture for endpoint risk management, capable of generating various endpoint estimates for real-time monitoring using clinical data.
    摘要 准确的预测模型对个体级终点和时间到终点是临床实践中非常重要。在本研究中,我们提出了一种新的方法,即GRU-D-Weibull,它将闭包隐藏单元(GRU-D)与减少(Decay)结合以模型Weibull分布。我们的方法可以实现实时个体化终点预测和人口级风险管理。使用6,879名CKD4阶段4慢性肾病患者的 cohort,我们评估了GRU-D-Weibull在终点预测中的性能。GRU-D-Weibull的C指数在指定日期为 approximately 0.7,并在4.3年后跟踪 period 后提高到approximately 0.77,与随机生存森林相似。我们的方法在终点预测中实现了约1.1年的绝对L1损失(SD 0.95),并在4年后的跟踪期间保持在约0.45年(SD0.3)以上,与其他方法相比显著超越。GRU-D-Weibull在跟踪期间一直压制了预测生存概率的误差,并在不同的跟踪时间点保持在更小和固定的范围内表现出色。我们发现在指定日期的输入特征损失率和缺失比例之间存在显著相关性(相关性从approximately 0.1到approximately 0.3),这种相关性随着更多数据的获得而逐渐减少。通过后期重新训练,我们成功地将预测和观测生存概率Alignment在不同的预测时间点和跟踪时间点。我们的发现表明GRU-D-Weibull可以作为下一代结构,用于实时终点风险管理,可以通过临床数据生成多个终点预测。

Open-set Face Recognition using Ensembles trained on Clustered Data

  • paper_url: http://arxiv.org/abs/2308.07445
  • repo_url: None
  • paper_authors: Rafael Henrique Vareto, William Robson Schwartz
  • for: 开放集面识别场景下,Unknown人物会在测试阶段出现,需要精准识别个人并有效地处理不熟悉的面孔。这篇论文描述了一种可扩展的开放集面识别方法,用于千计多个人的 галерее。
  • methods: 方法包括聚类和一个ensemble of binary learning algorithms,用于确定查询面孔样本是否属于face gallery,并且正确地确定它们的身份。
  • results: 实验表明,即使targeting scalability,也可以达到竞争性的性能。
    Abstract Open-set face recognition describes a scenario where unknown subjects, unseen during the training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galleries composed of hundreds and thousands of subjects. It is composed of clustering and an ensemble of binary learning algorithms that estimates when query face samples belong to the face gallery and then retrieves their correct identity. The approach selects the most suitable gallery subjects and uses the ensemble to improve prediction performance. We carry out experiments on well-known LFW and YTF benchmarks. Results show that competitive performance can be achieved even when targeting scalability.
    摘要 开放集 face recognition 描述一种情况,在训练阶段未看到的未知人脸在测试阶段出现。不仅需要准确地识别关心人脸,还需要采用有效地处理不熟悉的人脸方法。这篇文章介绍了一种可扩展的开放集face标识方法,用于数百或千个主题的图库。它包括 clustering 和一个 ensemble of binary learning algorithms,可以确定测试人脸样本是否属于图库,并且可以 accurately retrieve their correct identity。该方法选择了最适合的图库主题,并使用 ensemble 进行改进预测性能。我们在well-known LFW 和 YTF benchmark上进行了实验,结果显示,可以达到竞争性的性能,即使targeting scalability。

Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

  • paper_url: http://arxiv.org/abs/2308.07441
  • repo_url: None
  • paper_authors: Lianfa Li, Roxana Khalili, Frederick Lurmann, Nathan Pavlovic, Jun Wu, Yan Xu, Yisi Liu, Karl O’Sharkey, Beate Ritz, Luke Oman, Meredith Franklin, Theresa Bastain, Shohreh F. Farzan, Carrie Breton, Rima Habre
  • for: 这个论文主要是为了提高地面氮氧化物(NOx)的预测,以便更好地了解它们对健康和环境的影响。
  • methods: 这篇论文使用机器学习(ML)方法,但是这些方法缺乏物理和化学知识,因此可能会产生高度估计偏差。作者们提出了一种physics-informed deep learning框架,该框架可以编码扩散-扩散机制和流体动力约束,以提高NO2和NOx预测的准确性。
  • results: 作者们发现,该框架可以减少ML模型的估计偏差,并且可以提供精确的空气质量推算和明确的不确定性评估。此外,该框架还可以捕捉NO2和NOx的细致传输,并提供了可靠的空间抽象。
    Abstract Atmospheric nitrogen oxides (NOx) primarily from fuel combustion have recognized acute and chronic health and environmental effects. Machine learning (ML) methods have significantly enhanced our capacity to predict NOx concentrations at ground-level with high spatiotemporal resolution but may suffer from high estimation bias since they lack physical and chemical knowledge about air pollution dynamics. Chemical transport models (CTMs) leverage this knowledge; however, accurate predictions of ground-level concentrations typically necessitate extensive post-calibration. Here, we present a physics-informed deep learning framework that encodes advection-diffusion mechanisms and fluid dynamics constraints to jointly predict NO2 and NOx and reduce ML model bias by 21-42%. Our approach captures fine-scale transport of NO2 and NOx, generates robust spatial extrapolation, and provides explicit uncertainty estimation. The framework fuses knowledge-driven physicochemical principles of CTMs with the predictive power of ML for air quality exposure, health, and policy applications. Our approach offers significant improvements over purely data-driven ML methods and has unprecedented bias reduction in joint NO2 and NOx prediction.
    摘要 燃烧产生的大气氮氧化物(NOx)已被认定为有急性和长期健康和环境影响。机器学习(ML)方法已经大幅提高我们预测地面NOx浓度的能力,但这些方法可能会受到高估偏差因为没有物理和化学知识关于空气污染动力学。化学运输模型(CTM)利用这些知识,但精确预测地面浓度通常需要广泛的后调整。在这里,我们介绍一个具有物理知识的深度学习框架,该框架编码了扩散运输机制和流体动力学约束,以预测NO2和NOx的 JOINT 预测和减少ML模型偏差21-42%。我们的方法能够捕捉精确的NO2和NOx的细胞运输,生成坚固的空间推导,并提供明确的 uncertainty 估计。这个框架融合了物理化学知识驱动的CTMs和ML的预测力,实现了空气质量露地暴露、健康和政策应用中的融合。我们的方法具有与对纯数据驱动ML方法的比较,无 precedent 的偏差减少在 JOINT NO2和NOx 预测中。

Interaction-Aware Personalized Vehicle Trajectory Prediction Using Temporal Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.07439
  • repo_url: None
  • paper_authors: Amr Abdelraouf, Rohit Gupta, Kyungtae Han
  • for: 预测汽车轨迹的精度是自动驾驶系统和高级驾驶助手系统中的关键。现有方法主要依靠大规模的数据集来生成通用的轨迹预测,忽略了每位司机的个性驾驶模式。
  • methods: 我们提出了一种基于交互aware的个性化轨迹预测方法,该方法利用temporal graph neural networks(GCN)和Long Short-Term Memory(LSTM)模型了target vehicles和它们周围的交通之间的空间-时间交互。为了个性化预测,我们设立了一个管道,该管道通过转移学习来使模型在大规模轨迹数据集上进行初始化预training,然后在每位司机的具体驾驶数据上进行细化调整。
  • results: 我们的个性化GCN-LSTM模型在较长的预测时间范围内表现出优于其通用版本,并且与没有预training的个体模型相比,具有更高的预测精度。此外,我们的个性化模型还能够避免过拟合现象,强调了大规模数据集的预training对个性化预测的重要性。通过个性化,我们的方法提高了轨迹预测精度。
    Abstract Accurate prediction of vehicle trajectories is vital for advanced driver assistance systems and autonomous vehicles. Existing methods mainly rely on generic trajectory predictions derived from large datasets, overlooking the personalized driving patterns of individual drivers. To address this gap, we propose an approach for interaction-aware personalized vehicle trajectory prediction that incorporates temporal graph neural networks. Our method utilizes Graph Convolution Networks (GCN) and Long Short-Term Memory (LSTM) to model the spatio-temporal interactions between target vehicles and their surrounding traffic. To personalize the predictions, we establish a pipeline that leverages transfer learning: the model is initially pre-trained on a large-scale trajectory dataset and then fine-tuned for each driver using their specific driving data. We employ human-in-the-loop simulation to collect personalized naturalistic driving trajectories and corresponding surrounding vehicle trajectories. Experimental results demonstrate the superior performance of our personalized GCN-LSTM model, particularly for longer prediction horizons, compared to its generic counterpart. Moreover, the personalized model outperforms individual models created without pre-training, emphasizing the significance of pre-training on a large dataset to avoid overfitting. By incorporating personalization, our approach enhances trajectory prediction accuracy.
    摘要 提高汽车轨迹预测精度是当前驱动助手系统和自动驱动技术的关键。现有方法主要依靠大规模数据集上的通用轨迹预测,忽略了个人驾驶模式的特殊性。为了解决这个差距,我们提出了一种依赖于互动的个性化汽车轨迹预测方法,该方法利用图гра拓扑神经网络(GCN)和长短期记忆(LSTM)模型了目标汽车和它们周围交通的空间时间互动。为了个性化预测,我们建立了一个管道,其中模型首先在大规模轨迹数据集上进行预训练,然后对每名司机进行细化调整,使用具有特定驾驶模式的自适应驾驶数据。我们使用人工在车Loop模拟器收集个性化自然驾驶轨迹和相应的围绕汽车轨迹。实验结果表明我们的个性化GCN-LSTM模型在较长预测时间 horizons 上表现出色,特别是比其通用对应模型更高。此外,个性化模型也比不进行预训练的模型(无预训练)表现更好,这说明了预训练大数据集可以避免过拟合。通过包含个性化,我们的方法可以提高轨迹预测精度。

A Hybrid Deep Spatio-Temporal Attention-Based Model for Parkinson’s Disease Diagnosis Using Resting State EEG Signals

  • paper_url: http://arxiv.org/abs/2308.07436
  • repo_url: None
  • paper_authors: Niloufar Delfan, Mohammadreza Shahsavari, Sadiq Hussain, Robertas Damaševičius, U. Rajendra Acharya
  • for: 这个研究的目的是为了开发一个自动化的 Parkinson’s disease 诊断模型,使用休息状态 EEG 信号。
  • methods: 这个模型使用了一种混合模型,包括卷积神经网络 (CNN)、双向关键缓冲网络 (Bi-GRU) 和注意机制。
  • results: 研究结果显示,提案的模型可以高度准确地诊断 Parkinson’s disease,并且在不同测试数据上也能够获得高性能。此外,模型还能够对部分输入信息的损失具有耐性。
    Abstract Parkinson's disease (PD), a severe and progressive neurological illness, affects millions of individuals worldwide. For effective treatment and management of PD, an accurate and early diagnosis is crucial. This study presents a deep learning-based model for the diagnosis of PD using resting state electroencephalogram (EEG) signal. The objective of the study is to develop an automated model that can extract complex hidden nonlinear features from EEG and demonstrate its generalizability on unseen data. The model is designed using a hybrid model, consists of convolutional neural network (CNN), bidirectional gated recurrent unit (Bi-GRU), and attention mechanism. The proposed method is evaluated on three public datasets (Uc San Diego Dataset, PRED-CT, and University of Iowa (UI) dataset), with one dataset used for training and the other two for evaluation. The results show that the proposed model can accurately diagnose PD with high performance on both the training and hold-out datasets. The model also performs well even when some part of the input information is missing. The results of this work have significant implications for patient treatment and for ongoing investigations into the early detection of Parkinson's disease. The suggested model holds promise as a non-invasive and reliable technique for PD early detection utilizing resting state EEG.
    摘要 Parkinson's disease(PD)是一种严重和进行的神经疾病,影响全球数百万人。为了有效地治疗和管理PD,准确早期诊断是非常重要。本研究提出了基于深度学习的PD诊断模型,使用休息态电энцефаogram(EEG)信号。研究的目标是开发一个自动化的模型,可以从EEG中提取复杂隐藏的非线性特征,并在未看过数据上进行普适性评估。模型采用混合模型,包括卷积神经网络(CNN)、双向闭包Recurrent Unit(Bi-GRU)和注意机制。本研究在三个公共数据集(UC San Diego数据集、PRED-CT数据集和University of Iowa(UI)数据集)进行评估,其中一个数据集用于训练,另外两个数据集用于评估。结果表明,提议的模型可以准确地诊断PD,并且在训练和剩下数据集上都有高性能。此外,模型还能够在一部分输入信息缺失时保持良好的性能。本研究结果对患者治疗和PD早期检测的研究有着重要意义。建议的模型具有非侵入性和可靠性,可能成为PD早期诊断的非侵入性和可靠的技术。

Addressing Distribution Shift in RTB Markets via Exponential Tilting

  • paper_url: http://arxiv.org/abs/2308.07424
  • repo_url: None
  • paper_authors: Minji Kim, Seong Jin Lee, Bumsik Kim
  • for: This paper aims to address the issue of distribution shift in machine learning models, specifically in the context of Real-Time Bidding (RTB) market models.
  • methods: The proposed method is called Exponential Tilt Reweighting Alignment (ExTRA), which uses importance weights to minimize the KL divergence between the weighted source and target datasets. The ExTRA method can operate using labeled source data and unlabeled target data.
  • results: The paper evaluates the effectiveness of the ExTRA method through simulated real-world data, demonstrating its ability to address distribution shift and improve the performance of machine learning models.
    Abstract Distribution shift in machine learning models can be a primary cause of performance degradation. This paper delves into the characteristics of these shifts, primarily motivated by Real-Time Bidding (RTB) market models. We emphasize the challenges posed by class imbalance and sample selection bias, both potent instigators of distribution shifts. This paper introduces the Exponential Tilt Reweighting Alignment (ExTRA) algorithm, as proposed by Marty et al. (2023), to address distribution shifts in data. The ExTRA method is designed to determine the importance weights on the source data, aiming to minimize the KL divergence between the weighted source and target datasets. A notable advantage of this method is its ability to operate using labeled source data and unlabeled target data. Through simulated real-world data, we investigate the nature of distribution shift and evaluate the applicacy of the proposed model.
    摘要 Distribution shift in machine learning models can be a primary cause of performance degradation. This paper explores the characteristics of these shifts, primarily motivated by Real-Time Bidding (RTB) market models. We highlight the challenges posed by class imbalance and sample selection bias, both powerful instigators of distribution shifts. This paper introduces the Exponential Tilt Reweighting Alignment (ExTRA) algorithm, as proposed by Marty et al. (2023), to address distribution shifts in data. The ExTRA method determines the importance weights on the source data to minimize the KL divergence between the weighted source and target datasets. A notable advantage of this method is its ability to operate using labeled source data and unlabeled target data. Through simulated real-world data, we investigate the nature of distribution shift and evaluate the applicability of the proposed model.Here's the word-for-word translation:分布shift在机器学习模型中可以是表现下降的主要原因。这篇论文探讨分布shift的特点,主要受Real-Time Bidding(RTB)市场模型的激发。我们强调分布shift的挑战,包括分类偏度和样本选择偏见,这两者都是分布shift的强力引起者。这篇论文介绍Marty等人(2023)提出的扩展tilt重要性补做算法(ExTRA),用于Addressing分布shift in data。ExTRA方法通过确定源数据中的重要性Weight来减少源数据和目标数据之间的KL偏度。这种方法的一个优点是它可以使用标注的源数据和无标注的目标数据进行操作。通过实际的世界数据 simulate,我们investigate分布shift的本质和提出方法的适用性。

U-Turn Diffusion

  • paper_url: http://arxiv.org/abs/2308.07421
  • repo_url: None
  • paper_authors: Hamidreza Behjoo, Michael Chertkov
    for:这种Diffusion模型是用于生成人工图像的。methods:这些模型基于动态助手时间机制,其中Score函数来自输入图像。results:我们的研究发现了评估Diffusion模型效果的标准:生成过程中快速谱相关性的能力直接影响生成图像质量。此外,我们还提出了“U-Turn扩散”技术,该技术通过组合前向、U-turn和后向过程,生成一个接近独立同分布(i.i.d)样本。
    Abstract We present a comprehensive examination of score-based diffusion models of AI for generating synthetic images. These models hinge upon a dynamic auxiliary time mechanism driven by stochastic differential equations, wherein the score function is acquired from input images. Our investigation unveils a criterion for evaluating efficiency of the score-based diffusion models: the power of the generative process depends on the ability to de-construct fast correlations during the reverse/de-noising phase. To improve the quality of the produced synthetic images, we introduce an approach coined "U-Turn Diffusion". The U-Turn Diffusion technique starts with the standard forward diffusion process, albeit with a condensed duration compared to conventional settings. Subsequently, we execute the standard reverse dynamics, initialized with the concluding configuration from the forward process. This U-Turn Diffusion procedure, combining forward, U-turn, and reverse processes, creates a synthetic image approximating an independent and identically distributed (i.i.d.) sample from the probability distribution implicitly described via input samples. To analyze relevant time scales we employ various analytical tools, including auto-correlation analysis, weighted norm of the score-function analysis, and Kolmogorov-Smirnov Gaussianity test. The tools guide us to establishing that the Kernel Intersection Distance, a metric comparing the quality of synthetic samples with real data samples, is minimized at the optimal U-turn time.
    摘要 我们对基于分数的扩散模型的人工智能生成 sintetic 图像进行了全面的检验。这些模型基于动态辅助时间机制驱动的随机 diffeq 方程,其中分数函数从输入图像中获取。我们的调查发现一个评估分数基于扩散模型的效率的标准:生成过程中快速相关的破坏速率决定了扩散模型的能效性。为了提高生成的 sintetic 图像质量,我们提出了“U-Turn 扩散”技术。U-Turn 扩散技术从标准的前向扩散过程开始,但是与传统设置相比,它具有缩短的时间长度。然后,我们执行标准的反向动力学,初始化为前向过程的结束配置。这种 U-Turn 扩散过程,结合前向、U-turn 和反向过程,创造了一个约束为独立同分布(i.i.d.)随机变量的 sintetic 图像。为了分析相关的时间尺度,我们使用了多种分析工具,包括自相关分析、分数函数的Weighted нор 分析和kolmogorov-smirnov Gaussianity 测试。这些工具引导我们确定了最佳 U-turn 时间,使得权重 norm 分数函数的质量最佳。

Locally Adaptive and Differentiable Regression

  • paper_url: http://arxiv.org/abs/2308.07418
  • repo_url: None
  • paper_authors: Mingxuan Han, Varun Shankar, Jeff M Phillips, Chenglong Ye
  • for: 提出了一种基于本地学习模型的全球连续可导模型框架,以寻求处理数据中存在不同密度或函数值规模的问题。
  • methods: 使用权重加权平均方法将本地学习模型在相应的地方进行连续拟合,以实现全球连续可导模型。
  • results: 在推理中,该模型可以更快地达到统计准确性,并在各种实际应用中提供了改进的表现。
    Abstract Over-parameterized models like deep nets and random forests have become very popular in machine learning. However, the natural goals of continuity and differentiability, common in regression models, are now often ignored in modern overparametrized, locally-adaptive models. We propose a general framework to construct a global continuous and differentiable model based on a weighted average of locally learned models in corresponding local regions. This model is competitive in dealing with data with different densities or scales of function values in different local regions. We demonstrate that when we mix kernel ridge and polynomial regression terms in the local models, and stitch them together continuously, we achieve faster statistical convergence in theory and improved performance in various practical settings.
    摘要 现代机器学习中,过参数化模型如深度网络和随机森林已经非常流行。然而,传统机器学习模型中的稳定性和导数性目标,通常被现代过参数化、地方适应型模型所忽略。我们提出了一种通用框架,用于基于本地学习模型的权重平均构建全球连续和导数可达的模型。这种模型能够在不同的地方域上处理数据的不同浓度或函数值的比例。我们示出,当混合核ridge和多项式回归项在本地模型中,并将其们连续叠加时,我们可以在理论上更快地达到统计征 converge,并在各种实际场景中提高表现。

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

  • paper_url: http://arxiv.org/abs/2308.07395
  • repo_url: None
  • paper_authors: Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath
  • for: 提高auxiliary任务性能(非ASR任务)
  • methods: 使用文本注入法(JEIT)对ASR模型进行训练,并同时完成两个auxiliary任务
  • results: 提高了长尾数据的首字母排序性能,以及提高了对话转接检测的受检率
    Abstract Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tasks, which are the non-ASR tasks often performed by an E2E model. In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks. The first is capitalization, which is a de-normalization task. The second is turn-taking prediction, which attempts to identify whether a user has completed their conversation turn in a digital assistant interaction. We show results demonstrating that our text injection method boosts capitalization performance for long-tail data, and improves turn-taking detection recall.
    摘要 文本注入技术用于自动语音识别(ASR),其中不带文本数据用于补充带有音频文本数据的训练,已经显示出了词错率的明显改善。本研究探讨了文本注入的应用于 auxiliary task,这些任务通常由一个端到端模型完成。在这个工作中,我们使用联合端到端和内部语言模型训练算法(JEIT)来训练一个 ASR 模型,该模型同时完成了两个 auxiliary task。第一个是字母大小Normalization 任务,第二个是对话交换预测任务,它们是在数字助手交互中确定用户是否已经完成了对话转换。我们的文本注入方法可以提高长尾数据的字母大小正确率,并提高对话交换检测精度。

DISBELIEVE: Distance Between Client Models is Very Essential for Effective Local Model Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2308.07387
  • repo_url: None
  • paper_authors: Indu Joshi, Priyank Upadhya, Gaurav Kumar Nayak, Peter Schüffler, Nassir Navab
  • for: 该研究旨在探讨 Federated Learning 如何解决医疗数据隐私问题,并研究如何防止恶意客户端攻击 Federated 系统。
  • methods: 该研究提出了一种新的本地模型欺骗攻击(DISBELIEVE),该攻击可以在 Robust Aggregation 方法下降低本地模型的性能,从而影响全局模型的性能。
  • results: 实验结果表明,DISBELIEVE 攻击可以在三个公共可用的医疗图像集上显著降低 Robust Aggregation 方法的性能,并且在自然图像集上也有较好的效果。
    Abstract Federated learning is a promising direction to tackle the privacy issues related to sharing patients' sensitive data. Often, federated systems in the medical image analysis domain assume that the participating local clients are \textit{honest}. Several studies report mechanisms through which a set of malicious clients can be introduced that can poison the federated setup, hampering the performance of the global model. To overcome this, robust aggregation methods have been proposed that defend against those attacks. We observe that most of the state-of-the-art robust aggregation methods are heavily dependent on the distance between the parameters or gradients of malicious clients and benign clients, which makes them prone to local model poisoning attacks when the parameters or gradients of malicious and benign clients are close. Leveraging this, we introduce DISBELIEVE, a local model poisoning attack that creates malicious parameters or gradients such that their distance to benign clients' parameters or gradients is low respectively but at the same time their adverse effect on the global model's performance is high. Experiments on three publicly available medical image datasets demonstrate the efficacy of the proposed DISBELIEVE attack as it significantly lowers the performance of the state-of-the-art \textit{robust aggregation} methods for medical image analysis. Furthermore, compared to state-of-the-art local model poisoning attacks, DISBELIEVE attack is also effective on natural images where we observe a severe drop in classification performance of the global model for multi-class classification on benchmark dataset CIFAR-10.
    摘要 Federated learning 是一个有前途的方向,以解决分享患者敏感数据时的隐私问题。在医疗影像分析领域,联邦系统经常假设参与的本地客户端是诚实的。然而,一些研究表明,可以引入一组恶意客户端,使联邦设置受损,global模型性能下降。为解决这个问题,一些robust汇集方法被提出,以防止这些攻击。我们发现,大多数当前的state-of-the-art robust汇集方法都是依赖本地客户端和良好客户端之间的距离,这使得它们容易受到本地模型毒 poisoning攻击,当本地客户端和良好客户端之间的参数或梯度距离很近时。基于这一点,我们介绍了DISBELIEVE,一种本地模型毒 poisoning攻击,可以创造出谎言的参数或梯度,使其与良好客户端之间的距离很近,但同时对全局模型性能产生严重的影响。我们在三个公共可用的医疗影像数据集上进行了实验,并证明了我们提出的DISBELIEVE攻击的有效性,可以在医疗影像分析领域对当前的robust汇集方法进行重要的攻击。此外,我们还发现,DISBELIEVE攻击也能够在自然图像领域中效果,在CIFAR-10benchmark数据集上,全局模型的多类分类性能下降很严重。

DiffSED: Sound Event Detection with Denoising Diffusion

  • paper_url: http://arxiv.org/abs/2308.07293
  • repo_url: None
  • paper_authors: Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu
  • for: 这个论文的目标是提出一种基于生成学习的声音时间边界检测方法,以提高声音事件检测的精度和效率。
  • methods: 该方法使用一种基于扩散过程的梯度下降算法,通过在适应性Transformer核心网络框架中对含杂的提议进行修复,以将含杂的提议转换为高质量的事件时间边界。
  • results: 对于Urban-SED和EPIC-Sounds数据集,该方法在训练和测试中均显示出优于现有方法的性能,具有40%以上的更快的训练速度。
    Abstract Sound Event Detection (SED) aims to predict the temporal boundaries of all the events of interest and their class labels, given an unconstrained audio sample. Taking either the splitand-classify (i.e., frame-level) strategy or the more principled event-level modeling approach, all existing methods consider the SED problem from the discriminative learning perspective. In this work, we reformulate the SED problem by taking a generative learning perspective. Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process, conditioned on a target audio sample. During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions in the elegant Transformer decoder framework. Doing so enables the model generate accurate event boundaries from even noisy queries during inference. Extensive experiments on the Urban-SED and EPIC-Sounds datasets demonstrate that our model significantly outperforms existing alternatives, with 40+% faster convergence in training.
    摘要 声音事件检测(SED)目标是预测各个事件的时间边界和类别标签,给一个未Constrained的音频样本。现有的所有方法都是从推理学教学角度来解决SED问题,包括分割和分类(即帧级)策略或更为原理性的事件级模型。在这项工作中,我们重新定义了SED问题,通过一种生成学学习角度来解决。具体来说,我们目标是通过降噪过程中的批量梯度下降来生成各个事件的时间边界,使用搅拌Transformer推理框架。在训练中,我们的模型学习了将噪音提取过程反转,将噪音潜在提取到真实的样本。这使得我们的模型能够在推理中生成准确的事件边界,即使噪音提取过程不准确。我们在Urban-SED和EPIC-Sounds数据集上进行了广泛的实验,结果表明,我们的模型与现有的方法相比,在训练时间上提高了40%以上。

The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

  • paper_url: http://arxiv.org/abs/2308.07286
  • repo_url: None
  • paper_authors: Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André F. T. Martins, Graham Neubig, Ankush Garg, Jonathan H. Clark, Markus Freitag, Orhan Firat
  • for: 提高机器翻译系统的质量
  • methods: 利用大语言模型的推理和在场景学习能力,询问模型 identificar和分类翻译错误
  • results: 与只提示分数prompting比较,AutoMQM可以提高模型的性能,特别是大型模型,并提供可读性通过错误块与人工标注相对应
    Abstract Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems. While considerable progress has been made on estimating a single scalar quality score, current metrics lack the informativeness of more detailed schemes that annotate individual errors, such as Multidimensional Quality Metrics (MQM). In this paper, we help fill this gap by proposing AutoMQM, a prompting technique which leverages the reasoning and in-context learning capabilities of large language models (LLMs) and asks them to identify and categorize errors in translations. We start by evaluating recent LLMs, such as PaLM and PaLM-2, through simple score prediction prompting, and we study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores (with particularly large gains for larger models) while providing interpretability through error spans that align with human annotations.
    摘要 自动评估机器翻译(MT)是一种关键的工具,帮助快速迭代MT系统的发展。虽然已经做出了很大的进步,但当前的 метрикиlack the informativeness of more detailed schemes that annotate individual errors, such as Multidimensional Quality Metrics (MQM). 在这篇论文中,我们帮助填补这一漏洞,提出了AutoMQM,一种提示技术,利用大型自然语言模型(LLMs)的思维和Context learning能力,让它们标识和分类翻译中的错误。我们开始通过对最近的LLMs,如PaLM和PaLM-2,进行简单的分数预测提示,然后研究了abeled data的影响通过 Context learning和 fine-tuning。最后,我们评估了AutoMQM与PaLM-2模型,发现它可以提高性能,特别是对于更大的模型,而且提供了可读性通过错误跨度与人类标注相对应。

Cross-Attribute Matrix Factorization Model with Shared User Embedding

  • paper_url: http://arxiv.org/abs/2308.07284
  • repo_url: None
  • paper_authors: Wen Liang, Zeng Fan, Youzhi Liang, Jianguo Jia
  • for: 提高推荐系统的准确率和稳定性,特别是对于“长尾”用户和 Item。
  • methods: 使用神经网络抽象来捕捉用户-项目交互,同时考虑用户和项目的特性和属性,以解决冷启始问题。
  • results: 对于 MovieLens 和 Pinterest 数据集,我们的 Cross-Attribute Matrix Factorization 模型在 sparse 数据场景下显示出优于常见方法的性能。
    Abstract Over the past few years, deep learning has firmly established its prowess across various domains, including computer vision, speech recognition, and natural language processing. Motivated by its outstanding success, researchers have been directing their efforts towards applying deep learning techniques to recommender systems. Neural collaborative filtering (NCF) and Neural Matrix Factorization (NeuMF) refreshes the traditional inner product in matrix factorization with a neural architecture capable of learning complex and data-driven functions. While these models effectively capture user-item interactions, they overlook the specific attributes of both users and items. This can lead to robustness issues, especially for items and users that belong to the "long tail". Such challenges are commonly recognized in recommender systems as a part of the cold-start problem. A direct and intuitive approach to address this issue is by leveraging the features and attributes of the items and users themselves. In this paper, we introduce a refined NeuMF model that considers not only the interaction between users and items, but also acrossing associated attributes. Moreover, our proposed architecture features a shared user embedding, seamlessly integrating with user embeddings to imporve the robustness and effectively address the cold-start problem. Rigorous experiments on both the Movielens and Pinterest datasets demonstrate the superiority of our Cross-Attribute Matrix Factorization model, particularly in scenarios characterized by higher dataset sparsity.
    摘要 过去几年,深度学习在不同领域,包括计算机视觉、语音识别和自然语言处理等领域,展示了它的卓越。驱动于其成功,研究人员开始尝试应用深度学习技术到推荐系统中。尽管Neural Collaborative Filtering(NCF)和Neural Matrix Factorization(NeuMF)模型能够有效地捕捉用户-项目交互,但它们忽略了用户和项目的特定属性。这可能导致 robustness 问题,特别是对于数据中的"长尾"用户和项目。这种问题在推荐系统中通常被称为冷启始问题。为解决这个问题,我们在这篇论文中引入了一种改进的NeuMF模型,该模型不仅考虑用户和项目之间的交互,还考虑用户和项目之间的属性相互关系。此外,我们的提议的体系还具有共享用户嵌入,可以融合用户嵌入,从而提高系统的稳定性和效果地解决冷启始问题。我们在MovieLens和Pinterest数据集上进行了严格的实验,结果显示我们的 Cross-Attribute Matrix Factorization模型在数据集稀疏程度较高的情况下表现出色。

Data-Efficient Energy-Aware Participant Selection for UAV-Enabled Federated Learning

  • paper_url: http://arxiv.org/abs/2308.07273
  • repo_url: None
  • paper_authors: Youssra Cheriguene, Wael Jaafar, Chaker Abdelaziz Kerrache, Halim Yanikomeroglu, Fatima Zohra Bousbaa, Nasreddine Lagraa
  • for: 本研究旨在提高边缘 federated learning(FL)模型的准确性,通过选择合适的无人机参与者,并且考虑无人机的能源消耗、通信质量和本地数据的不同性。
  • methods: 本研究提出了一种新的无人机参与者选择策略,即基于数据效率和能源占用率的能源意识参与者选择策略(DEEPS),该策略通过选择每个子区域中最佳的FL参与者,基于本地数据的结构相似度指数平均分数和能源占用资料来实现。
  • results: 通过实验,本研究表明,对于边缘FL,使用DEEPS策略可以提高模型准确性、减少训练时间和无人机的能源消耗,相比于随机选择策略。
    Abstract Unmanned aerial vehicle (UAV)-enabled edge federated learning (FL) has sparked a rise in research interest as a result of the massive and heterogeneous data collected by UAVs, as well as the privacy concerns related to UAV data transmissions to edge servers. However, due to the redundancy of UAV collected data, e.g., imaging data, and non-rigorous FL participant selection, the convergence time of the FL learning process and bias of the FL model may increase. Consequently, we investigate in this paper the problem of selecting UAV participants for edge FL, aiming to improve the FL model's accuracy, under UAV constraints of energy consumption, communication quality, and local datasets' heterogeneity. We propose a novel UAV participant selection scheme, called data-efficient energy-aware participant selection strategy (DEEPS), which consists of selecting the best FL participant in each sub-region based on the structural similarity index measure (SSIM) average score of its local dataset and its power consumption profile. Through experiments, we demonstrate that the proposed selection scheme is superior to the benchmark random selection method, in terms of model accuracy, training time, and UAV energy consumption.
    摘要 “无人航空器(UAV)启动的边缘联合学习(FL)已经引起了研究者们的探索,因为UAV所收集的数据量巨大且多样,同时也存在资料传输到边缘服务器的隐私问题。然而,由于UAV收集的数据存在重复性,例如影像数据,以及不充分的FL参与者选择,FL学习过程的参数调整和模型偏好可能会增加。因此,本文研究UAV参与者选择的问题,以提高FL模型的准确性,并且遵循UAV的能源消耗、通信质量和本地数据的多样性限制。我们提出了一个 novel UAV参与者选择策略,called 数据效率能源注意的参与者选择策略(DEEPS),它是根据每个子区域中的本地数据和能源消耗观察所得到的结构相似度平均分数(SSIM)的平均分数,选择每个子区域中最佳的 FL 参与者。经过实验,我们发现,提案的选择策略与参考随机选择方法相比,在于模型准确性、训练时间和UAV能源消耗方面均有优势。”

Dialogue for Prompting: a Policy-Gradient-Based Discrete Prompt Optimization for Few-shot Learning

  • paper_url: http://arxiv.org/abs/2308.07272
  • repo_url: None
  • paper_authors: Chengzhengxu Li, Xiaoming Liu, Yichen Wang, Duyi Li, Yu Lan, Chao Shen
  • For: 提高几何语言处理(NLP)任务中的几何学习效果,以及解决现有的精度优化方法的问题。* Methods: 使用对话Alignment策略生成可读性提示集,并提出高效的提示筛选指标来选择高质量提示。然后,通过policy梯度学习算法来匹配提示和输入。* Results: 在四个开源数据集上,DP_2O方法在几何学习设定下的准确率高于当前最佳方法的1.52%,并且在不同的任务和数据集上都有good的通用性、稳定性和泛化能力。
    Abstract Prompt-based pre-trained language models (PLMs) paradigm have succeeded substantially in few-shot natural language processing (NLP) tasks. However, prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts, which is costly, inefficient, and subjective. Meanwhile, existing continuous prompt optimization methods improve the performance by learning the ideal prompts through the gradient information of PLMs, whose high computational cost, and low readability and generalizability are often concerning. To address the research gap, we propose a Dialogue-comprised Policy-gradient-based Discrete Prompt Optimization ($DP_2O$) method. We first design a multi-round dialogue alignment strategy for readability prompt set generation based on GPT-4. Furthermore, we propose an efficient prompt screening metric to identify high-quality prompts with linear complexity. Finally, we construct a reinforcement learning (RL) framework based on policy gradients to match the prompts to inputs optimally. By training a policy network with only 0.67% of the PLM parameter size on the tasks in the few-shot setting, $DP_2O$ outperforms the state-of-the-art (SOTA) method by 1.52% in accuracy on average on four open-source datasets. Moreover, subsequent experiments also demonstrate that $DP_2O$ has good universality, robustness, and generalization ability.
    摘要

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

  • paper_url: http://arxiv.org/abs/2308.07269
  • repo_url: https://github.com/zjunlp/easyedit
  • paper_authors: Peng Wang, Ningyu Zhang, Xin Xie, Yunzhi Yao, Bozhong Tian, Mengru Wang, Zekun Xi, Siyuan Cheng, Kangwei Liu, Guozhou Zheng, Huajun Chen
  • for: 这个论文的目的是提出一个轻松使用的知识编辑框架,以便在大语言模型(LLMs)上应用多种 cutting-edge 知识编辑方法。
  • methods: 该论文使用了多种知识编辑方法,包括粘贴、替换、剪辑等,以及一些自动生成的方法。
  • results: 该论文的实验结果表明,使用知识编辑方法可以超过传统的精度调整,并且具有更好的一致性和普适性。
    Abstract Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to the outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged -- aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Nevertheless, due to significant differences among various knowledge editing methods and the variations in task setups, there is no standard implementation framework available for the community, which hinders practitioners to apply knowledge editing to applications. To address these issues, we propose EasyEdit, an easy-to-use knowledge editing framework for LLMs. It supports various cutting-edge knowledge editing approaches and can be readily apply to many well-known LLMs such as T5, GPT-J, LlaMA, etc. Empirically, we report the knowledge editing results on LlaMA-2 with EasyEdit, demonstrating that knowledge editing surpasses traditional fine-tuning in terms of reliability and generalization. We have released the source code on GitHub at https://github.com/zjunlp/EasyEdit, along with Google Colab tutorials and comprehensive documentation for beginners to get started. Besides, we present an online system for real-time knowledge editing, and a demo video at http://knowlm.zjukg.cn/easyedit.mp4.
    摘要 大型语言模型(LLM)通常会受到知识割裂或错误问题的影响,这意味着它们不知道未看过的事件或生成文本中含有错误的 факти due to outdated/noisy data。为了解决这个问题,许多知识编辑方法 для LLM 已经出现 -- 目的是通过微妙地将更新的知识或不适合的行为进行调整,以最小化对无关输入的影响。然而,由于不同的知识编辑方法和任务设置的差异,现在没有一个标准的实现框架可以供社区使用,这限制了实践者将知识编辑应用于应用程序。为了解决这些问题,我们提出了 EasyEdit,一个易于使用的知识编辑框架 для LLM。它支持许多最新的知识编辑方法,并可以轻松地应用到许多已知的 LLM,如 T5、GPT-J、LlaMA 等。我们在 LlaMA-2 上进行了知识编辑实验,结果显示,知识编辑超过了传统精细调整的可靠性和应用性。我们在 GitHub 上发布了源代码,并提供 Google Colab 教学和详细的文档,以便初学者开始。此外,我们还提供了线上系统 для实时知识编辑,以及一个网页demo video,请参考 http://knowlm.zjukg.cn/easyedit.mp4。

LCE: An Augmented Combination of Bagging and Boosting in Python

  • paper_url: http://arxiv.org/abs/2308.07250
  • repo_url: https://github.com/localcascadeensemble/lce
  • paper_authors: Kevin Fauvel, Élisa Fromont, Véronique Masson, Philippe Faverdin, Alexandre Termier
  • for: 本研究开发了一个高性能、可扩展、易用的Python包lcensemble,用于对 классификация和回归问题进行通用任务。
  • methods: 本研究使用了Local Cascade Ensemble(LCE)机器学习方法,它将Random Forest和XGBoost两种现状顶峰方法融合,以实现更好的泛化预测器。
  • results: lcensemble可以与scikit-learn集成,并且可以与scikit-learn的管道和模型选择工具互动。它在处理大规模数据时表现出了高性能。
    Abstract lcensemble is a high-performing, scalable and user-friendly Python package for the general tasks of classification and regression. The package implements Local Cascade Ensemble (LCE), a machine learning method that further enhances the prediction performance of the current state-of-the-art methods Random Forest and XGBoost. LCE combines their strengths and adopts a complementary diversification approach to obtain a better generalizing predictor. The package is compatible with scikit-learn, therefore it can interact with scikit-learn pipelines and model selection tools. It is distributed under the Apache 2.0 license, and its source code is available at https://github.com/LocalCascadeEnsemble/LCE.
    摘要 LCensemble 是一个高性能、可扩展、易用的 Python 包,用于执行分类和回归的通用任务。该包实现了本地随机森林 ensemble(LCE)机器学习方法,该方法可以进一步提高当前状态的艺术法和 XGBoost 方法的预测性能。LCE 结合了它们的优点,采用了补做的多样化方法,从而获得一个更好的总体预测器。该包与 scikit-learn 兼容,因此可以与 scikit-learn 管道和模型选择工具进行交互。它根据 Apache 2.0 license 分发,源代码可以在 https://github.com/LocalCascadeEnsemble/LCE 上 obtener。

Can we Agree? On the Rashōmon Effect and the Reliability of Post-Hoc Explainable AI

  • paper_url: http://arxiv.org/abs/2308.07247
  • repo_url: None
  • paper_authors: Clement Poiret, Antoine Grigis, Justin Thomas, Marion Noulhiane
  • for: 这项研究探讨了使用SHAP在Rashomon集中获得可靠知识的挑战。
  • methods: 研究使用5个公共数据集进行实验,发现采样大小的增加可以提高模型的解释的一致性。但在少量采样下(<128个样本),解释具有高度的变化性,因此不可靠地抽取知识。然而,随着更多的数据,模型之间的一致性提高,允许达成共识。bagging ensemble通常具有更高的一致性。
  • results: 研究结果表明,要在少量采样下(<128个样本)进行验证,以确保结论的可靠性。此外,对于不同的模型类型、数据领域和解释方法,进一步的研究是必要的。测试神经网络和特定解释方法的收敛性也是有价值的。本研究的方法指向了可靠地从模糊模型中提取知识的原则方法。
    Abstract The Rash\=omon effect poses challenges for deriving reliable knowledge from machine learning models. This study examined the influence of sample size on explanations from models in a Rash\=omon set using SHAP. Experiments on 5 public datasets showed that explanations gradually converged as the sample size increased. Explanations from <128 samples exhibited high variability, limiting reliable knowledge extraction. However, agreement between models improved with more data, allowing for consensus. Bagging ensembles often had higher agreement. The results provide guidance on sufficient data to trust explanations. Variability at low samples suggests that conclusions may be unreliable without validation. Further work is needed with more model types, data domains, and explanation methods. Testing convergence in neural networks and with model-specific explanation methods would be impactful. The approaches explored here point towards principled techniques for eliciting knowledge from ambiguous models.
    摘要 “落差omon效应对机器学习模型知识抽取带来挑战。这项研究研究了模型在Rashomon集中的解释如何受样本大小影响。使用SHAP进行实验,发现随着样本大小增加,解释的一致性逐渐提高。但是,从128个样本开始,解释呈现高度的变化, limiting 可靠知识抽取。然而,通过更多的数据,模型之间的一致性提高,allowing for consensus。 bagging ensemble 通常具有更高的一致性。结果为我们提供了足够数据来信任解释的指南。低样本数时的变化表明,不进行验证,得出的结论可能不可靠。未来的工作应该进一步探索更多的模型类型、数据领域和解释方法。测试神经网络和特定模型解释方法的受样本大小影响也是有价值的。研究进行的方法指向了有良好原则的模型解释技术。”

A Unifying Generator Loss Function for Generative Adversarial Networks

  • paper_url: http://arxiv.org/abs/2308.07233
  • repo_url: None
  • paper_authors: Justin Veiner, Fady Alajaji, Bahman Gharesifard
  • for: 这个论文主要关注的是 dual-objective generative adversarial network (GAN) 的 $\alpha$-parametrized generator loss function,用于替代原始 GAN 系统中的 classical discriminator loss function。
  • methods: 这个论文提出了一种基于 symmetric class probability estimation 类型的 generator loss function,称为 $\mathcal{L}_\alpha$,并使用这个loss function来定义 $\mathcal{L}_\alpha$-GAN 系统。
  • results: 研究人员通过分析 generator 的优化问题,发现 generator 的优化问题可以表示为一个 Jensen-$f_\alpha$- divergence 的最小化问题,其中 $f_\alpha$ 是一个 convex 函数,具体表示为 loss function $\mathcal{L}_\alpha$。此外,这个 $\mathcal{L}_\alpha$-GAN 问题还可以恢复一些在文献中提出的 GAN 问题,包括 VanillaGAN、LSGAN、L$k$GAN 和 $({\alpha_D},{\alpha_G})$-GAN 中的 $\alpha_D=1$。最后,在 MNIST、CIFAR-10 和 Stacked MNIST 三个数据集上进行了实验,以证明不同的例子的 $\mathcal{L}_\alpha$-GAN 系统的性能。
    Abstract A unifying $\alpha$-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN), which uses a canonical (or classical) discriminator loss function such as the one in the original GAN (VanillaGAN) system. The generator loss function is based on a symmetric class probability estimation type function, $\mathcal{L}_\alpha$, and the resulting GAN system is termed $\mathcal{L}_\alpha$-GAN. Under an optimal discriminator, it is shown that the generator's optimization problem consists of minimizing a Jensen-$f_\alpha$-divergence, a natural generalization of the Jensen-Shannon divergence, where $f_\alpha$ is a convex function expressed in terms of the loss function $\mathcal{L}_\alpha$. It is also demonstrated that this $\mathcal{L}_\alpha$-GAN problem recovers as special cases a number of GAN problems in the literature, including VanillaGAN, Least Squares GAN (LSGAN), Least $k$th order GAN (L$k$GAN) and the recently introduced $(\alpha_D,\alpha_G)$-GAN with $\alpha_D=1$. Finally, experimental results are conducted on three datasets, MNIST, CIFAR-10, and Stacked MNIST to illustrate the performance of various examples of the $\mathcal{L}_\alpha$-GAN system.
    摘要 文本中引入了一种对称$\alpha$-参数化生成器损失函数,用于一个双目标生成对抗网络(GAN)系统。生成器损失函数基于一种对称的класси型概率估计函数,$\mathcal{L}_\alpha$,并将系统称为$\mathcal{L}_\alpha$-GAN。在理想的权衡器下, generator的优化问题可以表示为最小化一种Jensen-$f_\alpha$-分配,这是自然推广Jensen-Shannon分配的一种自然推广,其中$f_\alpha$是一个对称的束缚函数,它与损失函数$\mathcal{L}_\alpha$有关。此外,这个$\mathcal{L}_\alpha$-GAN问题还能够恢复一些文献中的GAN问题,包括VanillaGAN、LSGAN、L$k$GAN和$(\alpha_D,\alpha_G)$-GAN中的$\alpha_D=1$。最后,对三个数据集(MNIST、CIFAR-10和Stacked MNIST)进行了实验,以示出不同的$\mathcal{L}_\alpha$-GAN系统的性能。