cs.LG - 2023-08-04

Enhancing Cell Tracking with a Time-Symmetric Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2308.03887
  • repo_url: None
  • paper_authors: Gergely Szabó, Paolo Bonaiuti, Andrea Ciliberto, András Horváth
  • for: 这种论文的目的是提出一种基于深度学习的细胞跟踪方法,以便在视频微scopic录像中跟踪细胞的运动。
  • methods: 该方法基于细胞的空间时间邻域特征,不需要采用连续帧的假设,可以普适应用于各种生物学应用。
  • results: 经过多种生物学上的验证和比较,该方法能够有效地跟踪细胞的运动,并且可以承受大量的视频帧和噪声。
    Abstract The accurate tracking of live cells using video microscopy recordings remains a challenging task for popular state-of-the-art image processing based object tracking methods. In recent years, several existing and new applications have attempted to integrate deep-learning based frameworks for this task, but most of them still heavily rely on consecutive frame based tracking embedded in their architecture or other premises that hinder generalized learning. To address this issue, we aimed to develop a new deep-learning based tracking method that relies solely on the assumption that cells can be tracked based on their spatio-temporal neighborhood, without restricting it to consecutive frames. The proposed method has the additional benefit that the motion patterns of the cells can be learned completely by the predictor without any prior assumptions, and it has the potential to handle a large number of video frames with heavy artifacts. The efficacy of the proposed method is demonstrated through multiple biologically motivated validation strategies and compared against several state-of-the-art cell tracking methods.
    摘要 Live cells的准确跟踪使用视频微scopic记录仍然是流行的state-of-the-art图像处理基于对象跟踪方法中的挑战。在过去几年,许多现有和新的应用程序尝试 integrate deep learning基础框架来解决这个问题,但大多数它们仍然受限于顺序帧基础或其他假设,这会阻碍普适学习。为解决这个问题,我们目标是开发一种新的deep learning基础的跟踪方法,不需要基于顺序帧的假设,可以基于细胞的空间时间邻域来跟踪细胞。这种方法的优点是可以通过predictor完全学习细胞的运动模式,无需任何先前假设,并且可以处理大量的视频帧并快速响应变化。我们通过多种生物学上motivated的验证方法证明了提案的方法的有效性,并与多种现有细胞跟踪方法进行比较。

Learning Optimal Admission Control in Partially Observable Queueing Networks

  • paper_url: http://arxiv.org/abs/2308.02391
  • repo_url: None
  • paper_authors: Jonatha Anselmi, Bruno Gaujal, Louis-Sébastien Rebuffi
  • for: 本研究开发了一个高效的增强学习算法,用于在半可观queueing网络中计算最佳接受控制策略。
  • methods: 本研究使用了Norton的等效定理和生生死范例中的增强学习算法,以解决半可观POMDP中的问题。
  • results: 本研究的结果显示,我们的算法可以在半可观POMDP中实现最佳的平均保持/拒绝成本,并且其 regret bound只随S的幂级函数而增长,而不是像先前的研究一样,受到网络的宽度影响。
    Abstract We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. Specifically, only the arrival and departure times from the network are observable, and optimality refers to the average holding/rejection cost in infinite horizon. While reinforcement learning in Partially Observable Markov Decision Processes (POMDP) is prohibitively expensive in general, we show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$. In particular, in contrast with existing regret analyses, our regret bound does not depend on the diameter of the underlying Markov Decision Process (MDP), which in most queueing systems is at least exponential in $S$. The novelty of our approach is to leverage Norton's equivalent theorem for closed product-form queueing networks and an efficient reinforcement learning algorithm for MDPs with the structure of birth-and-death processes.
    摘要 我们提出了一个高效的增强学习算法,可以在具有部分可见性的队列网络中学习最佳接受控制策略。具体来说,只有队列网络的到达和离开时间是可见的,且优化指的是在无穷远征 horizon 中的平均保持/拒绝成本。而在 partially observable Markov decision processes (POMDP) 中,通常情况下增强学习是不可持续的,但我们显示了我们的算法仅对最大队列数量 $S$ 有对��� penalty。具体来说,我们的 regret bound 不同于现有的 regret 分析,不随 diameters 的增长,而是随 $S$ 的增长。我们的方法具有以下两个特点:一是利用 Norton's 等效定理 для closed product-form queueing networks,二是一种高效的增强学习算法 для birth-and-death 过程中的 MDP。

Scaling Survival Analysis in Healthcare with Federated Survival Forests: A Comparative Study on Heart Failure and Breast Cancer Genomics

  • paper_url: http://arxiv.org/abs/2308.02382
  • repo_url: None
  • paper_authors: Alberto Archetti, Francesca Ieva, Matteo Matteucci
  • for: 这paper的目的是提出一种基于联合学习的生存分析方法,以解决现实世界应用中的生存数据杂乱、截断和保密问题。
  • methods: 这paper使用了联合学习技术,特别是基于生存森林的Federated Survival Forest(FedSurF)算法,以及一些新的树样本方法来提高算法的性能和隐私保护。
  • results: 实验结果表明,FedSurF++可以与现有方法相比,在一次通信往返完成后达到相似的性能,并且具有更高的效率、更好的鲁棒性和更好的隐私保护。 Additionally, the paper demonstrates the success of FedSurF++ on two real-world datasets, highlighting its potential for improving the scalability and effectiveness of survival analysis in distributed settings while preserving user privacy.
    Abstract Survival analysis is a fundamental tool in medicine, modeling the time until an event of interest occurs in a population. However, in real-world applications, survival data are often incomplete, censored, distributed, and confidential, especially in healthcare settings where privacy is critical. The scarcity of data can severely limit the scalability of survival models to distributed applications that rely on large data pools. Federated learning is a promising technique that enables machine learning models to be trained on multiple datasets without compromising user privacy, making it particularly well-suited for addressing the challenges of survival data and large-scale survival applications. Despite significant developments in federated learning for classification and regression, many directions remain unexplored in the context of survival analysis. In this work, we propose an extension of the Federated Survival Forest algorithm, called FedSurF++. This federated ensemble method constructs random survival forests in heterogeneous federations. Specifically, we investigate several new tree sampling methods from client forests and compare the results with state-of-the-art survival models based on neural networks. The key advantage of FedSurF++ is its ability to achieve comparable performance to existing methods while requiring only a single communication round to complete. The extensive empirical investigation results in a significant improvement from the algorithmic and privacy preservation perspectives, making the original FedSurF algorithm more efficient, robust, and private. We also present results on two real-world datasets demonstrating the success of FedSurF++ in real-world healthcare studies. Our results underscore the potential of FedSurF++ to improve the scalability and effectiveness of survival analysis in distributed settings while preserving user privacy.
    摘要 生存分析是医学中的基本工具,用于模型在人口中事件兴 interest 的时间 until 发生。然而,在实际应用中,生存数据经常是不完整、审核、分布和保密的,尤其在医疗设置中,隐私是极为重要的。数据的缺乏可能会对大规模生存模型的扩展应用产生严重的限制。联邦学习是一种有前途的技术,它允许机器学习模型在多个数据集上进行训练,而不需要牺牲用户隐私。这使得联邦学习在生存数据和大规模生存应用中具有极大的潜力。在这项工作中,我们提出了一种基于联邦生存森林算法的扩展,即 FedSurF++。这是一种联邦ensemble方法,可以在不同的联邦中随机生成生存森林。我们 investigate 了一些新的客户端森林采样方法,并与现有的生存模型基于神经网络进行比较。FedSurF++ 的关键优势在于它可以在单一的通信循环中完成,并且可以与现有方法相比具有相似的性能。我们的实验结果表明,FedSurF++ 可以在算法和隐私保护方面做出显著改进,使原始 FedSurF 算法更加高效、稳定和安全。此外,我们还对两个实际 dataset 进行了实践,证明了 FedSurF++ 在真实世界医疗研究中的成功。我们的结果表明,FedSurF++ 可以在分布式设置中提高生存分析的扩展性和效果,同时保持用户隐私。

Harnessing the Web and Knowledge Graphs for Automated Impact Investing Scoring

  • paper_url: http://arxiv.org/abs/2308.02622
  • repo_url: None
  • paper_authors: Qingzhi Hu, Daniel Daza, Laurens Swinkels, Kristina Ūsaitė, Robbert-Jan ‘t Hoen, Paul Groth
  • for: This paper aims to automate the process of creating an SDG framework for companies.
  • methods: The proposed system uses a data-driven approach, collecting and filtering a dataset of texts from various web sources and a knowledge graph, and then training classifiers to predict SDG scores for a given company.
  • results: The best performing model achieved a micro average F1 score of 0.89, demonstrating the effectiveness of the proposed solution. Additionally, the system provides explanations in the form of data relevant to the predicted score, facilitating its use by humans.
    Abstract The Sustainable Development Goals (SDGs) were introduced by the United Nations in order to encourage policies and activities that help guarantee human prosperity and sustainability. SDG frameworks produced in the finance industry are designed to provide scores that indicate how well a company aligns with each of the 17 SDGs. This scoring enables a consistent assessment of investments that have the potential of building an inclusive and sustainable economy. As a result of the high quality and reliability required by such frameworks, the process of creating and maintaining them is time-consuming and requires extensive domain expertise. In this work, we describe a data-driven system that seeks to automate the process of creating an SDG framework. First, we propose a novel method for collecting and filtering a dataset of texts from different web sources and a knowledge graph relevant to a set of companies. We then implement and deploy classifiers trained with this data for predicting scores of alignment with SDGs for a given company. Our results indicate that our best performing model can accurately predict SDG scores with a micro average F1 score of 0.89, demonstrating the effectiveness of the proposed solution. We further describe how the integration of the models for its use by humans can be facilitated by providing explanations in the form of data relevant to a predicted score. We find that our proposed solution enables access to a large amount of information that analysts would normally not be able to process, resulting in an accurate prediction of SDG scores at a fraction of the cost.
    摘要 联合国发布可持续发展目标(SDGs),以促进政策和活动,确保人类发展和可持续性。在金融业中,SDG框架生成了分数,用于评估公司如何与17个SDG启合。这些分数可以帮助评估投资,以建立包容性和可持续的经济。由于需要高质量和可靠性,创建和维护这些框架需要很长时间和广泛的领域专业知识。在这项工作中,我们描述了一个数据驱动的系统,用于自动化SDG框架的创建过程。首先,我们提出了一种新的方法,用于收集和筛选来自不同网络源和知识图库相关公司的文本数据集。然后,我们实现和部署基于这些数据的分类器,用于预测公司与SDG的Alignment Score。我们的结果表明,我们的最佳表现模型可以准确预测SDG分数,微均F1分数为0.89,证明我们的解决方案的有效性。此外,我们还描述了如何将模型与人类使用者集成,通过提供预测分数的数据相关信息进行解释。我们发现,我们的提议的解决方案可以提供大量信息,让分析员不可能处理的信息,并且可以在成本的一小部分下准确预测SDG分数。

A Machine Learning Method for Predicting Traffic Signal Timing from Probe Vehicle Data

  • paper_url: http://arxiv.org/abs/2308.02370
  • repo_url: None
  • paper_authors: Juliette Ugirumurera, Joseph Severino, Erik A. Bensen, Qichao Wang, Jane Macfarlane
  • for: This paper aims to estimate traffic signal timing information from vehicle probe data using machine learning techniques.
  • methods: The authors use Extreme Gradient Boosting (XGBoost) model to estimate signal cycle lengths and a neural network model to determine the corresponding red times per phase from probe data.
  • results: The authors achieve an error of less than 0.56 seconds for cycle length predictions and red times predictions within 7.2 seconds on average.Here’s the result in Simplified Chinese text:
  • for: 这篇论文目的是使用机器学习技术来估算交通信号灯控制信息从车辆探测数据中。
  • methods: 作者使用极大梯度提升(XGBoost)模型来估算信号周期长度,并使用神经网络模型来确定每个阶段的红灯时间从探测数据中。
  • results: 作者在周期长度预测中达到了 menos于0.56秒的错误,红灯时间预测在7.2秒的平均错误范围内。
    Abstract Traffic signals play an important role in transportation by enabling traffic flow management, and ensuring safety at intersections. In addition, knowing the traffic signal phase and timing data can allow optimal vehicle routing for time and energy efficiency, eco-driving, and the accurate simulation of signalized road networks. In this paper, we present a machine learning (ML) method for estimating traffic signal timing information from vehicle probe data. To the authors best knowledge, very few works have presented ML techniques for determining traffic signal timing parameters from vehicle probe data. In this work, we develop an Extreme Gradient Boosting (XGBoost) model to estimate signal cycle lengths and a neural network model to determine the corresponding red times per phase from probe data. The green times are then be derived from the cycle length and red times. Our results show an error of less than 0.56 sec for cycle length, and red times predictions within 7.2 sec error on average.
    摘要 停车信号对交通运输具有重要作用,协调交通流量,保证交通圆环安全。此外,了解停车信号阶段和时间数据可以帮助车辆进行优化的路径规划,以提高时间和能源效率, ec driving 和停车信号网络的准确模拟。本文提出了一种机器学习(ML)方法,通过车辆探测数据来估算停车信号时间信息。作者知道的研究中,很少有使用车辆探测数据来确定停车信号时间参数的工作。本文开发了极Gradient Boosting(XGBoost)模型来估算停车信号阶段长度,以及一个神经网络模型来确定每个阶段的红灯时间。绿灯时间则可以从阶段长度和红灯时间中 derivation。我们的结果显示停车信号阶段长度的预测错误在0.56秒之下,而红灯时间的预测错误在7.2秒之内。

Color Image Recovery Using Generalized Matrix Completion over Higher-Order Finite Dimensional Algebra

  • paper_url: http://arxiv.org/abs/2308.02621
  • repo_url: None
  • paper_authors: Liang Liao, Zhuang Guo, Qi Gao, Yan Wang, Fajun Yu, Qifeng Zhao, Stephen Johh Maybank
  • for: 填充缺失色像的精度提高
  • methods: 基于泛化高阶矩阵的 recovery方法,包括扩展传统第二阶矩阵模型到更全面的高阶矩阵相似模型,并利用像素邻域扩展策略来捕捉地方像素约束。
  • results: 对各种算法进行了广泛的实验,包括使用模拟数据和公共可用的图像,并与传统矩阵和tensor completion算法进行比较。结果显示,我们的泛化矩阵完成模型和相应的算法在精度和效率方面与低阶矩阵和传统矩阵相似。
    Abstract To improve the accuracy of color image completion with missing entries, we present a recovery method based on generalized higher-order scalars. We extend the traditional second-order matrix model to a more comprehensive higher-order matrix equivalent, called the "t-matrix" model, which incorporates a pixel neighborhood expansion strategy to characterize the local pixel constraints. This "t-matrix" model is then used to extend some commonly used matrix and tensor completion algorithms to their higher-order versions. We perform extensive experiments on various algorithms using simulated data and algorithms on simulated data and publicly available images and compare their performance. The results show that our generalized matrix completion model and the corresponding algorithm compare favorably with their lower-order tensor and conventional matrix counterparts.
    摘要 要提高颜色图像完成缺失项的准确性,我们提出了基于泛化高阶约束的恢复方法。我们将传统的第二阶矩阵模型扩展到更为全面的高阶约束模型,称之为“t-矩阵”模型,该模型通过Pixel neighborhood Expansion strategy来捕捉当地像素约束。这个“t-矩阵”模型然后用于扩展一些通常用的矩阵和tensor completion算法到其高阶版本。我们在各种算法上进行了广泛的实验,使用模拟数据和公共可用的图像,并比较了其性能。结果显示,我们的泛化约束模型和相应的算法与其低阶矩阵和传统矩阵counterpart相比,表现出色。

Intensity-free Integral-based Learning of Marked Temporal Point Processes

  • paper_url: http://arxiv.org/abs/2308.02360
  • repo_url: https://github.com/stepinsilence/ifib
  • paper_authors: Sishun Liu, Ke Deng, Xiuzhen Zhang, Yongli Ren
  • for: 这个研究旨在开发一个高精度的组合PDF(conditional joint PDF)模型,用于处理离散事件,其事件标签可以是数值或标签型态的多维连续空间内。
  • methods: 这个研究提出了一个名为IFIB(intensity-free integral-based process)的解决方案,它不需要定义强度函数,直接模型组合PDF $p^*(m,t)$。IFIB使用了一个简单的架构,并且具有较好的性能和简洁性。
  • results: 研究人员透过实验和实际应用评估了IFIB的性能,结果显示IFIB在实际应用中表现更好,并且可以更好地捕捉事件的特性和趋势。另外,IFIB的代码亦可以在GitHub上获取。
    Abstract In the marked temporal point processes (MTPP), a core problem is to parameterize the conditional joint PDF (probability distribution function) $p^*(m,t)$ for inter-event time $t$ and mark $m$, conditioned on the history. The majority of existing studies predefine intensity functions. Their utility is challenged by specifying the intensity function's proper form, which is critical to balance expressiveness and processing efficiency. Recently, there are studies moving away from predefining the intensity function -- one models $p^*(t)$ and $p^*(m)$ separately, while the other focuses on temporal point processes (TPPs), which do not consider marks. This study aims to develop high-fidelity $p^*(m,t)$ for discrete events where the event marks are either categorical or numeric in a multi-dimensional continuous space. We propose a solution framework IFIB (\underline{I}ntensity-\underline{f}ree \underline{I}ntegral-\underline{b}ased process) that models conditional joint PDF $p^*(m,t)$ directly without intensity functions. It remarkably simplifies the process to compel the essential mathematical restrictions. We show the desired properties of IFIB and the superior experimental results of IFIB on real-world and synthetic datasets. The code is available at \url{https://github.com/StepinSilence/IFIB}.
    摘要 “在标注时间点过程(MTPP)中,核心问题是参数化 conditional joint PDF(概率分布函数)$p^*(m,t)$, conditioned on the history。大多数现有研究采用预定 INTENSITY 函数。其Utility 受到预定 INTENSITY 函数的正确形式的挑战,这是 critical 的平衡表达力和处理效率。现在,有一些研究弃除预定 INTENSITY 函数——一种 models $p^*(t)$ 和 $p^*(m)$ 分别,而另一种关注时间点过程(TPPs),不考虑标记。本研究目的是开发高准确的 $p^*(m,t)$ для精细事件,其事件标记可以是 categorical 或 numeric 在多维连续空间。我们提出了解决方案框架 IFIB(INTENSITY-free INTEGRAL-based process),它直接模型 conditional joint PDF $p^*(m,t)$ без INTENSITY 函数。它够remarkably 简化过程,迫使其数学约束。我们显示了 IFIB 的愿望性质和实验结果,并提供了实验结果。代码可以在 \url{https://github.com/StepinSilence/IFIB} 上获取。”

ChatGPT for GTFS: From Words to Information

  • paper_url: http://arxiv.org/abs/2308.02618
  • repo_url: https://github.com/utel-uiuc/gtfs_llm
  • paper_authors: Saipraneeth Devunuri, Shirin Qiam, Lewis Lehe
  • for: This research aims to determine if current large language models (LLMs) can retrieve information from the General Transit Feed Specification (GTFS) using natural language instructions.
  • methods: The research uses ChatGPT (GPT-3.5) to test its understanding of the GTFS specification and to perform information extraction from a filtered GTFS feed with 4 routes. The research compares zero-shot and program synthesis for information retrieval.
  • results: GPT-3.5 answers 77% of multiple-choice questions correctly, and program synthesis achieves ~90% accuracy on simple questions and ~40% accuracy on complex questions for information retrieval.
    Abstract The General Transit Feed Specification (GTFS) standard for publishing transit data is ubiquitous. GTFS being tabular data, with information spread across different files, necessitates specialized tools or packages to retrieve information. Concurrently, the use of Large Language Models for text and information retrieval is growing. The idea of this research is to see if the current widely adopted LLMs (ChatGPT) are able to retrieve information from GTFS using natural language instructions. We first test whether ChatGPT (GPT-3.5) understands the GTFS specification. GPT-3.5 answers 77% of our multiple-choice questions (MCQ) correctly. Next, we task the LLM with information extractions from a filtered GTFS feed with 4 routes. For information retrieval, we compare zero-shot and program synthesis. Program synthesis works better, achieving ~90% accuracy on simple questions and ~40% accuracy on complex questions.
    摘要 “通用交通资料标准(GTFS)是公共交通资料的标准化方式,它是表格式的数据,资讯分散在不同的档案中,因此需要特殊的工具或套件来撷取资讯。同时,使用大型自然语言模型(LLM)来检索和撷取文本资讯的使用情况正在增加。本研究的想法是看看目前最受推崇的LLM(ChatGPT)是否可以使用自然语言指令来从GTFS中撷取资讯。我们首先测试了ChatGPT是否理解GTFS规范。ChatGPT回答了我们的多选询题(MCQ)中的77%问题正确。接下来,我们将LLM调用到一个筛选后的GTFS资料中,并进行信息撷取。对于信息撷取,我们比较了零配置和程式合成。程式合成的方法在简单问题上取得了约90%的准确率,而在复杂问题上取得了约40%的准确率。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Multi-attacks: Many images $+$ the same adversarial attack $\to$ many target labels

  • paper_url: http://arxiv.org/abs/2308.03792
  • repo_url: https://github.com/stanislavfort/multi-attacks
  • paper_authors: Stanislav Fort
  • for: 本研究旨在提出多个攻击方法,可以同时改变多个图像的类别。
  • methods: 本研究使用了一种单个攻击方法,可以改变多个图像的类别,并且可以控制攻击方法的规模。
  • results: 研究发现,可以通过控制攻击方法的规模,改变多个图像的类别,并且可以在不同的图像和目标类上同时进行多个攻击。此外,研究还发现,采用协同学习可以减少攻击的可能性。
    Abstract We show that we can easily design a single adversarial perturbation $P$ that changes the class of $n$ images $X_1,X_2,\dots,X_n$ from their original, unperturbed classes $c_1, c_2,\dots,c_n$ to desired (not necessarily all the same) classes $c^*_1,c^*_2,\dots,c^*_n$ for up to hundreds of images and target classes at once. We call these \textit{multi-attacks}. Characterizing the maximum $n$ we can achieve under different conditions such as image resolution, we estimate the number of regions of high class confidence around a particular image in the space of pixels to be around $10^{\mathcal{O}(100)}$, posing a significant problem for exhaustive defense strategies. We show several immediate consequences of this: adversarial attacks that change the resulting class based on their intensity, and scale-independent adversarial examples. To demonstrate the redundancy and richness of class decision boundaries in the pixel space, we look for its two-dimensional sections that trace images and spell words using particular classes. We also show that ensembling reduces susceptibility to multi-attacks, and that classifiers trained on random labels are more susceptible. Our code is available on GitHub.
    摘要 我们示示出,可以轻松地设计单一的敌对干扰P,使$n$幅影像$X_1,X_2,\ldots,X_n$的原始、未干扰类别$c_1,c_2,\ldots,c_n$变化为愿意的类别$c^*_1,c^*_2,\ldots,c^*_n$。我们称这为“多元攻击”。在不同的图像分辨率下,我们估计在像素空间中高度信任类别的区域数量为$10^{\mathcal{O}(100)}$,这会对对抗策略造成严重的问题。我们显示了一些直接的后果:对于干扰的强度而变化的攻击,以及不受影像大小影响的攻击示例。我们还证明了分类器的集成可以对多元攻击进行防护,并且显示了使用随机标签训练的分类器更容易受到攻击。我们的代码可以在GitHub上找到。

Adapting to Change: Robust Counterfactual Explanations in Dynamic Data Landscapes

  • paper_url: http://arxiv.org/abs/2308.02353
  • repo_url: https://github.com/bardhprenkaj/hansel
  • paper_authors: Bardh Prenkaj, Mario Villaizan-Vallelado, Tobias Leemann, Gjergji Kasneci
  • for: This paper proposes a novel semi-supervised method for counterfactual explanation, called Dynamic GRAph Counterfactual Explainer (DyGRACE), which can be used to identify counterfactuals in graph-structured data.
  • methods: DyGRACE uses two graph autoencoders (GAEs) to learn the representation of each class in a binary classification scenario, and optimizes a parametric density function (implemented as a logistic regression function) to identify counterfactuals by maximizing the factual autoencoder’s reconstruction error.
  • results: The paper shows that DyGRACE is effective in identifying counterfactuals and can act as a drift detector, identifying distributional drift based on differences in reconstruction errors between iterations. It also avoids reliance on the oracle’s predictions in successive iterations, increasing the efficiency of counterfactual discovery.
    Abstract We introduce a novel semi-supervised Graph Counterfactual Explainer (GCE) methodology, Dynamic GRAph Counterfactual Explainer (DyGRACE). It leverages initial knowledge about the data distribution to search for valid counterfactuals while avoiding using information from potentially outdated decision functions in subsequent time steps. Employing two graph autoencoders (GAEs), DyGRACE learns the representation of each class in a binary classification scenario. The GAEs minimise the reconstruction error between the original graph and its learned representation during training. The method involves (i) optimising a parametric density function (implemented as a logistic regression function) to identify counterfactuals by maximising the factual autoencoder's reconstruction error, (ii) minimising the counterfactual autoencoder's error, and (iii) maximising the similarity between the factual and counterfactual graphs. This semi-supervised approach is independent of an underlying black-box oracle. A logistic regression model is trained on a set of graph pairs to learn weights that aid in finding counterfactuals. At inference, for each unseen graph, the logistic regressor identifies the best counterfactual candidate using these learned weights, while the GAEs can be iteratively updated to represent the continual adaptation of the learned graph representation over iterations. DyGRACE is quite effective and can act as a drift detector, identifying distributional drift based on differences in reconstruction errors between iterations. It avoids reliance on the oracle's predictions in successive iterations, thereby increasing the efficiency of counterfactual discovery. DyGRACE, with its capacity for contrastive learning and drift detection, will offer new avenues for semi-supervised learning and explanation generation.
    摘要 我们提出了一种新的半supervised图CounterfactualExplainer(GCE)方法,即动态图CounterfactualExplainer(DyGRACE)。它利用初始数据分布的知识来搜索有效的counterfactuals,而不是使用可能已经过时的决策函数。使用两个图自动encoder(GAEs),DyGRACE学习了每个类型在二分类场景中的表示。在训练过程中,GAEs minimizes the reconstruction error between the original graph and its learned representation。方法包括(i)通过maximizing the factual autoencoder's reconstruction error来优化一个parametric density function(实际上是一个логистиック回归函数)来标识counterfactuals,(ii)minimize the counterfactual autoencoder's error,和(iii)maximize the similarity between the factual and counterfactual graphs。这种半supervised的方法不依赖于下一个黑盒模型的预测。一个логистиック回归模型在一组图对上训练,以便在每个未看过的图上标识最佳counterfactual候选者,而GAEs可以在迭代过程中不断更新以表示learned graph representation的持续适应。DyGRACE非常有效,可以作为分布检测器,基于不同迭代的reconstruction error来发现分布的变化。它不依赖于下一个黑盒模型的预测,从而提高了对counterfactual的发现效率。DyGRACE,拥有对比学习和分布检测的能力,将为半supervised学习和解释生成提供新的 Avenues。

RobustMQ: Benchmarking Robustness of Quantized Models

  • paper_url: http://arxiv.org/abs/2308.02350
  • repo_url: None
  • paper_authors: Yisong Xiao, Aishan Liu, Tianyuan Zhang, Haotong Qin, Jinyang Guo, Xianglong Liu
  • for: 评估量化神经网络模型的可靠性和稳定性在具有限制资源的设备上的部署。
  • methods: 对于不同类型的噪声(攻击性噪声、自然噪声和系统性噪声)进行了全面的评估,并结合了已有的可靠性评估原则,以提供完整和有价值的发现。
  • results: 研究发现,量化模型对于不同类型的噪声 exhibit 不同的抵抗性,例如:量化模型在攻击性噪声方面具有更高的抵抗性,但在自然噪声和系统性噪声方面更容易受到影响;随着量化比特宽的增加,对于攻击性噪声的抵抗性下降,对于自然噪声和系统性噪声的抵抗性增加。
    Abstract Quantization has emerged as an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. However, quantized models exhibit vulnerabilities when exposed to various noises in real-world applications. Despite the importance of evaluating the impact of quantization on robustness, existing research on this topic is limited and often disregards established principles of robustness evaluation, resulting in incomplete and inconclusive findings. To address this gap, we thoroughly evaluated the robustness of quantized models against various noises (adversarial attacks, natural corruptions, and systematic noises) on ImageNet. The comprehensive evaluation results empirically provide valuable insights into the robustness of quantized models in various scenarios, for example: (1) quantized models exhibit higher adversarial robustness than their floating-point counterparts, but are more vulnerable to natural corruptions and systematic noises; (2) in general, increasing the quantization bit-width results in a decrease in adversarial robustness, an increase in natural robustness, and an increase in systematic robustness; (3) among corruption methods, \textit{impulse noise} and \textit{glass blur} are the most harmful to quantized models, while \textit{brightness} has the least impact; (4) among systematic noises, the \textit{nearest neighbor interpolation} has the highest impact, while bilinear interpolation, cubic interpolation, and area interpolation are the three least harmful. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.
    摘要 量化技术已成为深度神经网络(DNNs)部署在有限资源设备上的重要手段。然而,量化模型在实际应用中容易受到各种噪音的影响,这些噪音包括攻击性的攻击、自然损害和系统性的噪音。despite the importance of evaluating the impact of quantization on robustness, existing research on this topic is limited and often disregards established principles of robustness evaluation, resulting in incomplete and inconclusive findings. To address this gap, we thoroughly evaluated the robustness of quantized models against various noises (adversarial attacks, natural corruptions, and systematic noises) on ImageNet. The comprehensive evaluation results empirically provide valuable insights into the robustness of quantized models in various scenarios, for example:1. 量化模型对攻击性攻击 exhibit higher robustness than their floating-point counterparts, but are more vulnerable to natural corruptions and systematic noises;2. in general, increasing the quantization bit-width results in a decrease in adversarial robustness, an increase in natural robustness, and an increase in systematic robustness;3. among corruption methods, impulse noise and glass blur are the most harmful to quantized models, while brightness has the least impact;4. among systematic noises, the nearest neighbor interpolation has the highest impact, while bilinear interpolation, cubic interpolation, and area interpolation are the three least harmful.Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.

Vehicles Control: Collision Avoidance using Federated Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.02614
  • repo_url: None
  • paper_authors: Badr Ben Elallid, Amine Abouaomar, Nabil Benamar, Abdellatif Kobbane
  • for: 运输管理和安全性问题在城市化社会中日益严重,因此开发智能控制系统成为急需。
  • methods: 本研究使用联边深度强化学习(FDRL)技术来实现车辆控制的最佳化。
  • results: FDDPG算法比DDPG更有效地控制车辆,预防碰撞和提高平均速度。
    Abstract In the face of growing urban populations and the escalating number of vehicles on the roads, managing transportation efficiently and ensuring safety have become critical challenges. To tackle these issues, the development of intelligent control systems for vehicles is paramount. This paper presents a comprehensive study on vehicle control for collision avoidance, leveraging the power of Federated Deep Reinforcement Learning (FDRL) techniques. Our main goal is to minimize travel delays and enhance the average speed of vehicles while prioritizing safety and preserving data privacy. To accomplish this, we conducted a comparative analysis between the local model, Deep Deterministic Policy Gradient (DDPG), and the global model, Federated Deep Deterministic Policy Gradient (FDDPG), to determine their effectiveness in optimizing vehicle control for collision avoidance. The results obtained indicate that the FDDPG algorithm outperforms DDPG in terms of effectively controlling vehicles and preventing collisions. Significantly, the FDDPG-based algorithm demonstrates substantial reductions in travel delays and notable improvements in average speed compared to the DDPG algorithm.
    摘要 面对城市人口增长和交通量的增加,有效地管理交通和保障安全已成为急需解决的挑战。为了应对这些问题,智能控制系统的开发 для车辆已成为核心。本文通过 Federated Deep Reinforcement Learning(FDRL)技术来进行全面的研究,以最大化车辆控制的效率和安全性。我们的主要目标是减少交通延迟和提高车辆的平均速度,同时保持数据隐私。为达到这个目标,我们进行了本地模型(DDPG)和全球模型(FDDPG)的比较分析,以确定它们在避免碰撞方面的效果。结果表明,使用 FDDPG 算法可以更好地控制车辆,避免碰撞。特别是,基于 FDDPG 算法的方法在减少交通延迟和提高车辆的平均速度方面表现出了明显的改善。

Recurrent Neural Networks with more flexible memory: better predictions than rough volatility

  • paper_url: http://arxiv.org/abs/2308.08550
  • repo_url: None
  • paper_authors: Damien Challet, Vincent Ragel
  • for: 这篇论文是为了提高残差预测的能力而写的。
  • methods: 这篇论文使用了扩展的LSTM网络,其中每个输出维度都包含了多个灵活的时间尺度。
  • results: 对比vanilla LSTM和扩展LSTM,扩展LSTM需要训练两倍多的epoch数,但是验证和测试loss的变化更小。此外,使用最小验证损失的模型可以系统性地超过20%的粗略预测值。
    Abstract We extend recurrent neural networks to include several flexible timescales for each dimension of their output, which mechanically improves their abilities to account for processes with long memory or with highly disparate time scales. We compare the ability of vanilla and extended long short term memory networks (LSTMs) to predict asset price volatility, known to have a long memory. Generally, the number of epochs needed to train extended LSTMs is divided by two, while the variation of validation and test losses among models with the same hyperparameters is much smaller. We also show that the model with the smallest validation loss systemically outperforms rough volatility predictions by about 20% when trained and tested on a dataset with multiple time series.
    摘要 我们将回传神经网络扩展为每个输出维度中包含多个灵活时间尺度,这会机械地提高它们在处理长期记忆过程或高度不同时间尺度的能力。我们将vanilla和延长的长期快Memory网络(LSTM)用于预测资产波动性,知道具有长期记忆。通常,对于延长LSTM的训练需要的轮数比vanilla要少半,而模型之间的验证和测试损失的变化也变得较小。此外,我们还发现使用 smallest validation loss的模型系统地超过20%的粗糙波动预测。Note: " Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

Stability and Generalization of Hypergraph Collaborative Networks

  • paper_url: http://arxiv.org/abs/2308.02347
  • repo_url: None
  • paper_authors: Michael Ng, Hanrui Wu, Andy Yip
  • for: 本文旨在证明核心层的稳定性和泛化保证,以验证 Hypergraph Collaborative Networks(HCNN)在半监督学习任务中的效果。
  • methods: 本文使用 Hypergraph Collaborative Networks(HCNN),并提出了一种基于对称矩阵的核心层的稳定性分析方法。
  • results: 实验结果表明,HCNN在实际数据上具有较高的稳定性和泛化能力,并且可以在半监督学习任务中获得更好的性能。
    Abstract Graph neural networks have been shown to be very effective in utilizing pairwise relationships across samples. Recently, there have been several successful proposals to generalize graph neural networks to hypergraph neural networks to exploit more complex relationships. In particular, the hypergraph collaborative networks yield superior results compared to other hypergraph neural networks for various semi-supervised learning tasks. The collaborative network can provide high quality vertex embeddings and hyperedge embeddings together by formulating them as a joint optimization problem and by using their consistency in reconstructing the given hypergraph. In this paper, we aim to establish the algorithmic stability of the core layer of the collaborative network and provide generalization guarantees. The analysis sheds light on the design of hypergraph filters in collaborative networks, for instance, how the data and hypergraph filters should be scaled to achieve uniform stability of the learning process. Some experimental results on real-world datasets are presented to illustrate the theory.
    摘要 图 neural network 已经被证明可以非常有效地利用邻居关系来进行样本之间的对应。最近,有几种成功的提议来扩展图 neural network 到 hypergraph neural network,以利用更复杂的关系。特别是,在 hypergraph 协作网络中,得到的结果比其他 hypergraph neural network 更好,用于各种半监督学习任务。协作网络可以同时提供高质量的顶点嵌入和 гипер边嵌入,通过将它们定义为一个共同优化问题,并通过它们在重建给定的 hypergraph 中的一致性来实现这一点。在这篇论文中,我们想要证明协作网络核心层的算法稳定性,并提供一致性保证。分析推出了协作网络中的 гиперграhp 筛子设计方法,例如如何在数据和 гиперграф筛子上进行扫描,以实现学习过程的均匀稳定性。一些实验结果在真实世界数据上进行了描述,以证明理论。

Learning Networks from Gaussian Graphical Models and Gaussian Free Fields

  • paper_url: http://arxiv.org/abs/2308.02344
  • repo_url: None
  • paper_authors: Subhro Ghosh, Soumendu Sundar Mukherjee, Hoang-Son Tran, Ujan Gangopadhyay
  • for: 该论文旨在估计权重网络的结构从重复测量 Gaussian Graphical Model (GGM) 中获得。
  • methods: 该论文提出了一种新的估计器,基于 Gaussian Free Field (GFF) 的 Fourier分析性质,用于估计权重网络(等价地,其 Laplacian)。
  • results: 该论文提供了具体的恢复保证和样本复杂度下的界限,证明该估计器可以达到参数率的估计。在 Erdos-Renyi 随机图 $G(d,p)$ 上,当样本大小 $n$ 大于 $d^4 \log d \cdot p^{-2}$ 时,可以高probability 地recovery网络结构。
    Abstract We investigate the problem of estimating the structure of a weighted network from repeated measurements of a Gaussian Graphical Model (GGM) on the network. In this vein, we consider GGMs whose covariance structures align with the geometry of the weighted network on which they are based. Such GGMs have been of longstanding interest in statistical physics, and are referred to as the Gaussian Free Field (GFF). In recent years, they have attracted considerable interest in the machine learning and theoretical computer science. In this work, we propose a novel estimator for the weighted network (equivalently, its Laplacian) from repeated measurements of a GFF on the network, based on the Fourier analytic properties of the Gaussian distribution. In this pursuit, our approach exploits complex-valued statistics constructed from observed data, that are of interest on their own right. We demonstrate the effectiveness of our estimator with concrete recovery guarantees and bounds on the required sample complexity. In particular, we show that the proposed statistic achieves the parametric rate of estimation for fixed network size. In the setting of networks growing with sample size, our results show that for Erdos-Renyi random graphs $G(d,p)$ above the connectivity threshold, we demonstrate that network recovery takes place with high probability as soon as the sample size $n$ satisfies $n \gg d^4 \log d \cdot p^{-2}$.
    摘要 我们研究了从重复观测 Gaussian Graphical Model (GGM) 中Estimating the structure of a weighted network的问题。在这种情况下,我们考虑 GGM 的均值结构和weighted network的geometry相互关联。这些 GGM 在统计物理中已经受到了长期的关注,被称为 Gaussian Free Field (GFF)。在最近几年,它们在机器学习和理论计算机科学中也获得了很多关注。在这个工作中,我们提出了一个新的Weighted network的Estimator,基于 GFF 的对称性和当中的观测数据的傅立做�� statistiche的对���能�能。我们的方法利用了观测数据中的复数统计,具有对���能�能的实际价值。我们显示了我们的检测器具有固定网络大小的 parametric 速率,并且在网络规模 growing 的情况下,我们显示了当 Erdos-Renyi 随机网络 $G(d,p)$ 的connectivity阈值以上时,网络重建很可能会在高概率下发生。具体来说,我们显示了 $n \gg d^4 \log d \cdot p^{-2}$ 的 sample size 下,网络重建很可能会在高概率下发生。

RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph Classification

  • paper_url: http://arxiv.org/abs/2308.02335
  • repo_url: None
  • paper_authors: Zhengyang Mao, Wei Ju, Yifang Qin, Xiao Luo, Ming Zhang
  • for: 提高图像分类 tasks 中的抗衰减性能
  • methods: 提出了一种名为 Retrieval Augmented Hybrid Network (RAHNet) 的新框架,该框架可以同时学习一个 Robust 的特征提取器和一个不偏向的分类器,并在Feature extractor 阶段开发了一个图像检索模块,以增强tail classes 的内部多样性。
  • results: 对多个流行的 Benchmark 进行了实验,并证明了提出的方法与现有方法相比具有显著的优势。
    Abstract Graph classification is a crucial task in many real-world multimedia applications, where graphs can represent various multimedia data types such as images, videos, and social networks. Previous efforts have applied graph neural networks (GNNs) in balanced situations where the class distribution is balanced. However, real-world data typically exhibit long-tailed class distributions, resulting in a bias towards the head classes when using GNNs and limited generalization ability over the tail classes. Recent approaches mainly focus on re-balancing different classes during model training, which fails to explicitly introduce new knowledge and sacrifices the performance of the head classes. To address these drawbacks, we propose a novel framework called Retrieval Augmented Hybrid Network (RAHNet) to jointly learn a robust feature extractor and an unbiased classifier in a decoupled manner. In the feature extractor training stage, we develop a graph retrieval module to search for relevant graphs that directly enrich the intra-class diversity for the tail classes. Moreover, we innovatively optimize a category-centered supervised contrastive loss to obtain discriminative representations, which is more suitable for long-tailed scenarios. In the classifier fine-tuning stage, we balance the classifier weights with two weight regularization techniques, i.e., Max-norm and weight decay. Experiments on various popular benchmarks verify the superiority of the proposed method against state-of-the-art approaches.
    摘要 图像分类是现实世界多媒体应用中的一项重要任务,图像可以表示各种多媒体数据类型,如图像、视频和社交网络。先前的努力主要采用图像神经网络(GNN)在平衡的情况下进行分类,但实际数据通常会出现长尾类分布,导致使用GNN时偏向头类,并且对尾类的泛化能力有限。现有的方法主要是在模型训练过程中重新平衡不同类别,但这会失去新知识的引入和头类性能的牺牲。为解决这些缺陷,我们提出了一种新的框架,即Retrieval Augmented Hybrid Network(RAHNet),用于同时学习一个可靠的特征提取器和一个不偏向的分类器。在特征提取器训练阶段,我们开发了一个图像检索模块,用于搜索与尾类相关的图像,以直接增强尾类之间的内部多样性。此外,我们还创新地优化了一种类型中心的自适应对比损失函数,以获得适应长尾enario的特征表示,这更适合长尾分布的情况。在分类器精度调整阶段,我们使用两种质量规则技术,即最大 нор化和权重衰减,来均衡分类器的权重。实验表明,我们的方法在各种流行的标准准则上表现出色,胜过当前的状态艺术方法。

Interoperable synthetic health data with SyntHIR to enable the development of CDSS tools

  • paper_url: http://arxiv.org/abs/2308.02613
  • repo_url: https://github.com/potter-coder89/synthir
  • paper_authors: Pavitra Chauhan, Mohsen Gamal Saad Askar, Bjørn Fjukstad, Lars Ailo Bongo, Edvard Pedersen
  • for: 该论文目的是提出一种基于Synthetic Health Information Resources(SyntHIR)架构的临床决策支持系统(CDSS)开发方法,以便在临床工作流程中实现CDSS工具的开发和测试。
  • methods: 该论文使用了Fast Healthcare Interoperability Resources(FHIR)标准实现数据互操作性,使用Gretel框架生成假数据,Microsoft Azure FHIR服务器作为基于FHIR的电子健康记录(EHR)系统,以及SMART on FHIR框架实现工具传输性。
  • results: 作者使用挪威病人登记(NPR)和挪威药物额度(NorPD)的数据开发了一种基于机器学习的CDSS工具,并在SyntHIR系统上测试了该工具。然后,他们将该工具提升到Open DIPS环境中进行测试。结论,SyntHIR提供了一个通用的CDSS工具开发架构,使用假FHIR数据进行测试,并提供了一个可用的测试环境。然而,在生成假数据方面,还有一定的可改进空间。代码可以在https://github.com/potter-coder89/SyntHIR.git中获取。
    Abstract There is a great opportunity to use high-quality patient journals and health registers to develop machine learning-based Clinical Decision Support Systems (CDSS). To implement a CDSS tool in a clinical workflow, there is a need to integrate, validate and test this tool on the Electronic Health Record (EHR) systems used to store and manage patient data. However, it is often not possible to get the necessary access to an EHR system due to legal compliance. We propose an architecture for generating and using synthetic EHR data for CDSS tool development. The architecture is implemented in a system called SyntHIR. The SyntHIR system uses the Fast Healthcare Interoperability Resources (FHIR) standards for data interoperability, the Gretel framework for generating synthetic data, the Microsoft Azure FHIR server as the FHIR-based EHR system and SMART on FHIR framework for tool transportability. We demonstrate the usefulness of SyntHIR by developing a machine learning-based CDSS tool using data from the Norwegian Patient Register (NPR) and Norwegian Patient Prescriptions (NorPD). We demonstrate the development of the tool on the SyntHIR system and then lift it to the Open DIPS environment. In conclusion, SyntHIR provides a generic architecture for CDSS tool development using synthetic FHIR data and a testing environment before implementing it in a clinical setting. However, there is scope for improvement in terms of the quality of the synthetic data generated. The code is open source and available at https://github.com/potter-coder89/SyntHIR.git.
    摘要 “有一大机会使用高品质的病人日记和健康登记来开发机器学习型临床决策支持系统(CDSS)。实现CDSS工具在临床工作流程中的实现,需要与电子健康纪录(EHR)系统集成、验证和测试这个工具。然而,由于法律合规,常常无法获得EHR系统的必要存取权。我们提出了一个架构,用于生成和使用合成EHR数据来开发CDSS工具。这个架构是在SyntHIR系统中实现的。SyntHIR系统使用了快速医疗通信资源(FHIR)标准来实现数据互操作,使用Gretel框架生成合成数据,使用Microsoft Azure FHIR服务器作为基于FHIR的EHR系统,并使用SMART on FHIR框架来实现工具可移性。我们透过使用挪威病人登记(NPR)和挪威病人处方(NorPD)的数据,展示了这个工具的开发和运用。我们首先在SyntHIR系统上开发了这个工具,然后将其升级到Open DIPS环境。在结论中,SyntHIR提供了一个通用的架构,用于CDSS工具的开发使用合成FHIR数据,以及一个可用的测试环境,以便在临床设置中实现。然而,这个架构中的合成数据质量仍然有改进的空间。代码可以在https://github.com/potter-coder89/SyntHIR.git中取得。”

Deep learning for spike detection in deep brain stimulation surgery

  • paper_url: http://arxiv.org/abs/2308.05755
  • repo_url: None
  • paper_authors: Arkadiusz Nowacki, Ewelina Kołpa, Mateusz Szychiewicz, Konrad Ciecierski
  • for: 这个论文是为了描述一种基于深度学习的神经活动记录分析方法,用于诊断parkinson病。
  • methods: 该方法使用了一种卷积神经网络(CNN)来分析记录的神经活动。
  • results: 实验结果表明,该方法可以达到98.98%的最高准确率和0.9898的接受收操作特征曲线值。
    Abstract Deep brain stimulation (DBS) is a neurosurgical procedure successfully used to treat conditions such as Parkinson's disease. Electrostimulation, carried out by implanting electrodes into an identified focus in the brain, makes it possible to reduce the symptoms of the disease significantly. In this paper, a method for analyzing recordings of neuronal activity acquired during DBS neurosurgery using deep learning is presented. We tested using a convolutional neural network (CNN) for this purpose. Based on the time window, the classifier assesses whether neuronal activity (spike) is present. The maximum accuracy value for the classifier was 98.98%, and the area under the receiver operating characteristic curve (AUC) was 0.9898. The method made it possible to obtain a classification without using data preprocessing.
    摘要 深度脑刺激(DBS)是一种成功地用于治疗parkinson病的 neurosurgical procedure。通过在脑中implanting electrodes,可以使得脑动力学性的症状减轻。在这篇论文中,一种使用深度学习分析DBS neurosurgery记录的方法被提出。我们使用了卷积神经网络(CNN)来实现这一目的。根据时间窗口,分类器评估神经活动(脉冲)是否存在。最大的准确率值为98.98%,受测频谱特征曲线(AUC)的值为0.9898。这种方法可以不使用数据预处理来获得分类。

A stochastic optimization approach to train non-linear neural networks with a higher-order variation regularization

  • paper_url: http://arxiv.org/abs/2308.02293
  • repo_url: https://github.com/oknakfm/hovr
  • paper_authors: Akifumi Okuno
  • for: 本研究旨在Addressing the issue of overfitting in highly expressive parametric models, such as deep neural networks, by introducing a $(k,q)$th order variation regularization ($(k,q)$-VR).
  • methods: 本研究提出了一种 Stochastic optimization algorithm,可以高效地训练普通模型和深度神经网络,并且不需要显式的数学 интеграル。
  • results: 数字实验表明,使用 $(k,q)$-VR regularization可以使神经网络更“鲜硬”,比传统参数规范化更有效。此外,该方法还可以应用于物理学信息训练神经网络(PINNs)。
    Abstract While highly expressive parametric models including deep neural networks have an advantage to model complicated concepts, training such highly non-linear models is known to yield a high risk of notorious overfitting. To address this issue, this study considers a $(k,q)$th order variation regularization ($(k,q)$-VR), which is defined as the $q$th-powered integral of the absolute $k$th order derivative of the parametric models to be trained; penalizing the $(k,q)$-VR is expected to yield a smoother function, which is expected to avoid overfitting. Particularly, $(k,q)$-VR encompasses the conventional (general-order) total variation with $q=1$. While the $(k,q)$-VR terms applied to general parametric models are computationally intractable due to the integration, this study provides a stochastic optimization algorithm, that can efficiently train general models with the $(k,q)$-VR without conducting explicit numerical integration. The proposed approach can be applied to the training of even deep neural networks whose structure is arbitrary, as it can be implemented by only a simple stochastic gradient descent algorithm and automatic differentiation. Our numerical experiments demonstrate that the neural networks trained with the $(k,q)$-VR terms are more ``resilient'' than those with the conventional parameter regularization. The proposed algorithm also can be extended to the physics-informed training of neural networks (PINNs).
    摘要 “高度表达性的 parametric 模型,如深度神经网络,具有模型复杂概念的优势。然而,训练这些非线性模型可能会导致恶性适应。为解决这个问题,本研究考虑了 $(k,q)$ 次变化正则化($(k,q)$-VR),它是指要训练的 parametric 模型的 $q$ 次幂的绝对 $k$ 次导数的积分;减少 $(k,q)$-VR 会导致更平滑的函数,以避免适应。特别是,$(k,q)$-VR 包括普通(一般顺序)总变量,即 $q=1$。然而,$(k,q)$-VR 应用于普通 parametric 模型是计算易于实现的,这里的研究提供了一种随机优化算法,可以高效地训练普通模型并且不需要显式的数值积分。这种方法可以应用于深度神经网络的训练,其结构可以是任意的,只需使用简单的随机梯度下降算法和自动微分。我们的数值实验表明,使用 $(k,q)$-VR 项训练的神经网络比使用传统参数正则化更为“抗耗”。此外,该算法还可以扩展到物理学 informed 神经网络(PINNs)的训练。”

Frustratingly Easy Model Generalization by Dummy Risk Minimization

  • paper_url: http://arxiv.org/abs/2308.02287
  • repo_url: None
  • paper_authors: Juncheng Wang, Jindong Wang, Xixu Hu, Shujun Wang, Xing Xie
  • for: 提高机器学习模型的通用能力
  • methods: 使用拟合风险最小化(DuRM)技术,即通过扩大输出LOGITS维度,然后使用标准的梯度下降优化
  • results: 在多种任务上,包括普通分类、语义分割、 OUT-OF-distribution泛化、对抗训练和长尾识别等,DuRM能够一般性地提高模型的表现,并且可以与现有的通用技术相结合
    Abstract Empirical risk minimization (ERM) is a fundamental machine learning paradigm. However, its generalization ability is limited in various tasks. In this paper, we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general technique to improve the generalization of ERM. DuRM is extremely simple to implement: just enlarging the dimension of the output logits and then optimizing using standard gradient descent. Moreover, we validate the efficacy of DuRM on both theoretical and empirical analysis. Theoretically, we show that DuRM derives greater variance of the gradient, which facilitates model generalization by observing better flat local minima. Empirically, we conduct evaluations of DuRM across different datasets, modalities, and network architectures on diverse tasks, including conventional classification, semantic segmentation, out-of-distribution generalization, adverserial training, and long-tailed recognition. Results demonstrate that DuRM could consistently improve the performance under all tasks with an almost free lunch manner. Furthermore, we show that DuRM is compatible with existing generalization techniques and we discuss possible limitations. We hope that DuRM could trigger new interest in the fundamental research on risk minimization.
    摘要 empirical risk minimization (ERM) 是机器学习的基本思想之一,但它在各种任务中的泛化能力有限。在这篇论文中,我们提出了干扰风险最小化(DuRM),一种简单易行但具有普遍性的技术,以提高ERM的泛化能力。DuRM的实现非常简单:首先扩大输出логи特的维度,然后使用标准的梯度下降优化。我们在理论和实验两个方面 validate DuRM的有效性。从理论上看,我们表明DuRM可以提高模型的泛化能力,通过在更好的平坦的地方找到更好的地方的梯度。从实验来看,我们在不同的数据集、模式和网络架构上进行了多种任务的评估,包括传统的分类、 semantic segmentation、out-of-distribution泛化、对抗训练和长尾识别。结果表明,DuRM可以在所有任务上提高性能,几乎没有代价。此外,我们还证明了DuRM与现有的泛化技术相容,并讨论了可能的局限性。我们希望DuRM可以引起新的研究兴趣,关于风险最小化的基础研究。

DIVERSIFY: A General Framework for Time Series Out-of-distribution Detection and Generalization

  • paper_url: http://arxiv.org/abs/2308.02282
  • repo_url: None
  • paper_authors: Wang Lu, Jindong Wang, Xinwei Sun, Yiqiang Chen, Xiangyang Ji, Qiang Yang, Xing Xie
  • for: 这篇论文旨在解决时间序列资料的机器学习挑战,尤其是当时间序列资料的分布变化时,存在许多挑战。
  • methods: 本论文提出了一个称为DIVERSIFY的框架,用于检测时间序列资料的异常挑战和扩展。DIVERSIFY使用了一个迭代的过程,首先通过对网络的挑战性训练获得最差的对应分布enario,然后将这些分布enario与基本分布的差异仪化。
  • results: 实验结果显示,DIVERSIFY可以对时间序列资料的扩展和异常检测进行更好的检测和分类,并且与其他基准相比有着优秀的性能。
    Abstract Time series remains one of the most challenging modalities in machine learning research. The out-of-distribution (OOD) detection and generalization on time series tend to suffer due to its non-stationary property, i.e., the distribution changes over time. The dynamic distributions inside time series pose great challenges to existing algorithms to identify invariant distributions since they mainly focus on the scenario where the domain information is given as prior knowledge. In this paper, we attempt to exploit subdomains within a whole dataset to counteract issues induced by non-stationary for generalized representation learning. We propose DIVERSIFY, a general framework, for OOD detection and generalization on dynamic distributions of time series. DIVERSIFY takes an iterative process: it first obtains the "worst-case" latent distribution scenario via adversarial training, then reduces the gap between these latent distributions. We implement DIVERSIFY via combining existing OOD detection methods according to either extracted features or outputs of models for detection while we also directly utilize outputs for classification. In addition, theoretical insights illustrate that DIVERSIFY is theoretically supported. Extensive experiments are conducted on seven datasets with different OOD settings across gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition. Qualitative and quantitative results demonstrate that DIVERSIFY learns more generalized features and significantly outperforms other baselines.
    摘要 时序序列仍然是机器学习研究中最为困难的模式之一。它的非站立性性会导致模型在不同时间点上的分布变化,从而使得泛化检测和泛化学习受到挑战。我们提出了一种解决方案,即DIVERSIFY,用于对动态分布的时序序列进行泛化检测和泛化学习。DIVERSIFY采用迭代过程,首先通过对恶性情况下的射频分布进行反对抗训练,然后减少这些射频分布之间的差距。我们通过将现有的OOD检测方法与特定的特征或模型输出结合使用来实现DIVERSIFY。此外,我们还提供了理论启示,表明DIVERSIFY的理论基础。我们在七个不同的数据集上进行了广泛的实验,结果表明DIVERSIFY可以更好地学习通用的特征,并与其他基eline大大超越。

Adaptive Proximal Gradient Method for Convex Optimization

  • paper_url: http://arxiv.org/abs/2308.02261
  • repo_url: None
  • paper_authors: Yura Malitsky, Konstantin Mishchenko
  • for: 本研究探讨了两种基本的第一阶段算法在凸优化中,即梯度下降(GD)和距离梯度方法(ProxGD)。我们的研究重点在于使这些算法完全适应性的,利用当地的凸函数的曲率信息。
  • methods: 我们提出了基于观察到的梯度差的自适应GD和ProxGD版本,无需额外的计算成本。此外,我们证明了我们的方法在只假设当地的梯度 Lipschitz 性的情况下 converges。
  • results: 我们的方法可以使用更大的步长than those initially suggested in [MM20]。
    Abstract In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].
    摘要 在本文中,我们研究了两种基本的首频算法在凸优化中, namely,梯度下降(GD)和贝叶克梯度方法(ProxGD)。我们的关注点是使这两种算法完全适应性的,利用滑坡函数的本地勋氏度信息。我们提出了基于观察到的梯度差的自适应GD和ProxGD版本,无需额外计算成本。此外,我们证明了我们的方法在假设只有梯度的本地利弗希茨性下都是收敛的。此外,我们的方法还允许更大的步长 чем initially suggested in [MM20].

Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song

  • paper_url: http://arxiv.org/abs/2308.02249
  • repo_url: https://github.com/danbinaerinhan/finding-tori
  • paper_authors: Danbinaerin Han, Rafael Caro Repetto, Dasaem Jeong
  • for: 这个论文是用计算方法对韩国传统民歌场记录集(约700小时)进行分析的。
  • methods: 作者使用自动学习的卷积神经网络,基于折衣折射,来解决 dataset 中歌曲由非专业音乐家演唱,没有伴奏的问题。
  • results: 实验结果表明,作者的方法可以更好地捕捉韩国传统民歌中的折衣特征,比传统的折衣历史图表更加准确。
    Abstract In this paper, we introduce a computational analysis of the field recording dataset of approximately 700 hours of Korean folk songs, which were recorded around 1980-90s. Because most of the songs were sung by non-expert musicians without accompaniment, the dataset provides several challenges. To address this challenge, we utilized self-supervised learning with convolutional neural network based on pitch contour, then analyzed how the musical concept of tori, a classification system defined by a specific scale, ornamental notes, and an idiomatic melodic contour, is captured by the model. The experimental result shows that our approach can better capture the characteristics of tori compared to traditional pitch histograms. Using our approaches, we have examined how musical discussions proposed in existing academia manifest in the actual field recordings of Korean folk songs.
    摘要 在这篇论文中,我们介绍了一种计算机分析方法,用于分析约700小时的朝鲜传统民歌场记录数据集,这些歌曲主要由非专业音乐家演唱,没有伴奏。由于这些歌曲具有许多挑战,我们采用了自我超vised学习方法,使用基于折衣的卷积神经网络进行分析。我们发现,我们的方法可以更好地捕捉到韵律折衣的特点,比传统的折衣历史gram相更加精准。通过我们的方法,我们对现有学术研究中的音乐讨论进行了实际场记录的检验。

Deep neural networks from the perspective of ergodic theory

  • paper_url: http://arxiv.org/abs/2308.03888
  • repo_url: None
  • paper_authors: Fan Zhang
  • for: 这篇论文是用来解释深度神经网络设计的一种方法,即通过ergodic theory来理解网络的时间演化。
  • methods: 这篇论文使用了ergodic theory来描述网络的时间演化,每层对应一个时间实例。
  • results: 这篇论文表明,通过这种方法,一些可能看起来神秘的规则可以被归结为启发。
    Abstract The design of deep neural networks remains somewhat of an art rather than precise science. By tentatively adopting ergodic theory considerations on top of viewing the network as the time evolution of a dynamical system, with each layer corresponding to a temporal instance, we show that some rules of thumb, which might otherwise appear mysterious, can be attributed heuristics.
    摘要 深度神经网络的设计仍然归于一种艺术化的领域,而不是精确的科学。我们通过将神经网络视为时间演化的动力系统,每层对应于一个时间实例,采用ergodic理论考虑,可以解释一些可能看起来神秘的规则,归为习惯。

Self-Normalizing Neural Network, Enabling One Shot Transfer Learning for Modeling EDFA Wavelength Dependent Gain

  • paper_url: http://arxiv.org/abs/2308.02233
  • repo_url: None
  • paper_authors: Agastya Raj, Zehao Wang, Frank Slyne, Tingjun Chen, Dan Kilper, Marco Ruffini
  • for: 这 paper 是为了模型多个 EDFA 的波长依赖性增强的 ML 框架。
  • methods: 这 paper 使用 semi-supervised, self-normalizing neural networks,允许一次转移学习。
  • results: 实验结果显示,这种方法可以在不同的增强器类型上达到高精度的转移学习。
    Abstract We present a novel ML framework for modeling the wavelength-dependent gain of multiple EDFAs, based on semi-supervised, self-normalizing neural networks, enabling one-shot transfer learning. Our experiments on 22 EDFAs in Open Ireland and COSMOS testbeds show high-accuracy transfer-learning even when operated across different amplifier types.
    摘要 我们提出了一种新的机器学习框架,用于模型多个电子激发器的波长依赖性增强。我们使用半监督、自适应神经网络,实现一次转移学习。我们的实验表明,even when operated across different amplifier types, our framework can achieve high-accuracy transfer learning on 22 EDFAs in Open Ireland and COSMOS testbeds.Here's a word-for-word translation of the text:我们提出了一种新的机器学习框架,用于模型多个电子激发器的波长依赖性增强。我们使用半监督、自适应神经网络,实现一次转移学习。我们的实验表明,even when operated across different amplifier types, our framework can achieve high-accuracy transfer learning on 22 EDFAs in Open Ireland and COSMOS testbeds.

Likelihood-ratio-based confidence intervals for neural networks

  • paper_url: http://arxiv.org/abs/2308.02221
  • repo_url: https://github.com/laurenssluyterman/likelihood_ratio_intervals
  • paper_authors: Laurens Sluijterman, Eric Cator, Tom Heskes
  • for: 这paper是为了构建神经网络中的信任度范围而实现的首次实现。
  • methods: 该方法使用了可能性比率来建立神经网络中的信任度范围,具有数量有限的数据区域扩展的能力,以及自动包含了训练时间、网络架构和正则化技术等因素。
  • results: 该paper表明了可能性比率基于的 uncertainty estimate在医学预测和天文物理等领域可能已经成本较高,但是可能有优势于其他方法。这个研究提出了可能性比率基于的 uncertainty estimate的可能性和未来研究的潜在途径。
    Abstract This paper introduces a first implementation of a novel likelihood-ratio-based approach for constructing confidence intervals for neural networks. Our method, called DeepLR, offers several qualitative advantages: most notably, the ability to construct asymmetric intervals that expand in regions with a limited amount of data, and the inherent incorporation of factors such as the amount of training time, network architecture, and regularization techniques. While acknowledging that the current implementation of the method is prohibitively expensive for many deep-learning applications, the high cost may already be justified in specific fields like medical predictions or astrophysics, where a reliable uncertainty estimate for a single prediction is essential. This work highlights the significant potential of a likelihood-ratio-based uncertainty estimate and establishes a promising avenue for future research.
    摘要 Translation notes:* "asymmetric intervals" is translated as "非对称 интерVAL" (fēi duìxìng interVAL), which emphasizes the unequal nature of the intervals.* "incorporate" is translated as "包含" (bāofàn), which means "to contain" or "to include".* "training time" is translated as "训练时间" (xiùxíng shíjiān), which emphasizes the time spent on training the network.* "network architecture" is translated as "网络架构" (wǎngluò jiàgòu), which emphasizes the design of the network.* "regularization techniques" is translated as "规范技术" (guīfáng jìshù), which emphasizes the use of techniques to regularize the network.* "reliable uncertainty estimate" is translated as "可靠的不确定度估计" (kějì de bùxìngdòng dàigè), which emphasizes the accuracy and reliability of the uncertainty estimate.

Knowledge-Driven Multi-Agent Reinforcement Learning for Computation Offloading in Cybertwin-Enabled Internet of Vehicles

  • paper_url: http://arxiv.org/abs/2308.02603
  • repo_url: None
  • paper_authors: Ruijin Sun, Xiao Yang, Nan Cheng, Xiucheng Wang, Changle Li
  • for: 减少 Civert vehicles 的任务负载延迟
  • methods: 利用知识驱动多智能体学习(KMARL)方法,选择最佳的下载选项,使用图解学网络,利用域知识来做 permutation 变换和图strucutre 通信 topology 的嵌入
  • results: 比较其他方法,KMARL 方法可以获得更高的奖励和更好的扩展性,受益于域知识的整合
    Abstract By offloading computation-intensive tasks of vehicles to roadside units (RSUs), mobile edge computing (MEC) in the Internet of Vehicles (IoV) can relieve the onboard computation burden. However, existing model-based task offloading methods suffer from heavy computational complexity with the increase of vehicles and data-driven methods lack interpretability. To address these challenges, in this paper, we propose a knowledge-driven multi-agent reinforcement learning (KMARL) approach to reduce the latency of task offloading in cybertwin-enabled IoV. Specifically, in the considered scenario, the cybertwin serves as a communication agent for each vehicle to exchange information and make offloading decisions in the virtual space. To reduce the latency of task offloading, a KMARL approach is proposed to select the optimal offloading option for each vehicle, where graph neural networks are employed by leveraging domain knowledge concerning graph-structure communication topology and permutation invariance into neural networks. Numerical results show that our proposed KMARL yields higher rewards and demonstrates improved scalability compared with other methods, benefitting from the integration of domain knowledge.
    摘要 通过将计算任务转移到路边单元(RSU),移动边缘计算(MEC)在互联网联盟(IoV)中可以减轻车辆上的计算负担。然而,现有的模型基于的任务转移方法受到增加车辆和数据驱动方法的计算复杂性的限制,而数据驱动方法缺乏解释性。为解决这些挑战,在这篇论文中,我们提出了基于知识驱动多代理征分学习(KMARL)的方法,以降低cybertwin-enabled IoV中任务转移延迟。具体来说,在考虑的场景中,cybertwin acts as a communication agent for each vehicle to exchange information and make offloading decisions in the virtual space。为降低任务转移延迟,我们提出了一种基于KMARL的选择最佳转移选项的方法,其中使用了图神经网络,并利用了图结构通信网络和 permutation invariance 的知识来适应各种场景。数据结果表明,我们的提出的KMARL可以获得更高的奖励,并且在扩展性方面表现出色,受到知识 интеграation的利用帮助。

A Survey of Spanish Clinical Language Models

  • paper_url: http://arxiv.org/abs/2308.02199
  • repo_url: None
  • paper_authors: Guillem García Subies, Álvaro Barbero Jiménez, Paloma Martínez Fernández
  • for: 这篇论文主要针对的是使用语言模型解决西班牙语医疗领域中的任务。
  • methods: 论文回顾了17个语料库,主要关注医疗任务,并列出了最 relevantespanish Language Models和西班牙医疗语言模型。
  • results: 研究对这些模型进行了严格的比较,通过对一个精心选择的subset of available corpora进行了测试,以确定最佳performing models。
    Abstract This survey focuses in encoder Language Models for solving tasks in the clinical domain in the Spanish language. We review the contributions of 17 corpora focused mainly in clinical tasks, then list the most relevant Spanish Language Models and Spanish Clinical Language models. We perform a thorough comparison of these models by benchmarking them over a curated subset of the available corpora, in order to find the best-performing ones; in total more than 3000 models were fine-tuned for this study. All the tested corpora and the best models are made publically available in an accessible way, so that the results can be reproduced by independent teams or challenged in the future when new Spanish Clinical Language models are created.
    摘要 Translation notes:* "clinical domain" 是 translated as "医疗领域" (yī jī jīng yì)* "Spanish language" is translated as "西班牙语" (xī bān shā yǔ)* "corpora" is translated as "语料" (yǔ liào)* "fine-tuned" is translated as "微调" (wēi tiān)* "publicly available" is translated as "公开可用" (gōng kāi kě yòng)

AutoML4ETC: Automated Neural Architecture Search for Real-World Encrypted Traffic Classification

  • paper_url: http://arxiv.org/abs/2308.02182
  • repo_url: https://github.com/orangeuw/automl4etc
  • paper_authors: Navid Malekghaini, Elham Akbari, Mohammad A. Salahuddin, Noura Limam, Raouf Boutaba, Bertrand Mathieu, Stephanie Moteau, Stephane Tuffin
  • for: 这个论文旨在提出一种自动化神经网络设计方法,以提高加密网络流量分类器的性能。
  • methods: 该论文使用了自动化机器学习(AutoML)技术,定义了一个特定适用于加密网络流量分类的搜索空间,并使用不同的搜索策略来生成高性能的神经网络 architecture。
  • results: 论文的实验结果表明,使用AutoML4ETC可以生成高性能的加密网络流量分类器,并且这些模型比现有的状态泰半的分类器更加准确。此外,AutoML4ETC生成的模型也更加简单,具有较少的参数数量。
    Abstract Deep learning (DL) has been successfully applied to encrypted network traffic classification in experimental settings. However, in production use, it has been shown that a DL classifier's performance inevitably decays over time. Re-training the model on newer datasets has been shown to only partially improve its performance. Manually re-tuning the model architecture to meet the performance expectations on newer datasets is time-consuming and requires domain expertise. We propose AutoML4ETC, a novel tool to automatically design efficient and high-performing neural architectures for encrypted traffic classification. We define a novel, powerful search space tailored specifically for the near real-time classification of encrypted traffic using packet header bytes. We show that with different search strategies over our search space, AutoML4ETC generates neural architectures that outperform the state-of-the-art encrypted traffic classifiers on several datasets, including public benchmark datasets and real-world TLS and QUIC traffic collected from the Orange mobile network. In addition to being more accurate, AutoML4ETC's architectures are significantly more efficient and lighter in terms of the number of parameters. Finally, we make AutoML4ETC publicly available for future research.
    摘要 深度学习(DL)已成功应用于加密网络流量分类的实验设置中。然而,在生产环境中,DL分类器的性能一定程度下降。重新训练模型使用 newer datasets 只能部分提高其性能。手动重新调整模型结构以满足 newer datasets 的性能要求是时间consuming 并需要域专业知识。我们提出 AutoML4ETC,一种新的工具,可以自动设计高效和高性能的神经网络架构来分类加密流量。我们定义了一个特定于加密流量的近实时分类的强大搜索空间。我们表明,通过不同的搜索策略,AutoML4ETC 可以生成高性能的加密流量分类器,超过当前加密流量分类器的状态。此外,AutoML4ETC 的架构不仅更加准确,还更加轻量级,具体来说是参数的数量更少。最后,我们将 AutoML4ETC 公开发布,以便未来的研究。

Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

  • paper_url: http://arxiv.org/abs/2308.02180
  • repo_url: None
  • paper_authors: Cliff Wong, Sheng Zhang, Yu Gu, Christine Moung, Jacob Abel, Naoto Usuyama, Roshanthi Weerasinghe, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon
  • for: 这篇论文的目的是研究如何使用大语言模型(LLM)扩大临床试验匹配,以提高健康交付和发现的效率。
  • methods: 该论文使用了一个现代的临床试验匹配系统,并采用了大语言模型(GPT-4)进行研究。这些模型可以自动拟合复杂的临床试验资格和匹配逻辑(例如,嵌套的AND/OR/NOT)。
  • results: 研究显示,使用LLM可以大幅提高临床试验匹配的效率,并且可以substantially outperform先前的强基eline。然而,LLM仍然需要人工干预,以确保匹配的准确性。此外,研究还发现了一些应用LLM到终端临床试验匹配中的成长点,例如,限定上下文和准确率,特别是从长期医疗记录中提取patient信息。
    Abstract Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deployment at a large U.S. health network. Initial findings are promising: out of box, cutting-edge LLMs, such as GPT-4, can already structure elaborate eligibility criteria of clinical trials and extract complex matching logic (e.g., nested AND/OR/NOT). While still far from perfect, LLMs substantially outperform prior strong baselines and may serve as a preliminary solution to help triage patient-trial candidates with humans in the loop. Our study also reveals a few significant growth areas for applying LLMs to end-to-end clinical trial matching, such as context limitation and accuracy, especially in structuring patient information from longitudinal medical records.
    摘要 临床试验匹配是医疗提供和发现的关键过程。在实践中,它受到压力于极多的无结构数据和不可扩展的手动处理。在这篇论文中,我们进行了系统性的研究,使用大型自然语言模型(LLM)扩大临床试验匹配。我们的研究基于一个大型美国医疗网络中的临床试验匹配系统,正在测试阶段。初步发现表示,直接使用最新的GPT-4等 cutting-edge LLM可以结构化复杂的参与条件和抽象出临床试验匹配逻辑(例如,嵌套的AND/OR/NOT)。虽然仍有一定的改进空间,但LLM已经明显超过了之前的强基线,并可能作为人工干预的准则来帮助批处 patient-trial 候选者。我们的研究还揭示了应用LLM到终端临床试验匹配中的一些重要成长点,如背景限制和准确率,特别是从悠久医疗记录中结构化病人信息。

High-Accuracy Prediction of Metal-Insulator-Metal Metasurface with Deep Learning

  • paper_url: http://arxiv.org/abs/2308.04450
  • repo_url: None
  • paper_authors: Kaizhu Liu, Hsiang-Chen Chui, Changsen Sun, Xue Han
  • for: 预测电磁软件计算结果
  • methods: 使用ResNets-10模型进行预测плазмон射频表面S11参数
  • results: 预测Error值为-48.45、-46.47和-35.54对应的三种金属-半导体-金属结构,表明提议的网络可以取代传统电磁计算方法,并且训练过程只需要1,100个epoch。
    Abstract Deep learning prediction of electromagnetic software calculation results has been a widely discussed issue in recent years. But the prediction accuracy was still one of the challenges to be solved. In this work, we proposed that the ResNets-10 model was used for predicting plasmonic metasurface S11 parameters. The two-stage training was performed by the k-fold cross-validation and small learning rate. After the training was completed, the prediction loss for aluminum, gold, and silver metal-insulator-metal metasurfaces was -48.45, -46.47, and -35.54, respectively. Due to the ultralow error value, the proposed network can replace the traditional electromagnetic computing method for calculation within a certain structural range. Besides, this network can finish the training process less than 1,100 epochs. This means that the network training process can effectively lower the design process time. The ResNets-10 model we proposed can also be used to design meta-diffractive devices and biosensors, thereby reducing the time required for the calculation process. The ultralow error of the network indicates that this work contributes to the development of future artificial intelligence electromagnetic computing software.
    摘要 Recently, the issue of deep learning prediction of electromagnetic software calculation results has been widely discussed. However, the prediction accuracy was still a challenge to be solved. In this work, we proposed using the ResNets-10 model to predict the S11 parameters of plasmonic metasurfaces. We performed two-stage training with k-fold cross-validation and a small learning rate. After training was completed, the prediction loss for aluminum, gold, and silver metal-insulator-metal metasurfaces was -48.45, -46.47, and -35.54, respectively. Due to the ultralow error value, the proposed network can replace traditional electromagnetic computing methods for calculation within a certain structural range. Additionally, this network can finish the training process in less than 1,100 epochs, effectively lowering the design process time. The ResNets-10 model we proposed can also be used to design meta-diffractive devices and biosensors, thereby reducing the time required for the calculation process. The ultralow error of the network indicates that this work contributes to the development of future artificial intelligence electromagnetic computing software.Here's the text in Traditional Chinese for comparison:近年来,深度学习预测电磁软件计算结果的问题在各个领域中得到了广泛的讨论。然而,预测精度仍然是一个挑战。在这个工作中,我们提出了使用ResNets-10模型来预测射频金属表面S11参数。我们进行了两阶段训练,使用k-fold跨项验证和小学习率。训练完成后,对于铝、金、银 метал-隔离-锂的预测损失分别为-48.45、-46.47和-35.54。由于预测损失值几乎到零,我们的网络可以取代传统电磁计算方法,对某些结构范围内的计算进行预测。此外,这个网络可以在1,100次迭代完成训练过程,很快地完成设计过程。我们提出的ResNets-10模型可以用来设计meta-diffractive设备和生物感应器,因此可以缩短计算过程的时间。这个工作对未来人工智能电磁计算软件的发展做出了贡献。

Diffusion probabilistic models enhance variational autoencoder for crystal structure generative modeling

  • paper_url: http://arxiv.org/abs/2308.02165
  • repo_url: None
  • paper_authors: Teerachote Pakornchote, Natthaphon Choomphon-anomakhun, Sorrjit Arrerut, Chayanon Atthapak, Sakarn Khamkaeo, Thiparat Chotibut, Thiti Bovornratanaraks
  • for: 生成真实的晶体结构,保持晶体对称性
  • methods: 使用新型扩散probabilistic(DP)模型对原子坐标进行净化,而不是采用标准分数匹配方法
  • results: 能够重construct和生成高质量的晶体结构,与原始CDVAE模型相当;更重要的是,与量子化计算得到的碳结构进行比较,DP-CDVAE模型生成的结构与真正的基态结构更加相似,能减少了能量差值的平均值(68.1 meV/atom),这表明DP-CDVAE模型可以更好地生成表征基态结构的晶体结构。
    Abstract The crystal diffusion variational autoencoder (CDVAE) is a machine learning model that leverages score matching to generate realistic crystal structures that preserve crystal symmetry. In this study, we leverage novel diffusion probabilistic (DP) models to denoise atomic coordinates rather than adopting the standard score matching approach in CDVAE. Our proposed DP-CDVAE model can reconstruct and generate crystal structures whose qualities are statistically comparable to those of the original CDVAE. Furthermore, notably, when comparing the carbon structures generated by the DP-CDVAE model with relaxed structures obtained from density functional theory calculations, we find that the DP-CDVAE generated structures are remarkably closer to their respective ground states. The energy differences between these structures and the true ground states are, on average, 68.1 meV/atom lower than those generated by the original CDVAE. This significant improvement in the energy accuracy highlights the effectiveness of the DP-CDVAE model in generating crystal structures that better represent their ground-state configurations.
    摘要 “半导体晶体扩散自适应机器学习模型(CDVAE)是一种利用得分匹配生成真实的晶体结构,保持晶体对称的机器学习模型。在这项研究中,我们利用新的扩散概率模型(DP)来减少原子坐标中的噪声,而不是采用标准的得分匹配方法。我们提议的DP-CDVAE模型可以重建和生成晶体结构,其质量与原始CDVAE模型相似。此外,我们发现,对于氢材质量计算结果的缓和结构,DP-CDVAE模型生成的结构与真正的基态结构的能量差距平均为68.1 meV/原子更低,这表明DP-CDVAE模型可以更好地生成 represent their ground-state configurations的晶体结构。”Note: "晶体" (jīngbèi) in Chinese refers to crystal, and "晶体结构" (jīngbèi jìgòng) refers to crystal structure.

Speaker Diarization of Scripted Audiovisual Content

  • paper_url: http://arxiv.org/abs/2308.02160
  • repo_url: None
  • paper_authors: Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico
  • for: 为了提高电视剧和电影的中文配音和字幕创作效率。
  • methods: 利用摄制过程中使用的制作cript来提取pseudo-labeled数据,并提出一种新的半监督方法。
  • results: 相比两个无监督基线模型,提出的方法提高了51.7%的相对改善。
    Abstract The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language. In particular, the verbatim script (i.e. as-broadcast script) must be structured into a sequence of dialogue lines each including time codes, speaker name and transcript. Current speech recognition technology alleviates the transcription step. However, state-of-the-art speaker diarization models still fall short on TV shows for two main reasons: (i) their inability to track a large number of speakers, (ii) their low accuracy in detecting frequent speaker changes. To mitigate this problem, we present a novel approach to leverage production scripts used during the shooting process, to extract pseudo-labeled data for the speaker diarization task. We propose a novel semi-supervised approach and demonstrate improvements of 51.7% relative to two unsupervised baseline models on our metrics on a 66 show test set.
    摘要 媒体地化行业通常需要最终电影或电视制作的幂等脚本,以创建外语字幕或配音脚本。特别是,幂等脚本(即播放脚本)必须以时间码、发言人名称和脚本结构组织。现有的语音识别技术使得转录步骤得到alleviation。然而,当前的发言人分类模型仍然在电视节目上存在两个主要问题:(i)它们无法跟踪大量的发言人,(ii)它们在发现频繁的发言人变化时的准确率低。为解决这个问题,我们提出了一种新的方法,利用制作过程中使用的制作脚本,提取 Pseudo-labeled 数据 для发言人分类任务。我们提出了一种新的半supervised Approach,并在66集测试集上达到了51.7%的改进率相比两个无监督基线模型。

Improved Order Analysis and Design of Exponential Integrator for Diffusion Models Sampling

  • paper_url: http://arxiv.org/abs/2308.02157
  • repo_url: None
  • paper_authors: Qinsheng Zhang, Jiaming Song, Yongxin Chen
  • for: 这篇论文主要用于提出了一种改进的几何积分法,以提高扩散模型(DM)的采样质量。
  • methods: 该论文使用了几何积分法(EI),并通过对几何积分法的设计进行改进,以提高采样质量和稳定性。
  • results: 相比于现有的高阶几何积分法,该论文的提案可以提高采样质量和稳定性,并且可以避免一些容易出现的设计选择导致的问题。在实际应用中,该论文可以提供更高的采样效率和更好的采样质量。例如,在一个ImageNet扩散模型中,通过将单步DPM-Solver++替换为我们的order-satisfied RES solver,可以降低数值缺陷的比例为25.2%,并提高FID的值为25.4%(16.77 vs 12.51)。
    Abstract Efficient differential equation solvers have significantly reduced the sampling time of diffusion models (DMs) while retaining high sampling quality. Among these solvers, exponential integrators (EI) have gained prominence by demonstrating state-of-the-art performance. However, existing high-order EI-based sampling algorithms rely on degenerate EI solvers, resulting in inferior error bounds and reduced accuracy in contrast to the theoretically anticipated results under optimal settings. This situation makes the sampling quality extremely vulnerable to seemingly innocuous design choices such as timestep schedules. For example, an inefficient timestep scheduler might necessitate twice the number of steps to achieve a quality comparable to that obtained through carefully optimized timesteps. To address this issue, we reevaluate the design of high-order differential solvers for DMs. Through a thorough order analysis, we reveal that the degeneration of existing high-order EI solvers can be attributed to the absence of essential order conditions. By reformulating the differential equations in DMs and capitalizing on the theory of exponential integrators, we propose refined EI solvers that fulfill all the order conditions, which we designate as Refined Exponential Solver (RES). Utilizing these improved solvers, RES exhibits more favorable error bounds theoretically and achieves superior sampling efficiency and stability in practical applications. For instance, a simple switch from the single-step DPM-Solver++ to our order-satisfied RES solver when Number of Function Evaluations (NFE) $=9$, results in a reduction of numerical defects by $25.2\%$ and FID improvement of $25.4\%$ (16.77 vs 12.51) on a pre-trained ImageNet diffusion model.
    摘要 高效的差分方程解析器已经大幅降低了扩散模型(DM)的采样时间,同时保持高度采样质量。其中,对数Integrator(EI)已经取得了优势,但现有的高级EI基本采样算法仍然遵循不完全的EI解析器,从而导致较差的误差约束和降低了理论上预期的准确性。这种情况使采样质量极其敏感于不当的设计选择,如步长调度。例如,使用不优化的步长调度可能需要两倍的步长数量以达到相同的质量。为解决这个问题,我们重新评估了高级差分解析器的设计。通过严格的顺序分析,我们发现了现有高级EI解析器的不完全性可以归因于缺失的关键顺序条件。通过对DM的差分方程进行修改和利用差分方程解析器理论,我们提出了改进的差分方程解析器,称之为改进的差分方程解析器(RES)。使用这些改进的解析器,RES在理论上和实际应用中都显示出更有利的误差约束和更高的采样效率和稳定性。例如,将单步DPM-Solver++换为我们的顺序满足RES解析器,当Number of Function Evaluations(NFE)=9时,可以降低数值缺陷的比例为25.2%,并提高FID的改进率(16.77 vs 12.51)15.43%在预训练的ImageNet扩散模型上。

Optimization on Pareto sets: On a theory of multi-objective optimization

  • paper_url: http://arxiv.org/abs/2308.02145
  • repo_url: None
  • paper_authors: Abhishek Roy, Geelon So, Yi-An Ma
  • for: 这个论文是关于多目标优化的研究,旨在找到一个均衡多个目标函数的决策 вектор。
  • methods: 论文使用了本地方法来解决这个受限制的优化问题,该问题的约束集是 implicitly defined 的,通常是非对称和不 глад的。
  • results: 论文提出了一种算法,其最后迭代速度为 $O(K^{-1/2})$ 收敛到stationarity,当目标函数强 convex 和 Lipschitz 平滑时。
    Abstract In multi-objective optimization, a single decision vector must balance the trade-offs between many objectives. Solutions achieving an optimal trade-off are said to be Pareto optimal: these are decision vectors for which improving any one objective must come at a cost to another. But as the set of Pareto optimal vectors can be very large, we further consider a more practically significant Pareto-constrained optimization problem, where the goal is to optimize a preference function constrained to the Pareto set. We investigate local methods for solving this constrained optimization problem, which poses significant challenges because the constraint set is (i) implicitly defined, and (ii) generally non-convex and non-smooth, even when the objectives are. We define notions of optimality and stationarity, and provide an algorithm with a last-iterate convergence rate of $O(K^{-1/2})$ to stationarity when the objectives are strongly convex and Lipschitz smooth.
    摘要 在多目标优化中,每个决策向量必须均衡多个目标之间的贸易偏好。解决得到的优化解是称为Pareto优化:这些决策向量在改进任一目标时,必须来到另一目标的代价。但是Pareto优化集可能非常大,因此我们进一步考虑一种更实际 significannotive的Pareto受限优化问题,其目的是在Pareto集中优化偏好函数。我们研究本地方法来解决这个受限优化问题,这个问题存在两个主要挑战:首先,约束集是(i)隐式定义的,其次,通常是非拟合的和不平滑的。我们定义优化和稳定性的概念,并提供一种以$O(K^{-1/2})$的速率 converges to stationarity的算法,当目标函数是强Converter和Lipschitz光滑时。

Event-based Dynamic Graph Representation Learning for Patent Application Trend Prediction

  • paper_url: http://arxiv.org/abs/2308.09780
  • repo_url: None
  • paper_authors: Tao Zou, Le Yu, Leilei Sun, Bowen Du, Deqing Wang, Fuzhen Zhuang
  • for: 预测公司将在未来 periods of time 申请哪些专利,以便了解其发展策略和找到前期伙伴或竞争对手。
  • methods: 基于公司和专利分类代码的记忆表示和层次消息传递机制,实现专利申请趋势预测。
  • results: 在不同实验条件下,方法能够有效地预测专利申请趋势,同时还能学习分类代码的 semantics 和跟踪公司技术发展轨迹。
    Abstract Accurate prediction of what types of patents that companies will apply for in the next period of time can figure out their development strategies and help them discover potential partners or competitors in advance. Although important, this problem has been rarely studied in previous research due to the challenges in modelling companies' continuously evolving preferences and capturing the semantic correlations of classification codes. To fill in this gap, we propose an event-based dynamic graph learning framework for patent application trend prediction. In particular, our method is founded on the memorable representations of both companies and patent classification codes. When a new patent is observed, the representations of the related companies and classification codes are updated according to the historical memories and the currently encoded messages. Moreover, a hierarchical message passing mechanism is provided to capture the semantic proximities of patent classification codes by updating their representations along the hierarchical taxonomy. Finally, the patent application trend is predicted by aggregating the representations of the target company and classification codes from static, dynamic, and hierarchical perspectives. Experiments on real-world data demonstrate the effectiveness of our approach under various experimental conditions, and also reveal the abilities of our method in learning semantics of classification codes and tracking technology developing trajectories of companies.
    摘要 <>将来公司会申请哪些专利的预测可以掌握其发展策略,并在先前发现 potential partners或竞争对手。虽然这是一个重要的问题,但在过去的研究中它受到了较少的关注,这是因为模elling公司的持续发展偏好以及捕捉专利分类代码的含义相互关系是一个挑战。为了解决这个问题,我们提出了一种基于事件的动态图学学习框架 для专利申请趋势预测。具体来说,我们的方法基于公司和专利分类代码的印象 remembered representations。当观察到新专利时,相关公司和分类代码的表示被更新,根据历史记忆和当前编码的消息。此外,我们还提供了一种层次消息传递机制,以捕捉专利分类代码的含义相互关系,并将其更新为层次分类树。最后,我们预测专利申请趋势,通过 static、动态和层次三个视角的表示进行汇总。实验结果表明,我们的方法在不同的实验条件下表现出色,并且能够学习分类代码的 semantics 以及跟踪公司的科技发展轨迹。

Learning the solution operator of two-dimensional incompressible Navier-Stokes equations using physics-aware convolutional neural networks

  • paper_url: http://arxiv.org/abs/2308.02137
  • repo_url: None
  • paper_authors: Viktor Grimm, Alexander Heinlein, Axel Klawonn
  • for: 这个论文旨在推广物理知识到机器学习领域,但现有的方法仍然受限于单个几何或可调参数的几何。这篇论文提出了一种可以在不同几何中学习稳态 Navier-Stokes 方程的方法,不需要 parametrization。
  • methods: 该方法基于 U-Net 类 CNN 和确立的精度方法,从数学方面来说是使用 finite difference 方法进行精度描述。
  • results: 该方法与当前领先的数据基本方法进行比较,并与数据基本方法结合使用时的性能进行比较。
    Abstract In recent years, the concept of introducing physics to machine learning has become widely popular. Most physics-inclusive ML-techniques however are still limited to a single geometry or a set of parametrizable geometries. Thus, there remains the need to train a new model for a new geometry, even if it is only slightly modified. With this work we introduce a technique with which it is possible to learn approximate solutions to the steady-state Navier--Stokes equations in varying geometries without the need of parametrization. This technique is based on a combination of a U-Net-like CNN and well established discretization methods from the field of the finite difference method.The results of our physics-aware CNN are compared to a state-of-the-art data-based approach. Additionally, it is also shown how our approach performs when combined with the data-based approach.
    摘要 近年来,将物理学引入机器学习的概念在学术界得到了广泛的推广。然而,大多数物理包含的机器学习技术仍然受限于单个几何或一组可Parametrize的几何。因此,在新的几何模型中训练新模型仍然存在需求,即使这个几何只是稍微修改过。本工作我们介绍了一种可以在不同几何中学习稳态内离液方程的近似解的技术。这种技术基于U-Net-like CNN和已知的精度分割方法。我们的物理意识CNN的结果与现有的数据基本方法相比较,并且还展示了我们的方法与数据基本方法的组合效果。

Can Attention Be Used to Explain EHR-Based Mortality Prediction Tasks: A Case Study on Hemorrhagic Stroke

  • paper_url: http://arxiv.org/abs/2308.05110
  • repo_url: None
  • paper_authors: Qizhang Feng, Jiayi Yuan, Forhan Bin Emdad, Karim Hanna, Xia Hu, Zhe He
  • for: 预测中风mortalit,提高患者护理质量和风险评估的精度。
  • methods: 使用一种新的方法:可解释的注意力基于变换器模型,以提高早期中风预测的准确性和可读性。
  • results: 研究发现,该模型在比较旧方法时表现出较高的准确性和可读性,同时提供了明确的特征重要性。
    Abstract Stroke is a significant cause of mortality and morbidity, necessitating early predictive strategies to minimize risks. Traditional methods for evaluating patients, such as Acute Physiology and Chronic Health Evaluation (APACHE II, IV) and Simplified Acute Physiology Score III (SAPS III), have limited accuracy and interpretability. This paper proposes a novel approach: an interpretable, attention-based transformer model for early stroke mortality prediction. This model seeks to address the limitations of previous predictive models, providing both interpretability (providing clear, understandable explanations of the model) and fidelity (giving a truthful explanation of the model's dynamics from input to output). Furthermore, the study explores and compares fidelity and interpretability scores using Shapley values and attention-based scores to improve model explainability. The research objectives include designing an interpretable attention-based transformer model, evaluating its performance compared to existing models, and providing feature importance derived from the model.
    摘要 stroke 是一种重要的死亡和残疾原因,需要早期预测方法以降低风险。传统的评估病人方法,如急性生理学和慢性健康评估(APACHE II、IV)和简化型急性生理评分 III(SAPS III),有限的准确性和可解性。这篇论文提出了一种新的方法:一种可解的、注意力基本变换模型,用于早期stroke死亡预测。这个模型目的是解决之前预测模型的限制,提供可解性(提供明确、理解的解释)和准确性(从输入到输出的模型动态 truthful explanation)。此外,研究还 explore和比较了准确性和可解性分数使用Shapley值和注意力基本分数来提高模型解释性。研究的目标包括设计一种可解的注意力基本变换模型,评估其表现与现有模型相比,并提供来自模型的特征重要性。

Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity

  • paper_url: http://arxiv.org/abs/2308.03521
  • repo_url: None
  • paper_authors: Xuefeng Han, Jun Li, Wen Chen, Zhen Mei, Kang Wei, Ming Ding, H. Vincent Poor
  • for: 针对智能移动设备普及的蜂窝学习(FL)在无线网络中的应用,尤其是对数据不均衡和不同训练数据大小的 клиент所带来的挑战。
  • methods: 本文使用closed-form表达式来 bounds 蜂窝学习(FL)损失函数,并考虑了无线资源分配和客户端调度。
  • results: 实验结果表明,提出的算法在实际数据上比其他参考方案更高的学习精度和能耗。
    Abstract With the rapid proliferation of smart mobile devices, federated learning (FL) has been widely considered for application in wireless networks for distributed model training. However, data heterogeneity, e.g., non-independently identically distributions and different sizes of training data among clients, poses major challenges to wireless FL. Limited communication resources complicate the implementation of fair scheduling which is required for training on heterogeneous data, and further deteriorate the overall performance. To address this issue, this paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. Specifically, we first develop a closed-form expression for an upper bound on the FL loss function, with a particular emphasis on data heterogeneity described by a dataset size vector and a data divergence vector. Then we formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE). Next, via the Lyapunov drift technique, we transform the CRE optimization problem into a series of tractable problems. Extensive experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard written form of Chinese used in mainland China.Please note that the translation is done by a machine and may not be perfect, and there may be some cultural or linguistic differences that are not captured by the translation.

Branched Latent Neural Operators

  • paper_url: http://arxiv.org/abs/2308.02599
  • repo_url: https://github.com/stanfordcbcl/blno.jl
  • paper_authors: Matteo Salvador, Alison Lesley Marsden
  • for: 这篇论文旨在开发一种可靠和高效的减少维度模型,用于数字双子技术应用。
  • methods: 该论文提出了分支隐藏神经网络(BLNOs)来学习输入输出映射,以处理复杂物理过程。 BLNOs 使用简单和压缩的Feedforward 几何神经网络,并通过将输入分解成不同内在角色,如时间变量和模型参数,以提高学习的动力和缩减维度。
  • results: 该论文的实验结果表明,BLNOs 可以在小训练集和短训练时间内达到优秀的泛化性能,并且其泛化误差与选择维度无关。此外,BLNOs 的半连接结构可以减少可调参数的数量。在一个具有诊断检查和心脏形态的心脏模型中,BLNOs 通过150个在线生成的12个电气征诊断图来训练,并在7个参数上进行自动hyperparameter调整。最佳BLNO在 fewer than 3 hours内在单个CPU上训练完毕,具有7层隐藏层和19个神经元。在独立测试集中,该模型的平方误差在 $10^{-4}$ 之间。
    Abstract We introduce Branched Latent Neural Operators (BLNOs) to learn input-output maps encoding complex physical processes. A BLNO is defined by a simple and compact feedforward partially-connected neural network that structurally disentangles inputs with different intrinsic roles, such as the time variable from model parameters of a differential equation, while transferring them into a generic field of interest. BLNOs leverage interpretable latent outputs to enhance the learned dynamics and break the curse of dimensionality by showing excellent generalization properties with small training datasets and short training times on a single processor. Indeed, their generalization error remains comparable regardless of the adopted discretization during the testing phase. Moreover, the partial connections, in place of a fully-connected structure, significantly reduce the number of tunable parameters. We show the capabilities of BLNOs in a challenging test case involving biophysically detailed electrophysiology simulations in a biventricular cardiac model of a pediatric patient with hypoplastic left heart syndrome. The model includes a purkinje network for fast conduction and a heart-torso geometry. Specifically, we trained BLNOs on 150 in silico generated 12-lead electrocardiograms (ECGs) while spanning 7 model parameters, covering cell-scale, organ-level and electrical dyssynchrony. Although the 12-lead ECGs manifest very fast dynamics with sharp gradients, after automatic hyperparameter tuning the optimal BLNO, trained in less than 3 hours on a single CPU, retains just 7 hidden layers and 19 neurons per layer. The mean square error is on the order of $10^{-4}$ on an independent test dataset comprised of 50 additional electrophysiology simulations. This paper provides a novel computational tool to build reliable and efficient reduced-order models for digital twinning in engineering applications.
    摘要 我们引入分支 latent neural operator (BLNO),以learn输入输出映射,描述复杂物理过程。BLNO是一个简单且紧凑的 feedforward partially-connected neural network,它将输入分为不同的内在角色,如时间变量和模型参数,并将它们转换为一个通用的场景。BLNOs 利用可解释的 latent output 提高学习过程中的动态,并突破维度紧缩问题,因为它们在训练阶段只需要小量的训练数据和短时间执行。此外,对维度的采取价值不同的� enters 也可以将数据分为不同的类别,进一步提高模型的精度。我们在一个具有复杂生物物理特性的心脏模型中进行了一个挑战性的测试,该模型包括一个 purkinje 网络和心脏� torso 对应。我们将 BLNOs 训练在 150 个在 silico 生成的 12 项电cardiogram (ECG) 中,涵盖 7 个模型参数,包括细胞层、器官层和电子� synchrony。although 12 项 ECG 呈现出非常快的动态和锋利的梯度,经自动优化参数后,最佳的 BLNO 在仅 3 小时内在单一 CPU 上训练,只有 7 个隐藏层和 19 个神经元。模型的平方误差在独立测试数据中是 $10^{-4}$ 阶段。这篇论文提供了一个新的计算工具,用于建立可靠且高效的削减� orden 模型,供工程应用中的数字双胞处理。

Eva: A General Vectorized Approximation Framework for Second-order Optimization

  • paper_url: http://arxiv.org/abs/2308.02123
  • repo_url: None
  • paper_authors: Lin Zhang, Shaohuai Shi, Bo Li
  • for: 这个论文目的是提高深度学习模型训练效率。
  • methods: 该论文提出了两种新技术:1)使用小批量训练数据的 kronecker 分解来减少内存消耗;2)通过sherman-morrison公式来计算更新公式而不需要直接计算矩阵的逆元。
  • results: 对不同的模型和数据集进行了广泛的实验,结果显示,使用 Eva 可以降低总训练时间,相比于首项SGD和其他二项算法(K-FAC和Shampoo),可以提高训练效率。
    Abstract Second-order optimization algorithms exhibit excellent convergence properties for training deep learning models, but often incur significant computation and memory overheads. This can result in lower training efficiency than the first-order counterparts such as stochastic gradient descent (SGD). In this work, we present a memory- and time-efficient second-order algorithm named Eva with two novel techniques: 1) we construct the second-order information with the Kronecker factorization of small stochastic vectors over a mini-batch of training data to reduce memory consumption, and 2) we derive an efficient update formula without explicitly computing the inverse of matrices using the Sherman-Morrison formula. We further extend Eva to a general vectorized approximation framework to improve the compute and memory efficiency of two existing second-order algorithms (FOOF and Shampoo) without affecting their convergence performance. Extensive experimental results on different models and datasets show that Eva reduces the end-to-end training time up to 2.05x and 2.42x compared to first-order SGD and second-order algorithms (K-FAC and Shampoo), respectively.
    摘要 第二顺序优化算法具有深度学习模型训练中的卓越减少性,但经常带来计算和内存开销。这可能导致训练效率较低于第一顺序对手,如随机梯度下降(SGD)。在这篇文章中,我们介绍了一种具有内存和时间有效的第二顺序算法,名为Eva,以及两种新技术:1. 我们使用小批量训练数据的Kronecker因子分解来构建第二顺序信息,以减少内存消耗。2. 我们 derivate了一个高效的更新公式,不需要直接计算矩阵的逆元,使用Sherman-Morrison公式。此外,我们还将Eva扩展到一个通用的矢量化近似框架,以提高FOOF和Shampoo等两种第二顺序算法的计算和内存效率,无需影响其减少性。我们在不同的模型和数据集上进行了广泛的实验,结果显示,Eva可以比SGD和K-FAC/Shampoo等第一顺序和第二顺序算法减少结束训练时间,具体的比例分别为2.05倍和2.42倍。

Model Provenance via Model DNA

  • paper_url: http://arxiv.org/abs/2308.02121
  • repo_url: None
  • paper_authors: Xin Mu, Yu Wang, Yehong Zhang, Jiaqi Zhang, Hui Wang, Yang Xiang, Yue Yu
  • for: 本研究主要针对机器学习模型的生命周期(即模型的来源、训练和应用)中的一个新问题,即模型来源问题(Model Provenance,MP),即确定目标模型的来源模型是否为其训练模型。这是一个重要的问题,对于确保机器学习模型的安全性和知识产权具有重要 significances,但在文献中尚未得到了充分的关注。
  • methods: 我们提出了一种新的模型特征表示方法,即模型DNA(Model DNA),它可以很好地表示机器学习模型的训练数据和输入输出信息。然后,我们提出了一种基于数据驱动和模型驱动的学习方法,通过对模型DNA进行编码,以获得一个紧凑 Complete和全面的模型表示(DNA)。
  • results: 我们通过在计算机视觉和自然语言处理任务上使用不同的模型、数据集和场景进行评估,以示我们的方法在准确地确定模型来源方面的效果。
    Abstract Understanding the life cycle of the machine learning (ML) model is an intriguing area of research (e.g., understanding where the model comes from, how it is trained, and how it is used). This paper focuses on a novel problem within this field, namely Model Provenance (MP), which concerns the relationship between a target model and its pre-training model and aims to determine whether a source model serves as the provenance for a target model. This is an important problem that has significant implications for ensuring the security and intellectual property of machine learning models but has not received much attention in the literature. To fill in this gap, we introduce a novel concept of Model DNA which represents the unique characteristics of a machine learning model. We utilize a data-driven and model-driven representation learning method to encode the model's training data and input-output information as a compact and comprehensive representation (i.e., DNA) of the model. Using this model DNA, we develop an efficient framework for model provenance identification, which enables us to identify whether a source model is a pre-training model of a target model. We conduct evaluations on both computer vision and natural language processing tasks using various models, datasets, and scenarios to demonstrate the effectiveness of our approach in accurately identifying model provenance.
    摘要 Translated into Simplified Chinese:理解机器学习模型的生命周期是一个有趣的研究领域(例如,理解模型的来源、如何训练和如何使用)。这篇论文关注一个新的问题在这个领域,即模型 происхождение(MP),它关注目标模型和其预训练模型之间的关系,并计划确定预训练模型是目标模型的来源。这是一个重要的问题,它对机器学习模型的安全和知识产权具有重要意义,但在文献中尚未得到了充分的关注。为了填补这一空白,我们提出了一个新的机器学习模型特征表示(Model DNA),它表示机器学习模型的唯一特征。我们使用数据驱动和模型驱动的表示学习方法将模型训练数据和输入输出信息编码为模型的唯一表示(DNA)。使用这个模型DNA,我们开发了一种高效的模型 происхождение标识框架,可以准确地确定预训练模型是目标模型的来源。我们在计算机视觉和自然语言处理任务上使用了多种模型、数据集和场景,以示出我们方法的准确性。

Designing a Deep Learning-Driven Resource-Efficient Diagnostic System for Metastatic Breast Cancer: Reducing Long Delays of Clinical Diagnosis and Improving Patient Survival in Developing Countries

  • paper_url: http://arxiv.org/abs/2308.02597
  • repo_url: None
  • paper_authors: William Gao, Dayong Wang, Yi Huang
  • for: 这个研究旨在解决癌症致死率高的乳癌患者在发展中国家,特别是SUB-SAHARAN AFRICA、南亚和南美洲所面临的长时间诊断延误问题。
  • methods: 这个研究使用了深度学习技术开发了一个可高度准确地诊断乳癌的系统,并且可以在资源充足的医疗设施中运行。
  • results: 根据评估结果,MobileNetV2型别的诊断模型在准确性、数据通用性和训练效率等方面优于更复杂的VGG16、ResNet50和ResNet101模型。此外,Visual比较表明MobileNetV2模型可以实时检测小型乳癌细胞在正常细胞中的嵌入。
    Abstract Breast cancer is one of the leading causes of cancer mortality. Breast cancer patients in developing countries, especially sub-Saharan Africa, South Asia, and South America, suffer from the highest mortality rate in the world. One crucial factor contributing to the global disparity in mortality rate is long delay of diagnosis due to a severe shortage of trained pathologists, which consequently has led to a large proportion of late-stage presentation at diagnosis. The delay between the initial development of symptoms and the receipt of a diagnosis could stretch upwards 15 months. To tackle this critical healthcare disparity, this research has developed a deep learning-based diagnosis system for metastatic breast cancer that can achieve high diagnostic accuracy as well as computational efficiency. Based on our evaluation, the MobileNetV2-based diagnostic model outperformed the more complex VGG16, ResNet50 and ResNet101 models in diagnostic accuracy, model generalization, and model training efficiency. The visual comparisons between the model prediction and ground truth have demonstrated that the MobileNetV2 diagnostic models can identify very small cancerous nodes embedded in a large area of normal cells which is challenging for manual image analysis. Equally Important, the light weighted MobleNetV2 models were computationally efficient and ready for mobile devices or devices of low computational power. These advances empower the development of a resource-efficient and high performing AI-based metastatic breast cancer diagnostic system that can adapt to under-resourced healthcare facilities in developing countries. This research provides an innovative technological solution to address the long delays in metastatic breast cancer diagnosis and the consequent disparity in patient survival outcome in developing countries.
    摘要 乳癌是全球最主要的癌症致死原因之一,而发展中国家的乳癌患者,特别是非洲萨赫拉地区、南亚和南美地区,患者的死亡率最高。一个重要的因素导致全球医疗 disparity 是诊断延迟,因为缺乏培训的病理学家,这导致了许多晚期诊断。延迟从症状开始到诊断的时间可以达到15个月。为了解决这一重要的医疗 disparity,这项研究开发了一个基于深度学习的乳癌诊断系统,可以实现高精度和计算效率。根据我们的评估,基于 MobileNetV2 的诊断模型在诊断精度、模型泛化和模型训练效率方面都高于 VGG16、ResNet50 和 ResNet101 模型。视觉比较表明,MobileNetV2 诊断模型可以准确地识别小型乳癌细胞在大量正常细胞中的存在,这是人工图像分析困难。此外,MobileNetV2 模型轻量级,可以在移动设备或低计算能力的设备上进行计算,这使得这些技术更加可行。这些进步使得可以开发一个资源高效的 AI 基于乳癌诊断系统,可以适应发展中国家的医疗设施。这项研究提供了一种创新的技术解决方案,以减少发展中国家乳癌患者的诊断延迟和因此的生存结果差距。

VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs

  • paper_url: http://arxiv.org/abs/2308.02117
  • repo_url: https://github.com/yangling0818/vqgraph
  • paper_authors: Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin Cui, Muhan Zhang, Jure Leskovec
  • for: 本文旨在提出一种新的框架,即VQGraph,用于将图神经网络(GNN)和多层感知器(MLP)相互链接,以提高图像识别的性能。
  • methods: 本文使用了一种新的tokenizer,即基于vector-quantized variational autoencoder(VQ-VAE)的编码器,来表示图中的节点。此外,本文还提出了一种基于软Token分配的新的token-based填充目标,以便从GNN中传递知识到MLP中。
  • results: EXTENSIVE experiments和分析表明,VQGraph可以具有更高的性能,在七个图 dataset上实现新的状态艺术性能。VQGraph能够更快地进行预测,比GNN快828倍,同时也可以提高GNN和独立MLP的准确率。
    Abstract Graph Neural Networks (GNNs) conduct message passing which aggregates local neighbors to update node representations. Such message passing leads to scalability issues in practical latency-constrained applications. To address this issue, recent methods adopt knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (MLP) by mimicking the output of GNN. However, the existing GNN representation space may not be expressive enough for representing diverse local structures of the underlying graph, which limits the knowledge transfer from GNN to MLP. Here we present a novel framework VQGraph to learn a powerful graph representation space for bridging GNNs and MLPs. We adopt the encoder of a variant of a vector-quantized variational autoencoder (VQ-VAE) as a structure-aware graph tokenizer, which explicitly represents the nodes of diverse local structures as numerous discrete tokens and constitutes a meaningful codebook. Equipped with the learned codebook, we propose a new token-based distillation objective based on soft token assignments to sufficiently transfer the structural knowledge from GNN to MLP. Extensive experiments and analyses demonstrate the strong performance of VQGraph, where we achieve new state-of-the-art performance on GNN-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph.
    摘要 图 neural network (GNN) 通过消息传递来更新节点表示。这种消息传递会导致实际应用中的执行效率问题。为解决这个问题,现有方法采用知识传授 (KD) 学习计算效率高的多层感知器 (MLP),但现有的 GNN 表示空间可能不够表示多样性的本地结构,这限制了 GNN 知识的传递。我们提出一种新的框架 VQGraph,用于学习一个强大的图表示空间,以bridging GNN 和 MLP。我们采用一种变体的 vector-quantized variational autoencoder (VQ-VAE) 的Encoder作为结构意识 graph tokenizer,其将节点视为多种不同的本地结构的数据点,并构成一个有意义的代码库。准备了学习的代码库后,我们提出了一个新的 токен基本的分配目标,基于软件分配来充分传递 GNN 中的结构知识到 MLP。我们对七个图Dataset进行了广泛的实验和分析,并证明了 VQGraph 的强大性。我们在传递性下 achieved 新的状态时���的性能,在 inductive 和 transductive 设置下,VQGraph 的性能比 GNN 高出 3.90% 和 28.05% 的平均值。此外,VQGraph 能够更快地进行计算,比 GNN 快 828 倍。代码:https://github.com/YangLing0818/VQGraph。

Breast Ultrasound Tumor Classification Using a Hybrid Multitask CNN-Transformer Network

  • paper_url: http://arxiv.org/abs/2308.02101
  • repo_url: None
  • paper_authors: Bryar Shareef, Min Xian, Aleksandar Vakanski, Haotian Wang
  • for: 这个研究是为了提高乳腺ultrasound(BUS)图像分类的准确性。
  • methods: 本研究使用了一种混合型多任务深度学习网络,名为Hybrid-MT-ESTAN,它结合了CNN和Swin Transformer的元件,以进行BUS图像分类和分 segmentation。
  • results: 比较九种BUS分类方法,Hybrid-MT-ESTAN得到了最高的准确性、敏感度和F1分数,具体的数据为82.7%, 86.4%和86.0%。
    Abstract Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification. Although convolutional neural networks (CNNs) have demonstrated reliable performance in tumor classification, they have inherent limitations for modeling global and long-range dependencies due to the localized nature of convolution operations. Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations. In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation using a hybrid architecture composed of CNNs and Swin Transformer components. The proposed approach was compared to nine BUS classification methods and evaluated using seven quantitative metrics on a dataset of 3,320 BUS images. The results indicate that Hybrid-MT-ESTAN achieved the highest accuracy, sensitivity, and F1 score of 82.7%, 86.4%, and 86.0%, respectively.
    摘要 globally capturing contextual information plays a crucial role in breast ultrasound (BUS) image classification. Although convolutional neural networks (CNNs) have demonstrated reliable performance in tumor classification, they have inherent limitations in modeling global and long-range dependencies due to the localized nature of convolution operations. Vision Transformers have an improved capability of capturing global contextual information, but may distort local image patterns due to tokenization operations. In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation using a hybrid architecture composed of CNNs and Swin Transformer components. The proposed approach was compared to nine BUS classification methods and evaluated using seven quantitative metrics on a dataset of 3,320 BUS images. The results indicate that Hybrid-MT-ESTAN achieved the highest accuracy, sensitivity, and F1 score of 82.7%, 86.4%, and 86.0%, respectively.Here's the word-for-word translation of the text into Simplified Chinese:全球捕捉上下文信息在乳腺超声图像分类中扮演了关键角色。尽管卷积神经网络(CNNs)在肿瘤分类中表现出了可靠性,但它们具有本质上的局部化限制,因为卷积操作的本地化特性。视觉 трансформа器具有提高全球上下文信息捕捉的能力,但可能因token化操作而导致本地图像模式的扭曲。在本研究中,我们提出了一种混合多任务深度神经网络,名为Hybrid-MT-ESTAN,用于实现乳腺超声图像分类和分割。我们的方法与9种BUS分类方法进行比较,并在3320个BUS图像 Dataset 上进行评估,使用7个量化指标。结果表明,Hybrid-MT-ESTAN achieved the highest accuracy, sensitivity, and F1 score of 82.7%, 86.4%, and 86.0%, respectively.

Efficient Model Adaptation for Continual Learning at the Edge

  • paper_url: http://arxiv.org/abs/2308.02084
  • repo_url: None
  • paper_authors: Zachary A. Daniels, Jun Hu, Michael Lomnitz, Phil Miller, Aswin Raghavan, Joe Zhang, Michael Piacentino, David Zhang
  • for: 这篇论文旨在提出一个非站点自动机学(AutoML)框架,以便在不同环境下进行高效的连续学习。
  • methods: 这篇论文使用了一个固定的深度神经网络(DNN)特征嵌入器,并使用阶层神经网络来处理新数据。它还使用了零数学 нейрон架构搜索(ZS-NAS)来找到适当的神经适配器,以适应新数据。
  • results: 这篇论文系统地评估了它的方法在多个benchmark数据集上,并与现有的OOD检测和几何学搜索算法进行比较。结果显示,这篇论文的方法具有强大的外部检测和适配能力,并能够在不同的环境下进行高效的连续学习。
    Abstract Most machine learning (ML) systems assume stationary and matching data distributions during training and deployment. This is often a false assumption. When ML models are deployed on real devices, data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest. While it is possible to have a human-in-the-loop to monitor for distribution shifts and engineer new architectures in response to these shifts, such a setup is not cost-effective. Instead, non-stationary automated ML (AutoML) models are needed. This paper presents the Encoder-Adaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts. The EAR framework uses a fixed deep neural network (DNN) feature encoder and trains shallow networks on top of the encoder to handle novel data. The EAR framework is capable of 1) detecting when new data is out-of-distribution (OOD) by combining DNNs with hyperdimensional computing (HDC), 2) identifying low-parameter neural adaptors to adapt the model to the OOD data using zero-shot neural architecture search (ZS-NAS), and 3) minimizing catastrophic forgetting on previous tasks by progressively growing the neural architecture as needed and dynamically routing data through the appropriate adaptors and reconfigurators for handling domain-incremental and class-incremental continual learning. We systematically evaluate our approach on several benchmark datasets for domain adaptation and demonstrate strong performance compared to state-of-the-art algorithms for OOD detection and few-/zero-shot NAS.
    摘要 大多数机器学习(ML)系统假设训练和部署期间数据分布保持静态和匹配。这是一个不实际的假设。当 ML 模型在真实设备上部署时,数据分布经常在时间变化,这是由环境因素、感知特征和任务关注点的变化引起的。虽然可以有人在Loop(human-in-the-loop)监控数据分布的变化并根据这些变化设计新的建筑,但这种设置不是可持续的。相反,不可靠自动机器学习(AutoML)模型是需要的。这篇论文提出了 Encoder-Adaptor-Reconfigurator(EAR)框架,用于效率地进行逐步学习下降。EAR 框架使用固定的深度神经网络(DNN)特征编码器,并在编码器之上训练浅层网络来处理新的数据。EAR 框架可以1) 将新数据确定为外部数据(OOD),通过将 DNN 与高维计算(HDC)结合使用,2) 使用零参数神经适应器适应模型到OOD 数据,并3) 使用进行逐步增长的神经网络,逐步地处理域逐步学习和类逐步学习。我们系统地评估了我们的方法在域逐步学习和类逐步学习中的表现,并与当前最佳算法相比,在OOD 检测和几个/零个 NAS 方面表现出色。

Target specification bias, counterfactual prediction, and algorithmic fairness in healthcare

  • paper_url: http://arxiv.org/abs/2308.02081
  • repo_url: None
  • paper_authors: Eran Tal
  • for: 这篇论文旨在描述机器学习(ML)在医疗领域中的偏见来源,并提出一种新的方法来解决这种偏见。
  • methods: 该论文使用了一种新的方法,即“target specification bias”,来描述医疗领域中ML模型的偏见来源。该方法基于决策者对假设enario的预测,而不是实际情况的预测。
  • results: 该论文的结果表明,target specification bias是医疗领域中ML模型偏见的一种常见来源,并且可能导致医疗资源的不合理使用和伤害病人。同时,该论文还提出了一种新的方法来解决这种偏见,即通过对目标变量的操作来减少target specification bias。
    Abstract Bias in applications of machine learning (ML) to healthcare is usually attributed to unrepresentative or incomplete data, or to underlying health disparities. This article identifies a more pervasive source of bias that affects the clinical utility of ML-enabled prediction tools: target specification bias. Target specification bias arises when the operationalization of the target variable does not match its definition by decision makers. The mismatch is often subtle, and stems from the fact that decision makers are typically interested in predicting the outcomes of counterfactual, rather than actual, healthcare scenarios. Target specification bias persists independently of data limitations and health disparities. When left uncorrected, it gives rise to an overestimation of predictive accuracy, to inefficient utilization of medical resources, and to suboptimal decisions that can harm patients. Recent work in metrology - the science of measurement - suggests ways of counteracting target specification bias and avoiding its harmful consequences.
    摘要 Machine learning(ML)在医疗领域中的偏见通常被归结于不代表性或不完整的数据,或者基础医疗差距。这篇文章揭示了一种更普遍的偏见源,对mlenabled预测工具的临床实用性产生了影响:目标规定偏见。目标规定偏见发生在ml模型操作时,实际目标变量与决策者所定义的目标变量之间存在差异。这种差异通常是微妙的,来自于决策者通常关注预测实际医疗enario的结果,而不是实际的医疗enario。这种偏见独立于数据限制和医疗差距,并且会导致预测精度过高、医疗资源的不合理使用和伤害病人的决策。近期的metrology研究(量度科学)提供了对target specification bias的应对方法和避免其有害后果的方法。

Causality Guided Disentanglement for Cross-Platform Hate Speech Detection

  • paper_url: http://arxiv.org/abs/2308.02080
  • repo_url: https://github.com/paras2612/catch
  • paper_authors: Paras Sheth, Tharindu Kumarage, Raha Moraffah, Aman Chadha, Huan Liu
  • for: 本研究旨在开发一种可以在多个平台上训练和生成恐吓言语检测模型,以解决现有模型过于依赖特定语言信号和分类词的局限性,以及缺乏高质量标注数据的问题。
  • methods: 本研究使用了分解输入表示的方法,将输入表示分为不同平台的特征和平台独特的特征,以学习不同平台上的恐吓言语表示。此外,研究还利用了 causal 关系学习来更好地理解不同环境下的恐吓言语表示。
  • results: 研究的实验结果表明, compared to 现有状态的方法,本研究的模型在多个平台上的恐吓言语检测效果明显更高。
    Abstract Social media platforms, despite their value in promoting open discourse, are often exploited to spread harmful content. Current deep learning and natural language processing models used for detecting this harmful content overly rely on domain-specific terms affecting their capabilities to adapt to generalizable hate speech detection. This is because they tend to focus too narrowly on particular linguistic signals or the use of certain categories of words. Another significant challenge arises when platforms lack high-quality annotated data for training, leading to a need for cross-platform models that can adapt to different distribution shifts. Our research introduces a cross-platform hate speech detection model capable of being trained on one platform's data and generalizing to multiple unseen platforms. To achieve good generalizability across platforms, one way is to disentangle the input representations into invariant and platform-dependent features. We also argue that learning causal relationships, which remain constant across diverse environments, can significantly aid in understanding invariant representations in hate speech. By disentangling input into platform-dependent features (useful for predicting hate targets) and platform-independent features (used to predict the presence of hate), we learn invariant representations resistant to distribution shifts. These features are then used to predict hate speech across unseen platforms. Our extensive experiments across four platforms highlight our model's enhanced efficacy compared to existing state-of-the-art methods in detecting generalized hate speech.
    摘要 社交媒体平台,尽管它们在促进开放的讨论方面具有价值,但它们也被滥用来传播危险内容。现有的深入学习和自然语言处理模型用于检测这种危险内容往往仅仅依赖于域内特定的术语,导致它们在泛化仇言检测方面有限制。此外,当平台缺乏高质量标注数据 для训练时,需要开发可以适应不同分布偏移的跨平台模型。我们的研究推出了一种可以在一个平台的数据上训练并在多个未看过平台上生效的恶意语言检测模型。为实现良好的泛化性 across 平台,我们可以将输入表示分解为不变和平台特定的特征。我们还 argue 了解不变的 causal 关系可以帮助理解不变的表示在仇言中。通过将输入分解为平台特定的特征(用于预测仇恨目标)和平台独立的特征(用于预测存在仇恨),我们学习了不变的表示,这些表示对各种环境的分布偏移具有抗逆性。这些特征最后被用来预测恶意语言 across 未看过平台。我们的广泛的实验 across 四个平台表明我们的模型在检测通用的仇言方面具有显著的高效性,比现有的状态的艺术方法更高效。

Specious Sites: Tracking the Spread and Sway of Spurious News Stories at Scale

  • paper_url: http://arxiv.org/abs/2308.02068
  • repo_url: None
  • paper_authors: Hans W. A. Hanley, Deepak Kumar, Zakir Durumeric
  • for: 这研究旨在自动跟踪在线平台上流行的新闻 narative,以帮助检测和推篱虚假信息。
  • methods: 该研究使用大型自然语言模型 MPNet 和 DP-Means 归一 clustering 技术,日常抓取 1,404 个不可靠新闻网站,自动孤立和分析在线生态系统中流行的 narratives。
  • results: 该研究发现了 2022 年最流行的新闻 narratives,并 identificated 最具影响力的网站,这些网站通常是新闻 narative 的起点和扩大源。此外,研究还表明了该系统可以帮助 факт-检查机构如 Politifact、Reuters 和 AP News 更快地解决虚假信息故事。
    Abstract Misinformation, propaganda, and outright lies proliferate on the web, with some narratives having dangerous real-world consequences on public health, elections, and individual safety. However, despite the impact of misinformation, the research community largely lacks automated and programmatic approaches for tracking news narratives across online platforms. In this work, utilizing daily scrapes of 1,404 unreliable news websites, the large-language model MPNet, and DP-Means clustering, we introduce a system to automatically isolate and analyze the narratives spread within online ecosystems. Identifying 55,301 narratives on these 1,404 websites, we describe the most prevalent narratives spread in 2022 and identify the most influential websites that originate and magnify narratives. Finally, we show how our system can be utilized to detect new narratives originating from unreliable news websites and aid fact-checkers like Politifact, Reuters, and AP News in more quickly addressing misinformation stories.
    摘要 互联网上流传谣言、宣传和谎言,有些叙述可能有危害公共健康、选举和个人安全的危险。然而,研究社区在 automatized 和程序化的方面缺乏跟踪新闻叙述的方法。在这项工作中,我们使用每天抓取1,404个不可靠新闻网站,大型自然语言模型MPNet,以及DP-Means clustering,开发了一个系统来自动孤立和分析在线生态系统中流传的叙述。我们发现了2022年流传的55,301个叙述,并指出了这些叙述的最有影响力的网站。最后,我们示出了如何使用我们的系统来检测来自不可靠新闻网站的新叙述,并帮助 факт-检查机构如Politifact、Reuters 和AP News更快地解决谣言故事。

Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing with Non-Learnable Primitives

  • paper_url: http://arxiv.org/abs/2308.02066
  • repo_url: https://github.com/zhichao-lu/etr-nlp-mtl
  • paper_authors: Chuntao Ding, Zhichao Lu, Shangguang Wang, Ran Cheng, Vishnu Naresh Boddeti
  • for: This paper proposes a method to mitigate task interference in multi-task learning (MTL) models by combining non-learnable primitives (NLPs) and explicit task routing (ETR).
  • methods: The proposed ETR-NLP model employs non-learnable primitives to extract task-agnostic features and recombine them into a shared branch common to all tasks and explicit task-specific branches reserved for each task.
  • results: The proposed ETR-NLP model significantly outperforms state-of-the-art baselines with fewer learnable parameters and similar FLOPs across all datasets.Here is the same information in Simplified Chinese text:
  • for: 这篇论文提出了一种用于避免多任务学习(MTL)模型中任务干扰的方法,通过结合非学习式基本元素(NLP)和显式任务路由(ETR)来实现。
  • methods: 提议的 ETR-NLP 模型使用非学习式基本元素来提取任务共享的特征,并将其重新组合成共享到所有任务的分支和每个任务的专门的分支。
  • results: 提议的 ETR-NLP 模型在所有数据集上都达到了比基eline更高的性能,使用 fewer 的学习参数和相同的计算复杂度(FLOPs)。
    Abstract Multi-task learning (MTL) seeks to learn a single model to accomplish multiple tasks by leveraging shared information among the tasks. Existing MTL models, however, have been known to suffer from negative interference among tasks. Efforts to mitigate task interference have focused on either loss/gradient balancing or implicit parameter partitioning with partial overlaps among the tasks. In this paper, we propose ETR-NLP to mitigate task interference through a synergistic combination of non-learnable primitives (NLPs) and explicit task routing (ETR). Our key idea is to employ non-learnable primitives to extract a diverse set of task-agnostic features and recombine them into a shared branch common to all tasks and explicit task-specific branches reserved for each task. The non-learnable primitives and the explicit decoupling of learnable parameters into shared and task-specific ones afford the flexibility needed for minimizing task interference. We evaluate the efficacy of ETR-NLP networks for both image-level classification and pixel-level dense prediction MTL problems. Experimental results indicate that ETR-NLP significantly outperforms state-of-the-art baselines with fewer learnable parameters and similar FLOPs across all datasets. Code is available at this \href{https://github.com/zhichao-lu/etr-nlp-mtl}.
    摘要 In this paper, we propose ETR-NLP to mitigate task interference through a synergistic combination of non-learnable primitives (NLPs) and explicit task routing (ETR). Our key idea is to use non-learnable primitives to extract a diverse set of task-agnostic features and recombine them into a shared branch common to all tasks and explicit task-specific branches reserved for each task. The non-learnable primitives and the explicit decoupling of learnable parameters into shared and task-specific ones provide the flexibility needed for minimizing task interference.We evaluate the effectiveness of ETR-NLP networks for both image-level classification and pixel-level dense prediction MTL problems. Experimental results show that ETR-NLP significantly outperforms state-of-the-art baselines with fewer learnable parameters and similar FLOPs across all datasets. Code is available at this link: .

On the Biometric Capacity of Generative Face Models

  • paper_url: http://arxiv.org/abs/2308.02065
  • repo_url: https://github.com/human-analysis/capacity-generative-face-models
  • paper_authors: Vishnu Naresh Boddeti, Gautam Sreekumar, Arun Ross
  • for: 这个论文的目的是为了计算生成的人脸模型中的生物 metric 的上限,以便评估和比较不同的生成人脸模型的可扩展性。
  • methods: 该论文使用了一种统计方法,通过在幂圆特征空间中计算生成的人脸图像的异常性,来估算生成人脸模型的生物 metric 的上限。
  • results: 该论文的结果显示,StyleGAN3 和 DCFace 生成的人脸图像的生物 metric 的上限分别为 $1.43\times10^6$ 和 $1.190\times10^4$,而降低 False Acceptance Rate (FAR) 时,这些生物 metric 的上限减少至 $1.796\times10^4$ 和 $562$。此外,该论文还发现,一些生成人脸模型对年龄存在显著差异,但对 gender 没有差异。
    Abstract There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: "Given a generative face model, how many unique identities can it generate?" In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and comparing different generative face models and establish an upper bound on their scalability. This paper proposes a statistical approach to estimate the biometric capacity of generated face images in a hyperspherical feature space. We employ our approach on multiple generative models, including unconditional generators like StyleGAN, Latent Diffusion Model, and "Generated Photos," as well as DCFace, a class-conditional generator. We also estimate capacity w.r.t. demographic attributes such as gender and age. Our capacity estimates indicate that (a) under ArcFace representation at a false acceptance rate (FAR) of 0.1%, StyleGAN3 and DCFace have a capacity upper bound of $1.43\times10^6$ and $1.190\times10^4$, respectively; (b) the capacity reduces drastically as we lower the desired FAR with an estimate of $1.796\times10^4$ and $562$ at FAR of 1% and 10%, respectively, for StyleGAN3; (c) there is no discernible disparity in the capacity w.r.t gender; and (d) for some generative models, there is an appreciable disparity in the capacity w.r.t age. Code is available at https://github.com/human-analysis/capacity-generative-face-models.
    摘要 “过去几年来,生成高效精确的脸部模型有了很大的进步。然而,一个重要的问题仍然没有答案:“将生成的脸部模型如何生成多少个独特的身份?”这个问题的科学基础可以帮助评估和比较不同的生成脸部模型,并且可以定义生成脸部模型的扩展性上限。本文提出了一个统计方法来估算生成脸部模型中的生物特征容量。我们使用这个方法评估多个生成模型,包括StyleGAN、Latent Diffusion Model和“Generated Photos”等,以及DCFace,一个基于类别的生成模型。我们还估算了基于人口特征如年龄和性别的容量。我们的容量估算表明:(a)在ArcFace表现下,StyleGAN3和DCFace的容量上限为1.43×10^6和1.190×10^4,分别;(b)随着欲求False Acceptance Rate(FAR)下降,StyleGAN3的容量几乎急遽减少, estimate为1.796×10^4和562,分别;(c)与性别无显著差异;(d)某些生成模型对年龄有明显差异。相关代码可以在https://github.com/human-analysis/capacity-generative-face-models 取得。”

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

  • paper_url: http://arxiv.org/abs/2308.02060
  • repo_url: None
  • paper_authors: Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh
  • for: 本研究旨在探讨高缺省略对模型训练的影响,以及如何使用标准随机搜索技术来训练缺省略网络。
  • methods: 本研究使用了标准计算机视觉和自然语言处理缺省略标准准则进行训练,并提供了一些新的方法来 Mitigate the issue of sparse training。
  • results: 研究结果显示,使用标准稠密训练策略对缺省略训练是不优化的,会导致模型受训练不良。研究人员提供了一些新的方法,以实现高缺省略下的模型训练,并在计算机视觉和自然语言处理两个领域中达到了状态的训练结果。
    Abstract Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community. Yet, much less is known about the interaction between sparsity and the standard stochastic optimization techniques used for training sparse networks, and most existing work uses standard dense schedules and hyperparameters for training sparse networks. In this work, we examine the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We begin by showing that using standard dense training recipes for sparse training is suboptimal, and results in under-training. We provide new approaches for mitigating this issue for both sparse pre-training of vision models (e.g. ResNet50/ImageNet) and sparse fine-tuning of language models (e.g. BERT/GLUE), achieving state-of-the-art results in both settings in the high-sparsity regime, and providing detailed analyses for the difficulty of sparse training in both scenarios. Our work sets a new threshold in terms of the accuracies that can be achieved under high sparsity, and should inspire further research into improving sparse model training, to reach higher accuracies under high sparsity, but also to do so efficiently.
    摘要 We first show that using standard dense training recipes for sparse training results in under-training, and we propose new approaches to mitigate this issue. Our methods achieve state-of-the-art results in both sparse pre-training of vision models (e.g. ResNet50/ImageNet) and sparse fine-tuning of language models (e.g. BERT/GLUE) in the high-sparsity regime. We also provide detailed analyses of the difficulty of sparse training in both scenarios. Our work sets a new threshold for the accuracies that can be achieved under high sparsity, and should inspire further research into improving sparse model training to reach higher accuracies under high sparsity, while also being efficient.

Incorporating Recklessness to Collaborative Filtering based Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.02058
  • repo_url: https://github.com/knodis-research-group/recklessness-regularization
  • paper_authors: Diego Pérez-López, Fernando Ortega, Ángel González-Prieto, Jorge Dueñas-Lerín
  • for: 提高 Matrix Factorization 基于的个性化推荐系统的风险评估和决策精度。
  • methods: 提出一个新的 recklessness термин,用于控制 Matrix Factorization 系统的决策时的风险水平。
  • results: 实验结果表明,recklessness 不仅可以控制风险,还可以提高推荐系统的预测量和质量。
    Abstract Recommender systems that include some reliability measure of their predictions tend to be more conservative in forecasting, due to their constraint to preserve reliability. This leads to a significant drop in the coverage and novelty that these systems can provide. In this paper, we propose the inclusion of a new term in the learning process of matrix factorization-based recommender systems, called recklessness, which enables the control of the risk level desired when making decisions about the reliability of a prediction. Experimental results demonstrate that recklessness not only allows for risk regulation but also improves the quantity and quality of predictions provided by the recommender system.
    摘要 <>translate "Recommender systems that include some reliability measure of their predictions tend to be more conservative in forecasting, due to their constraint to preserve reliability. This leads to a significant drop in the coverage and novelty that these systems can provide. In this paper, we propose the inclusion of a new term in the learning process of matrix factorization-based recommender systems, called recklessness, which enables the control of the risk level desired when making decisions about the reliability of a prediction. Experimental results demonstrate that recklessness not only allows for risk regulation but also improves the quantity and quality of predictions provided by the recommender system." into 简化字 Simplified Chinese.Here's the translation:<>推荐系统通常会因为保持可靠性的约束而变得更加保守,这会导致涵盖率和创新率的下降。在这篇论文中,我们提议在基于矩阵因子化的推荐系统学习过程中添加一个新的 термин,即“recklessness”,以控制预测可靠性的风险水平。实验结果表明,recklessness不仅允许风险规定,还可以提高推荐系统预测的质量和量。

Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

  • paper_url: http://arxiv.org/abs/2308.02055
  • repo_url: None
  • paper_authors: Prateek Verma, Shan Zhong, Xiaoyu Liu, Adithya Rajan
  • for: 提高搜索引擎中的Query Autocomplete(QAC)功能的准确率和商业指标,使用神经网络基于自然语言处理(NLP)算法以吸收季节性信号。
  • methods: 使用神经网络基于NLP算法,将季节性信号纳入QAC排名模型中,实现终端评估。
  • results: 研究表明,吸收季节性信号可以提高QAC relevance和商业指标,提供了一种新的方法来评估和优化QAC模型。
    Abstract Query autocomplete (QAC) also known as typeahead, suggests list of complete queries as user types prefix in the search box. It is one of the key features of modern search engines specially in e-commerce. One of the goals of typeahead is to suggest relevant queries to users which are seasonally important. In this paper we propose a neural network based natural language processing (NLP) algorithm to incorporate seasonality as a signal and present end to end evaluation of the QAC ranking model. Incorporating seasonality into autocomplete ranking model can improve autocomplete relevance and business metric.
    摘要 查询自动完成(QAC)也称为预先提示,在搜索框中提供完整的查询列表,根据用户输入前缀。这是现代搜索引擎的一个关键功能,尤其在电商领域。QAC的一个目标是提供相关的查询,以便用户在不同的季节中搜索相关的内容。在这篇论文中,我们提出一种基于神经网络自然语言处理(NLP)算法的方法,以吸收季节信号并对QAC排名模型进行综合评估。在吸收季节信号的情况下,QAC排名模型的相关性和业务指标都可以得到改善。

Robust Independence Tests with Finite Sample Guarantees for Synchronous Stochastic Linear Systems

  • paper_url: http://arxiv.org/abs/2308.02054
  • repo_url: None
  • paper_authors: Ambrus Tamás, Dániel Ágoston Bálint, Balázs Csanád Csáji
  • for: 这个论文是为了开发一种robust independence测试,以确保 Stochastic Linear Time-Invariant(SLTI)系统中的产生的输出是独立的。
  • methods: 这个方法使用了 confidence region estimates 和 permutation tests,以及一些总体依赖度度量,如希尔伯特-Ш密特独立性准则和距离协方差,来检测 SLTI 系统中的非线性依赖关系。
  • results: 这个研究提供了不吸收准确性水平的类型一错误概率的上下文,并证明了这种假设测试的一致性,并通过对抗系统的示例进行了示例。
    Abstract The paper introduces robust independence tests with non-asymptotically guaranteed significance levels for stochastic linear time-invariant systems, assuming that the observed outputs are synchronous, which means that the systems are driven by jointly i.i.d. noises. Our method provides bounds for the type I error probabilities that are distribution-free, i.e., the innovations can have arbitrary distributions. The algorithm combines confidence region estimates with permutation tests and general dependence measures, such as the Hilbert-Schmidt independence criterion and the distance covariance, to detect any nonlinear dependence between the observed systems. We also prove the consistency of our hypothesis tests under mild assumptions and demonstrate the ideas through the example of autoregressive systems.
    摘要 文章介绍了一种Robust Independence Test,可以确定温馈线性时间不变系统中的独立性,无需假设抽象分布。我们的方法提供了对type I error probability的分布不受限制的 bounds,可以处理任何输出的异常分布。我们的算法结合了信任区间估计和 permutation tests,以及一些普遍的依赖度量,如希尔伯特-尚德独立性标准和距离协方差,来检测系统中的非线性依赖关系。我们还证明了我们的假设测试在某些轻度的假设下是一致的。我们通过示例描述了抽象系统的应用。

A Graphical Approach to Document Layout Analysis

  • paper_url: http://arxiv.org/abs/2308.02051
  • repo_url: None
  • paper_authors: Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim Sokolov, Vadym Barda, Delphine Vendryes, Chris Tanner
  • for: 文章的目的是提出一种基于图гра� neural network的文档布局分析模型(GLAM),用于解决文档布局分析(DLA)问题。
  • methods: 文章使用了一种基于图гра�的布局分析模型(GLAM),该模型直接利用PDF文档中的metadata,将每个PDF页面转换为一个结构化的图гра�,然后将DLA问题定义为图гра�分类和 segmentation问题。
  • results: 对两个Difficult DLA datasets进行测试,GLAM模型在5个类型上比领先的140M+参数的计算机视ión基本模型表现更好,并且一种简单的 ensemble 模型可以在DocLayNet dataset上 achieve新的状态� slower than SOTA models, making GLAM a favorable engineering choice for DLA tasks。
    Abstract Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.
    摘要

SMARLA: A Safety Monitoring Approach for Deep Reinforcement Learning Agents

  • paper_url: http://arxiv.org/abs/2308.02594
  • repo_url: None
  • paper_authors: Amirhossein Zolfagharian, Manel Abdellatif, Lionel C. Briand, Ramesh S
  • for: 这篇论文旨在解决深度权威学习算法在安全关键系统中的安全问题。
  • methods: 本文提出了一种基于机器学习的安全监测方法,名为SMARLA,可以用于 Deep Reinforcement Learning 代理人。SMARLA 是一个黑盒式方法(不需要访问代理人的内部),利用状态抽象来减少状态空间,从而使得学习安全违反预测模型更加容易。
  • results: 在两个常见的RL案例研究中,我们证明了SMARLA 可以准确预测安全违反,false positive 率很低,可以在代理人执行的前半部分预测安全违反,大约在代理人执行的第二部分时。
    Abstract Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based safety monitoring approach designed for DRL agents. For practical reasons, SMARLA is designed to be black-box (as it does not require access to the internals of the agent) and leverages state abstraction to reduce the state space and thus facilitate the learning of safety violation prediction models from agent's states. We validated SMARLA on two well-known RL case studies. Empirical analysis reveals that SMARLA achieves accurate violation prediction with a low false positive rate, and can predict safety violations at an early stage, approximately halfway through the agent's execution before violations occur.
    摘要

FuNToM: Functional Modeling of RF Circuits Using a Neural Network Assisted Two-Port Analysis Method

  • paper_url: http://arxiv.org/abs/2308.02050
  • repo_url: None
  • paper_authors: Morteza Fayazi, Morteza Tavakoli Taba, Amirata Tabatabavakili, Ehsan Afshari, Ronald Dreslinski
  • for: 这个论文的目的是提出一种基于人工智能的电路模型化方法,以提高电路设计的效率和准确性。
  • methods: 这个方法利用了两个 porte 分析方法,可以模型多种电路架构,并且使用神经网络来预测电路的行为。
  • results: 相比于现有的方法,这个方法可以将训练数据量降低至 2.8x - 10.9x,并且对于实体电路的模型化需要更少的时间(176.8x - 188.6x)。
    Abstract Automatic synthesis of analog and Radio Frequency (RF) circuits is a trending approach that requires an efficient circuit modeling method. This is due to the expensive cost of running a large number of simulations at each synthesis cycle. Artificial intelligence methods are promising approaches for circuit modeling due to their speed and relative accuracy. However, existing approaches require a large amount of training data, which is still collected using simulation runs. In addition, such approaches collect a whole separate dataset for each circuit topology even if a single element is added or removed. These matters are only exacerbated by the need for post-layout modeling simulations, which take even longer. To alleviate these drawbacks, in this paper, we present FuNToM, a functional modeling method for RF circuits. FuNToM leverages the two-port analysis method for modeling multiple topologies using a single main dataset and multiple small datasets. It also leverages neural networks which have shown promising results in predicting the behavior of circuits. Our results show that for multiple RF circuits, in comparison to the state-of-the-art works, while maintaining the same accuracy, the required training data is reduced by 2.8x - 10.9x. In addition, FuNToM needs 176.8x - 188.6x less time for collecting the training set in post-layout modeling.
    摘要 自动生成分析和无线频率(RF)Circuit是一种升温的方法,需要有效的电路模型方法。这是因为在每个合成周期中运行大量的 simulations 的成本高昂。人工智能方法是可靠的approach для电路模型,因为它们的速度和相对准确性。然而,现有的方法需要大量的训练数据,这些数据仍然通过 simulation runs 收集。此外,这些方法每个电路拓扑都需要一个分立的数据集,即使只是添加或删除一个元素。这些问题被加速器了,因为需要后处理模拟,这些模拟需要更长的时间。为了缓解这些缺点,在这篇论文中,我们提出了 FuNToM,一种功能模型方法 для RF Circuit。FuNToM 利用了两个端口分析方法,用于模型多种拓扑,使用单个主数据集和多个小数据集。它还利用了人工神经网络,这些神经网络在预测电路行为方面表现出色。我们的结果表明,对多个 RF Circuit,与现状的工作相比,保持同样的准确性,需要的训练数据被减少了2.8倍 - 10.9倍。此外,FuNToM 在后处理模拟中收集训练集的时间需要176.8倍 - 188.6倍。

Deep Maxout Network-based Feature Fusion and Political Tangent Search Optimizer enabled Transfer Learning for Thalassemia Detection

  • paper_url: http://arxiv.org/abs/2308.02029
  • repo_url: None
  • paper_authors: Hemn Barzan Abdalla, Awder Ahmed, Guoquan Li, Nasser Mustafa, Abdur Rashid Sangi
  • For: The paper is written for detecting thalassemia, a heritable blood disorder, and understanding the frequency of its occurrence and reliable mutations to prevent, control, and treat the disease.* Methods: The paper proposes a Political Tangent Search Optimizer based Transfer Learning (PTSO_TL) method for thalassemia detection, which includes data normalization, feature fusion, data augmentation, and convolutional neural network (CNN) with hyperparameters from a trained model such as Xception.* Results: The PTSO_TL method obtained maximal precision, recall, and f-measure values of about 94.3%, 96.1%, and 95.2%, respectively.Here is the simplified Chinese text for the three information points:* 用途:本研究是为了检测贫血病,一种遗传的血液疾病,以及其发生频率和可靠的突变,以便预防、控制和治疗该疾病。* 方法:本研究提出了一种基于政治折衣搜索优化器的转移学习(PTSO_TL)方法,包括数据Normalization、特征融合、数据增强和卷积神经网络(CNN)的hyperparameters。* 结果:PTSO_TL方法在识别贫血病方面取得了最高的准确率、回归率和f-度值,准确率达94.3%、回归率达96.1%和f-度值达95.2%。
    Abstract Thalassemia is a heritable blood disorder which is the outcome of a genetic defect causing lack of production of hemoglobin polypeptide chains. However, there is less understanding of the precise frequency as well as sharing in these areas. Knowing about the frequency of thalassemia occurrence and dependable mutations is thus a significant step in preventing, controlling, and treatment planning. Here, Political Tangent Search Optimizer based Transfer Learning (PTSO_TL) is introduced for thalassemia detection. Initially, input data obtained from a particular dataset is normalized in the data normalization stage. Quantile normalization is utilized in the data normalization stage, and the data are then passed to the feature fusion phase, in which Weighted Euclidean Distance with Deep Maxout Network (DMN) is utilized. Thereafter, data augmentation is performed using the oversampling method to increase data dimensionality. Lastly, thalassemia detection is carried out by TL, wherein a convolutional neural network (CNN) is utilized with hyperparameters from a trained model such as Xception. TL is tuned by PTSO, and the training algorithm PTSO is presented by merging of Political Optimizer (PO) and Tangent Search Algorithm (TSA). Furthermore, PTSO_TL obtained maximal precision, recall, and f-measure values of about 94.3%, 96.1%, and 95.2%, respectively.
    摘要 θαλασσημία 是一种遗传血液疾病,它的发病原因是遗传的蛋白质链缺失。然而, precis 的发病频率以及这些地区的分布不够了解。了解θαλασσημία的发病频率和可靠的突变可以是预防、控制和治疗规划的重要一步。在这里,我们引入了政治弯曲搜索优化器基于转移学习(PTSO_TL) для θαλασσημία检测。首先,输入数据从特定数据集中获取并Normalize。在数据Normalization阶段,我们使用Quantile Normalization,然后将数据传递给特征融合阶段。在这里,我们使用Weighted Euclidean Distance with Deep Maxout Network(DMN)。接着,我们使用扩展方法进行数据增强,以增加数据维度。最后,我们使用TL进行θαλασσημία检测,其中使用了一个准确网络(CNN),并将其与已训练模型Xception的超参数进行调整。PTSO_TL的最大精度、回归率和f-度分别达到了约94.3%、96.1%和95.2%。

Federated Representation Learning for Automatic Speech Recognition

  • paper_url: http://arxiv.org/abs/2308.02013
  • repo_url: None
  • paper_authors: Guruprasad V Ramesh, Gopinath Chennupati, Milind Rao, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo
  • for: 这个研究是用来探讨 Federated Learning (FL) 技术如何保护数据隐私性的,同时学习 robust audio representation。
  • methods: 这篇论文使用了 Self-supervised Learning (SSL) 和 FL 技术,使用 Libri-Light 数据集中的无标签音频数据,通过 simulate non-IID speaker-siloed data distributions 来预训练 LSTM 编码器。
  • results: 研究表明,使用 FL 预训练模型可以达到与中央预训练模型相同的性能水平,并且在新语言 French 中进行适应性提高了20% (WER)。
    Abstract Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training.
    摘要 federated learning (FL) 是一种隐私保护的 paradigm,allowing edge devices 学习合作而无需共享数据。 edge devices 如 Alexa 和 Siri 是可能的无标签音频数据的来源,可以用来学习Robust audio representations。在这项工作中,我们将Self-supervised Learning (SSL) 和 FL 结合以学习保持数据隐私的 Automatic Speech Recognition 表示。我们使用 Libri-Light 无标签speech 数据集中的speaker和chapter信息来模拟非ID的speaker-siloed 数据分布,并在 FedSGD 框架中预训练 LSTM Encoder。我们发现预训练 ASR Encoder 在 FL 中表现与中央预训练模型相当,并且生成了无预训练比较的 12-15% (WER) 的改进。我们进一步适应了联邦预训练模型到一种新语言,法语,并显示了无预训练比较的 20% (WER) 的改进。

Memory capacity of two layer neural networks with smooth activations

  • paper_url: http://arxiv.org/abs/2308.02001
  • repo_url: None
  • paper_authors: Liam Madden, Christos Thrampoulidis
  • for: 该论文探讨了两层神经网络的内存容量问题,即神经网络可以储存的最大通用数据大小。
  • methods: 作者使用了非多项函数 activation 函数(如 sigmoid 和 smoothed ReLU),并通过计算 Jacobian 的核心矩阵的排名来分析内存容量。
  • results: 作者发现了一个下界 bound,asserting that md/2 是两层神经网络的内存容量下界,并且这个下界可以达到一个因子约为 2 的优化。这些结果比前一些研究更加广泛,并且可以推广到更深的模型和其他架构。
    Abstract Determining the memory capacity of two-layer neural networks with m hidden neurons and input dimension d (i.e., md+m total trainable parameters), which refers to the largest size of general data the network can memorize, is a fundamental machine-learning question. For non-polynomial real analytic activation functions, such as sigmoids and smoothed rectified linear units (smoothed ReLUs), we establish a lower bound of md/2 and optimality up to a factor of approximately 2. Analogous prior results were limited to Heaviside and ReLU activations, with results for smooth activations suffering from logarithmic factors and requiring random data. To analyze the memory capacity, we examine the rank of the network's Jacobian by computing the rank of matrices involving both Hadamard powers and the Khati-Rao product. Our computation extends classical linear algebraic facts about the rank of Hadamard powers. Overall, our approach differs from previous works on memory capacity and holds promise for extending to deeper models and other architectures.
    摘要 To analyze the memory capacity, we examine the rank of the network's Jacobian by computing the rank of matrices involving both Hadamard powers and the Khati-Rao product. Our approach differs from previous works on memory capacity and holds promise for extending to deeper models and other architectures.Here is the translation in Simplified Chinese:确定两层神经网络的内存容量(即md+m总可训练参数)是机器学习的基本问题。对于非多项实数 activation functions,如sigmoid和smoothed ReLU,我们设下md/2的下界和优化因子约为2。这与之前的结果有所不同,它们仅适用于Heaviside和ReLU激活函数,并且受到了对数因子的影响,需要随机数据。为了分析内存容量,我们研究了神经网络的雅可比矩阵的排名,通过计算包括 Hadamard powers 和 Khati-Rao 乘积的矩阵的排名。我们的方法与之前关于内存容量的工作不同,并且可能扩展到更深的模型和其他架构。

On the Transition from Neural Representation to Symbolic Knowledge

  • paper_url: http://arxiv.org/abs/2308.02000
  • repo_url: None
  • paper_authors: Junyan Cheng, Peter Chin
  • for: 本研究旨在 bridge 神经网络和符号表示之间的巨大差距,使神经网络能够吸收符号思维的元素。
  • methods: 我们提出了一种启用EM算法学习数据的转换表示框架,以压缩输入数据中的高维信息,并自然地发现数据中隐藏的逻辑结构。
  • results: 我们在3个抽象compositional visual objects dataset上进行了广泛的实验,并证明了学习的表示可以准确地分解视觉输入,并且在下游任务中具有顺滑的适应性。
    Abstract Bridging the huge disparity between neural and symbolic representation can potentially enable the incorporation of symbolic thinking into neural networks from essence. Motivated by how human gradually builds complex symbolic representation from the prototype symbols that are learned through perception and environmental interactions. We propose a Neural-Symbolic Transitional Dictionary Learning (TDL) framework that employs an EM algorithm to learn a transitional representation of data that compresses high-dimension information of visual parts of an input into a set of tensors as neural variables and discover the implicit predicate structure in a self-supervised way. We implement the framework with a diffusion model by regarding the decomposition of input as a cooperative game, then learn predicates by prototype clustering. We additionally use RL enabled by the Markovian of diffusion models to further tune the learned prototypes by incorporating subjective factors. Extensive experiments on 3 abstract compositional visual objects datasets that require the model to segment parts without any visual features like texture, color, or shadows apart from shape and 3 neural/symbolic downstream tasks demonstrate the learned representation enables interpretable decomposition of visual input and smooth adaption to downstream tasks which are not available by existing methods.
    摘要 bridging the huge disparity between neural and symbolic representation can potentially enable the incorporation of symbolic thinking into neural networks from essence. motivated by how human gradually builds complex symbolic representation from the prototype symbols that are learned through perception and environmental interactions. we propose a neural-symbolic transitional dictionary learning (tdl) framework that employs an EM algorithm to learn a transitional representation of data that compresses high-dimension information of visual parts of an input into a set of tensors as neural variables and discover the implicit predicate structure in a self-supervised way. we implement the framework with a diffusion model by regarding the decomposition of input as a cooperative game, then learn predicates by prototype clustering. we additionally use rl enabled by the markovian of diffusion models to further tune the learned prototypes by incorporating subjective factors. extensive experiments on 3 abstract compositional visual objects datasets that require the model to segment parts without any visual features like texture, color, or shadows apart from shape and 3 neural/symbolic downstream tasks demonstrate the learned representation enables interpretable decomposition of visual input and smooth adaption to downstream tasks which are not available by existing methods.Here's a word-for-word translation of the text into Simplified Chinese:bridging the huge disparity between neural and symbolic representation can potentially enable the incorporation of symbolic thinking into neural networks from essence. motivated by how human gradually builds complex symbolic representation from the prototype symbols that are learned through perception and environmental interactions. we propose a neural-symbolic transitional dictionary learning (tdl) framework that employs an EM algorithm to learn a transitional representation of data that compresses high-dimension information of visual parts of an input into a set of tensors as neural variables and discover the implicit predicate structure in a self-supervised way. we implement the framework with a diffusion model by regarding the decomposition of input as a cooperative game, then learn predicates by prototype clustering. we additionally use rl enabled by the markovian of diffusion models to further tune the learned prototypes by incorporating subjective factors. extensive experiments on 3 abstract compositional visual objects datasets that require the model to segment parts without any visual features like texture, color, or shadows apart from shape and 3 neural/symbolic downstream tasks demonstrate the learned representation enables interpretable decomposition of visual input and smooth adaption to downstream tasks which are not available by existing methods.

Explainable unsupervised multi-modal image registration using deep networks

  • paper_url: http://arxiv.org/abs/2308.01994
  • repo_url: None
  • paper_authors: Chengjia Wang, Giorgos Papanastasiou
  • for: 这 paper 的目的是提出一种基于深度学习的多Modalities MRI 图像 регистраción方法,以便在临床应用中提高图像识别率。
  • methods: 这 paper 使用了一种基于深度学习的图像处理管道,包括 intra-和inter-modality MRI 图像 регистраción,以及Grad-CAM 基于的解释性框架。
  • results: 研究人员通过在不同的模式和时间点进行图像 registration,并在不同的组织和Modalities之间进行图像对alignment,以达到superior的表现。此外,他们还通过Grad-CAM 基于的解释性框架,可以准确地解释模型与数据之间的关系。
    Abstract Clinical decision making from magnetic resonance imaging (MRI) combines complementary information from multiple MRI sequences (defined as 'modalities'). MRI image registration aims to geometrically 'pair' diagnoses from different modalities, time points and slices. Both intra- and inter-modality MRI registration are essential components in clinical MRI settings. Further, an MRI image processing pipeline that can address both afine and non-rigid registration is critical, as both types of deformations may be occuring in real MRI data scenarios. Unlike image classification, explainability is not commonly addressed in image registration deep learning (DL) methods, as it is challenging to interpet model-data behaviours against transformation fields. To properly address this, we incorporate Grad-CAM-based explainability frameworks in each major component of our unsupervised multi-modal and multi-organ image registration DL methodology. We previously demonstrated that we were able to reach superior performance (against the current standard Syn method). In this work, we show that our DL model becomes fully explainable, setting the framework to generalise our approach on further medical imaging data.
    摘要 临床决策从磁共振成像(MRI)结合多种MRI序列(定义为“modalities”)的信息。MRI图像对接目标是将不同modalities、时间点和 slice中的诊断进行几何对应。在临床MRI setting中, both intra-和inter-modalities MRI对接是关键组件。此外,一个能够处理both afine和非RIGID对接的MRI图像处理管道是重要的,因为这两种类型的变形都可能出现在实际MRI数据场景中。与图像分类不同,在图像对接深度学习(DL)方法中,解释性不常被考虑,因为很难从变换场景中解释模型与数据之间的行为。为了正确地解决这个问题,我们在每个主要组件中都包含了基于Grad-CAM的解释框架。在我们的无监督多modal和多器官图像对接DL方法中,我们之前已经达到了与当前标准Syn方法相比的更高性能。在这个工作中,我们展示了我们的DL模型已经变得可解释,设置了框架来普遍化我们的方法在更多的医疗影像数据上。

CartiMorph: a framework for automated knee articular cartilage morphometrics

  • paper_url: http://arxiv.org/abs/2308.01981
  • repo_url: https://github.com/yongchengyao/cartimorph
  • paper_authors: Yongcheng Yao, Junru Zhong, Liping Zhang, Sheheryar Khan, Weitian Chen
  • for: 这个论文是为了提出一种自动计算膝关节软骨变形的框架,以便促进膝关节疾病的成像生物标志物的发现。
  • methods: 这个论文使用了深度学习模型来实现层次图像特征表示,并进行了图像分割、模板建立和图像到模板匹配等步骤。
  • results: 这个论文通过对患者的膝关节照片进行自动分割、模板建立和匹配,实现了对软骨的量化测量,包括全厚度软骨损伤率(FCL)、软骨厚度、表面积和体积的测量。并且对软骨厚度图进行了比较,发现在薄和 périphériques 地区的误差较小。
    Abstract We introduce CartiMorph, a framework for automated knee articular cartilage morphometrics. It takes an image as input and generates quantitative metrics for cartilage subregions, including the percentage of full-thickness cartilage loss (FCL), mean thickness, surface area, and volume. CartiMorph leverages the power of deep learning models for hierarchical image feature representation. Deep learning models were trained and validated for tissue segmentation, template construction, and template-to-image registration. We established methods for surface-normal-based cartilage thickness mapping, FCL estimation, and rule-based cartilage parcellation. Our cartilage thickness map showed less error in thin and peripheral regions. We evaluated the effectiveness of the adopted segmentation model by comparing the quantitative metrics obtained from model segmentation and those from manual segmentation. The root-mean-squared deviation of the FCL measurements was less than 8%, and strong correlations were observed for the mean thickness (Pearson's correlation coefficient $\rho \in [0.82,0.97]$), surface area ($\rho \in [0.82,0.98]$) and volume ($\rho \in [0.89,0.98]$) measurements. We compared our FCL measurements with those from a previous study and found that our measurements deviated less from the ground truths. We observed superior performance of the proposed rule-based cartilage parcellation method compared with the atlas-based approach. CartiMorph has the potential to promote imaging biomarkers discovery for knee osteoarthritis.
    摘要 我们介绍CartiMorph,一个框架 для自动计算膝关节软骨cartilage的形态特征。它可以从图像中提取量化特征,包括软骨损伤率(FCL)、软骨厚度、表面积和体积。CartiMorph利用深度学习模型来表示图像特征。我们训练了和验证了深度学习模型,用于识别、模板生成和模板与图像对齐。我们开发了基于表面法则的软骨厚度映射、FCL估计和软骨分割方法。我们评估了采用的分割模型的效果,并发现其与人工分割的量化特征具有强相关性(Pearson correlation coefficient $\rho \in [0.82,0.97]$)。我们对我们的FCL测量与之前的研究中的参照值进行比较,发现我们的测量与参照值之间存在较小的差异。我们发现我们的软骨分割方法比使用 Atlases-based方法表现出更高的性能。CartiMorph具有推动膝关节风湿病影像生物标志的潜力。

Unmasking Parkinson’s Disease with Smile: An AI-enabled Screening Framework

  • paper_url: http://arxiv.org/abs/2308.02588
  • repo_url: None
  • paper_authors: Tariq Adnan, Md Saiful Islam, Wasifur Rahman, Sangwu Lee, Sutapa Dey Tithi, Kazi Noshin, Imran Sarker, M Saifur Rahman, Ehsan Hoque
  • for: 这项研究旨在开发一种基于微表情的parkinson病诊断方法,以提高诊断的可靠性和效率。
  • methods: 研究人员通过利用人脸特征和动作单元,从多个数据源中收集了3871个视频,包括256名自reported pd患者。然后,他们使用一个 ensemble 模型来提取有关 hypomimia 的特征,并实现了89.7%的准确率和89.3%的接收操作特征曲线(AUROC)。
  • results: 研究人员发现,基于笑脸视频的特征alone可以实现相当的性能,即使在两个外部测试集上,这些模型在训练过程中从来没有看到的数据上也能够达到类似的性能,这表明pd风险评估可能可以通过笑脸自拍视频进行。
    Abstract Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing participants' homes across multiple countries, a clinic, and a PD care facility in the US. Leveraging facial landmarks and action units, we extracted features relevant to Hypomimia, a prominent symptom of PD characterized by reduced facial expressions. An ensemble of AI models trained on these features achieved an accuracy of 89.7% and an Area Under the Receiver Operating Characteristic (AUROC) of 89.3% while being free from detectable bias across population subgroups based on sex and ethnicity on held-out data. Further analysis reveals that features from the smiling videos alone lead to comparable performance, even on two external test sets the model has never seen during training, suggesting the potential for PD risk assessment from smiling selfie videos.
    摘要

Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces

  • paper_url: http://arxiv.org/abs/2308.01976
  • repo_url: None
  • paper_authors: Dayananda Ubrangala, Juhi Sharma, Ravi Prasad Kondapalli, Kiran R, Amit Agarwala, Laurent Boué
  • for: correction of typographical errors in online marketplaces
  • methods: data augmentation, training of recurrent neural network for context-limited domain-specific embeddings
  • results: real-time inferencing API for finding the closest match between misspelled user queries and available product names, with high accuracy using controlled and high-quality synthetic data.
    Abstract Typographical errors are a major source of frustration for visitors of online marketplaces. Because of the domain-specific nature of these marketplaces and the very short queries users tend to search for, traditional spell cheking solutions do not perform well in correcting typos. We present a data augmentation method to address the lack of annotated typo data and train a recurrent neural network to learn context-limited domain-specific embeddings. Those embeddings are deployed in a real-time inferencing API for the Microsoft AppSource marketplace to find the closest match between a misspelled user query and the available product names. Our data efficient solution shows that controlled high quality synthetic data may be a powerful tool especially considering the current climate of large language models which rely on prohibitively huge and often uncontrolled datasets.
    摘要 typographical errors are a major source of frustration for visitors of online marketplaces. Because of the domain-specific nature of these marketplaces and the very short queries users tend to search for, traditional spell cheking solutions do not perform well in correcting typos. We present a data augmentation method to address the lack of annotated typo data and train a recurrent neural network to learn context-limited domain-specific embeddings. Those embeddings are deployed in a real-time inferencing API for the Microsoft AppSource marketplace to find the closest match between a misspelled user query and the available product names. Our data efficient solution shows that controlled high quality synthetic data may be a powerful tool, especially considering the current climate of large language models which rely on prohibitively huge and often uncontrolled datasets.Note: "Microsoft AppSource" has been translated as "微软应用源" (wēi ròng yìng yuè yuán) in the text.

Synthesising Rare Cataract Surgery Samples with Guided Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.02587
  • repo_url: https://github.com/meclabtuda/catasynth
  • paper_authors: Yannik Frisch, Moritz Fuchs, Antoine Sanner, Felix Anton Ucar, Marius Frenzel, Joana Wasielica-Poslednik, Adrian Gericke, Felix Mathias Wagner, Thomas Dratsch, Anirban Mukhopadhyay
  • for: This paper aims to address the challenges of gathering and annotating data for training automated assistance systems for cataract surgery, by analyzing cataract surgery video data and utilizing a conditional generative model to synthesize diverse, high-quality examples of surgical phases and tool use.
  • methods: The authors use a conditional generative model based on Denoising Diffusion Implicit Models (DDIM) and Classifier-Free Guidance (CFG) to synthesize realistic examples of surgical phases and tool use, addressing the imbalances and data sparsity issues in the publicly available data.
  • results: The authors demonstrate that their approach can generate valuable unseen examples, allowing the tool classifier to improve by up to 10% for rare cases, and provide a reliable source of realistic synthetic data for the development of automated assistance systems for cataract surgery.
    Abstract Cataract surgery is a frequently performed procedure that demands automation and advanced assistance systems. However, gathering and annotating data for training such systems is resource intensive. The publicly available data also comprises severe imbalances inherent to the surgical process. Motivated by this, we analyse cataract surgery video data for the worst-performing phases of a pre-trained downstream tool classifier. The analysis demonstrates that imbalances deteriorate the classifier's performance on underrepresented cases. To address this challenge, we utilise a conditional generative model based on Denoising Diffusion Implicit Models (DDIM) and Classifier-Free Guidance (CFG). Our model can synthesise diverse, high-quality examples based on complex multi-class multi-label conditions, such as surgical phases and combinations of surgical tools. We affirm that the synthesised samples display tools that the classifier recognises. These samples are hard to differentiate from real images, even for clinical experts with more than five years of experience. Further, our synthetically extended data can improve the data sparsity problem for the downstream task of tool classification. The evaluations demonstrate that the model can generate valuable unseen examples, allowing the tool classifier to improve by up to 10% for rare cases. Overall, our approach can facilitate the development of automated assistance systems for cataract surgery by providing a reliable source of realistic synthetic data, which we make available for everyone.
    摘要 喉肢手术是一种非常常见的手术程序,需要自动化和高级帮助系统。然而,收集和标注数据用于训练这些系统是资源占用的。公共可用的数据也包含了手术过程中的严重偏好,这会导致分类器的性能下降。为了解决这个挑战,我们分析了喉肢手术视频数据,并评估了下游工具分类器的最差表现阶段。分析结果表明,偏好会使分类器对于不充分表现的情况下的性能下降。为了解决这个问题,我们使用基于减震扩散隐藏模型(DDIM)和无分类导航(CFG)的冲击型生成模型。我们的模型可以生成多样化、高质量的示例,基于复杂的多类多标签条件,如手术阶段和手术工具的组合。我们证明了生成的示例中的工具可以被分类器识别。这些示例与真实图像很难区分,即使是临床专家超过5年的经验。此外,我们通过生成的数据扩展,解决了下游任务的数据缺乏问题,从而使分类器的性能提高。评估结果表明,我们的模型可以生成价值很高的未看过的示例,使分类器在罕见情况下提高到10%。总的来说,我们的方法可以促进喉肢手术自动化的发展,提供一个可靠的实际synthetic数据源,我们将其公开给大家。

Aligning Agent Policy with Externalities: Reward Design via Bilevel RL

  • paper_url: http://arxiv.org/abs/2308.02585
  • repo_url: None
  • paper_authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Furong Huang, Mengdi Wang
  • for: 本文旨在解决在强化学习(RL)中Policy优化过程中忽略重要的外部影响因素,如状态空间覆盖和安全性。同时,它可能导致不жела的共识行为和不一致政策。
  • methods: 本文提出了一种碧级优化问题,并将其连接到主体-代理人模型,其中主体定义了系统的更广泛目标和限制,而代理人解决Markov决策过程(MDP)。
  • results: 本文提出了主体驱动策略对准(PPA-BRL),它可以有效地将代理人的策略与主体的目标相吻合。 authors还证明了PPA-BRL的收敛性,并采用了多个示例来说明该框架的优势。
    Abstract In reinforcement learning (RL), a reward function is often assumed at the outset of a policy optimization procedure. Learning in such a fixed reward paradigm in RL can neglect important policy optimization considerations, such as state space coverage and safety. Moreover, it can fail to encompass broader impacts in terms of social welfare, sustainability, or market stability, potentially leading to undesirable emergent behavior and potentially misaligned policy. To mathematically encapsulate the problem of aligning RL policy optimization with such externalities, we consider a bilevel optimization problem and connect it to a principal-agent framework, where the principal specifies the broader goals and constraints of the system at the upper level and the agent solves a Markov Decision Process (MDP) at the lower level. The upper-level deals with learning a suitable reward parametrization corresponding to the broader goals and the lower-level deals with learning the policy for the agent. We propose Principal driven Policy Alignment via Bilevel RL (PPA-BRL), which efficiently aligns the policy of the agent with the principal's goals. We explicitly analyzed the dependence of the principal's trajectory on the lower-level policy, prove the convergence of PPA-BRL to the stationary point of the problem. We illuminate the merits of this framework in view of alignment with several examples spanning energy-efficient manipulation tasks, social welfare-based tax design, and cost-effective robotic navigation.
    摘要 在增强学习(RL)中,常常假设一个奖金函数在政策优化过程的开始。这种固定奖金的假设可能忽略了重要的政策优化考虑因素,如状态空间覆盖率和安全性。此外,它可能不包括更广泛的影响,如社会福利、可持续发展和市场稳定性,可能导致不жела的 Emergent 行为和不一致的政策。为了数学地表达RL政策优化和这些外部因素之间的关系,我们考虑了一个双层优化问题,并将其连接到一个主体-代理人模型。在主体级别上,我们学习一个适合主体的奖金参数化,而在代理人级别上,我们学习一个Markov决策过程(MDP)。主体级别处理主体的更广泛目标和约束,而代理人级别处理代理人的政策。我们提出了由主体驱动的政策对齐方法(PPA-BRL),可以有效地将代理人的政策与主体的目标相对应。我们明确分析了主体的轨迹对下一级政策的依赖关系,并证明PPA-BRL的确 converge 到问题的站点点。我们在各种示例中,如能源减少的操作任务、基于社会福利的税制设计和cost-effective的机器人导航等方面,ILLUMINATE 了该框架的优势。

Reasoning in Large Language Models Through Symbolic Math Word Problems

  • paper_url: http://arxiv.org/abs/2308.01906
  • repo_url: None
  • paper_authors: Vedant Gaur, Nikunj Saunshi
  • for: 这篇论文旨在研究大语言模型(LLM)在数学问题中的推理能力。
  • methods: 研究者使用了一个符号版本的 SVAMP 数据集,并发现 GPT-3 模型在符号问题上也有良好的零学习精度。
  • results: 研究者发现,使用自我提示法可以使 LLM 提供一个准确和可证明的推理,并且自我提示还可以提高符号准确率,从而实现一种拓展效果。
    Abstract Large language models (LLMs) have revolutionized NLP by solving downstream tasks with little to no labeled data. Despite their versatile abilities, the larger question of their ability to reason remains ill-understood. This paper addresses reasoning in math word problems (MWPs) by studying symbolic versions of the numeric problems, since a symbolic expression is a "concise explanation" of the numeric answer. We create and use a symbolic version of the SVAMP dataset and find that GPT-3's davinci-002 model also has good zero-shot accuracy on symbolic MWPs. To evaluate the faithfulness of the model's reasoning, we go beyond accuracy and additionally evaluate the alignment between the final answer and the outputted reasoning, which correspond to numeric and symbolic answers respectively for MWPs. We explore a self-prompting approach to encourage the symbolic reasoning to align with the numeric answer, thus equipping the LLM with the ability to provide a concise and verifiable reasoning and making it more interpretable. Surprisingly, self-prompting also improves the symbolic accuracy to be higher than both the numeric and symbolic accuracies, thus providing an ensembling effect. The SVAMP_Sym dataset will be released for future research on symbolic math problems.
    摘要

Revisiting Deformable Convolution for Depth Completion

  • paper_url: http://arxiv.org/abs/2308.01905
  • repo_url: None
  • paper_authors: Xinglong Sun, Jean Ponce, Yu-Xiong Wang
  • for: 提高深度地图的质量,使用单pass涨幅 convolution 对于 dense depth map 的生成。
  • methods: 使用 deformable kernel convolution 作为归一化推理,并在不同的核心大小和弹性率下进行系统调查。
  • results: 在大规模的 KITTI dataset 上测试,模型实现了状态机器人级别的性能,并且在准确率和推理速度两个方面均达到了新高度。
    Abstract Depth completion, which aims to generate high-quality dense depth maps from sparse depth maps, has attracted increasing attention in recent years. Previous work usually employs RGB images as guidance, and introduces iterative spatial propagation to refine estimated coarse depth maps. However, most of the propagation refinement methods require several iterations and suffer from a fixed receptive field, which may contain irrelevant and useless information with very sparse input. In this paper, we address these two challenges simultaneously by revisiting the idea of deformable convolution. We propose an effective architecture that leverages deformable kernel convolution as a single-pass refinement module, and empirically demonstrate its superiority. To better understand the function of deformable convolution and exploit it for depth completion, we further systematically investigate a variety of representative strategies. Our study reveals that, different from prior work, deformable convolution needs to be applied on an estimated depth map with a relatively high density for better performance. We evaluate our model on the large-scale KITTI dataset and achieve state-of-the-art level performance in both accuracy and inference speed. Our code is available at https://github.com/AlexSunNik/ReDC.
    摘要 “深度补充”,即从粗略深度图生成高质量的稠密深度图,在过去几年内吸引了越来越多的关注。先前的工作通常使用RGB图像作为引导,并通过迭代的空间填充来精细化估算的粗略深度图。然而,大多数的填充精细方法需要几个迭代,并且受到固定的见识场的限制,可能包含无关和无用的信息,特别是在非常罕见的输入中。在这篇论文中,我们解决了这两个挑战,并同时提出了一种有效的架构。我们提议使用可变核函数卷积,作为单 passes 精细化模块,并经验证了其优越性。为了更好地理解可变核函数的作用,并在深度补充中利用它,我们进一步系统地研究了一些代表性的策略。我们的研究发现,与之前的工作不同,可变核函数需要在估算的深度图上进行高密度应用,以获得更好的性能。我们在大规模的KITTI dataset上评估了我们的模型,并 achieved state-of-the-art 水平的准确率和执行速度。我们的代码可以在https://github.com/AlexSunNik/ReDC中找到。

How many preprints have actually been printed and why: a case study of computer science preprints on arXiv

  • paper_url: http://arxiv.org/abs/2308.01899
  • repo_url: None
  • paper_authors: Jialiang Lin, Yao Yu, Yu Zhou, Zhiyang Zhou, Xiaodong Shi
  • for: 本研究使用 case study 方法探讨了自2008年至2017年的计算机科学预印在 arXiv 上的发布情况,以衡量这些预印 eventually 被正式出版的可能性。
  • methods: 本研究使用了 semantics-based mapping method,使用 Bidirectional Encoder Representations from Transformers (BERT) 来匹配预印和最终发布的 manuscript。
  • results: 研究发现,66% 的预印被发布 unter changed titles,11% 被发布 unter different titles 和其他修改。 further analysis 表明,已出版的预印具有充分的修订、多个作者、详细的摘要和引言、广泛的参考文献和可用的源代码等特征。
    Abstract Preprints play an increasingly critical role in academic communities. There are many reasons driving researchers to post their manuscripts to preprint servers before formal submission to journals or conferences, but the use of preprints has also sparked considerable controversy, especially surrounding the claim of priority. In this paper, a case study of computer science preprints submitted to arXiv from 2008 to 2017 is conducted to quantify how many preprints have eventually been printed in peer-reviewed venues. Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv. In the case of these manuscripts, the traditional fuzzy matching method is incapable of mapping the preprint to the final published version. In view of this issue, we introduce a semantics-based mapping method with the employment of Bidirectional Encoder Representations from Transformers (BERT). With this new mapping method and a plurality of data sources, we find that 66% of all sampled preprints are published under unchanged titles and 11% are published under different titles and with other modifications. A further analysis was then performed to investigate why these preprints but not others were accepted for publication. Our comparison reveals that in the field of computer science, published preprints feature adequate revisions, multiple authorship, detailed abstract and introduction, extensive and authoritative references and available source code.
    摘要 《Preprints在学术社区中发挥越来越重要的作用。有很多原因使研究者们将 manuscrips 上载到 preprint 服务器之前,而不是正式提交到期刊或会议,但使用 preprints 也引发了一些争议,特别是在优先权方面。本文通过对 computer science 领域自2008年至2017年的 arXiv 上的 preprints 进行 caso study,以计算这些投稿 eventually 被 peer-reviewed 出版的数量。 Among 发表的投稿中,一些发表于不同的标题和未更新 arXiv 上的投稿。在这些投稿中,传统的杂合匹配方法无法将投稿映射到最终发表的版本。为解决这个问题,我们引入 semantics-based 映射方法,使用 Bidirectional Encoder Representations from Transformers (BERT)。With 这种新的映射方法和多种数据源,我们发现:66% 的投稿被发表于不变的标题下,11% 的投稿被发表于不同的标题和其他修改。进一步的分析表明,在 computer science 领域中,发表的 preprints 具有充分的修改、多个作者、详细的摘要和引言、详细的参考文献和可用的源代码。》

Improving Replay Sample Selection and Storage for Less Forgetting in Continual Learning

  • paper_url: http://arxiv.org/abs/2308.01895
  • repo_url: None
  • paper_authors: Daniel Brignac, Niels Lobo, Abhijit Mahalanobis
  • for: 这篇研究旨在解决深度学习中的不断学习问题,尤其是当学习多个任务时,避免Catastrophic Forgetting这个问题。
  • methods: 这篇研究使用了储存部分经验的方法,并与多种替代方案进行比较,以找出最佳的储存数量和最佳的储存样本。
  • results: 这篇研究获得了一些有用的结果,包括提出了一种新的储存数量选择方法,并进行了详细的分析,以帮助找到最佳的储存数量和储存样本。
    Abstract Continual learning seeks to enable deep learners to train on a series of tasks of unknown length without suffering from the catastrophic forgetting of previous tasks. One effective solution is replay, which involves storing few previous experiences in memory and replaying them when learning the current task. However, there is still room for improvement when it comes to selecting the most informative samples for storage and determining the optimal number of samples to be stored. This study aims to address these issues with a novel comparison of the commonly used reservoir sampling to various alternative population strategies and providing a novel detailed analysis of how to find the optimal number of stored samples.
    摘要 Simplified Chinese:深度学习探索可以让深度学习者在不知道任务数量的情况下接受多个任务,而不会出现过去任务的恶化。一种有效的解决方案是重温,即将一些过去经验存储在内存中,并在学习当前任务时重温。然而,还有很多可以提高的空间,包括选择存储的最有用样本和确定存储样本的优化数量。这个研究旨在通过对常用的储存抽样与其他人口策略进行比较,以及提供一种细化的分析,以找到最佳存储样本数量。

Exact identification of nonlinear dynamical systems by Trimmed Lasso

  • paper_url: http://arxiv.org/abs/2308.01891
  • repo_url: None
  • paper_authors: Shawn L. Kiser, Mikhail Guskov, Marc Rébillat, Nicolas Ranc
  • for: 本研究旨在提出一种可以对finite和噪声数据进行非线性动力系统的 indentification方法,并且可以提供精确的模型预测结果。
  • methods: 本研究使用了SINDy算法,并提出了一种基于trimmed Lasso的方法,可以在finite和噪声数据下提供精确的模型预测结果,并且可以处理多列性问题。
  • results: 研究表明,trimmed Lasso方法可以在finite和噪声数据下提供精确的模型预测结果,并且可以处理多列性问题,而SINDy和reweighted $\ell_1$ minimization方法则有些问题。
    Abstract Identification of nonlinear dynamical systems has been popularized by sparse identification of the nonlinear dynamics (SINDy) via the sequentially thresholded least squares (STLS) algorithm. Many extensions SINDy have emerged in the literature to deal with experimental data which are finite in length and noisy. Recently, the computationally intensive method of ensembling bootstrapped SINDy models (E-SINDy) was proposed for model identification, handling finite, highly noisy data. While the extensions of SINDy are numerous, their sparsity-promoting estimators occasionally provide sparse approximations of the dynamics as opposed to exact recovery. Furthermore, these estimators suffer under multicollinearity, e.g. the irrepresentable condition for the Lasso. In this paper, we demonstrate that the Trimmed Lasso for robust identification of models (TRIM) can provide exact recovery under more severe noise, finite data, and multicollinearity as opposed to E-SINDy. Additionally, the computational cost of TRIM is asymptotically equal to STLS since the sparsity parameter of the TRIM can be solved efficiently by convex solvers. We compare these methodologies on challenging nonlinear systems, specifically the Lorenz 63 system, the Bouc Wen oscillator from the nonlinear dynamics benchmark of No\"el and Schoukens, 2016, and a time delay system describing tool cutting dynamics. This study emphasizes the comparisons between STLS, reweighted $\ell_1$ minimization, and Trimmed Lasso in identification with respect to problems faced by practitioners: the problem of finite and noisy data, the performance of the sparse regression of when the library grows in dimension (multicollinearity), and automatic methods for choice of regularization parameters.
    摘要 非线性动力系统的标识已经得到了广泛的普及,通过稀疏标识非线性动力学(SINDy)viaSequentially Thresholded Least Squares(STLS)算法。在文献中,许多SINDy的扩展出现了,以处理实验数据的限制和噪声。最近,对Bootstrapped SINDy模型的ensemble(E-SINDy)计算昂贵方法被提出,用于模型标识,面对限制、高噪声数据。然而,SINDy扩展的 sparse 估计器 occasional 提供稀疏approximation of the dynamics 而不是精确的回归。此外,这些估计器会在多icollinearity 下表现不佳,例如lasso 的不可 Representable condition。在这篇文章中,我们展示了Trimmed Lasso 可以在更严重的噪声、限制和多icollinearity下提供精确的回归,而不是E-SINDy。此外,TRIM 的计算成本与 STLS 相同,可以由 convex 算法有效地解决约束参数。我们在非线性系统中进行了对抗样本,包括 Lorenz 63 系统、Bouc Wen 振荡器和时延系统,以及2016年非线性动力学 benchmark 中的 No\"el 和 Schoukens 的测试集。本研究强调了 STLS、重量化 $\ell_1$ 最小化和 Trimmed Lasso 在实际应用中遇到的问题的比较:噪声和限制的数据 finite 问题、约束参数的自动选择问题以及库存在多个参数时的多icollinearity 问题。

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

  • paper_url: http://arxiv.org/abs/2308.01890
  • repo_url: None
  • paper_authors: Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko
  • for: 多Label图像识别任务中,适用于低标签场景的研究,具有很大的挑战性和实际 significanc。
  • methods: 我们利用了 millions of auxiliary image-text pairs 预训练的强大对应关系,并提出了一种高效的框架,即 Evidence-guided Dual Context Optimization (DualCoOp++),用于解决 partial-label 和 zero-shot multi-label recognition 问题。
  • results: 我们在标准的多Label图像识别benchmark上进行了实验,并证明了我们的方法在低标签场景下的表现superiority,比state-of-the-art方法更高。
    Abstract Multi-label image recognition in the low-label regime is a task of great challenge and practical significance. Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations. In this research, we leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs. We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++), which serves as a unified approach for addressing partial-label and zero-shot multi-label recognition. In DualCoOp++ we separately encode evidential, positive, and negative contexts for target classes as parametric components of the linguistic input (i.e., prompts). The evidential context aims to discover all the related visual content for the target class, and serves as guidance to aggregate positive and negative contexts from the spatial domain of the image, enabling better distinguishment between similar categories. Additionally, we introduce a Winner-Take-All module that promotes inter-class interaction during training, while avoiding the need for extra parameters and costs. As DualCoOp++ imposes minimal additional learnable overhead on the pretrained vision-language framework, it enables rapid adaptation to multi-label recognition tasks with limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the superior performance of our approach compared to state-of-the-art methods.
    摘要 多标签图像识别在低标签场景下是一项具有挑战性和实际意义的任务。先前的工作强调学习图像和文本空间之间的对应关系,以做到因为有限的多标签注释而减少精度。在本研究中,我们利用已经预训练的图像和文本特征之间的强大对应关系,提出一种高效的框架called Evidence-guided Dual Context Optimization (DualCoOp++)。DualCoOp++是一种统一的方法,用于解决 partial-label 和 zero-shot 多标签识别问题。在 DualCoOp++ 中,我们将目标类的 evidential、正例和负例上下文分别编码为文本输入(即提示)的 parametric 组件。evidential 上下文的目的是找到target类相关的所有视觉内容,并作为指导将正例和负例上下文从图像空间的空间域聚合,以提高类别之间的区分。此外,我们还引入了一个 Winner-Take-All 模块,通过在训练时间提高 между类交互,以避免添加额外参数和成本。由于 DualCoOp++ 对已经预训练的视觉语言框架做出最小的额外学习压力,因此它可以快速适应多标签识别任务,即使是有限的注释和未知类。在标准多标签识别benchmark上,我们的方法与状态之前的方法相比,显示出更高的性能。

Cream Skimming the Underground: Identifying Relevant Information Points from Online Forums

  • paper_url: http://arxiv.org/abs/2308.02581
  • repo_url: None
  • paper_authors: Felipe Moreno-Vera, Mateus Nogueira, Cainã Figueiredo, Daniel Sadoc Menasché, Miguel Bicudo, Ashton Woiwood, Enrico Lovat, Anton Kocheturov, Leandro Pfleger de Aguiar
  • for: 本研究提出了一种基于机器学习的方法,用于在野外探测漏洞利用。
  • methods: 该方法利用了多个下地黑客论坛的数据,并基于supervised机器学习模型进行筛选和标注线程和帖子的内容。
  • results: 研究发现,可以通过使用random forest算法,实现对线程和帖子的自动分类,并且准确率、精度和准确率都高于0.99。此外,研究还提供了针对武器化和利用之间的差异分析,以及黑客社区的其他方面的分析。
    Abstract This paper proposes a machine learning-based approach for detecting the exploitation of vulnerabilities in the wild by monitoring underground hacking forums. The increasing volume of posts discussing exploitation in the wild calls for an automatic approach to process threads and posts that will eventually trigger alarms depending on their content. To illustrate the proposed system, we use the CrimeBB dataset, which contains data scraped from multiple underground forums, and develop a supervised machine learning model that can filter threads citing CVEs and label them as Proof-of-Concept, Weaponization, or Exploitation. Leveraging random forests, we indicate that accuracy, precision and recall above 0.99 are attainable for the classification task. Additionally, we provide insights into the difference in nature between weaponization and exploitation, e.g., interpreting the output of a decision tree, and analyze the profits and other aspects related to the hacking communities. Overall, our work sheds insight into the exploitation of vulnerabilities in the wild and can be used to provide additional ground truth to models such as EPSS and Expected Exploitability.
    摘要

Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory

  • paper_url: http://arxiv.org/abs/2308.01853
  • repo_url: https://github.com/patrickrchao/dist_shift_exp
  • paper_authors: Patrick Chao, Edgar Dobriban
  • for: 本文研究了 Wasserstein 分布变换的影响,即每个数据点可能受到轻微变化的情况。
  • methods: 本文提出了 JOINT 分布变换,即每个观测值可能受到协调的变化。
  • results: 研究发现,在平均估计和线性回归中,采用样本均值和最小二乘估计器是最优的。但是,在其他问题上,提供了近似估计器和精确的finite-sample bound。此外,本文还介绍了一些用于下界分布变换的工具,如缓和技术和经典工具的推广。
    Abstract Distribution shifts are a serious concern in modern statistical learning as they can systematically change the properties of the data away from the truth. We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation, as opposed to the Huber contamination model where a fraction of observations are outliers. We formulate and study shifts beyond independent perturbations, exploring Joint Distribution Shifts, where the per-observation perturbations can be coordinated. We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation. Under a squared loss for mean estimation and prediction error in linear regression, we find the exact minimax risk, a least favorable perturbation, and show that the sample mean and least squares estimators are respectively optimal. This holds for both independent and joint shifts, but the least favorable perturbations and minimax risks differ. For other problems, we provide nearly optimal estimators and precise finite-sample bounds. We also introduce several tools for bounding the minimax risk under distribution shift, such as a smoothing technique for location families, and generalizations of classical tools including least favorable sequences of priors, the modulus of continuity, Le Cam's, Fano's, and Assouad's methods.
    摘要 “分布shift是现代统计学中的一项重要问题,因为它可能会系统性地改变数据的性质,从而导致统计分析的结果不准确。我们主要关注 Wasserstein 分布shift,其中每个数据点都可能发生轻微的扰动,而不是 Hubert 杂入模型,其中一部分观察值是异常值。我们将分布shift beyond independent perturbations 探讨,包括联合分布shift,其中每个观察值的扰动可以协调。我们分析了一些重要的统计问题,包括位置估计、线性回归和非参数密度估计。使用平方损失函数 для均值估计和线性回归预测误差,我们找到了最佳的最小值风险,最不利的扰动和证明样本均值和最小二乘估计器是最佳的。这些结果适用于独立分布shift和联合分布shift,但最佳扰动和最小值风险不同。对于其他问题,我们提供了近似最佳估计器和精确的 finite-sample 上限。我们还介绍了一些用于下界最小值风险的工具,包括分布 families 的平滑技术和经典工具的普遍化,如最不利序列的假设、模度Continuity、Le Cam 和 Fano 的方法。”

Curricular Transfer Learning for Sentence Encoded Tasks

  • paper_url: http://arxiv.org/abs/2308.01849
  • repo_url: None
  • paper_authors: Jader Martins Camboim de Sá, Matheus Ferraroni Sanches, Rafael Roque de Souza, Júlio Cesar dos Reis, Leandro Aparecido Villas
  • for: 提高 conversational AI task 的性能,对于 distribution 的变化进行适应。
  • methods: 提出了一种逐步适应(curriculum)方法,通过 “data hacking” 和 grammar 分析进行指导。
  • results: 在 MultiWoZ 任务上实现了显著的改进,舒展比其他已知预训练方法。
    Abstract Fine-tuning language models in a downstream task is the standard approach for many state-of-the-art methodologies in the field of NLP. However, when the distribution between the source task and target task drifts, \textit{e.g.}, conversational environments, these gains tend to be diminished. This article proposes a sequence of pre-training steps (a curriculum) guided by "data hacking" and grammar analysis that allows further gradual adaptation between pre-training distributions. In our experiments, we acquire a considerable improvement from our method compared to other known pre-training approaches for the MultiWoZ task.
    摘要 文本翻译为简化中文:在自然语言处理领域的许多状态体验中,训练语言模型在下游任务中进行微调是标准的方法。然而,当源任务和目标任务的分布发生变化,例如对话环境,这些提高往往减少。这篇文章提议一系列的预训练步骤(课程),受到“数据黑客”和语法分析的引导,以进一步适应预训练分布的变化。在我们的实验中,我们获得了与其他已知预训练方法相比的显著改进,用于多语言对话任务。Note: "数据黑客" (data hacker) is a term used in China to refer to someone who is skilled at finding and exploiting vulnerabilities in data or systems. In the context of this article, it seems to refer to the idea of using data from a variety of sources to "hack" or adapt the pre-training process to better suit the target task.

Probabilistic Deep Supervision Network: A Noise-Resilient Approach for QoS Prediction

  • paper_url: http://arxiv.org/abs/2308.02580
  • repo_url: https://github.com/hotfrom/pds-net
  • paper_authors: Ziliang Wang, Xiaohong Zhang, Sheng Huang, Wei Zhang, Dan Yang, Meng Yan
  • for: 提高推荐系统中质量服务预测的准确性,增加用户满意度。
  • methods: 提出了一种基于概率深度监督网络(PDS-Net)的新框架,通过在概率空间中进行权重学习,将知道的特征和真实标签进行拟合,并通过条件基于多任务损失函数来识别含有噪声数据的对象,从而更好地预测质量服务。
  • results: 对两个实际的质量服务数据集进行了实验评估,并证明了我们的方法的有效性,比对 estado-of-the-art 基elines 高效。
    Abstract Quality of Service (QoS) prediction is an essential task in recommendation systems, where accurately predicting unknown QoS values can improve user satisfaction. However, existing QoS prediction techniques may perform poorly in the presence of noise data, such as fake location information or virtual gateways. In this paper, we propose the Probabilistic Deep Supervision Network (PDS-Net), a novel framework for QoS prediction that addresses this issue. PDS-Net utilizes a Gaussian-based probabilistic space to supervise intermediate layers and learns probability spaces for both known features and true labels. Moreover, PDS-Net employs a condition-based multitasking loss function to identify objects with noise data and applies supervision directly to deep features sampled from the probability space by optimizing the Kullback-Leibler distance between the probability space of these objects and the real-label probability space. Thus, PDS-Net effectively reduces errors resulting from the propagation of corrupted data, leading to more accurate QoS predictions. Experimental evaluations on two real-world QoS datasets demonstrate that the proposed PDS-Net outperforms state-of-the-art baselines, validating the effectiveness of our approach.
    摘要 quality of service(QoS)预测是推荐系统中的一项重要任务,可以提高用户满意度。然而,现有的QoS预测技术可能在噪声数据存在时表现不佳。在这篇论文中,我们提出了可靠的深度监督网络(PDS-Net)框架,用于QoS预测。PDS-Net使用 Gaussian 基于的 probabilistic 空间来监督中间层,并学习 probabilities 空间 для知道特征和真实标签。此外,PDS-Net 使用 condition-based multitasking 损失函数来标识含噪数据对象,并直接将深度特征从probability空间中抽取到损失函数中进行超vision。因此,PDS-Net 可以有效地减少噪声数据的传播错误,从而提高 QoS 预测的准确性。实验评估在两个真实 QoS 数据集上表明,提出的 PDS-Net 超过了状态的基eline,证明了我们的方法的有效性。

URET: Universal Robustness Evaluation Toolkit (for Evasion)

  • paper_url: http://arxiv.org/abs/2308.01840
  • repo_url: https://github.com/ibm/uret
  • paper_authors: Kevin Eykholt, Taesung Lee, Douglas Schales, Jiyong Jang, Ian Molloy, Masha Zorin
  • for: 本研究旨在提供一个可以生成各种输入类型和任务领域的攻击入力框架。
  • methods: 我们提出了一个新的框架,可以根据输入和一组预定的输入变数发现一系列的变数,以生成具有 semantic 和功能约束的攻击入力。
  • results: 我们在多种不同的机器学习任务和各种输入表现上验证了我们的方法,并证明了生成攻击例子的重要性,以便实现安全和可靠的 AI 系统。
    Abstract Machine learning models are known to be vulnerable to adversarial evasion attacks as illustrated by image classification models. Thoroughly understanding such attacks is critical in order to ensure the safety and robustness of critical AI tasks. However, most evasion attacks are difficult to deploy against a majority of AI systems because they have focused on image domain with only few constraints. An image is composed of homogeneous, numerical, continuous, and independent features, unlike many other input types to AI systems used in practice. Furthermore, some input types include additional semantic and functional constraints that must be observed to generate realistic adversarial inputs. In this work, we propose a new framework to enable the generation of adversarial inputs irrespective of the input type and task domain. Given an input and a set of pre-defined input transformations, our framework discovers a sequence of transformations that result in a semantically correct and functional adversarial input. We demonstrate the generality of our approach on several diverse machine learning tasks with various input representations. We also show the importance of generating adversarial examples as they enable the deployment of mitigation techniques.
    摘要 To address these challenges, we propose a new framework for generating adversarial inputs that can be applied to any input type and task domain. Given an input and a set of predefined input transformations, our framework discovers a sequence of transformations that result in a semantically correct and functional adversarial input. We demonstrate the versatility of our approach on several diverse machine learning tasks with various input representations. We also show the importance of generating adversarial examples, as they enable the deployment of mitigation techniques.Here is the translation in Simplified Chinese:机器学习模型知道会受到攻击,图像分类模型的攻击示例。理解这些攻击非常重要,以确保AI任务的安全性和可靠性。然而,大多数攻击都难以应用于大多数AI系统,因为它们都是图像领域的,具有有限的约束。图像由数字、连续、独立的特征组成,与其他AI系统的输入类型不同。此外,一些输入类型还有semantic和functional约束,需要在生成攻击输入时考虑。为解决这些挑战,我们提出了一个新的框架,可以应用于任何输入类型和任务领域。给定一个输入和一组预定的输入变换,我们的框架可以找到一个符号正确、功能正确的攻击输入序列。我们在多种多样的机器学习任务上展示了这种方法的通用性,并显示了生成攻击示例的重要性,以便应用 Mitigation 技术。