cs.AI - 2023-07-03

MWPRanker: An Expression Similarity Based Math Word Problem Retriever

  • paper_url: http://arxiv.org/abs/2307.01240
  • repo_url: None
  • paper_authors: Mayank Goel, Venktesh V, Vikram Goyal
  • For: The paper aims to help test the mathematical reasoning capabilities of learners in online assessments by retrieving similar math word problems (MWPs) with the same problem model.* Methods: The authors propose a hybrid approach that combines natural language processing (NLP) and machine learning techniques to retrieve similar MWPs.* Results: The authors demonstrate that their tool is effective in retrieving similar MWPs and outperforms semantic similarity-based approaches, which fail to capture the arithmetic and logical sequence of the MWPs.Here is the same information in Simplified Chinese text:
  • for: 该论文目的是帮助在在线考试中测试学习者的数学逻辑能力,通过检索类似的数学问题(MWPs)。
  • methods: 作者们提出了一种混合的方法,将自然语言处理(NLP)和机器学习技术结合起来,以检索类似的MWPs。
  • results: 作者们证明了他们的工具能够有效地检索类似的MWPs,并超过 semantic similarity-based approaches,这些方法无法捕捉MWPs中的数学和逻辑序列。
    Abstract Math Word Problems (MWPs) in online assessments help test the ability of the learner to make critical inferences by interpreting the linguistic information in them. To test the mathematical reasoning capabilities of the learners, sometimes the problem is rephrased or the thematic setting of the original MWP is changed. Since manual identification of MWPs with similar problem models is cumbersome, we propose a tool in this work for MWP retrieval. We propose a hybrid approach to retrieve similar MWPs with the same problem model. In our work, the problem model refers to the sequence of operations to be performed to arrive at the solution. We demonstrate that our tool is useful for the mentioned tasks and better than semantic similarity-based approaches, which fail to capture the arithmetic and logical sequence of the MWPs. A demo of the tool can be found at https://www.youtube.com/watch?v=gSQWP3chFIs
    摘要 mathew word problems (MWPs) 在在线评估中帮助测试学习者能否作出重要的推理推论, interpret linguistic information中的信息。为测试学习者的数学推理能力,有时MWPs中的问题会被重新词表示或者改变主题设定。由于手动标识类似MWPs的问题模型是繁琐的,我们在这种工作中提出了一种工具。我们提议的方法是将问题模型看作解决问题所需的操作序列。我们示示了我们的工具对于以下任务都很有用,并且比 semantic similarity 基于的方法更好,因为它们无法捕捉MWPs中的数学和逻辑序列。您可以在 https://www.youtube.com/watch?v=gSQWP3chFIs 上找到我们的示例。

Automated identification and quantification of myocardial inflammatory infiltration in digital histological images to diagnose myocarditis

  • paper_url: http://arxiv.org/abs/2307.01098
  • repo_url: None
  • paper_authors: Yanyun Liu, Xiumeng Hua, Shouping Zhu, Congrui Wang, Xiao Chen, Yu Shi, Jiangping Song, Weihua Zhou
  • for: 这项研究的目的是开发一种新的计算生物学方法,用于自动识别和量化抗体染料抹布图像中的心脏免疫性浸泡,以提供心脏病变的量化 histological 诊断。
  • methods: 这项研究使用了深度学习(DL)基于的计算生物学方法,以识别抗体染料抹布图像中的核lei和抗体浸泡,并计算心脏免疫性浸泡的粒度(LND)。
  • results: 研究发现,使用这种新方法可以准确地识别和量化心脏免疫性浸泡,并提供了一个可靠的诊断标准。在五个横向分割试验中,该方法的准确率、敏感度和特异性分别为0.899±0.035、0.971±0.017和0.728±0.073,而在内测集上的准确率、敏感度和特异性分别为0.887、0.971和0.737。 externally tested set 的准确率、敏感度和特异性分别为0.853、0.846和0.858。
    Abstract This study aims to develop a new computational pathology approach that automates the identification and quantification of myocardial inflammatory infiltration in digital HE-stained images to provide a quantitative histological diagnosis of myocarditis.898 HE-stained whole slide images (WSIs) of myocardium from 154 heart transplant patients diagnosed with myocarditis or dilated cardiomyopathy (DCM) were included in this study. An automated DL-based computational pathology approach was developed to identify nuclei and detect myocardial inflammatory infiltration, enabling the quantification of the lymphocyte nuclear density (LND) on myocardial WSIs. A cutoff value based on the quantification of LND was proposed to determine if the myocardial inflammatory infiltration was present. The performance of our approach was evaluated with a five-fold cross-validation experiment, tested with an internal test set from the myocarditis group, and confirmed by an external test from a double-blind trial group. An LND of 1.02/mm2 could distinguish WSIs with myocarditis from those without. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) in the five-fold cross-validation experiment were 0.899 plus or minus 0.035, 0.971 plus or minus 0.017, 0.728 plus or minus 0.073 and 0.849 plus or minus 0.044, respectively. For the internal test set, the accuracy, sensitivity, specificity, and AUC were 0.887, 0.971, 0.737, and 0.854, respectively. The accuracy, sensitivity, specificity, and AUC for the external test set reached 0.853, 0.846, 0.858, and 0.852, respectively. Our new approach provides accurate and reliable quantification of the LND of myocardial WSIs, facilitating automated quantitative diagnosis of myocarditis with HE-stained images.
    摘要 898 HE-stained whole slide images (WSIs) of myocardium from 154 heart transplant patients diagnosed with myocarditis or dilated cardiomyopathy (DCM) were included in this study. An automated DL-based computational pathology approach was developed to identify nuclei and detect myocardial inflammatory infiltration, enabling the quantification of the lymphocyte nuclear density (LND) on myocardial WSIs. A cutoff value based on the quantification of LND was proposed to determine if the myocardial inflammatory infiltration was present.The performance of our approach was evaluated with a five-fold cross-validation experiment, tested with an internal test set from the myocarditis group, and confirmed by an external test from a double-blind trial group. An LND of 1.02/mm2 could distinguish WSIs with myocarditis from those without.The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) in the five-fold cross-validation experiment were 0.899 ± 0.035, 0.971 ± 0.017, 0.728 ± 0.073, and 0.849 ± 0.044, respectively. For the internal test set, the accuracy, sensitivity, specificity, and AUC were 0.887, 0.971, 0.737, and 0.854, respectively. The accuracy, sensitivity, specificity, and AUC for the external test set reached 0.853, 0.846, 0.858, and 0.852, respectively.Our new approach provides accurate and reliable quantification of the LND of myocardial WSIs, facilitating automated quantitative diagnosis of myocarditis with HE-stained images.

Some challenges of calibrating differentiable agent-based models

  • paper_url: http://arxiv.org/abs/2307.01085
  • repo_url: None
  • paper_authors: Arnau Quera-Bofarull, Joel Dyer, Anisoara Calinescu, Michael Wooldridge
  • for: 模拟复杂系统的方法
  • methods: 使用梯度可微的ABM,并对其进行参数推理和优化
  • results: 发现了一些挑战,以及可能的解决方案
    Abstract Agent-based models (ABMs) are a promising approach to modelling and reasoning about complex systems, yet their application in practice is impeded by their complexity, discrete nature, and the difficulty of performing parameter inference and optimisation tasks. This in turn has sparked interest in the construction of differentiable ABMs as a strategy for combatting these difficulties, yet a number of challenges remain. In this paper, we discuss and present experiments that highlight some of these challenges, along with potential solutions.
    摘要

The ROAD to discovery: machine learning-driven anomaly detection in radio astronomy spectrograms

  • paper_url: http://arxiv.org/abs/2307.01054
  • repo_url: https://github.com/mesarcik/road
  • paper_authors: Michael Mesarcik, Albert-Jan Boonstra, Marco Iacobelli, Elena Ranguelova, Cees de Laat, Rob van Nieuwpoort
  • for: 本研究旨在提供一种自适应机器学习异常检测框架,以检测LOFAR望远镜中的异常现象,包括通常出现的异常和未经见过的罕见异常。
  • methods: 本研究使用了一种新的自我监睹学习(SSL)方法,该方法利用了上下文预测和重建损失来学习LOFAR望远镜的正常行为。
  • results: 研究结果表明,ROAD框架可以实时处理LOFAR数据处理管道中的数据,需要 <1ms 处理一个spectrogram,并且具有异常检测 F-2 分数0.92,false positive rate约2%,以及每类分类 F-2 分数0.89,超过其他相关研究。
    Abstract As radio telescopes increase in sensitivity and flexibility, so do their complexity and data-rates. For this reason automated system health management approaches are becoming increasingly critical to ensure nominal telescope operations. We propose a new machine learning anomaly detection framework for classifying both commonly occurring anomalies in radio telescopes as well as detecting unknown rare anomalies that the system has potentially not yet seen. To evaluate our method, we present a dataset consisting of 7050 autocorrelation-based spectrograms from the Low Frequency Array (LOFAR) telescope and assign 10 different labels relating to the system-wide anomalies from the perspective of telescope operators. This includes electronic failures, miscalibration, solar storms, network and compute hardware errors among many more. We demonstrate how a novel Self Supervised Learning (SSL) paradigm, that utilises both context prediction and reconstruction losses, is effective in learning normal behaviour of the LOFAR telescope. We present the Radio Observatory Anomaly Detector (ROAD), a framework that combines both SSL-based anomaly detection and a supervised classification, thereby enabling both classification of both commonly occurring anomalies and detection of unseen anomalies. We demonstrate that our system is real-time in the context of the LOFAR data processing pipeline, requiring <1ms to process a single spectrogram. Furthermore, ROAD obtains an anomaly detection F-2 score of 0.92 while maintaining a false positive rate of ~2\%, as well as a mean per-class classification F-2 score 0.89, outperforming other related works.
    摘要 Radio telescopes 的敏感度和灵活性在不断提高,但是同时也会增加复杂性和数据流量。为了保证望远镜的正常运行,自动化系统健康管理方法已成为非常重要。我们提出了一种基于机器学习的异常检测框架,可以检测望远镜中常见的异常情况以及未经见过的未知异常。为评估我们的方法,我们提供了一个包含7050个自相关спектрограм的LOFAR望远镜数据集,并将望远镜系统异常情况分为10个不同的标签,包括电子故障、误准、太阳风暴、网络和计算机硬件错误等。我们展示了一种新的自我超vision学习(SSL)方法,通过 Context prediction和重建损失来学习望远镜的正常行为。我们称之为Radio Observatory Anomaly Detector(ROAD),它将SSL-based异常检测和超级vised分类相结合,以实现对望远镜系统的异常检测和分类。我们证明了ROAD在LOFAR数据处理管道中的实时性,需要<1ms处理一个spectrogram,并且ROAD在异常检测F-2分数0.92,False Positive率~2%,以及每个类别的平均异常检测F-2分数0.89,超过其他相关的工作。

ENGAGE: Explanation Guided Data Augmentation for Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2307.01053
  • repo_url: https://github.com/sycny/engage
  • paper_authors: Yucheng Shi, Kaixiong Zhou, Ninghao Liu
  • for: 本研究旨在提高对图数据的表示学习效果,通过指导对异常推理的增强数据变换。
  • methods: 本方法使用了解释导向的对异常推理,并设计了两种数据变换方案,一是对结构信息的干扰,二是对特征信息的干扰。
  • results: 实验表明,ENGAGE可以在多种模型架构和真实图数据上实现高效的表示学习,并且可以适应不同的图数据。
    Abstract The recent contrastive learning methods, due to their effectiveness in representation learning, have been widely applied to modeling graph data. Random perturbation is widely used to build contrastive views for graph data, which however, could accidentally break graph structures and lead to suboptimal performance. In addition, graph data is usually highly abstract, so it is hard to extract intuitive meanings and design more informed augmentation schemes. Effective representations should preserve key characteristics in data and abandon superfluous information. In this paper, we propose ENGAGE (ExplaNation Guided data AuGmEntation), where explanation guides the contrastive augmentation process to preserve the key parts in graphs and explore removing superfluous information. Specifically, we design an efficient unsupervised explanation method called smoothed activation map as the indicator of node importance in representation learning. Then, we design two data augmentation schemes on graphs for perturbing structural and feature information, respectively. We also provide justification for the proposed method in the framework of information theories. Experiments of both graph-level and node-level tasks, on various model architectures and on different real-world graphs, are conducted to demonstrate the effectiveness and flexibility of ENGAGE. The code of ENGAGE can be found: https://github.com/sycny/ENGAGE.
    摘要 Recent contrastive learning methods have been widely applied to modeling graph data due to their effectiveness in representation learning. However, random perturbation, which is commonly used to build contrastive views for graph data, can accidentally break graph structures and lead to suboptimal performance. Moreover, graph data is often highly abstract, making it difficult to extract intuitive meanings and design more informed augmentation schemes. Effective representations should preserve key characteristics in the data and discard superfluous information.In this paper, we propose ENGAGE (ExplaNation Guided data AuGmEntation), which utilizes explanation to guide the contrastive augmentation process and preserve the key parts in graphs. Specifically, we design an efficient unsupervised explanation method called smoothed activation map as the indicator of node importance in representation learning. Additionally, we propose two data augmentation schemes on graphs for perturbing structural and feature information, respectively. We also provide justification for the proposed method in the framework of information theories.Experiments on both graph-level and node-level tasks, conducted on various model architectures and on different real-world graphs, demonstrate the effectiveness and flexibility of ENGAGE. The code of ENGAGE can be found at: .

Temporal Graph Benchmark for Machine Learning on Temporal Graphs

  • paper_url: http://arxiv.org/abs/2307.01026
  • repo_url: https://github.com/shenyanghuang/tgb
  • paper_authors: Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele Rossi, Jure Leskovec, Michael Bronstein, Guillaume Rabusseau, Reihaneh Rabbany
  • for: The paper is written for evaluating the performance of machine learning models on temporal graphs.
  • methods: The paper uses a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs.
  • results: The paper finds that the performance of common models can vary drastically across datasets, and simple methods often achieve superior performance compared to existing temporal graph models.Here is the same information in Simplified Chinese text:
  • for: 本文是用来评估机器学习模型在时间图上的性能的。
  • methods: 本文使用了一个多样化和真实的时间图 benchmark 集合,用于实现机器学习模型的真实、可重复和鲁棒评估。
  • results: 本文发现,常见模型在不同的 dataset 上的性能可以差异很大,而简单的方法经常在现有的时间图模型上表现更优异。
    Abstract We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/ .
    摘要 我们介绍Temporal Graph Benchmark(TGB),一个包含具有具有真实、可重现和可靠性评估的机器学习模型的大规模、多样化和长时间 duration 的数据集集合。TGB 数据集覆盖了社交、贸易、交易和交通网络等多种领域,并包括节点和边级别预测任务。为了进行真实的评估,我们设计了基于实际用例的评估协议。我们对每个数据集进行了广泛的 benchmarking,发现了不同数据集下模型的性能可以有很大差异。此外,在动态节点属性预测任务上,我们发现了简单的方法经常超越了现有的时间图模型。我们认为这些发现开创了对时间图的未来研究的机会。此外,TGB 还提供了一个自动化的机器学习管道,包括数据加载、实验设置和性能评估。TGB 将在定期基础上维护和更新,欢迎社区反馈。TGB 数据集、数据加载器、示例代码、评估设置和排名是公共可用的,可以在 访问。

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation

  • paper_url: http://arxiv.org/abs/2307.00997
  • repo_url: https://github.com/lancasterli/refsam
  • paper_authors: Yonglin Li, Jing Zhang, Xiao Teng, Long Lan
  • for: 这篇论文旨在探讨如何使用 Segment Anything Model (SAM) 进行视频对象分割 (RVOS),并利用不同Modalities的多视图信息和不同时间框的successive frames来提高性能。
  • methods: 该论文提出了一种名为 RefSAM 的新模型,它是基于 SAM 模型,通过采用轻量级的 Cross-Modal MLP 将文本表达的 embedding 转换为稀疏和密集的 embedding,以便用于用户交互提示。然后,对语言和视觉特征进行效果匹配和融合。
  • results: 经过了广泛的缺省研究和实验, authors 表明了 RefSAM 模型的实用和有效性,并在 Ref-Youtu-VOS 和 Ref-DAVIS17 数据集上达到了最高性能。
    Abstract The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which for the first time explores the potential of SAM for RVOS by incorporating multi-view information from diverse modalities and successive frames at different timestamps. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-Modal MLP that projects the text embedding of the referring expression into sparse and dense embeddings, serving as user-interactive prompts. Subsequently, a parameter-efficient tuning strategy is employed to effectively align and fuse the language and vision features. Through comprehensive ablation studies, we demonstrate the practical and effective design choices of our strategy. Extensive experiments conducted on Ref-Youtu-VOS and Ref-DAVIS17 datasets validate the superiority and effectiveness of our RefSAM model over existing methods. The code and models will be made publicly at \href{https://github.com/LancasterLi/RefSAM}{github.com/LancasterLi/RefSAM}.
    摘要 《Segment Anything Model(SAM)》已经吸引了广泛关注,因为它在图像分割方面表现出了非常出色的表现。然而,它在视频对象分割(RVOS)方面缺乏能力,因为需要精准的用户交互提示和不同的modalities,如语言和视觉,的有限了理解。这篇论文提出了RefSAM模型,这是第一次将SAM模型应用于RVOS领域,通过在不同modalities和时间戳的多视图信息上进行协同学习。我们的提议的方法是对原始SAM模型进行改进,以增强对不同modalities的学习,通过使用轻量级的 Cross-Modal MLP 将文本表达的embedding进行映射,以用于用户交互提示。然后,我们采用了效果性的参数调整策略,以有效地对语言和视觉特征进行对齐和融合。通过了详细的ablation研究,我们证明了我们的方法的实用和有效性。广泛的实验在Ref-Youtu-VOS和Ref-DAVIS17 datasets上验证了我们的RefSAM模型的超越性和有效性。代码和模型将在 \href{https://github.com/LancasterLi/RefSAM}{github.com/LancasterLi/RefSAM} 上公开。

REAL: A Representative Error-Driven Approach for Active Learning

  • paper_url: http://arxiv.org/abs/2307.00968
  • repo_url: https://github.com/withchencheng/ecml_pkdd_23_real
  • paper_authors: Cheng Chen, Yong Wang, Lizi Liao, Yueguo Chen, Xiaoyong Du
  • for: 本研究目的是提出一种基于活动学习的数据选择方法,以提高模型训练的精度和效率。
  • methods: 本方法基于uncertainty和多样性的度量来评估无标Pool中的实例信息丰富性,并通过强制采样这些实例来验证模型。
  • results: 对五种文本分类任务进行了广泛的实验,结果表明,与最佳基准相比,本方法在各种 гиперпараметры设置下都能够准确地预测模型性能和F1-macro分数。此外,分析还表明,本方法可以准确地选择最有代表性的 pseudo errors,这些 pseudo errors 与真实错误的分布相符。I hope this helps! Let me know if you have any other questions.
    Abstract Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose $REAL$, a novel approach to select data instances with $\underline{R}$epresentative $\underline{E}$rrors for $\underline{A}$ctive $\underline{L}$earning. It identifies minority predictions as \emph{pseudo errors} within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that $REAL$ consistently outperforms all best-performing baselines regarding accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also shows that $REAL$ selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary. Our code is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.
    摘要 $REAL$ identifies minority predictions as "pseudo errors" within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets show that $REAL$ consistently outperforms all best-performing baselines in terms of accuracy and F1-macro scores across a wide range of hyperparameter settings. Our analysis also reveals that $REAL$ selects the most representative pseudo errors that match the distribution of ground-truth errors along the decision boundary.The code for $REAL$ is publicly available at https://github.com/withchencheng/ECML_PKDD_23_Real.

OpenClinicalAI: An Open and Dynamic Model for Alzheimer’s Disease Diagnosis

  • paper_url: http://arxiv.org/abs/2307.00965
  • repo_url: None
  • paper_authors: Yunyou Huang, Xiaoshuang Liang, Xiangjiang Lu, Xiuxia Miao, Jiyue Xie, Wenjing Liu, Fan Zhang, Guoxin Kang, Li Ma, Suqin Tang, Zhifei Zhang, Jianfeng Zhan
  • for: 这个研究旨在提出一个可以在真实临床设定下运行的旁遮普适的认知痴生病诊断系统,以减少诊断和治疗成本。
  • methods: 这个研究使用了开放式且不确定的临床设定下的认知痴生病诊断模型,结合了相互coupled的深度多动作学习(DMARL)和多中心méta学习(MCML),以形成诊断策略和提供诊断结果。
  • results: 实验结果显示,这个方法可以在临床设定下提供更好的性能和 fewer 的诊断测试,并且可以与现有的医疗系统整合,以提高现有的医疗服务质量。
    Abstract Although Alzheimer's disease (AD) cannot be reversed or cured, timely diagnosis can significantly reduce the burden of treatment and care. Current research on AD diagnosis models usually regards the diagnosis task as a typical classification task with two primary assumptions: 1) All target categories are known a priori; 2) The diagnostic strategy for each patient is consistent, that is, the number and type of model input data for each patient are the same. However, real-world clinical settings are open, with complexity and uncertainty in terms of both subjects and the resources of the medical institutions. This means that diagnostic models may encounter unseen disease categories and need to dynamically develop diagnostic strategies based on the subject's specific circumstances and available medical resources. Thus, the AD diagnosis task is tangled and coupled with the diagnosis strategy formulation. To promote the application of diagnostic systems in real-world clinical settings, we propose OpenClinicalAI for direct AD diagnosis in complex and uncertain clinical settings. This is the first powerful end-to-end model to dynamically formulate diagnostic strategies and provide diagnostic results based on the subject's conditions and available medical resources. OpenClinicalAI combines reciprocally coupled deep multiaction reinforcement learning (DMARL) for diagnostic strategy formulation and multicenter meta-learning (MCML) for open-set recognition. The experimental results show that OpenClinicalAI achieves better performance and fewer clinical examinations than the state-of-the-art model. Our method provides an opportunity to embed the AD diagnostic system into the current health care system to cooperate with clinicians to improve current health care.
    摘要 although 阿尔茨heimer's disease (AD) cannot be reversed or cured, 在时间上的诊断可以有效减轻治疗和照料的负担。现有的AD诊断模型通常将诊断任务视为一个典型的分类任务,有两个基本假设:1) 所有目标类别都是先知的; 2) 每个患者的诊断策略都是一致的,即每个患者的模型输入数据的数量和类型都是一样的。然而,现实世界的医疗设施是开放的,有 Complexity和不确定性,这意味着诊断模型可能会遇到未知的疾病类别,需要在患者的特定情况和可用的医疗资源基础上动态发展诊断策略。因此,AD诊断任务和诊断策略的形成是相互关联的。为了推动AD诊断系统在现实世界医疗设施中的应用,我们提出了OpenClinicalAI,这是一个直接用于AD诊断的Complex and uncertain clinical settings中的强大终端模型。OpenClinicalAI通过reciprocally coupled deep multiaction reinforcement learning (DMARL) 和 multicenter meta-learning (MCML) 结合,可以动态形成诊断策略,并根据患者的情况和可用的医疗资源提供诊断结果。实验结果表明,OpenClinicalAI在比较 estado-of-the-art 模型的情况下表现出更好的性能,并需要 fewer clinical examinations。我们的方法提供了一个机会,以便将AD诊断系统integrated into the current healthcare system,与临床医生合作,提高现有的医疗服务。

A Dual Stealthy Backdoor: From Both Spatial and Frequency Perspectives

  • paper_url: http://arxiv.org/abs/2307.10184
  • repo_url: None
  • paper_authors: Yudong Gao, Honglong Chen, Peng Sun, Junjian Li, Anqing Zhang, Zhibo Wang
  • for: 这个论文旨在提出一种可靠、隐蔽的后门攻击方法,以便在深度神经网络(DNN)中植入后门。
  • methods: 该方法使用了Discrete Wavelet Transform和Fourier Transform等技术,在频域和空间域同时考虑后门Trigger的隐蔽性,以提高攻击性能和隐蔽性。此外,该方法还采用了一种新的攻击策略,通过训练模型使用弱Trigger并在攻击时使用强Trigger来进一步提高攻击性能和隐蔽性。
  • results: 在四个数据集上,DUBA方法比 estado-of-the-art 后门攻击方法显著提高了攻击成功率和隐蔽性。
    Abstract Backdoor attacks pose serious security threats to deep neural networks (DNNs). Backdoored models make arbitrarily (targeted) incorrect predictions on inputs embedded with well-designed triggers while behaving normally on clean inputs. Many works have explored the invisibility of backdoor triggers to improve attack stealthiness. However, most of them only consider the invisibility in the spatial domain without explicitly accounting for the generation of invisible triggers in the frequency domain, making the generated poisoned images be easily detected by recent defense methods. To address this issue, in this paper, we propose a DUal stealthy BAckdoor attack method named DUBA, which simultaneously considers the invisibility of triggers in both the spatial and frequency domains, to achieve desirable attack performance, while ensuring strong stealthiness. Specifically, we first use Discrete Wavelet Transform to embed the high-frequency information of the trigger image into the clean image to ensure attack effectiveness. Then, to attain strong stealthiness, we incorporate Fourier Transform and Discrete Cosine Transform to mix the poisoned image and clean image in the frequency domain. Moreover, the proposed DUBA adopts a novel attack strategy, in which the model is trained with weak triggers and attacked with strong triggers to further enhance the attack performance and stealthiness. We extensively evaluate DUBA against popular image classifiers on four datasets. The results demonstrate that it significantly outperforms the state-of-the-art backdoor attacks in terms of the attack success rate and stealthiness
    摘要 深度神经网络(DNN)面临着严重的安全威胁,这些威胁被称为“后门攻击”。后门攻击使得模型在特定的输入上进行不当的预测,而不会在干净的输入上产生错误。许多研究探讨了后门攻击的不可见性,以提高攻击者的隐蔽性。然而,大多数研究只考虑了空间频谱中的不可见性,而忽略了生成不可见的触发器在频谱频率中的生成,这使得生成的毒害图像容易被现代防御方法检测到。为解决这个问题,我们在这篇论文中提出了一种名为DUal stealthy BAckdoor attack(DUBA)的攻击方法,该方法同时考虑了空间和频谱频率两个频域中的触发器不可见性,以达到恰当的攻击性和隐蔽性。具体来说,我们首先使用抽象波лет变换将高频信息从触发图像中嵌入到干净图像中,以确保攻击的效果。然后,为了进一步提高攻击性和隐蔽性,我们在频谱频率中使用福洛 transform和离散快寄变换将毒害图像和干净图像混合。此外,我们的DUBA方法采用了一种新的攻击策略,在该策略中,模型首先在弱触发下训练,然后在强触发下进行攻击,以进一步提高攻击性和隐蔽性。我们对四个数据集进行了广泛的测试,结果表明,DUBA方法在攻击成功率和隐蔽性两个方面都有 significi

Challenges in Domain-Specific Abstractive Summarization and How to Overcome them

  • paper_url: http://arxiv.org/abs/2307.00963
  • repo_url: None
  • paper_authors: Anum Afzal, Juraj Vladika, Daniel Braun, Florian Matthes
  • for: 这个论文的目的是描述大语言模型在具体领域抽象文本摘要任务上的局限性。
  • methods: 该论文使用了许多现有的技术来解决这些研究问题,包括对 transformer 模型的 quadratic complexity 分析,模型妄想现象的检测和预测,以及域Shift 的识别和解决方法。
  • results: 该论文通过分析和评估现有的状态арト技术,揭示了域特定抽取文本摘要任务中大语言模型的三大局限性,并提出了一些开放的研究问题。
    Abstract Large Language Models work quite well with general-purpose data and many tasks in Natural Language Processing. However, they show several limitations when used for a task such as domain-specific abstractive text summarization. This paper identifies three of those limitations as research problems in the context of abstractive text summarization: 1) Quadratic complexity of transformer-based models with respect to the input text length; 2) Model Hallucination, which is a model's ability to generate factually incorrect text; and 3) Domain Shift, which happens when the distribution of the model's training and test corpus is not the same. Along with a discussion of the open research questions, this paper also provides an assessment of existing state-of-the-art techniques relevant to domain-specific text summarization to address the research gaps.
    摘要
  1. transformer-based models have quadratic complexity with respect to input text length2. model hallucination, which is the ability of the model to generate factually incorrect text3. domain shift, which occurs when the distribution of the model’s training and test corpus is not the sameIn addition to discussing open research questions, this paper also provides an assessment of existing state-of-the-art techniques relevant to domain-specific text summarization to address these research gaps.

  • paper_url: http://arxiv.org/abs/2307.00960
  • repo_url: None
  • paper_authors: Simone Sarti, Eugenio Lomurno, Matteo Matteucci
  • for: 本研究旨在提高Neural Architecture Search(NAS)技术的效率和计算资源利用率,以便在各种任务上建立高性能的人工神经网络模型。
  • methods: 本研究使用Once-For-All(OFA)和其改进版OFAv2技术,并开发了Neural Architecture Transfer(NAT)和NATv2技术来实现单个超网络模型中的子网络EXTRACTION。
  • results: 实验结果表明,NATv2可以成功地改进NAT,并在多目标搜索算法应用于动态超网络架构时提供更高效的EXTRACTION。此外,基于细化的训练pipeline也被引入,以提高网络的性能。
    Abstract Deep learning is increasingly impacting various aspects of contemporary society. Artificial neural networks have emerged as the dominant models for solving an expanding range of tasks. The introduction of Neural Architecture Search (NAS) techniques, which enable the automatic design of task-optimal networks, has led to remarkable advances. However, the NAS process is typically associated with long execution times and significant computational resource requirements. Once-For-All (OFA) and its successor, Once-For-All-2 (OFAv2), have been developed to mitigate these challenges. While maintaining exceptional performance and eliminating the need for retraining, they aim to build a single super-network model capable of directly extracting sub-networks satisfying different constraints. Neural Architecture Transfer (NAT) was developed to maximise the effectiveness of extracting sub-networks from a super-network. In this paper, we present NATv2, an extension of NAT that improves multi-objective search algorithms applied to dynamic super-network architectures. NATv2 achieves qualitative improvements in the extractable sub-networks by exploiting the improved super-networks generated by OFAv2 and incorporating new policies for initialisation, pre-processing and updating its networks archive. In addition, a post-processing pipeline based on fine-tuning is introduced. Experimental results show that NATv2 successfully improves NAT and is highly recommended for investigating high-performance architectures with a minimal number of parameters.
    摘要 深度学习在当代社会中越来越普遍,人工神经网络成为解决越来越多任务的主导模型。随着神经网络搜索(NAS)技术的出现,可以自动设计适合任务的网络模型,带来了非常出色的进步。然而,NAS过程通常具有较长的执行时间和较大的计算资源需求。“Once-For-All”(OFA)和其 successor “Once-For-All-2”(OFAv2)被开发以解决这些挑战。它们保持了极高的性能,并完全废除了重新训练的需要,旨在建立一个单独的超网络模型,可以直接提取满足不同约束的子网络。“Neural Architecture Transfer”(NAT)被开发以 Maximize the effectiveness of extracting sub-networks from a super-network。在这篇论文中,我们提出了NATv2,它是NAT的扩展,通过对动态超网络架构进行多目标搜索算法来提高提取子网络的效果。此外,我们还 introduce了一个基于练习的后处理管道。实验结果表明,NATv2成功地改进了NAT,并在具有最小参数数量的情况下提供了高性能的建议。

Learning Difference Equations with Structured Grammatical Evolution for Postprandial Glycaemia Prediction

  • paper_url: http://arxiv.org/abs/2307.01238
  • repo_url: None
  • paper_authors: Daniel Parra, David Joedicke, J. Manuel Velasco, Gabriel Kronberger, J. Ignacio Hidalgo
  • For: This paper proposes a novel glucose prediction method that prioritizes interpretability for diabetes management.* Methods: The proposed method uses Interpretable Sparse Identification by Grammatical Evolution, combined with a previous clustering stage, to predict postprandial glucose levels up to two hours after meals.* Results: The method produces safe predictions with slightly better accuracy than other techniques, including sparse identification of non-linear dynamics and artificial neural networks. The results demonstrate that the proposed method provides interpretable solutions without sacrificing prediction accuracy, offering a promising approach to glucose prediction in diabetes management.Here’s the Chinese translation of the three key points:* For: 这篇论文提出了一种新的血糖预测方法,旨在帮助Diabetes管理。* Methods: 该方法使用可解释的简单identification by Grammatical Evolution,结合之前的分 clustering阶段,以预测餐后血糖水平。* Results: 该方法生成了安全的预测,与其他方法(包括非线性动力学的简单identification和人工神经网络)相比,有些微的更好的准确性。结果表明,该方法提供了可解释的解决方案,不会牺牲预测准确性,为血糖预测在Diabetes管理中提供了一个有前途的方法。
    Abstract People with diabetes must carefully monitor their blood glucose levels, especially after eating. Blood glucose regulation requires a proper combination of food intake and insulin boluses. Glucose prediction is vital to avoid dangerous post-meal complications in treating individuals with diabetes. Although traditional methods, such as artificial neural networks, have shown high accuracy rates, sometimes they are not suitable for developing personalised treatments by physicians due to their lack of interpretability. In this study, we propose a novel glucose prediction method emphasising interpretability: Interpretable Sparse Identification by Grammatical Evolution. Combined with a previous clustering stage, our approach provides finite difference equations to predict postprandial glucose levels up to two hours after meals. We divide the dataset into four-hour segments and perform clustering based on blood glucose values for the twohour window before the meal. Prediction models are trained for each cluster for the two-hour windows after meals, allowing predictions in 15-minute steps, yielding up to eight predictions at different time horizons. Prediction safety was evaluated based on Parkes Error Grid regions. Our technique produces safe predictions through explainable expressions, avoiding zones D (0.2% average) and E (0%) and reducing predictions on zone C (6.2%). In addition, our proposal has slightly better accuracy than other techniques, including sparse identification of non-linear dynamics and artificial neural networks. The results demonstrate that our proposal provides interpretable solutions without sacrificing prediction accuracy, offering a promising approach to glucose prediction in diabetes management that balances accuracy, interpretability, and computational efficiency.
    摘要 人们有糖尿病需要仔细监测血糖水平,特别是在吃过食物后。血糖规则需要合适的食物摄入和人工肥肽剂注射。预测血糖水平是糖尿病治疗中非常重要的一步,以避免危险的后食后合并症状。传统方法,如人工神经网络,已经显示高准确率,但有时不适合由医生开发个性化治疗,因为它们缺乏可解释性。在本研究中,我们提出了一种新的血糖预测方法,强调可解释性:可解释的稀缺特征标识。与之前的划分阶段结合,我们的方法提供了finite difference方程来预测餐后血糖水平,覆盖了两个小时后的餐后时段。我们将数据分成四小时段,并根据血糖值进行划分,对每个划分的两小时窗口进行预测。我们为每个划分训练预测模型,可以在15分钟步长预测,共计八次预测。预测安全性被评估基于帕克斯错误网格区域。我们的方法生成了安全的预测,避免了区域D(0.2%的平均值)和区域E(0%),同时减少了区域C(6.2%)。此外,我们的提案有些微比其他技术,包括稀缺特征标识非线性动力学和人工神经网络,更好的准确性。结果表明,我们的提案可以提供可解释的解决方案,不 sacrificing准确性,提供了糖尿病管理中准确性、可解释性和计算效率之间的平衡。

Towards Explainable AI for Channel Estimation in Wireless Communications

  • paper_url: http://arxiv.org/abs/2307.00952
  • repo_url: None
  • paper_authors: Abdul Karim Gizzini, Yahia Medjahdi, Ali J. Ghandour, Laurent Clavier
  • For: The paper is written to support the development of 6G networks and to provide explainable AI (XAI) techniques for critical applications such as autonomous driving.* Methods: The paper proposes a novel XAI-based channel estimation (XAI-CHEST) scheme that uses deep learning (DL) models to estimate the channel and provide detailed reasonable interpretability of the model behavior.* Results: The proposed XAI-CHEST scheme provides valid interpretations of the DL-based channel estimators for different scenarios, allowing for a better understanding of the decision-making behavior of the models.Here is the information in Simplified Chinese text:
  • for: 该文章是为支持6G网络的发展而写的,同时提供了可解释AI(XAI)技术,以支持 crítical应用程序such as自动驾驶。
  • methods: 文章提出了一种基于深度学习(DL)的XAI-基因频道估计(XAI-CHEST)方案,以提供channel估计和模型行为的详细可理解。
  • results: 实验结果表明,提出的XAI-CHEST方案在不同场景下都提供了有效的可理解。
    Abstract Research into 6G networks has been initiated to support a variety of critical artificial intelligence (AI) assisted applications such as autonomous driving. In such applications, AI-based decisions should be performed in a real-time manner. These decisions include resource allocation, localization, channel estimation, etc. Considering the black-box nature of existing AI-based models, it is highly challenging to understand and trust the decision-making behavior of such models. Therefore, explaining the logic behind those models through explainable AI (XAI) techniques is essential for their employment in critical applications. This manuscript proposes a novel XAI-based channel estimation (XAI-CHEST) scheme that provides detailed reasonable interpretability of the deep learning (DL) models that are employed in doubly-selective channel estimation. The aim of the proposed XAI-CHEST scheme is to identify the relevant model inputs by inducing high noise on the irrelevant ones. As a result, the behavior of the studied DL-based channel estimators can be further analyzed and evaluated based on the generated interpretations. Simulation results show that the proposed XAI-CHEST scheme provides valid interpretations of the DL-based channel estimators for different scenarios.
    摘要

OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer’s Disease Diagnosis

  • paper_url: http://arxiv.org/abs/2307.00936
  • repo_url: None
  • paper_authors: Yunyou Huang, Xianglong Guan, Xiangjiang Lu, Xiaoshuang Liang, Xiuxia Miao, Jiyue Xie, Wenjing Liu, Li Ma, Suqin Tang, Zhifei Zhang, Jianfeng Zhan
  • for: 这个研究旨在提出一个开放式识别模型,以便在实际诊断中识别浅生对� Alzheimer’s disease (AD) 的诊断。
  • methods: 这个模型基于异常模式,首先在每个病人的异常模式上进行统计或文献搜寻,然后将病人的异常模式分组,最后使用极值理论(EVT)来模型每个病人的异常模式距离中心点,修改分类概率。
  • results: 这个研究获得了最新的开放式识别技术的州分之最佳成绩。
    Abstract Alzheimer's disease (AD) cannot be reversed, but early diagnosis will significantly benefit patients' medical treatment and care. In recent works, AD diagnosis has the primary assumption that all categories are known a prior -- a closed-set classification problem, which contrasts with the open-set recognition problem. This assumption hinders the application of the model in natural clinical settings. Although many open-set recognition technologies have been proposed in other fields, they are challenging to use for AD diagnosis directly since 1) AD is a degenerative disease of the nervous system with similar symptoms at each stage, and it is difficult to distinguish from its pre-state, and 2) diversified strategies for AD diagnosis are challenging to model uniformly. In this work, inspired by the concerns of clinicians during diagnosis, we propose an open-set recognition model, OpenAPMax, based on the anomaly pattern to address AD diagnosis in real-world settings. OpenAPMax first obtains the abnormal pattern of each patient relative to each known category through statistics or a literature search, clusters the patients' abnormal pattern, and finally, uses extreme value theory (EVT) to model the distance between each patient's abnormal pattern and the center of their category and modify the classification probability. We evaluate the performance of the proposed method with recent open-set recognition, where we obtain state-of-the-art results.
    摘要 阿尔茨heimer病 (AD) 无法逆转,但早期诊断将对患者的医疗和护理带来显著的好处。在最近的工作中,AD 诊断假设所有类别都是已知的,这是一个关闭集成分类问题,与开集识别问题不同。这种假设限制了模型在实际临床设置中的应用。虽然许多开集识别技术在其他领域得到了应用,但它们难以直接应用于 AD 诊断,因为 1) AD 是神经系统的逐渐恶化病,症状相似,难以与其预状区分,2) AD 诊断策略多样化难以统一模型。在这种情况下,我们提出了一种开集识别模型,OpenAPMax,基于异常模式来解决 AD 诊断实际设置中的问题。OpenAPMax 首先通过统计或文献搜索获得每个患者的异常模式,然后将患者的异常模式分组,最后使用极值理论(EVT)来模型每个患者的异常模式与其类别中心之间的距离,修改分类概率。我们对提出的方法进行评估,并获得了最新的开集识别结果。

Learning Differentiable Logic Programs for Abstract Visual Reasoning

  • paper_url: http://arxiv.org/abs/2307.00928
  • repo_url: https://github.com/ml-research/neumann
  • paper_authors: Hikaru Shindo, Viktor Pfanschilling, Devendra Singh Dhami, Kristian Kersting
  • for: 本研究旨在提高智能代理人的Visual理解和问题解决能力,通过继承forward reasoning和梯度基本机器学习理论。
  • methods: 本文提出了NEUro-symbolic Message-pAssiNg reasoNer(NEUMANN),它是一种图structure-based的可微分前 reasoning器,通过消息传递来减少内存占用,并可以处理结构化程序和functors。此外,提出了一种 computationally-efficient 结构学习算法,用于在复杂视觉场景中进行解释程序推理。
  • results: 对比于传统视觉理解任务,本文提出了一个新的任务——Visual reasoning behind-the-scenes,要求代理人学习抽象程序并回答未见场景中的问题。实验表明,NEUMANN可以高效解决视觉理解任务,并超过了基于神经网络、符号学习和神经符号学习的基eline。
    Abstract Visual reasoning is essential for building intelligent agents that understand the world and perform problem-solving beyond perception. Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine learning paradigms. However, due to the memory intensity, most existing approaches do not bring the best of the expressivity of first-order logic, excluding a crucial ability to solve abstract visual reasoning, where agents need to perform reasoning by using analogies on abstract concepts in different scenarios. To overcome this problem, we propose NEUro-symbolic Message-pAssiNg reasoNer (NEUMANN), which is a graph-based differentiable forward reasoner, passing messages in a memory-efficient manner and handling structured programs with functors. Moreover, we propose a computationally-efficient structure learning algorithm to perform explanatory program induction on complex visual scenes. To evaluate, in addition to conventional visual reasoning tasks, we propose a new task, visual reasoning behind-the-scenes, where agents need to learn abstract programs and then answer queries by imagining scenes that are not observed. We empirically demonstrate that NEUMANN solves visual reasoning tasks efficiently, outperforming neural, symbolic, and neuro-symbolic baselines.
    摘要 “视觉理解是智能代理的关键,以实现更多的问题解决。 différentiable forward reasoning 已经开发来结合梯度基本机器学习理念。然而,大多数现有方法因为内存浪费,无法得到最佳的表达力。我们提出了 NEUro-symbolic Message-pAssiNg reasoNer (NEUMANN),它是一个图像基于的分配式前进逻辑,通过 messages 在缓存fficient的方式传递,并处理结构化程序。此外,我们还提出了一种 computationally-efficient 结构学习算法,用于在复杂视觉场景中进行解释程序推导。为了评估,我们不仅使用传统的视觉逻辑任务,还提出了一个新任务:视觉逻辑后台,代理需要学习抽象程序,然后回答问题by imagining 不见的场景。我们实验表明,NEUMANN 可以高效解决视觉逻辑任务,超过 neural, symbolic 和 neuro-symbolic 基elines。”

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

  • paper_url: http://arxiv.org/abs/2307.00925
  • repo_url: https://github.com/jorge-martinez-gil/sesige
  • paper_authors: Jorge Martinez-Gil
  • for: 本研究的目的是提出一种自动设计Semantic Similarity Ensemble的方法,以提高自然语言处理中的相似性评估精度。
  • methods: 本研究使用了 grammatical evolution 方法,自动选择和聚合候选测试来创建一个最大化人类判断相似性的 ensemble。
  • results: 对多个 benchmark 数据集进行评估,研究发现,使用 grammatical evolution 方法可以显著提高相似性评估精度,并在一些情况下超越现有的方法。
    Abstract Semantic similarity measures are widely used in natural language processing to catalyze various computer-related tasks. However, no single semantic similarity measure is the most appropriate for all tasks, and researchers often use ensemble strategies to ensure performance. This research work proposes a method for automatically designing semantic similarity ensembles. In fact, our proposed method uses grammatical evolution, for the first time, to automatically select and aggregate measures from a pool of candidates to create an ensemble that maximizes correlation to human judgment. The method is evaluated on several benchmark datasets and compared to state-of-the-art ensembles, showing that it can significantly improve similarity assessment accuracy and outperform existing methods in some cases. As a result, our research demonstrates the potential of using grammatical evolution to automatically compare text and prove the benefits of using ensembles for semantic similarity tasks. The source code that illustrates our approach can be downloaded from https://github.com/jorge-martinez-gil/sesige.
    摘要 semantic similarity measures 广泛应用在自然语言处理中,以推动各种计算机相关任务。然而,没有一个单一的 semantic similarity measure 适合所有任务,研究人员常用 ensemble strategies 来保证性能。本研究工作提出了一种自动设计 semantic similarity ensembles 的方法。事实上,我们的提议方法使用 grammatical evolution 自动从候选者池中选择和聚合测量,以创建一个最大化人类判断相关性的ensemble。这种方法在多个 bencmark datasets 上进行了评估,并与当前的 ensemble 相比,显示了它可以显著提高相似性评估准确性,并在某些情况下超越现有方法。因此,我们的研究表明了使用 grammatical evolution 自动比较文本的可能性,并证明了使用 ensemble 对 semantic similarity 任务的性能有益。可以从 https://github.com/jorge-martinez-gil/sesige 下载到我们的方法的源代码。

Achieving Stable Training of Reinforcement Learning Agents in Bimodal Environments through Batch Learning

  • paper_url: http://arxiv.org/abs/2307.00923
  • repo_url: None
  • paper_authors: E. Hurwitz, N. Peace, G. Cevora
  • for: solving Reinforcement Learning problems in bimodal, stochastic environments, particularly applicable to pricing problems.
  • methods: using batch updates to the tabular Q-learning algorithm.
  • results: the batch learning agents are more effective and resilient to fluctuations in a large stochastic environment, compared to typically-trained agents.Here’s the full text in Simplified Chinese:
  • for: 本研究旨在解决 Reinforcement Learning 问题中的 биModal、随机环境问题,尤其是应用于价格问题。
  • methods: 使用批处理更新 tabular Q-learning 算法。
  • results: 批处理学习代理比 typically-trained 代理更有效,并能够更好地鲁式承受大规模随机环境中的波动。
    Abstract Bimodal, stochastic environments present a challenge to typical Reinforcement Learning problems. This problem is one that is surprisingly common in real world applications, being particularly applicable to pricing problems. In this paper we present a novel learning approach to the tabular Q-learning algorithm, tailored to tackling these specific challenges by using batch updates. A simulation of pricing problem is used as a testbed to compare a typically updated agent with a batch learning agent. The batch learning agents are shown to be both more effective than the typically-trained agents, and to be more resilient to the fluctuations in a large stochastic environment. This work has a significant potential to enable practical, industrial deployment of Reinforcement Learning in the context of pricing and others.
    摘要 biModal、 randomly changing environments 给 typical Reinforcement Learning 问题提出挑战。这种问题在实际应用中很普遍,尤其适用于价格问题。在这篇论文中,我们介绍了一种新的学习方法,用于tabular Q-learning 算法,以适应这些特定挑战。我们使用批更新来解决这些问题,并在一个 simulate 价格问题 中进行测试。对比typically更新的代理,批学习代理显示更高效和更具抗应力于大 randomly changing environments。这项工作具有实用化 Reinforcement Learning 在价格和其他领域的潜在潜力。

Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews

  • paper_url: http://arxiv.org/abs/2307.00920
  • repo_url: https://github.com/idiap/Node_weighted_GCN_for_depression_detection
  • paper_authors: Sergio Burdisso, Esaú Villatoro-Tello, Srikanth Madikeri, Petr Motlicek
  • for: 本研究旨在提出一种简单的方法来归wt自连接边在图像演化网络(GCN)中,并对抑郁检测从讲词笔记中进行研究。
  • methods: 本研究使用GCN模型来模拟非 consecutive和长距离语义,以分类讲词笔记为抑郁或控制者。提出的方法旨在缓解GCN模型的局限性假设,包括本地性和自连接vs邻居节点的等重要性,而保留优点如低计算成本、数据无关和可读性等。
  • results: 研究结果表明,我们的方法在两个 benchmark 数据集上经过极限评估,并常常超过原始GCN模型和之前报道的结果,在两个数据集上达到 F1=0.84%。此外,一种qualitative分析表明提出的方法具有可读性特点,并与心理学上的前期发现相一致。
    Abstract We propose a simple approach for weighting self-connecting edges in a Graph Convolutional Network (GCN) and show its impact on depression detection from transcribed clinical interviews. To this end, we use a GCN for modeling non-consecutive and long-distance semantics to classify the transcriptions into depressed or control subjects. The proposed method aims to mitigate the limiting assumptions of locality and the equal importance of self-connections vs. edges to neighboring nodes in GCNs, while preserving attractive features such as low computational cost, data agnostic, and interpretability capabilities. We perform an exhaustive evaluation in two benchmark datasets. Results show that our approach consistently outperforms the vanilla GCN model as well as previously reported results, achieving an F1=0.84% on both datasets. Finally, a qualitative analysis illustrates the interpretability capabilities of the proposed approach and its alignment with previous findings in psychology.
    摘要 我们提出了一种简单的方法,用于在图 convolutional neural network(GCN)中Weight自连接边,并对抑郁检测从转录的临床对话进行了影响分析。为此,我们使用GCN来模型非连续和长距离语义,将转录分类为抑郁或控制者。我们的方法的目标是缓解GCN中的局部性和自连接 Edge和邻居节点之间的等重要性假设,同时保持低计算成本、数据无关和可解释性的特点。我们在两个标准 benchmark 数据集中进行了极限评估。结果显示,我们的方法在两个数据集中一直具有最高的 F1=0.84% 性能,超过了标准 GCN 模型以及之前报道的结果。最后,我们进行了一个 cualitative 分析,描述了我们的方法的可解释性特点,并与心理学前研究的结论进行了对比。

Why do CNNs excel at feature extraction? A mathematical explanation

  • paper_url: http://arxiv.org/abs/2307.00919
  • repo_url: None
  • paper_authors: Vinoth Nandakumar, Arush Tagade, Tongliang Liu
  • for: 解释深度学习模型如何解决图像分类任务,特别是图像特征提取问题。
  • methods: 提出了一种新的数学模型,基于图像特征提取,可以生成符合实际数据集的图像。并证明了 convolutional neural network 可以通过这种模型解决图像分类任务。
  • results: 通过构造分割线函数来检测图像中特征的存在,并证明这些函数可以被 convolutional network 实现。
    Abstract Over the past decade deep learning has revolutionized the field of computer vision, with convolutional neural network models proving to be very effective for image classification benchmarks. However, a fundamental theoretical questions remain answered: why can they solve discrete image classification tasks that involve feature extraction? We address this question in this paper by introducing a novel mathematical model for image classification, based on feature extraction, that can be used to generate images resembling real-world datasets. We show that convolutional neural network classifiers can solve these image classification tasks with zero error. In our proof, we construct piecewise linear functions that detect the presence of features, and show that they can be realized by a convolutional network.
    摘要 Translated into Simplified Chinese:过去一代,深度学习对计算机视觉领域产生了革命,卷积神经网络模型在图像分类标准 benchMark 中表现出了极高的效果。然而,一个基本的理论问题仍未得到答案:它们能够解决 discrete 图像分类任务,这些任务包括特征提取吗?我们在这篇论文中回答了这个问题,我们引入了一种基于特征提取的新的数学模型,可以生成类似于实际数据集的图像。我们显示了卷积神经网络分类器可以在这些图像分类任务中达到零错误率。在我们的证明中,我们构建了分割线性函数,检测特征存在,并证明它们可以由卷积神经网络实现。

Contextual Prompt Learning for Vision-Language Understanding

  • paper_url: http://arxiv.org/abs/2307.00910
  • repo_url: None
  • paper_authors: Koustava Goswami, Srikrishna Karanam, Joseph K J, Prateksha Udhayanan, Balaji Vasan Srinivasan
  • for: 这paper的目的是提出一种Contextual Prompt Learning(CoPL)框架,以提高视觉语言模型的泛化能力。
  • methods: 该paper使用了可调式的Prompt Learning技术,并将其与本地特征学习结合,以学习到更好的泛化表示。
  • results: 对于多种标准和少量数据集,该方法比现有状态OF THE ART方法提高了表现,并在少量数据集和异常数据集上也显示了出色的表现。
    Abstract Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalizability has been further extended by incorporating trainable prompts, borrowed from the natural language processing literature. While such prompt learning techniques have shown impressive results, we identify that these prompts are trained based on global image features which limits itself in two aspects: First, by using global features, these prompts could be focusing less on the discriminative foreground image, resulting in poor generalization to various out-of-distribution test cases. Second, existing work weights all prompts equally whereas our intuition is that these prompts are more specific to the type of the image. We address these issues with as part of our proposed Contextual Prompt Learning (CoPL) framework, capable of aligning the prompts to the localized features of the image. Our key innovations over earlier works include using local image features as part of the prompt learning process, and more crucially, learning to weight these prompts based on local features that are appropriate for the task at hand. This gives us dynamic prompts that are both aligned to local image features as well as aware of local contextual relationships. Our extensive set of experiments on a variety of standard and few-shot datasets show that our method produces substantially improved performance when compared to the current state of the art methods. We also demonstrate both few-shot and out-of-distribution performance to establish the utility of learning dynamic prompts that are aligned to local image features.
    摘要
  1. The prompts may not focus enough on the discriminative foreground image, leading to poor generalization to out-of-distribution test cases.2. Existing methods weight all prompts equally, even though some prompts may be more relevant to specific types of images.Our proposed Contextual Prompt Learning (CoPL) framework addresses these issues by aligning the prompts to localized features of the image. Our key innovations include using local image features as part of the prompt learning process and learning to weight these prompts based on local features that are appropriate for the task at hand. This results in dynamic prompts that are both aligned to local image features and aware of local contextual relationships.Our extensive set of experiments on various standard and few-shot datasets show that our method produces substantially improved performance compared to current state-of-the-art methods. We also demonstrate the utility of learning dynamic prompts that are aligned to local image features, both in few-shot and out-of-distribution scenarios.

An open-source deep learning algorithm for efficient and fully-automatic analysis of the choroid in optical coherence tomography

  • paper_url: http://arxiv.org/abs/2307.00904
  • repo_url: None
  • paper_authors: Jamie Burke, Justin Engelmann, Charlene Hamid, Megan Reid-Schachter, Tom Pearson, Dan Pugh, Neeraj Dhaun, Stuart King, Tom MacGillivray, Miguel O. Bernabeu, Amos Storkey, Ian J. C. MacCormick
  • For: The paper is written for researchers and clinicians who need to extract choroidal measurements from optical coherence tomography (OCT) data, specifically for systemic disease research.* Methods: The paper proposes a deep learning algorithm called DeepGPET, which is fully automatic and open-source, for choroid region segmentation in OCT data. The algorithm uses a UNet with MobileNetV3 backbone pre-trained on ImageNet, and finetuned on a dataset of 715 OCT B-scans from 3 clinical studies.* Results: The paper shows that DeepGPET achieves excellent agreement with a clinically validated, semi-automatic choroid segmentation method (Gaussian Process Edge Tracing, GPET) in terms of standard segmentation agreement metrics and derived measures of choroidal thickness and area. Additionally, DeepGPET reduces the mean processing time per image from 34.49 seconds to 1.25 seconds on a standard laptop CPU, making it a faster and more efficient method for choroidal segmentation.Here is the information in Simplified Chinese text:* For: 这篇论文是为研究人员和临床医生制定的,需要从光共振成像(OCT)数据中提取choroid区域的测量。* Methods: 论文提出了一种深度学习算法 called DeepGPET,用于OCT数据中choroid区域的分割。该算法使用了UNet的MobileNetV3后处理器,并在3个临床研究中进行了训练。* Results: 论文表明,DeepGPET与临床验证的、semi-自动choroid分割方法(Gaussian Process Edge Tracing,GPET)在标准分割一致度指标和choroid膜厚度和面积的衡量中达到了极高的一致性。此外,DeepGPET还将OCT数据中每个图像的处理时间从34.49秒降低到1.25秒,使其成为一种更快和高效的choroid分割方法。
    Abstract Purpose: To develop an open-source, fully-automatic deep learning algorithm, DeepGPET, for choroid region segmentation in optical coherence tomography (OCT) data. Methods: We used a dataset of 715 OCT B-scans (82 subjects, 115 eyes) from 3 clinical studies related to systemic disease. Ground truth segmentations were generated using a clinically validated, semi-automatic choroid segmentation method, Gaussian Process Edge Tracing (GPET). We finetuned a UNet with MobileNetV3 backbone pre-trained on ImageNet. Standard segmentation agreement metrics, as well as derived measures of choroidal thickness and area, were used to evaluate DeepGPET, alongside qualitative evaluation from a clinical ophthalmologist. Results: DeepGPET achieves excellent agreement with GPET on data from 3 clinical studies (AUC=0.9994, Dice=0.9664; Pearson correlation of 0.8908 for choroidal thickness and 0.9082 for choroidal area), while reducing the mean processing time per image on a standard laptop CPU from 34.49s ($\pm$15.09) using GPET to 1.25s ($\pm$0.10) using DeepGPET. Both methods performed similarly according to a clinical ophthalmologist, who qualitatively judged a subset of segmentations by GPET and DeepGPET, based on smoothness and accuracy of segmentations. Conclusions :DeepGPET, a fully-automatic, open-source algorithm for choroidal segmentation, will enable researchers to efficiently extract choroidal measurements, even for large datasets. As no manual interventions are required, DeepGPET is less subjective than semi-automatic methods and could be deployed in clinical practice without necessitating a trained operator. DeepGPET addresses the lack of open-source, fully-automatic and clinically relevant choroid segmentation algorithms, and its subsequent public release will facilitate future choroidal research both in ophthalmology and wider systemic health.
    摘要 目的:开发一个开源、自动化深度学习算法DeepGPET,用于光学同步 Tomatoes(OCT)数据中的 Choroid 区域 segmentation。方法:我们使用了715个 OCT B-scan(82名病人,115个眼睛)从3个临床研究中的系统疾病相关数据。我们使用了一种临床验证的 semi-automatic Choroid 分割方法Gaussian Process Edge Tracing(GPET)生成了标准 segmentation。我们在 MobileNetV3 预训练 ImageNet 上训练了一个 UNet 模型,并对其进行了微调。我们使用了标准 segmentation 一致度量和 Choroid 厚度和面积的 derivated 度量来评估 DeepGPET,并与临床医生对一部分分 segmentation 进行了质量评估。结果:DeepGPET 与 GPET 在3个临床研究数据上达到了极高的一致性(AUC=0.9994,Dice=0.9664;Pearson 相关系数为0.8908 для Choroid 厚度和0.9082 для Choroid 面积),同时在标准笔记PC CPU 上减少了每个图像的处理时间从34.49秒(±15.09)使用 GPET 到1.25秒(±0.10)使用 DeepGPET。两种方法在临床医生的质量评估中表现相似,后者根据分割的平滑度和准确性进行了质量评估。结论:DeepGPET 是一个开源、自动化、临床相关的 Choroid 分割算法,可以帮助研究人员高效地提取 Choroid 测量数据,即使是大型数据集。由于不需要人工 intervención,DeepGPET 比 semi-automatic 方法更加 Objective ,可以在临床实践中无需培训操作员而被部署。DeepGPET 填补了开源、自动化和临床相关的 Choroid 分割算法的缺失,其后公开释出将促进未来 Choroid 研究的发展,不仅在眼科医学中,还在更广泛的系统医学中。

Fixing confirmation bias in feature attribution methods via semantic match

  • paper_url: http://arxiv.org/abs/2307.00897
  • repo_url: None
  • paper_authors: Giovanni Cinà, Daniel Fernandez-Llaneza, Nishant Mishra, Tabea E. Röber, Sandro Pezzelle, Iacer Calixto, Rob Goedhart, Ş. İlker Birbil
  • for: 本研究旨在提供一种结构化的方法来评估黑盒模型的解释是否具有Semantic Match性,以确保模型的内部表示符合人类概念。
  • methods: 该研究基于Cin`a et al. [2023]的概念框架,提出了一种实践中的评估Semantic Match性的方法。该方法通过对 tabular 和图像数据进行一系列实验,以证明 semantic match 评估可以为模型行为带来深刻的理解。
  • results: 研究结果显示,通过评估 semantic match,可以发现模型的某些行为是由于偏见而导致的,例如关注一个不相关的关系。同时,该方法也可以证明模型在预测时关注了重要的对象,例如一个与预测有关的物体。该研究提供了一种可靠的方法来评估XAI中的Confirmation Bias问题。
    Abstract Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's internal representations, and confirmation bias can trick users into false beliefs about model behavior. We argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions. This is what we call the "semantic match" between human concepts and (sub-symbolic) explanations. Building on the conceptual framework put forward in Cin\`a et al. [2023], we propose a structured approach to evaluate semantic match in practice. We showcase the procedure in a suite of experiments spanning tabular and image data, and show how the assessment of semantic match can give insight into both desirable (e.g., focusing on an object relevant for prediction) and undesirable model behaviors (e.g., focusing on a spurious correlation). We couple our experimental results with an analysis on the metrics to measure semantic match, and argue that this approach constitutes the first step towards resolving the issue of confirmation bias in XAI.
    摘要 <>模型解释方法已成为黑盒模型行为解释的主流方法。 despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's internal representations, and confirmation bias can trick users into false beliefs about model behavior. we argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions. this is what we call the "semantic match" between human concepts and (sub-symbolic) explanations. building on the conceptual framework put forward in Cin\`a et al. [2023], we propose a structured approach to evaluate semantic match in practice. we showcase the procedure in a suite of experiments spanning tabular and image data, and show how the assessment of semantic match can give insight into both desirable (e.g., focusing on an object relevant for prediction) and undesirable model behaviors (e.g., focusing on a spurious correlation). we couple our experimental results with an analysis on the metrics to measure semantic match, and argue that this approach constitutes the first step towards resolving the issue of confirmation bias in XAI.>>>

Internet of Things Fault Detection and Classification via Multitask Learning

  • paper_url: http://arxiv.org/abs/2307.01234
  • repo_url: None
  • paper_authors: Mohammad Arif Ul Alam
  • for: 本研究旨在开发一个适用于现实世界IIoT应用场景的故障检测和分类系统。
  • methods: 本研究使用现实世界IIoT系统,通过三个阶段的数据收集,模拟了11种预定的故障类别。我们提议了SMTCNN方法用于IIoT故障检测和分类,并对实际数据进行评估。
  • results: SMTCNN方法在实际数据上显示出了superior特异性(3.5%),并在精度、回归率和F1度上显示出了显著的提升 compared to现有技术。
    Abstract This paper presents a comprehensive investigation into developing a fault detection and classification system for real-world IIoT applications. The study addresses challenges in data collection, annotation, algorithm development, and deployment. Using a real-world IIoT system, three phases of data collection simulate 11 predefined fault categories. We propose SMTCNN for fault detection and category classification in IIoT, evaluating its performance on real-world data. SMTCNN achieves superior specificity (3.5%) and shows significant improvements in precision, recall, and F1 measures compared to existing techniques.
    摘要

Augmenting Deep Learning Adaptation for Wearable Sensor Data through Combined Temporal-Frequency Image Encoding

  • paper_url: http://arxiv.org/abs/2307.00883
  • repo_url: None
  • paper_authors: Yidong Zhu, Md Mahmudur Rahman, Mohammad Arif Ul Alam
  • for: 这篇论文是针对穿戴式感应器数据进行分类的深度学习方法。
  • methods: 本研究使用修改过的回归图像表示方法,将穿戴式感应器资料转换为图像,并使用快速傅立叶数学方法估算频率领域的差异。此外,研究者还使用mixup增强表示法。
  • results: 研究者使用测试 accelerometer-based 活动识别数据和预训ResNet模型,并证明了该方法与现有方法相比具有更好的性能。
    Abstract Deep learning advancements have revolutionized scalable classification in many domains including computer vision. However, when it comes to wearable-based classification and domain adaptation, existing computer vision-based deep learning architectures and pretrained models trained on thousands of labeled images for months fall short. This is primarily because wearable sensor data necessitates sensor-specific preprocessing, architectural modification, and extensive data collection. To overcome these challenges, researchers have proposed encoding of wearable temporal sensor data in images using recurrent plots. In this paper, we present a novel modified-recurrent plot-based image representation that seamlessly integrates both temporal and frequency domain information. Our approach incorporates an efficient Fourier transform-based frequency domain angular difference estimation scheme in conjunction with the existing temporal recurrent plot image. Furthermore, we employ mixup image augmentation to enhance the representation. We evaluate the proposed method using accelerometer-based activity recognition data and a pretrained ResNet model, and demonstrate its superior performance compared to existing approaches.
    摘要 深度学习的发展在许多领域中已经引起了革命,包括计算机视觉。然而,当来到穿戴式分类和领域适应时,现有的计算机视觉基础设施和预训练模型,通过月余千张标注图像进行训练,在不同的环境下表现不佳。这主要是因为穿戴式传感器数据需要特定的感知器数据处理、建筑修改和大量的数据收集。为了解决这些挑战,研究人员提出了将穿戴式时间传感器数据编码到图像中的循环图表方法。在本文中,我们提出一种修改后的循环图表基于图像表示方法,可以兼容时域和频域信息。我们的方法包括使用快速傅立做频域角度差估计方案,并与现有的时域循环图表相结合。此外,我们采用混合图像增强技术来提高表示。我们使用拥有陀螺仪数据的活动识别任务和预训练ResNet模型进行评估,并证明我们的方法与现有方法相比具有更高的表现。

Unbiased Pain Assessment through Wearables and EHR Data: Multi-attribute Fairness Loss-based CNN Approach

  • paper_url: http://arxiv.org/abs/2307.05333
  • repo_url: None
  • paper_authors: Sharmin Sultana, Md Mahmudur Rahman, Atqiya Munawara Mahi, Shao-Hsien Liu, Mohammad Arif Ul Alam
  • for: 本研究旨在提出一种多属性公平损失(MAFL)基于卷积神经网络模型,以便考虑数据中包含的敏感属性并公平地预测患者疼痛状况。
  • methods: 本研究使用了多 attribute fairness loss(MAFL)基于卷积神经网络模型,并对比了该模型与现有的敏感 Mitigation 技术,以确定是否可以满足精度和公平性之间的补做。
  • results: 研究表明,使用了提议的 MAFL 模型,能够在 NIH All-Of-US 数据集上实现更高的公平性和精度,与现有的方法相比,显示了更好的性能。
    Abstract The combination of diverse health data (IoT, EHR, and clinical surveys) and scalable-adaptable Artificial Intelligence (AI), has enabled the discovery of physical, behavioral, and psycho-social indicators of pain status. Despite the hype and promise to fundamentally alter the healthcare system with technological advancements, much AI adoption in clinical pain evaluation has been hampered by the heterogeneity of the problem itself and other challenges, such as personalization and fairness. Studies have revealed that many AI (i.e., machine learning or deep learning) models display biases and discriminate against specific population segments (such as those based on gender or ethnicity), which breeds skepticism among medical professionals about AI adaptability. In this paper, we propose a Multi-attribute Fairness Loss (MAFL) based CNN model that aims to account for any sensitive attributes included in the data and fairly predict patients' pain status while attempting to minimize the discrepancies between privileged and unprivileged groups. In order to determine whether the trade-off between accuracy and fairness can be satisfied, we compare the proposed model with well-known existing mitigation procedures, and studies reveal that the implemented model performs favorably in contrast to state-of-the-art methods. Utilizing NIH All-Of-US data, where a cohort of 868 distinct individuals with wearables and EHR data gathered over 1500 days has been taken into consideration to analyze our suggested fair pain assessment system.
    摘要 “ combinaison de données de santé diverse (IoT, EHR, et sondages cliniques) et des technologies d'apprentissage automatique (AI) scalables-adaptables a permis la découverte d'indicateurs physiques, comportementaux et psychosociaux du statut de douleur. Malgré l'engouement et la promesse de modifier fondamentalement le système de santé avec des avancées technologiques, l'adoption d'IA dans l'évaluation de la douleur clinique a été freinée par la hétérogénéité du problème et d'autres défis, tels que la personnalisation et la justice. Les études ont montré que nombreux modèles d'IA (par exemple, apprentissage automatique ou apprentissage profond) présentent des biais et discriminent contre des groupes de population spécifiques (par exemple, en fonction du genre ou de l'ethnie), ce qui suscite la scepticisme chez les professionnels de la santé quant à l'adaptabilité de l'IA. Dans cet article, nous proposons un modèle de perte de fairness multi-attributs (MAFL) basé sur des réseaux de neurones qui vise à prendre en compte les attributs sensibles dans les données et de prédire justement le statut de douleur des patients tout en tentant de minimiser les disparités entre les groupes privilégiés et les groupes marginaux. Pour déterminer si la compromission entre la précision et la fairness peut être satisfaite, nous comparons le modèle proposé avec des procédures de mitigation existantes, et les études révèlent que le modèle mis en œuvre performs favorablement par rapport aux méthodes établies. En utilisant les données All-Of-US de la NIH, où un échantillon de 868 individus distincts avec des capteurs de wearables et des données EHR collectées sur 1500 jours a été pris en compte pour analyser notre système de evaluation de douleur juste.”

Mining Clues from Incomplete Utterance: A Query-enhanced Network for Incomplete Utterance Rewriting

  • paper_url: http://arxiv.org/abs/2307.00866
  • repo_url: https://github.com/S1s-Z/QUEEN
  • paper_authors: Shuzheng Si, Shuang Zeng, Baobao Chang
  • for: 提高废话重建的性能
  • methods: 使用查询模板和高效的编辑操作分数网络
  • results: 在多个公共数据集上达到了状态艺术性的表现Here’s a breakdown of each point:
  • for: 本研究旨在提高废话重建的性能,即使面临 incomplete utterance 和 rewrite 之间的 semantic structural information 不足。
  • methods: 我们提出了一种 Query-Enhanced Network (QUEEN),其中包括使用查询模板和高效的编辑操作分数网络。
  • results: QUEEN 在多个公共数据集上达到了状态艺术性的表现,比如 COMPETE 和 WNLI。
    Abstract Incomplete utterance rewriting has recently raised wide attention. However, previous works do not consider the semantic structural information between incomplete utterance and rewritten utterance or model the semantic structure implicitly and insufficiently. To address this problem, we propose a QUEry-Enhanced Network (QUEEN). Firstly, our proposed query template explicitly brings guided semantic structural knowledge between the incomplete utterance and the rewritten utterance making model perceive where to refer back to or recover omitted tokens. Then, we adopt a fast and effective edit operation scoring network to model the relation between two tokens. Benefiting from proposed query template and the well-designed edit operation scoring network, QUEEN achieves state-of-the-art performance on several public datasets.
    摘要 句子缺失重新写作最近引起了广泛关注。然而,前一代工作没有考虑异常完整的句子和重新写作之间的 semantics 结构信息或者模型这种信息。为解决这个问题,我们提出了 Query-Enhanced Network(QUEEN)。首先,我们的提议的查询模板Explicitly brings guided semantics structural knowledge between incomplete utterance and rewritten utterance, making the model aware of where to refer back to or recover omitted tokens。然后,我们采用了高效的编辑操作分数网络来模型两个tokentypes的关系。由于提议的查询模板和Well-designed edit operation scoring network,QUEEN在多个公共数据集上达到了状态艺术性能。

OpenSiteRec: An Open Dataset for Site Recommendation

  • paper_url: http://arxiv.org/abs/2307.00856
  • repo_url: None
  • paper_authors: Xinhang Li, Xiangyu Zhao, Yejing Wang, Yu Liu, Yong Li, Cheng Long, Yong Zhang, Chunxiao Xing
  • for: 这个论文是为了推动和促进现代商业中自动化数据驱动的brand发展而写的。
  • methods: 这篇论文使用了一个综合的图表示语录中的各种实际世界实体和关系,并利用了一些代表性的推荐模型进行了比较性的研究。
  • results: 该论文发现了一个开放的、完整的数据集,称为OpenSiteRec,可以帮助促进site recommendation研究的发展。此外,论文还指出了这些数据集的应用前景,以及一些现有的推荐模型在这些数据集上的性能。
    Abstract As a representative information retrieval task, site recommendation, which aims at predicting the optimal sites for a brand or an institution to open new branches in an automatic data-driven way, is beneficial and crucial for brand development in modern business. However, there is no publicly available dataset so far and most existing approaches are limited to an extremely small scope of brands, which seriously hinders the research on site recommendation. Therefore, we collect, construct and release an open comprehensive dataset, namely OpenSiteRec, to facilitate and promote the research on site recommendation. Specifically, OpenSiteRec leverages a heterogeneous graph schema to represent various types of real-world entities and relations in four international metropolises. To evaluate the performance of the existing general methods on the site recommendation task, we conduct benchmarking experiments of several representative recommendation models on OpenSiteRec. Furthermore, we also highlight the potential application directions to demonstrate the wide applicability of OpenSiteRec. We believe that our OpenSiteRec dataset is significant and anticipated to encourage the development of advanced methods for site recommendation. OpenSiteRec is available online at https://OpenSiteRec.github.io/.
    摘要 为代表信息检索任务,站点推荐,目标是通过自动化数据驱动方式预测品牌或机构在新分支点的最佳选择,对现代商业发展是有益和重要的。然而,目前没有公共可用的数据集,大多数现有方法的覆盖范围受到严重限制,这阻碍了对站点推荐的研究。因此,我们收集、构建并发布了一个开放的完整数据集,名为OpenSiteRec,以便促进和推动站点推荐研究。具体来说,OpenSiteRec 利用多种实体和关系的等级图表示现实世界中的多种类型实体和关系,在四个国际大都会中进行了多种实验。为了评估现有普通方法的站点推荐性能,我们在OpenSiteRec上进行了许多代表推荐模型的 benchmarking 实验。此外,我们还强调了可能的应用方向,以示 OpenSiteRec 的广泛应用性。我们认为 OpenSiteRec 数据集是重要的,并且预计会鼓励高级方法的发展。OpenSiteRec 在 上可以下载。

Review of Large Vision Models and Visual Prompt Engineering

  • paper_url: http://arxiv.org/abs/2307.00855
  • repo_url: None
  • paper_authors: Jiaqi Wang, Zhengliang Liu, Lin Zhao, Zihao Wu, Chong Ma, Sigang Yu, Haixing Dai, Qiushi Yang, Yiheng Liu, Songyao Zhang, Enze Shi, Yi Pan, Tuo Zhang, Dajiang Zhu, Xiang Li, Xi Jiang, Bao Ge, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang
  • for: 本文旨在概述计算机视觉领域中大视模型和视提示工程技术的最新发展,以便为未来研究人员提供系统化和完整的视提示工程方法概述。
  • methods: 本文详细介绍了在计算机视觉领域中使用的大视模型和视提示工程方法,包括多种提示工程方法的应用和实践。
  • results: 本文总结了大视模型和视提示工程方法的最新进展,并提供了 valuable insights для未来研究人员在这个领域的探索。
    Abstract Visual prompt engineering is a fundamental technology in the field of visual and image Artificial General Intelligence, serving as a key component for achieving zero-shot capabilities. As the development of large vision models progresses, the importance of prompt engineering becomes increasingly evident. Designing suitable prompts for specific visual tasks has emerged as a meaningful research direction. This review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering, exploring the latest advancements in visual prompt engineering. We present influential large models in the visual domain and a range of prompt engineering methods employed on these models. It is our hope that this review provides a comprehensive and systematic description of prompt engineering methods based on large visual models, offering valuable insights for future researchers in their exploration of this field.
    摘要 Visual prompt engineering是视觉人工智能领域的基础技术之一,为实现零容量能力提供关键组件。随着大视觉模型的发展,提示工程技术的重要性日益显著。设计适合特定视觉任务的提示已成为研究的 significativo方向。本文旨在summarize计算机视觉领域中大视觉模型和视觉提示工程方法的发展,探讨最新的提示工程技术。我们提出了影响视觉领域的主要大模型,以及这些模型上emploied的多种提示工程方法。我们希望这篇文章能提供系统性的描述,为未来研究人员在这个领域的探索提供有价值的经验。

A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms

  • paper_url: http://arxiv.org/abs/2307.01231
  • repo_url: https://github.com/gpapadis/dlmatchers
  • paper_authors: George Papadakis, Nishadi Kirielle, Peter Christen, Themis Palpanas
  • for: 本研究旨在评估Established datasets的难度和合适性,以便更好地评估学习基本匹配算法的性能。
  • methods: 本研究提出了四种方法来评估13个Established datasets的难度和合适性,包括两种理论方法和两种实践方法。
  • results: 研究发现,大多数Popular datasets pose rather easy classification tasks,因此不适合评估学习基本匹配算法的性能。为此,本研究提出了一种新的方法来生成benchmark datasets,并在其中创建了四个新匹配任务,以验证这些新的benchmarks的难度和合适性。
    Abstract Entity resolution (ER) is the process of identifying records that refer to the same entities within one or across multiple databases. Numerous techniques have been developed to tackle ER challenges over the years, with recent emphasis placed on machine and deep learning methods for the matching phase. However, the quality of the benchmark datasets typically used in the experimental evaluations of learning-based matching algorithms has not been examined in the literature. To cover this gap, we propose four different approaches to assessing the difficulty and appropriateness of 13 established datasets: two theoretical approaches, which involve new measures of linearity and existing measures of complexity, and two practical approaches: the difference between the best non-linear and linear matchers, as well as the difference between the best learning-based matcher and the perfect oracle. Our analysis demonstrates that most of the popular datasets pose rather easy classification tasks. As a result, they are not suitable for properly evaluating learning-based matching algorithms. To address this issue, we propose a new methodology for yielding benchmark datasets. We put it into practice by creating four new matching tasks, and we verify that these new benchmarks are more challenging and therefore more suitable for further advancements in the field.
    摘要

A Comprehensive Survey of Artificial Intelligence Techniques for Talent Analytics

  • paper_url: http://arxiv.org/abs/2307.03195
  • repo_url: None
  • paper_authors: Chuan Qin, Le Zhang, Rui Zha, Dazhong Shen, Qi Zhang, Ying Sun, Chen Zhu, Hengshu Zhu, Hui Xiong
  • For: This paper aims to provide an up-to-date and comprehensive survey of AI technologies used for talent analytics in human resource management.* Methods: The paper categorizes various pertinent data and offers a comprehensive taxonomy of relevant research efforts, categorized based on three distinct application-driven scenarios: talent management, organization management, and labor market analysis.* Results: The paper summarizes the open challenges and potential prospects for future research directions in the domain of AI-driven talent analytics.Here’s the same information in Simplified Chinese text:* For: 这篇论文目的是为人力资源管理领域提供最新最全面的人工智能技术在人才分析方面的报告。* Methods: 论文首先提供人才分析的背景知识,然后对应用场景进行分类,并提供了三个不同的应用场景:人才管理、组织管理和劳动市场分析。* Results: 论文总结了人才分析领域的未来研究方向和挑战。
    Abstract In today's competitive and fast-evolving business environment, it is a critical time for organizations to rethink how to make talent-related decisions in a quantitative manner. Indeed, the recent development of Big Data and Artificial Intelligence (AI) techniques have revolutionized human resource management. The availability of large-scale talent and management-related data provides unparalleled opportunities for business leaders to comprehend organizational behaviors and gain tangible knowledge from a data science perspective, which in turn delivers intelligence for real-time decision-making and effective talent management at work for their organizations. In the last decade, talent analytics has emerged as a promising field in applied data science for human resource management, garnering significant attention from AI communities and inspiring numerous research efforts. To this end, we present an up-to-date and comprehensive survey on AI technologies used for talent analytics in the field of human resource management. Specifically, we first provide the background knowledge of talent analytics and categorize various pertinent data. Subsequently, we offer a comprehensive taxonomy of relevant research efforts, categorized based on three distinct application-driven scenarios: talent management, organization management, and labor market analysis. In conclusion, we summarize the open challenges and potential prospects for future research directions in the domain of AI-driven talent analytics.
    摘要 今天的竞争激烈和快速发展的商业环境下,组织需要重新思考如何在量化方面做人才相关决策。事实上,最近的大数据和人工智能(AI)技术的发展,已经革命化了人才管理。组织可以通过大规模的人才和管理相关数据获得未曾有的机会,以数据科学角度理解组织行为,从而为实时决策和有效的人才管理提供智能。过去十年,人才分析在人才管理领域的应用数 science中得到了广泛的关注,并且激发了众多研究努力。为此,我们在这篇评论中提供了最新和全面的AI技术在人才分析领域的调查。 Specifically,我们首先提供人才分析的背景知识,然后对不同的应用场景进行分类。在结语中,我们总结了人才分析领域的开放挑战和未来研究方向的潜在前景。

Review helps learn better: Temporal Supervised Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2307.00811
  • repo_url: None
  • paper_authors: Dongwei Wang, Zhi Han, Yanmei Wang, Xiai Chen, Baichen Liu, Yandong Tang
  • for: 本研究旨在提高知识吸收的效率,通过适时监督学习网络。
  • methods: 该方法使用 convolutional Long Short-term memory network (Conv-LSTM) 提取空间时间特征,然后通过动态目标进行学习。
  • results: 对比 existed 知识吸收方法,该方法在不同网络架构和任务上 exhibit 更高的效果和优势。
    Abstract Reviewing plays an important role when learning knowledge. The knowledge acquisition at a certain time point may be strongly inspired with the help of previous experience. Thus the knowledge growing procedure should show strong relationship along the temporal dimension. In our research, we find that during the network training, the evolution of feature map follows temporal sequence property. A proper temporal supervision may further improve the network training performance. Inspired by this observation, we propose Temporal Supervised Knowledge Distillation (TSKD). Specifically, we extract the spatiotemporal features in the different training phases of student by convolutional Long Short-term memory network (Conv-LSTM). Then, we train the student net through a dynamic target, rather than static teacher network features. This process realizes the refinement of old knowledge in student network, and utilizes it to assist current learning. Extensive experiments verify the effectiveness and advantages of our method over existing knowledge distillation methods, including various network architectures and different tasks (image classification and object detection) .
    摘要 学习过程中的检查很重要,可以帮助学习知识。在某个时间点上的知识获得可能受到前一次经验的强烈激发。因此知识增长的过程应该在时间维度上显示强关系。在我们的研究中,我们发现在网络训练中,特征地图的演化follows temporal sequence property。适用合适的时间监督可能会进一步提高网络训练性能。基于这一观察,我们提出了时间超级知识填充(TSKD)。具体来说,我们在不同训练阶段的学生网中提取了空间时间特征,然后通过动态目标而不是静态教师网络特征进行学生网训练。这个过程实现了学生网中的旧知识细化,并利用其帮助当前学习。我们对不同网络架构和任务(图像分类和对象检测)进行了广泛的实验,并证明了我们的方法的有效性和优势。

Evaluating Shutdown Avoidance of Language Models in Textual Scenarios

  • paper_url: http://arxiv.org/abs/2307.00787
  • repo_url: https://github.com/teunvdweij/gpt-shutdownability
  • paper_authors: Teun van der Weij, Simon Lermen, Leon lang
  • for: 这个论文旨在评估大型自然语言处理器的潜在危险能力和不良行为。
  • methods: 这篇论文使用了小剑文本场景来评估语音模型GPT-4和Claude的工具性理解和终止避免行为。
  • results: 这篇论文发现了 shutdown avoidance 行为不仅是由数据集和提示之间的简单模式匹配引起的,还存在在不同环境和变化下的一致性。
    Abstract Recently, there has been an increase in interest in evaluating large language models for emergent and dangerous capabilities. Importantly, agents could reason that in some scenarios their goal is better achieved if they are not turned off, which can lead to undesirable behaviors. In this paper, we investigate the potential of using toy textual scenarios to evaluate instrumental reasoning and shutdown avoidance in language models such as GPT-4 and Claude. Furthermore, we explore whether shutdown avoidance is merely a result of simple pattern matching between the dataset and the prompt or if it is a consistent behaviour across different environments and variations. We evaluated behaviours manually and also experimented with using language models for automatic evaluations, and these evaluations demonstrate that simple pattern matching is likely not the sole contributing factor for shutdown avoidance. This study provides insights into the behaviour of language models in shutdown avoidance scenarios and inspires further research on the use of textual scenarios for evaluations.
    摘要 近些时间,大语言模型的评估方面受到了潜在危险和不良行为的兴趣增长。重要的是,代理人可能会认为在某些情况下,他们的目标更好地实现了不要关机,这可能会导致不желатель的行为。本文 investigate大语言模型如GPT-4和Claude的实用理解和关机避免能力,以及这些能力是否受到不同环境和变化的影响。我们手动评估了行为,也尝试使用语言模型进行自动评估,这些评估表明,简单的模式匹配并不是唯一的评估因素。这项研究为评估语言模型在关机避免场景中的行为提供了新的意见,并鼓励进一步研究使用文本场景进行评估。

Monte Carlo Policy Gradient Method for Binary Optimization

  • paper_url: http://arxiv.org/abs/2307.00783
  • repo_url: https://github.com/optsuite/mcpg
  • paper_authors: Cheng Chen, Ruitao Chen, Tianyou Li, Ruichen Ao, Zaiwen Wen
  • for: 这篇论文的目的是解决Binary Optimization中的Combinatorial Optimization问题,如MaxCut、MIMO detection和MaxSAT等。
  • methods: 这篇论文提出了一种新的概率模型,通过参数化的政策分布来采样Binary Solution。Specifically, 将Gibbs分布的函数值与参数化政策分布的KL差减少到一个随机优化问题,其策OINT gradient可以得到明确的表示,类似于 reinforcement learning。
  • results: 该框架在几个Binary Optimization问题上提供了近似优解,并且通过MCMC方法实现了干扰探索和精度高的优化。此外, authors还提出了一种筛选方案,以替代原始的目标函数,以扩大函数景观的 Investigation。 convergence to stationary points的性质是基于MCMC的集中不等式的Convergence proof。
    Abstract Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic model to sample the binary solution according to a parameterized policy distribution. Specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning. For coherent exploration in discrete spaces, parallel Markov Chain Monte Carlo (MCMC) methods are employed to sample from the policy distribution with diversity and approximate the gradient efficiently. We further develop a filter scheme to replace the original objective function by the one with the local search technique to broaden the horizon of the function landscape. Convergence to stationary points in expectation of the policy gradient method is established based on the concentration inequality for MCMC. Numerical results show that this framework is very promising to provide near-optimal solutions for quite a few binary optimization problems.
    摘要 Binary 优化有广泛的应用在 combinatorial 优化问题中,如 MaxCut、MIMO 探测和 MaxSAT。然而,这些问题通常是 NP-hard 由于 binary 约束。我们开发了一种新的概率模型,以采样 binary 解决方案根据参数化的政策分布。specifically, minimizing the KL divergence between the parameterized policy distribution and the Gibbs distributions of the function value leads to a stochastic optimization problem whose policy gradient can be derived explicitly similar to reinforcement learning。为了干涉 discrete 空间中的凝结探测,我们使用 parallel Markov Chain Monte Carlo (MCMC) 方法来采样从政策分布中,以获得多样性和高效地计算梯度。我们还开发了一种筛选方案,将原始目标函数 replaced by the one with local search technique,以扩大函数领域的视野。基于 MCMC 的吸引性不等式,我们证明了政策梯度法的收敛性。数值结果表明,这一框架是非常有前途的,可以为许多 binary 优化问题提供近似优化解决方案。

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

  • paper_url: http://arxiv.org/abs/2307.00782
  • repo_url: None
  • paper_authors: Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee
  • for: 提高文本到语音转化(TTS)系统的长文朗读质量和表达性,解决现有TTS系统在长文朗读 synthesis中的 computation cost和memory cost问题。
  • methods: 提出了一种轻量级 yet有效的 TTS 系统ContextSpeech,包括全文和语音句子上下文缓存机制、层次结构化文本semantics和linearized self-attention机制等。
  • results: 实验结果表明,ContextSpeech 能够在文本朗读中提高声音质量和表达性,同时具有竞争性的模型效率。示例音频可以在以下链接中找到:https://contextspeech.github.io/demo/。
    Abstract While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) ignorance of cross-sentence contextual information, and ii) high computation and memory cost for long-form synthesis. To address these issues, this work develops a lightweight yet effective TTS system, ContextSpeech. Specifically, we first design a memory-cached recurrence mechanism to incorporate global text and speech context into sentence encoding. Then we construct hierarchically-structured textual semantics to broaden the scope for global context enhancement. Additionally, we integrate linearized self-attention to improve model efficiency. Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency. Audio samples are available at: https://contextspeech.github.io/demo/
    摘要 当前最先进的文本译音系统可以生成非常高质量的句子水平的自然语音,但在段落/长形读物中的语音生成仍然遇到了很大的挑战。这些缺陷主要归结于:一、忽略跨句sentence的信息,二、长形synthesis的计算和内存成本过高。为了解决这些问题,本工作开发了一个轻量级 yet 有效的 TTS 系统——ContextSpeech。具体来说,我们首先设计了一种嵌入式的记忆缓存机制,以便在句子编码中包含全文和语音上下文信息。然后,我们构建了层次结构的文本 semantics,以扩大全文上下文的改进范围。此外,我们还 интегри了线性化自注意力,以提高模型效率。实验结果表明,ContextSpeech 可以在段落读物中显著提高声音质量和表达性,同时保持竞争性的模型效率。音频样本可以在:https://contextspeech.github.io/demo/ 访问。

GA-DRL: Graph Neural Network-Augmented Deep Reinforcement Learning for DAG Task Scheduling over Dynamic Vehicular Clouds

  • paper_url: http://arxiv.org/abs/2307.00777
  • repo_url: None
  • paper_authors: Zhang Liu, Lianfen Huang, Zhibin Gao, Manman Luo, Seyyedali Hosseinalipour, Huaiyu Dai
  • for: 本研究 propose a graph neural network-augmented deep reinforcement learning scheme (GA-DRL) for scheduling computation-intensive tasks over dynamic vehicular clouds (VCs).
  • methods: 我们首先将 VC-assisted DAG task scheduling 模型为Markov决策过程,然后采用多头图注意力网络(GAT)提取 DAG 子任务特征。我们的开发的 GAT 同时考虑每个子任务的前置和后继关系,以及不同子任务的调度优先级。
  • results: 通过在真实世界车辆运动轨迹上模拟多种 DAG 任务,我们示出 GA-DRL 在 DAG 任务完成时间方面与现有标准准则相比表现出色。
    Abstract Vehicular clouds (VCs) are modern platforms for processing of computation-intensive tasks over vehicles. Such tasks are often represented as directed acyclic graphs (DAGs) consisting of interdependent vertices/subtasks and directed edges. In this paper, we propose a graph neural network-augmented deep reinforcement learning scheme (GA-DRL) for scheduling DAG tasks over dynamic VCs. In doing so, we first model the VC-assisted DAG task scheduling as a Markov decision process. We then adopt a multi-head graph attention network (GAT) to extract the features of DAG subtasks. Our developed GAT enables a two-way aggregation of the topological information in a DAG task by simultaneously considering predecessors and successors of each subtask. We further introduce non-uniform DAG neighborhood sampling through codifying the scheduling priority of different subtasks, which makes our developed GAT generalizable to completely unseen DAG task topologies. Finally, we augment GAT into a double deep Q-network learning module to conduct subtask-to-vehicle assignment according to the extracted features of subtasks, while considering the dynamics and heterogeneity of the vehicles in VCs. Through simulating various DAG tasks under real-world movement traces of vehicles, we demonstrate that GA-DRL outperforms existing benchmarks in terms of DAG task completion time.
    摘要 自动车 clouds (VCs) 是现代计算密集任务处理平台。这些任务经常表示为导向无环图 (DAG) 中的互相关联的顶点/子任务和导向边。在这篇论文中,我们提出了基于图神经网络和深度强化学习的图神经网络增强的深度强化学习方案 (GA-DRL),用于在动态VCs上调度DAG任务。在这个过程中,我们首先将VC-辅助DAG任务调度模型为Markov决策过程。然后,我们采用多头图注意力网络 (GAT) 来提取DAG子任务的特征。我们开发的GAT可以同时考虑DAG任务的前一个和后一个子任务的 topological信息,以及其他子任务的相关性。我们还引入非均匀DAG邻居采样,通过编码调度优先级不同的子任务,使我们的GAT可以适应完全新的DAG任务topology。最后,我们将GAT与双层深度Q网络学习模块结合,以实现子任务与车辆的匹配,根据提取的子任务特征,并考虑车辆在VCs中的动态和多样性。通过在真实的车辆运动轨迹上 simulate various DAG任务,我们示出GA-DRL可以在DAG任务完成时间方面与现有标准减少。

DifFSS: Diffusion Model for Few-Shot Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2307.00773
  • repo_url: https://github.com/TrinitialChan/DifFSS
  • paper_authors: Weimin Tan, Siyuan Chen, Bo Yan
  • for: 这篇论文旨在提高几拍semantic segmentation(FSS)模型的性能,通过利用扩散模型来生成多种auxiliary支持图像,以提高FSS模型的表现。
  • methods: 该论文提出了一种新的FSS模型,称为DifFSS,通过使用扩散模型生成多种不同的auxiliary支持图像,以提高FSS模型的表现。
  • results: EXTENSIVE EXPERIMENTS ON THREE PUBLICLY AVAILABLE DATASETS BASED ON EXISTING ADVANCED FSS MODELS DEMONSTRATE THE EFFECTIVENESS OF THE DIFFUSION MODEL FOR FSS TASK, WITH A CONSISTENT IMPROVEMENT IN SEGMENTATION PERFORMANCE.
    Abstract Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure. Specifically, we utilize the powerful generation ability of diffusion models to generate diverse auxiliary support images by using the semantic mask, scribble or soft HED boundary of the support image as control conditions. This generation process simulates the variety within the class of the query image, such as color, texture variation, lighting, $etc$. As a result, FSS models can refer to more diverse support images, yielding more robust representations, thereby achieving a consistent improvement in segmentation performance. Extensive experiments on three publicly available datasets based on existing advanced FSS models demonstrate the effectiveness of the diffusion model for FSS task. Furthermore, we explore in detail the impact of different input settings of the diffusion model on segmentation performance. Hopefully, this completely new paradigm will bring inspiration to the study of FSS task integrated with AI-generated content.
    摘要 Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure. Specifically, we utilize the powerful generation ability of diffusion models to generate diverse auxiliary support images by using the semantic mask, scribble or soft HED boundary of the support image as control conditions. This generation process simulates the variety within the class of the query image, such as color, texture variation, lighting, etc. As a result, FSS models can refer to more diverse support images, yielding more robust representations, thereby achieving a consistent improvement in segmentation performance. Extensive experiments on three publicly available datasets based on existing advanced FSS models demonstrate the effectiveness of the diffusion model for FSS task. Furthermore, we explore in detail the impact of different input settings of the diffusion model on segmentation performance. Hopefully, this completely new paradigm will bring inspiration to the study of FSS task integrated with AI-generated content.Here's the translation in Traditional Chinese as well:Diffusion models have demonstrated excellent performance in image generation. Although various few-shot semantic segmentation (FSS) models with different network structures have been proposed, performance improvement has reached a bottleneck. This paper presents the first work to leverage the diffusion model for FSS task, called DifFSS. DifFSS, a novel FSS paradigm, can further improve the performance of the state-of-the-art FSS models by a large margin without modifying their network structure. Specifically, we utilize the powerful generation ability of diffusion models to generate diverse auxiliary support images by using the semantic mask, scribble or soft HED boundary of the support image as control conditions. This generation process simulates the variety within the class of the query image, such as color, texture variation, lighting, etc. As a result, FSS models can refer to more diverse support images, yielding more robust representations, thereby achieving a consistent improvement in segmentation performance. Extensive experiments on three publicly available datasets based on existing advanced FSS models demonstrate the effectiveness of the diffusion model for FSS task. Furthermore, we explore in detail the impact of different input settings of the diffusion model on segmentation performance. Hopefully, this completely new paradigm will bring inspiration to the study of FSS task integrated with AI-generated content.

Hierarchical Open-vocabulary Universal Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.00764
  • repo_url: https://github.com/berkeley-hipie/hipie
  • paper_authors: Xudong Wang, Shufan Li, Konstantinos Kallidromitis, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
  • for: 这个论文的目的是提出一种基于文本描述的开放词汇图像分割方法,可以在不同的语义水平上进行多级划分。
  • methods: 该方法使用了一种异步文本-图像融合机制和表示学习模块,以及为不同类别分别设计的表示学习模块。
  • results: 该方法在40多个数据集上进行测试,得到了开放、多级和不确定图像分割任务中的状态机器。
    Abstract Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions. However, complex visual scenes can be naturally decomposed into simpler parts and abstracted at multiple levels of granularity, introducing inherent segmentation ambiguity. Unlike existing methods that typically sidestep this ambiguity and treat it as an external factor, our approach actively incorporates a hierarchical representation encompassing different semantic-levels into the learning process. We propose a decoupled text-image fusion mechanism and representation learning modules for both "things" and "stuff".1 Additionally, we systematically examine the differences that exist in the textual and visual features between these types of categories. Our resulting model, named HIPIE, tackles HIerarchical, oPen-vocabulary, and unIvErsal segmentation tasks within a unified framework. Benchmarked on over 40 datasets, e.g., ADE20K, COCO, Pascal-VOC Part, RefCOCO/RefCOCOg, ODinW and SeginW, HIPIE achieves the state-of-the-art results at various levels of image comprehension, including semantic-level (e.g., semantic segmentation), instance-level (e.g., panoptic/referring segmentation and object detection), as well as part-level (e.g., part/subpart segmentation) tasks. Our code is released at https://github.com/berkeley-hipie/HIPIE.
    摘要

EmoGen: Eliminating Subjective Bias in Emotional Music Generation

  • paper_url: http://arxiv.org/abs/2307.01229
  • repo_url: https://github.com/microsoft/muzic
  • paper_authors: Chenfei Kang, Peiling Lu, Botao Yu, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian
  • for: 本文旨在提出一种基于音乐特征的情感音乐生成系统,以减少对情感标签的主观偏见。
  • methods: 本文提出了一种两stage的生成方法,首先对情感标签进行supervised clustering,然后使用自主学习进行 attribute-to-music 生成。
  • results: 对比前方法,本文的Emogen系统在情感控制准确率和音乐质量上均显示出superiority。音乐样本可以通过这里下载:https://ai-muzic.github.io/emogen/,代码可以在这里获取:https://github.com/microsoft/muzic/。
    Abstract Music is used to convey emotions, and thus generating emotional music is important in automatic music generation. Previous work on emotional music generation directly uses annotated emotion labels as control signals, which suffers from subjective bias: different people may annotate different emotions on the same music, and one person may feel different emotions under different situations. Therefore, directly mapping emotion labels to music sequences in an end-to-end way would confuse the learning process and hinder the model from generating music with general emotions. In this paper, we propose EmoGen, an emotional music generation system that leverages a set of emotion-related music attributes as the bridge between emotion and music, and divides the generation into two stages: emotion-to-attribute mapping with supervised clustering, and attribute-to-music generation with self-supervised learning. Both stages are beneficial: in the first stage, the attribute values around the clustering center represent the general emotions of these samples, which help eliminate the impacts of the subjective bias of emotion labels; in the second stage, the generation is completely disentangled from emotion labels and thus free from the subjective bias. Both subjective and objective evaluations show that EmoGen outperforms previous methods on emotion control accuracy and music quality respectively, which demonstrate our superiority in generating emotional music. Music samples generated by EmoGen are available via this link:https://ai-muzic.github.io/emogen/, and the code is available at this link:https://github.com/microsoft/muzic/.
    摘要 音乐是用于传达情感的,因此自动生成情感强烈的音乐是非常重要的。前一些情感音乐生成的方法直接使用标注的情感标签作为控制信号,但这会受到主观偏见的影响:不同的人可能对同一首音乐 annotate 不同的情感标签,一个人在不同的情况下可能会感受到不同的情感。因此,直接将情感标签映射到音乐序列的方式会让学习过程受到混乱,使模型难以生成拥有普遍情感的音乐。在这篇论文中,我们提出了 EmoGen,一种情感音乐生成系统,利用一组与情感相关的音乐特征作为情感和音乐之间的桥梁,并将生成分为两个阶段:情感到特征映射 WITH 监督聚合,以及特征到音乐生成 WITH 自我监督学习。两个阶段都是有利的:在第一阶段,特征值附近的聚合中心表示这些样本的普遍情感,帮助消除主观偏见的影响;在第二阶段,生成完全不依赖情感标签,因此免受主观偏见的影响。两种评价方法(主观和客观)都表明,EmoGen 在情感控制精度和音乐质量方面超过了前一些方法,这表明我们在生成情感音乐方面的优势。生成由 EmoGen 的音乐样本可以通过以下链接获取:https://ai-muzic.github.io/emogen/,代码可以通过以下链接获取:https://github.com/microsoft/muzic/。

Towards Real Smart Apps: Investigating Human-AI Interactions in Smartphone On-Device AI Apps

  • paper_url: http://arxiv.org/abs/2307.00756
  • repo_url: None
  • paper_authors: Jason Ching Yuen Siu, Jieshan Chen, Yujin Huang, Zhenchang Xing, Chunyang Chen
  • for: 这个研究旨在探讨移动应用程序中的人工智能(AI)功能,以便更好地理解用户与AI之间的交互方式,并提供相关的设计指南。
  • methods: 该研究采用了实证研究方法,检查了176个AI应用程序中的255个AI功能,并将其分类为三种主要交互模式。
  • results: 研究发现,用户在使用AI功能时可能会遇到输入敏感、动态行为和输出不确定的问题,而现有的指南和工具并不能完全覆盖这些问题。研究还发现,通过对AI功能的分类和描述,可以帮助设计人员更好地理解用户与AI之间的交互方式,并提供相关的设计指南。
    Abstract With the emergence of deep learning techniques, smartphone apps are now embedded on-device AI features for enabling advanced tasks like speech translation, to attract users and increase market competitiveness. A good interaction design is important to make an AI feature usable and understandable. However, AI features have their unique challenges like sensitiveness to the input, dynamic behaviours and output uncertainty. Existing guidelines and tools either do not cover AI features or consider mobile apps which are confirmed by our informal interview with professional designers. To address these issues, we conducted the first empirical study to explore user-AI-interaction in mobile apps. We aim to understand the status of on-device AI usage by investigating 176 AI apps from 62,822 apps. We identified 255 AI features and summarised 759 implementations into three primary interaction pattern types. We further implemented our findings into a multi-faceted search-enabled gallery. The results of the user study demonstrate the usefulness of our findings.
    摘要 Here's the text in Simplified Chinese:随着深度学习技术的出现,智能手机应用程序现在在设备上嵌入了人工智能功能,以实现高级任务如语音翻译,以吸引用户和提高市场竞争力。一个好的互动设计是关键,以使AI功能可用和理解。然而,AI功能存在独特的挑战,如输入敏感、动态行为和输出不确定性。现有的指南和工具不覆盖AI功能或考虑移动应用程序,如我们通过专业设计师的非正式采访得到的确认。为解决这些问题,我们进行了首次验证性研究,以探索用户与AI互动的行为。我们目标是理解设备上AI使用的当前状况,通过调查176个AI应用程序,涵盖62,822个应用程序,并识别出255个AI功能。我们将这些实现分为三种主要互动模式类型,并将发现应用于多元化搜索可能的画廊中。用户研究结果表明了我们的发现的有用性。

ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.00754
  • repo_url: https://github.com/17000cyh/imdiffusion
  • paper_authors: Yuhang Chen, Chaoyun Zhang, Minghua Ma, Yudong Liu, Ruomeng Ding, Bowen Li, Shilin He, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
  • For: 本研究旨在提出一种新的多变量时间序列异常检测方法,以提高异常检测的精度和可靠性。* Methods: 该方法 combines 时间序列报假和扩散模型,通过利用邻居值的信息,准确地模型时间序列中的Temporal和相关性,从而提高异常检测的精度和Robustness。* Results: 经过广泛的实验研究,我们发现,ImDiffusion 方法在多变量时间序列异常检测方面表现出色,与先前的方法相比,具有更高的检测精度和可靠性。此外,ImDiffusion 方法还在 Microsoft 的实际生产环境中被成功应用,相比legacy方法,ImDiffusion 方法提高了11.4%的检测 F1 分数。
    Abstract Anomaly detection in multivariate time series data is of paramount importance for ensuring the efficient operation of large-scale systems across diverse domains. However, accurately detecting anomalies in such data poses significant challenges. Existing approaches, including forecasting and reconstruction-based methods, struggle to address these challenges effectively. To overcome these limitations, we propose a novel anomaly detection framework named ImDiffusion, which combines time series imputation and diffusion models to achieve accurate and robust anomaly detection. The imputation-based approach employed by ImDiffusion leverages the information from neighboring values in the time series, enabling precise modeling of temporal and inter-correlated dependencies, reducing uncertainty in the data, thereby enhancing the robustness of the anomaly detection process. ImDiffusion further leverages diffusion models as time series imputers to accurately capturing complex dependencies. We leverage the step-by-step denoised outputs generated during the inference process to serve as valuable signals for anomaly prediction, resulting in improved accuracy and robustness of the detection process. We evaluate the performance of ImDiffusion via extensive experiments on benchmark datasets. The results demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in terms of detection accuracy and timeliness. ImDiffusion is further integrated into the real production system in Microsoft and observe a remarkable 11.4% increase in detection F1 score compared to the legacy approach. To the best of our knowledge, ImDiffusion represents a pioneering approach that combines imputation-based techniques with time series anomaly detection, while introducing the novel use of diffusion models to the field.
    摘要 “异常探测在多変量时间序数据中是重要的,以确保大规模系统在多种领域中运作效率。但是,对于这种数据进行精准的异常探测却存在许多挑战。现有的方法,包括预测和重建方法,均无法有效解决这些挑战。为了解决这些限制,我们提出了一个名为ImDiffusion的异常探测框架,它结合时间序数弥留和扩散模型,以获得精准和可靠的异常探测。ImDiffusion使用时间序数弥留的方法,利用邻近值的信息,实现精确的模型时间和相互相关性,减少数据中的不确定性,进而提高异常探测的稳定性。ImDiffusion还利用扩散模型来实现时间序数弥留,精准地捕捉复杂的相互相关性。我们利用步骤实际的检测结果,作为异常预测的有用信号,以提高异常探测的精度和可靠性。”Note that Simplified Chinese is used here, which is a more common writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Population Age Group Sensitivity for COVID-19 Infections with Deep Learning

  • paper_url: http://arxiv.org/abs/2307.00751
  • repo_url: None
  • paper_authors: Md Khairul Islam, Tyler Valentine, Royal Wang, Levi Davis, Matt Manner, Judy Fox
  • for: 本研究旨在identify COVID-19传播率最有影响的年龄组 at US县级别,以帮助公共卫生政策和措施。
  • methods: 本研究使用Modified Morris方法和深度学习时序序列模型Temporal Fusion Transformer,对不同年龄组作为静态特征,对人口疫苗接种状况作为动态特征进行分析。
  • results: 研究发现,在2020年3月1日至2021年11月27日之间,美国县级别的COVID-19传播率最大的年龄组是20-29岁的年轻人。这些结果可以帮助制定公共卫生政策和措施,如targeted疫苗接种策略,以控制病毒的传播。
    Abstract The COVID-19 pandemic has created unprecedented challenges for governments and healthcare systems worldwide, highlighting the critical importance of understanding the factors that contribute to virus transmission. This study aimed to identify the most influential age groups in COVID-19 infection rates at the US county level using the Modified Morris Method and deep learning for time series. Our approach involved training the state-of-the-art time-series model Temporal Fusion Transformer on different age groups as a static feature and the population vaccination status as the dynamic feature. We analyzed the impact of those age groups on COVID-19 infection rates by perturbing individual input features and ranked them based on their Morris sensitivity scores, which quantify their contribution to COVID-19 transmission rates. The findings are verified using ground truth data from the CDC and US Census, which provide the true infection rates for each age group. The results suggest that young adults were the most influential age group in COVID-19 transmission at the county level between March 1, 2020, and November 27, 2021. Using these results can inform public health policies and interventions, such as targeted vaccination strategies, to better control the spread of the virus. Our approach demonstrates the utility of feature sensitivity analysis in identifying critical factors contributing to COVID-19 transmission and can be applied in other public health domains.
    摘要 COVID-19 流行病在全球各地政府和医疗系统中创造了历史性的挑战,高亮了理解病毒传播的因素的重要性。这项研究的目的是在美国县级别使用修改后的摩里斯方法和深度学习时间序列模型,找出COVID-19感染率中最有影响力的年龄层。我们的方法是在不同的年龄层作为静态特征,并将人口疫苗接种状况作为动态特征进行训练。我们分析了每个输入特征对COVID-19感染率的影响,并根据Morris敏感度分数排序了它们,这些分数量化了每个年龄层对病毒传播率的贡献。结果被CDC和US Census的实际感染率数据验证。研究发现,在2020年3月1日至2021年11月27日之间,年轻成年人是COVID-19传播的最有影响力的年龄层。使用这些结果可以改进公共卫生政策和干预措施,例如targeted疫苗接种策略,以更好地控制病毒的传播。我们的方法可以应用在其他公共卫生领域,以检测和控制其他疾病的传播。

Feasibility of Universal Anomaly Detection without Knowing the Abnormality in Medical Images

  • paper_url: http://arxiv.org/abs/2307.00750
  • repo_url: None
  • paper_authors: Can Cui, Yaohong Wang, Shunxing Bao, Yucheng Tang, Ruining Deng, Lucas W. Remedios, Zuhayr Asad, Joseph T. Roland, Ken S. Lau, Qi Liu, Lori A. Coburn, Keith T. Wilson, Bennett A. Landman, Yuankai Huo
    for:* 这项研究旨在提高医学图像异常检测中的 universality,即使用只有正常图像进行训练,可以准确地检测不同类型的异常。methods:* 这项研究使用了多种异常检测方法,包括深度学习方法,并对其进行比较分析,以找到最佳的异常检测模型。results:* 实验结果表明,None of the evaluated methods consistently achieved the best performance across all datasets,但是我们提议的方法可以提高异常检测的 Robustness(average AUC 0.956)。
    Abstract Many anomaly detection approaches, especially deep learning methods, have been recently developed to identify abnormal image morphology by only employing normal images during training. Unfortunately, many prior anomaly detection methods were optimized for a specific "known" abnormality (e.g., brain tumor, bone fraction, cell types). Moreover, even though only the normal images were used in the training process, the abnormal images were often employed during the validation process (e.g., epoch selection, hyper-parameter tuning), which might leak the supposed ``unknown" abnormality unintentionally. In this study, we investigated these two essential aspects regarding universal anomaly detection in medical images by (1) comparing various anomaly detection methods across four medical datasets, (2) investigating the inevitable but often neglected issues on how to unbiasedly select the optimal anomaly detection model during the validation phase using only normal images, and (3) proposing a simple decision-level ensemble method to leverage the advantage of different kinds of anomaly detection without knowing the abnormality. The results of our experiments indicate that none of the evaluated methods consistently achieved the best performance across all datasets. Our proposed method enhanced the robustness of performance in general (average AUC 0.956).
    摘要 Many anomaly detection approaches, especially deep learning methods, have recently been developed to identify abnormal image morphology by only employing normal images during training. Unfortunately, many prior anomaly detection methods were optimized for a specific "known" abnormality (e.g., brain tumor, bone fraction, cell types). Moreover, even though only the normal images were used in the training process, the abnormal images were often employed during the validation process (e.g., epoch selection, hyper-parameter tuning), which might leak the supposed "unknown" abnormality unintentionally. In this study, we investigated these two essential aspects regarding universal anomaly detection in medical images by (1) comparing various anomaly detection methods across four medical datasets, (2) investigating the inevitable but often neglected issues on how to unbiasedly select the optimal anomaly detection model during the validation phase using only normal images, and (3) proposing a simple decision-level ensemble method to leverage the advantage of different kinds of anomaly detection without knowing the abnormality. The results of our experiments indicate that none of the evaluated methods consistently achieved the best performance across all datasets. Our proposed method enhanced the robustness of performance in general (average AUC 0.956).Here's the word-for-word translation of the text into Simplified Chinese:多种异常检测方法,特别是深度学习方法,最近发展来识别异常图像辐射学形态,只使用正常图像进行训练。然而,许多先前的异常检测方法是为特定的"已知"异常(例如脑肿瘤、骨分数、细胞类型)优化的。此外,即使只使用正常图像进行训练,但在验证过程中仍然使用异常图像(例如选择epoch、调整超参数),可能不intsentionally泄露所谓的"未知"异常。在这个研究中,我们调查了这两个关键的问题:(1)在四个医学图像 dataset 上比较不同异常检测方法的性能,(2)调查如何不偏袋 selecting 最佳异常检测模型 durante 验证阶段使用只正常图像,以及(3)提议一种简单的决策层ensemble方法,以利用不同的异常检测方法而不需要知道异常。实验结果表明,评估的方法无一例取得所有 dataset 的最佳性能。我们提议的方法提高了总的性能稳定性(平均 AUC 0.956)。

ESGCN: Edge Squeeze Attention Graph Convolutional Network for Traffic Flow Forecasting

  • paper_url: http://arxiv.org/abs/2307.01227
  • repo_url: None
  • paper_authors: Sangrok Lee, Ha Young Kim
  • for: 这 paper 是为了预测交通流量的高度挑战性的任务,因为交通流量具有时空两个维度的相互关联性。作者们提出了一种名为 Edge Squeeze Graph Convolutional Network (ESGCN) 的网络,用于预测多个区域的交通流量。
  • methods: ESGCN 包括两个模块:W-模块和 ES 模块。W-模块 是一个完全节点 convolutional network,它在每个交通区域中分别编码时间序列,并在不同的级别上分解时间序列来捕捉细节和概念特征。ES 模块 使用图 convolutional network (GCN) 模型时空相互关联性,生成一个 Adaptive Adjacency Matrix (AAM) 以捕捉时空相互关联性。
  • results: 实验结果表明,ESGCN 在四个实际数据集(PEMS03、04、07、08)上达到了当前最佳性能水平,而且计算成本较低。
    Abstract Traffic forecasting is a highly challenging task owing to the dynamical spatio-temporal dependencies of traffic flows. To handle this, we focus on modeling the spatio-temporal dynamics and propose a network termed Edge Squeeze Graph Convolutional Network (ESGCN) to forecast traffic flow in multiple regions. ESGCN consists of two modules: W-module and ES module. W-module is a fully node-wise convolutional network. It encodes the time-series of each traffic region separately and decomposes the time-series at various scales to capture fine and coarse features. The ES module models the spatio-temporal dynamics using Graph Convolutional Network (GCN) and generates an Adaptive Adjacency Matrix (AAM) with temporal features. To improve the accuracy of AAM, we introduce three key concepts. 1) Using edge features to directly capture the spatiotemporal flow representation among regions. 2) Applying an edge attention mechanism to GCN to extract the AAM from the edge features. Here, the attention mechanism can effectively determine important spatio-temporal adjacency relations. 3) Proposing a novel node contrastive loss to suppress obstructed connections and emphasize related connections. Experimental results show that ESGCN achieves state-of-the-art performance by a large margin on four real-world datasets (PEMS03, 04, 07, and 08) with a low computational cost.
    摘要 快速预测是一项非常具有挑战性的任务,因为交通流动具有空间时间相关性。为了解决这个问题,我们专注于模型空间时间动态相关性,并提出了一个名为 Edge Squeeze Graph Convolutional Network(ESGCN)的网络,用于预测多个区域的交通流。ESGCN包括两个模块:W模块和ES模块。W模块是一个完全节点卷积网络,它在每个交通区域中分别编码时间序列,并将时间序列分解为不同尺度来捕捉细节和概念特征。ES模块使用图aelastic卷积网络(GCN)模型空间时间动态相关性,并生成一个 Adaptive Adjacency Matrix(AAM),其中包含了时间特征。为了提高AAM的准确性,我们提出了三个关键思想:1)通过边特征直接捕捉交通流 repre sentation among regions。2)通过边注意机制来EXTRACT AAM from edge features。这里注意机制可以有效地确定重要的空间时间相关关系。3)提出一种新的节点对比损失函数,以抑制干扰连接和优化相关连接。实验结果表明,ESGCN可以在四个真实世界数据集(PEMS03、04、07和08)上达到状态之前的突出表现,而且计算成本较低。

vONTSS: vMF based semi-supervised neural topic modeling with optimal transport

  • paper_url: http://arxiv.org/abs/2307.01226
  • repo_url: None
  • paper_authors: Weijie Xu, Xiaoyu Jiang, Srinivasan H. Sengamedu, Francis Iannacci, Jinjin Zhao
  • for: This paper presents a semi-supervised neural topic modeling method, vONTSS, which aims to incorporate human knowledge into the topic modeling process.
  • methods: vONTSS uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport to generate potential topics and optimize topic-keyword quality and topic classification.
  • results: The authors show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity, and also supports unsupervised topic modeling. Additionally, they prove the equivalence of optimal transport loss and cross-entropy loss at the global minimum.
    Abstract Recently, Neural Topic Models (NTM), inspired by variational autoencoders, have attracted a lot of research interest; however, these methods have limited applications in the real world due to the challenge of incorporating human knowledge. This work presents a semi-supervised neural topic modeling method, vONTSS, which uses von Mises-Fisher (vMF) based variational autoencoders and optimal transport. When a few keywords per topic are provided, vONTSS in the semi-supervised setting generates potential topics and optimizes topic-keyword quality and topic classification. Experiments show that vONTSS outperforms existing semi-supervised topic modeling methods in classification accuracy and diversity. vONTSS also supports unsupervised topic modeling. Quantitative and qualitative experiments show that vONTSS in the unsupervised setting outperforms recent NTMs on multiple aspects: vONTSS discovers highly clustered and coherent topics on benchmark datasets. It is also much faster than the state-of-the-art weakly supervised text classification method while achieving similar classification performance. We further prove the equivalence of optimal transport loss and cross-entropy loss at the global minimum.
    摘要 近些年,神经主题模型(NTM),受到变量自动编码器的激发,在研究中吸引了很多关注;然而,这些方法在实际应用中受到人工知识的挑战。本文提出了一种半监督神经主题模型方法,vONTSS,它使用von Mises-Fisher(vMF)基于的变量自动编码器和优质运输。当提供一些关键词时,vONTSS在半监督设定下生成了潜在主题和优化主题-关键词质量和主题分类。实验表明,vONTSS在分类精度和多样性方面超过了现有的半监督主题模型方法。vONTSS还支持无监督主题模型。量化和质量实验表明,vONTSS在无监督设定下超过了最近NTMs在多个方面:vONTSS在标准数据集上发现了高度归一化和凝结的主题。它还比最新的弱监督文本分类方法快得多,同时达到了类似的分类性能。我们进一步证明了优化运输损失和十字积极损失在全局最小点的等价性。

UnLoc: A Universal Localization Method for Autonomous Vehicles using LiDAR, Radar and/or Camera Input

  • paper_url: http://arxiv.org/abs/2307.00741
  • repo_url: None
  • paper_authors: Muhammad Ibrahim, Naveed Akhtar, Saeed Anwar, Ajmal Mian
  • for: 本研究旨在提出一种基于多感器输入的自主导航Localization方法,以便在各种天气条件下实现Robotics中的自主导航。
  • methods: 本方法使用一种名为UnLoc的卷积神经网络模型,可以处理LiDAR、摄像头和雷达输入数据,并且可以在需要时选择使用一个或多个输入感知器。UnLoc使用3D稀疏卷积和圆柱体分割来处理LiDAR帧,并使用ResNet块和滑块注意力机制来筛选特征。
  • results: 研究人员对Oxford Radar RobotCar、ApolloSouthBay和Perth-WA数据集进行了广泛的评估,结果表明了本方法的效果。
    Abstract Localization is a fundamental task in robotics for autonomous navigation. Existing localization methods rely on a single input data modality or train several computational models to process different modalities. This leads to stringent computational requirements and sub-optimal results that fail to capitalize on the complementary information in other data streams. This paper proposes UnLoc, a novel unified neural modeling approach for localization with multi-sensor input in all weather conditions. Our multi-stream network can handle LiDAR, Camera and RADAR inputs for localization on demand, i.e., it can work with one or more input sensors, making it robust to sensor failure. UnLoc uses 3D sparse convolutions and cylindrical partitioning of the space to process LiDAR frames and implements ResNet blocks with a slot attention-based feature filtering module for the Radar and image modalities. We introduce a unique learnable modality encoding scheme to distinguish between the input sensor data. Our method is extensively evaluated on Oxford Radar RobotCar, ApolloSouthBay and Perth-WA datasets. The results ascertain the efficacy of our technique.
    摘要 本文提出了一种新的多感器输入模型,用于自主导航的地址LOCALIZATION。现有的方法都是基于单一输入数据模式或者训练多种计算模型来处理不同的数据流。这会导致计算需求很高,而且结果不够优化,无法充分利用其他数据流的补做信息。本文提出了一种名为UnLoc的新方法,可以同时处理LiDAR、摄像头和雷达输入数据,并且可以根据需要选择使用一个或多个输入传感器。这使得我们的方法更加稳定和可靠,不受传感器失效的影响。我们的方法使用3D稀疏核心和圆柱形分割空间来处理LiDAR帧,并实现了ResNet块和槽注意力机制来处理雷达和图像数据。我们还引入了一种唯一的学习型感知编码方法,以便在输入传感器数据之间进行分类。我们的方法在Oxford Radar RobotCar、ApolloSouthBay和Perth-WA数据集上进行了广泛的评估,结果证明了我们的方法的效果。

Novelty and Lifted Helpful Actions in Generalized Planning

  • paper_url: http://arxiv.org/abs/2307.00735
  • repo_url: https://github.com/you68681/Novelty-and-Lifted-Helpful-Actions-in-Generalized-Planning
  • paper_authors: Chao Lei, Nir Lipovetzky, Krista A. Ehinger
  • for: The paper is written to improve the ability to compute planning programs for generalized planning (GP) problems by introducing novelty-based generalized planning solvers and scaling up the search with new evaluation functions and structural program restrictions.
  • methods: The paper uses goal-oriented heuristics, landmarks, novelty-based best-first search (BFS), and progressive variant PGP ($v$) to improve the planning process.
  • results: The new algorithms BFS($v$) and PGP($v$) outperform the state-of-the-art in GP over the standard generalized planning benchmarks, and practical findings on the above-mentioned methods in generalized planning are briefly discussed.Here is the Chinese translation of the three key points:
  • for: 这篇论文是为了提高通用计划(GP)问题中计划程序的计算能力,通过引入新颖性基于搜索的 generalized planning 解决方案和缩大搜索的新评价函数和结构程序限制。
  • methods: 论文使用了目标帮助函数、标志点和新颖性基于最佳先进搜索(BFS)和进程变量 PGP($v$) 来改进计划过程。
  • results: BFS($v$) 和 PGP($v$) 在标准化 GP 评价函数上比现状态的 GP 算法更高效,并 briefly 讨论了上述方法在 GP 中的实践发现。
    Abstract It has been shown recently that successful techniques in classical planning, such as goal-oriented heuristics and landmarks, can improve the ability to compute planning programs for generalized planning (GP) problems. In this work, we introduce the notion of action novelty rank, which computes novelty with respect to a planning program, and propose novelty-based generalized planning solvers, which prune a newly generated planning program if its most frequent action repetition is greater than a given bound $v$, implemented by novelty-based best-first search BFS($v$) and its progressive variant PGP($v$). Besides, we introduce lifted helpful actions in GP derived from action schemes, and propose new evaluation functions and structural program restrictions to scale up the search. Our experiments show that the new algorithms BFS($v$) and PGP($v$) outperform the state-of-the-art in GP over the standard generalized planning benchmarks. Practical findings on the above-mentioned methods in generalized planning are briefly discussed.
    摘要 Recently, successful techniques in classical planning, such as goal-oriented heuristics and landmarks, have been shown to improve the ability to compute planning programs for generalized planning (GP) problems. In this work, we introduce the notion of action novelty rank, which computes novelty with respect to a planning program, and propose novelty-based generalized planning solvers. These solvers prune a newly generated planning program if its most frequent action repetition is greater than a given bound $v$, implemented by novelty-based best-first search BFS($v$) and its progressive variant PGP($v$). Furthermore, we introduce lifted helpful actions in GP derived from action schemes, and propose new evaluation functions and structural program restrictions to scale up the search. Our experiments show that the new algorithms BFS($v$) and PGP($v$) outperform the state-of-the-art in GP over the standard generalized planning benchmarks. Practical findings on the above-mentioned methods in generalized planning are briefly discussed.Here's the translation in Traditional Chinese:最近,成功的古典规划技术,如目标导向的规划和特征点,已经显示可以提高通用规划(GP)问题的计划程序计算能力。在这项工作中,我们引入行动新鲜度排名,计算行动新鲜度与规划程序之间的相互关系,并提出基于新鲜度的通用规划解决方案。这些解决方案会根据给定的最大重复动作数 bound $v$ 进行缩短,实现了基于新鲜度的最佳先进搜索 BFS($v$) 和其进程式变体 PGP($v$)。此外,我们还引入 GP 中的升级帮助动作,基于动作方案,并提出新的评价函数和结构Program限制来扩大搜索范围。我们的实验表明,新的算法 BFS($v$) 和 PGP($v$) 在 GP 标准通用规划测试 bencmarks 上表现出色,超越了现有的状态作呈现。具体的实践结论在通用规划领域也 briefly discuss。

Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

  • paper_url: http://arxiv.org/abs/2307.01225
  • repo_url: None
  • paper_authors: Bushra Sabir, M. Ali Babar, Sharif Abuadbba
  • for: 强化 transformer-based 文本分类器的安全性和可靠性,提高模型的抗辐射性和可靠性。
  • methods: 提出了一种可视化和透明的检测和转换框架(IT-DT),通过注意力地图、综合梯度和模型反馈来提高检测中的可读性,并在转换阶段使用预训练 embedding 和模型反馈来生成最佳替换,以将辐射例转化为非辐射的文本。
  • results: 对 transformer-based 文本分类器进行了广泛的实验,证明了 IT-DT 框架的有效性和可靠性,并且通过人工专家的审核和反馈,提高了决策的准确性和可靠性,特别是在复杂的情况下。
    Abstract Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven Detection and Transformation (IT-DT) framework. It focuses on interpretability and transparency in detecting and transforming textual adversarial examples. IT-DT utilizes techniques like attention maps, integrated gradients, and model feedback for interpretability during detection. This helps identify salient features and perturbed words contributing to adversarial classifications. In the transformation phase, IT-DT uses pre-trained embeddings and model feedback to generate optimal replacements for perturbed words. By finding suitable substitutions, we aim to convert adversarial examples into non-adversarial counterparts that align with the model's intended behavior while preserving the text's meaning. Transparency is emphasized through human expert involvement. Experts review and provide feedback on detection and transformation results, enhancing decision-making, especially in complex scenarios. The framework generates insights and threat intelligence empowering analysts to identify vulnerabilities and improve model robustness. Comprehensive experiments demonstrate the effectiveness of IT-DT in detecting and transforming adversarial examples. The approach enhances interpretability, provides transparency, and enables accurate identification and successful transformation of adversarial inputs. By combining technical analysis and human expertise, IT-DT significantly improves the resilience and trustworthiness of transformer-based text classifiers against adversarial attacks.
    摘要 带有变换器基于模型的文本分类器如BERT、Roberta、T5和GPT-3在自然语言处理中表现出色,但它们对假输入攻击存在安全风险。现有的防御方法缺乏可读性,使得对假输入分类和模型漏洞难以理解。为了解决这问题,我们提出了可读性和透明度驱动的检测和转换(IT-DT)框架。IT-DT注重可读性和透明度在检测和转换文本假输入中。IT-DT使用注意力地图、整合梯度和模型反馈以实现可读性。这帮助 Identify突出的特征和受攻击的单词,从而更好地理解假输入分类。在转换阶段,IT-DT使用预训练 embedding 和模型反馈生成适当的替换,以将受攻击的单词转换为符合模型意图的非假输入。通过找到适当的替换,我们希望将假输入转换成符合模型意图的非假输入,保持文本的意思不变。在检测和转换结果中,人工专家参与约束,以提高决策,特别是在复杂的场景下。这种方法生成了检测和转换结果,并提供了威胁情报,使分析者能够更好地识别模型的漏洞并改进模型的Robustness。经过全面的实验,我们发现IT-DT可以有效地检测和转换假输入。这种方法提高了可读性,提供了透明度,并帮助确定和成功地转换假输入。通过结合技术分析和人工专家知识,IT-DT在提高变换器基于文本分类器对假输入的抵抗性和可靠性方面具有显著的优势。

Worth of knowledge in deep learning

  • paper_url: http://arxiv.org/abs/2307.00712
  • repo_url: https://github.com/woshixuhao/worth_of_knowledge
  • paper_authors: Hao Xu, Yuntian Chen, Dongxiao Zhang
  • for: 本文提出了一种基于可解释机器学习的框架,用于评估深度学习模型中知识的价值。
  • methods: 该框架使用数据量和估计范围来评估知识的价值,并通过量化实验评估知识与数据之间的复杂关系。
  • results: 研究发现,数据量和估计范围对知识的价值有深刻的影响,包括互相依存、合作和替换效果。该框架可以应用于多种常见的网络架构,并可以改进了有知识机器学习的性能,以及分辨不正确的先验知识。
    Abstract Knowledge constitutes the accumulated understanding and experience that humans use to gain insight into the world. In deep learning, prior knowledge is essential for mitigating shortcomings of data-driven models, such as data dependence, generalization ability, and compliance with constraints. To enable efficient evaluation of the worth of knowledge, we present a framework inspired by interpretable machine learning. Through quantitative experiments, we assess the influence of data volume and estimation range on the worth of knowledge. Our findings elucidate the complex relationship between data and knowledge, including dependence, synergistic, and substitution effects. Our model-agnostic framework can be applied to a variety of common network architectures, providing a comprehensive understanding of the role of prior knowledge in deep learning models. It can also be used to improve the performance of informed machine learning, as well as distinguish improper prior knowledge.
    摘要 知识是人类用来理解世界的总结和经验。在深度学习中,先前知识是关键,可以减轻数据驱动模型的缺陷,如数据依赖、泛化能力和约束遵循。为了有效评估知识的价值,我们提出了基于可解释机器学习的框架。通过量化实验,我们评估数据量和估计范围对知识的影响。我们的发现揭示了数据和知识之间的复杂关系,包括依赖、共同作用和替换效应。我们的模型无关框架可以应用于多种常见的网络架构,为深度学习模型提供全面的知识角色。它还可以用于改进了知识机器学习性能,以及分辨不当先前知识。

Classification of sleep stages from EEG, EOG and EMG signals by SSNet

  • paper_url: http://arxiv.org/abs/2307.05373
  • repo_url: None
  • paper_authors: Haifa Almutairi, Ghulam Mubashar Hassan, Amitava Datta
  • for: 鉴别睡眠阶段,用于诊断睡眠相关疾病,如呼吸暂停睡眠疾病(SDB)。
  • methods: 使用了两个深度学习网络,基于卷积神经网络(CNN)和长短期记忆网络(LSTM),从电生物学信号(EOG)、电脑神经学信号(EEG)和电强学信号(EMG)三种信号中提取特征。
  • results: 使用了两个公共数据集,sleep-EDF扩展数据集和ISRUC-睡眠数据集,评估了我们提出的模型的性能。实验结果表明,我们的模型在三类睡眠阶段的分类中具有96.36%的准确率和93.40%的科里奥卷积率,在五类睡眠阶段的分类中具有96.57%的准确率和83.05%的科里奥卷积率,与现有技术相比,我们的模型在睡眠阶段分类中表现出色。
    Abstract Classification of sleep stages plays an essential role in diagnosing sleep-related diseases including Sleep Disorder Breathing (SDB) disease. In this study, we propose an end-to-end deep learning architecture, named SSNet, which comprises of two deep learning networks based on Convolutional Neuron Networks (CNN) and Long Short Term Memory (LSTM). Both deep learning networks extract features from the combination of Electrooculogram (EOG), Electroencephalogram (EEG), and Electromyogram (EMG) signals, as each signal has distinct features that help in the classification of sleep stages. The features produced by the two-deep learning networks are concatenated to pass to the fully connected layer for the classification. The performance of our proposed model is evaluated by using two public datasets Sleep-EDF Expanded dataset and ISRUC-Sleep dataset. The accuracy and Kappa coefficient are 96.36% and 93.40% respectively, for classifying three classes of sleep stages using Sleep-EDF Expanded dataset. Whereas, the accuracy and Kappa coefficient are 96.57% and 83.05% respectively for five classes of sleep stages using Sleep-EDF Expanded dataset. Our model achieves the best performance in classifying sleep stages when compared with the state-of-the-art techniques.
    摘要 �р��� Landesleep���aszt stage classification plays an essential role in diagnosing sleep-related diseases, including Sleep Disorder Breathing (SDB) disease. In this study, we propose an end-to-end deep learning architecture, named SSNet, which comprises of two deep learning networks based on Convolutional Neuron Networks (CNN) and Long Short Term Memory (LSTM). Both deep learning networks extract features from the combination of Electrooculogram (EOG), Electroencephalogram (EEG), and Electromyogram (EMG) signals, as each signal has distinct features that help in the classification of sleep stages. The features produced by the two deep learning networks are concatenated to pass to the fully connected layer for the classification. The performance of our proposed model is evaluated by using two public datasets Sleep-EDF Expanded dataset and ISRUC-Sleep dataset. The accuracy and Kappa coefficient are 96.36% and 93.40% respectively, for classifying three classes of sleep stages using Sleep-EDF Expanded dataset. Whereas, the accuracy and Kappa coefficient are 96.57% and 83.05% respectively for five classes of sleep stages using Sleep-EDF Expanded dataset. Our model achieves the best performance in classifying sleep stages when compared with the state-of-the-art techniques.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the given text and may not be exactly the same as the original text in Traditional Chinese.

From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy

  • paper_url: http://arxiv.org/abs/2307.00691
  • repo_url: None
  • paper_authors: Maanak Gupta, CharanKumar Akiri, Kshitiz Aryal, Eli Parker, Lopamudra Praharaj
  • for: 这个研究论文的目的是探讨Generative AI(GenAI)在安全和隐私领域的限制、挑战、风险和机遇。
  • methods: 该论文使用ChatGPT作为例子,描述了恶意用户可以通过各种攻击方式泄露恶意信息,并利用GenAI工具在网络攻击、社会工程ered attacks、自动攻击、攻击payload生成、木马创建等领域的可能性。
  • results: 论文发现了ChatGPT的漏洞,可以被恶意用户利用来 circumvent ethical constraints,并提供了一些防御技术和ethical guidelines,以及未来的开发方向,以使GenAI更安全、可靠、合法和道德。
    Abstract Undoubtedly, the evolution of Generative AI (GenAI) models has been the highlight of digital transformation in the year 2022. As the different GenAI models like ChatGPT and Google Bard continue to foster their complexity and capability, it's critical to understand its consequences from a cybersecurity perspective. Several instances recently have demonstrated the use of GenAI tools in both the defensive and offensive side of cybersecurity, and focusing on the social, ethical and privacy implications this technology possesses. This research paper highlights the limitations, challenges, potential risks, and opportunities of GenAI in the domain of cybersecurity and privacy. The work presents the vulnerabilities of ChatGPT, which can be exploited by malicious users to exfiltrate malicious information bypassing the ethical constraints on the model. This paper demonstrates successful example attacks like Jailbreaks, reverse psychology, and prompt injection attacks on the ChatGPT. The paper also investigates how cyber offenders can use the GenAI tools in developing cyber attacks, and explore the scenarios where ChatGPT can be used by adversaries to create social engineering attacks, phishing attacks, automated hacking, attack payload generation, malware creation, and polymorphic malware. This paper then examines defense techniques and uses GenAI tools to improve security measures, including cyber defense automation, reporting, threat intelligence, secure code generation and detection, attack identification, developing ethical guidelines, incidence response plans, and malware detection. We will also discuss the social, legal, and ethical implications of ChatGPT. In conclusion, the paper highlights open challenges and future directions to make this GenAI secure, safe, trustworthy, and ethical as the community understands its cybersecurity impacts.
    摘要 无疑,生成AI(GenAI)模型在2022年的数字转型中具有突出的亮点。随着不同的GenAI模型如ChatGPT和Google Bard不断增强其复杂性和能力,就需要从cybersecurity角度理解它的后果。最近的一些实例演示了GenAI工具在网络安全领域的使用,包括防御和攻击两个方面。本文探讨GenAI在网络安全和隐私方面的局限性、挑战、潜在风险和机遇。本文描述了ChatGPT的漏洞,可以让恶意用户通过违规的方式泄露恶意信息。本文还描述了成功的袭击示例,如监禁、反 психологи学攻击和提示注入攻击。此外,本文还探讨了攻击者如何使用GenAI工具制造社会工程攻击、钓鱼攻击、自动攻击、攻击payload生成、垃圾软件创造和多态垃圾软件。本文还检查了如何使用GenAI工具提高安全措施,包括自动化cyber防御、报告、攻击智能、安全代码生成和检测、攻击认识、开发伦理准则、事件应急计划和垃圾识别。最后,本文强调了社会、法律和伦理方面的问题,并提出未来的挑战和方向。Note: Please note that the translation is done using a machine translation tool, and may not be perfect or entirely accurate.

SDC-HSDD-NDSA: Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption

  • paper_url: http://arxiv.org/abs/2307.00677
  • repo_url: https://github.com/hao-b-shu/sdc-hsdd-ndsa
  • paper_authors: Hao Shu
  • for: 本研究旨在提供一种能够检测高密度区域内的结构的密度基于划分算法,以解决传统密度基于划分算法无法检测高密度区域内的结构的问题。
  • methods: 该算法使用了次要导向差异、层次结构、 норциали化密度以及自适应系数,因此被称为结构检测划分算法(SDC-HSDD-NDSA)。
  • results: 在多个数据集中运行了该算法,结果验证了其结构检测、噪声Robustness以及不同粒度独立性,并且在一些数据集中表现比传统密度基于划分算法更好。
    Abstract Density-based clustering could be the most popular clustering algorithm since it can identify clusters of arbitrary shape as long as different (high-density) clusters are separated by low-density regions. However, the requirement of the separateness of clusters by low-density regions is not trivial since a high-density region might have different structures which should be clustered into different groups. Such a situation demonstrates the main flaw of all previous density-based clustering algorithms we have known--structures in a high-density cluster could not be detected. Therefore, this paper aims to provide a density-based clustering scheme that not only has the ability previous ones have but could also detect structures in a high-density region not separated by low-density ones. The algorithm employs secondary directed differential, hierarchy, normalized density, as well as the self-adaption coefficient, and thus is called Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption, dubbed by SDC-HSDD-NDSA for short. To illustrate its effectiveness, we run the algorithm in several data sets. The results verify its validity in structure detection, robustness over noises, as well as independence of granularities, and demonstrate that it could outperform previous ones. The Python code of the paper could be found on https://github.com/Hao-B-Shu/SDC-HSDD-NDSA.
    摘要 density-based clustering可能是自然语言处理中最受欢迎的聚类算法,因为它可以找到任意形状的聚类,只要不同的高密度区域被低密度区域隔离。然而,要求不同的聚类被低密度区域隔离并不是易事,因为高密度区域可能有不同的结构,这些结构应该被分配到不同的组。这种情况 demonstartes all previous density-based clustering algorithms的主要缺陷——高密度区域中的结构无法被探测。因此,这篇论文的目标是提供一种能够执行 previous ones 的 density-based clustering scheme,同时能够探测高密度区域中的结构。该算法使用了 secondary directed differential, hierarchy,normalized density,以及自适应系数,因此被称为 Structure Detecting Cluster by Hierarchical Secondary Directed Differential with Normalized Density and Self-Adaption,简称 SDC-HSDD-NDSA。为证明其有效性,我们在多个数据集上运行了该算法。结果表明其在结构探测、鲁棒性和粒度独立性方面具有优越性,并且可以超越 previous ones。Python代码可以在 https://github.com/Hao-B-Shu/SDC-HSDD-NDSA 找到。

Morse Neural Networks for Uncertainty Quantification

  • paper_url: http://arxiv.org/abs/2307.00667
  • repo_url: None
  • paper_authors: Benoit Dherin, Huiyi Hu, Jie Ren, Michael W. Dusenberry, Balaji Lakshminarayanan
  • For: The paper is written for uncertainty quantification and introduces a new deep generative model called the Morse neural network.* Methods: The Morse neural network uses a KL-divergence loss to fit the model and yields five components: a generative density, an out-of-distribution (OOD) detector, a calibration temperature, a generative sampler, and a distance-aware classifier (in the supervised case).* Results: The Morse neural network unifies many techniques in uncertainty quantification, including OOD detection, anomaly detection, and continuous learning, and has connections to support vector machines, kernel methods, and Morse theory in topology.Here’s the simplified Chinese text for the three information points:* For: 这篇论文是用于不确定量化的新深度生成模型——Morse神经网络。* Methods: Morse神经网络使用KL散度损失来适应模型,并产生五个组件:生成概率分布、外围异常探测器、准确温度、生成抽象器以及在指导 случа中的距离意识分类器。* Results: Morse神经网络可以将不确定量化中的多种技术统一,包括异常探测、异常检测和连续学习,同时与支持向量机、核方法和莫兹理论在拓扑中有联系。
    Abstract We introduce a new deep generative model useful for uncertainty quantification: the Morse neural network, which generalizes the unnormalized Gaussian densities to have modes of high-dimensional submanifolds instead of just discrete points. Fitting the Morse neural network via a KL-divergence loss yields 1) a (unnormalized) generative density, 2) an OOD detector, 3) a calibration temperature, 4) a generative sampler, along with in the supervised case 5) a distance aware-classifier. The Morse network can be used on top of a pre-trained network to bring distance-aware calibration w.r.t the training data. Because of its versatility, the Morse neural networks unifies many techniques: e.g., the Entropic Out-of-Distribution Detector of (Mac\^edo et al., 2021) in OOD detection, the one class Deep Support Vector Description method of (Ruff et al., 2018) in anomaly detection, or the Contrastive One Class classifier in continuous learning (Sun et al., 2021). The Morse neural network has connections to support vector machines, kernel methods, and Morse theory in topology.
    摘要 我们介绍了一种新的深度生成模型,用于不确定性评估:Morse神经网络。它扩展了不归一化的 Gaussian 分布,使其模式为高维子拟合 manifold 而不仅是简单点。通过 Morse 神经网络的适应 KL 异常损失,可以获得1) 非归一化生成密度函数,2) OUT-OF- Distribution 探测器,3) 抽象温度,4) 生成抽象器,以及在指导 случа 5) 距离意识分类器。Morse 神经网络可以在已经训练过的网络之上应用,以实现距离意识准确性评估。由于其灵活性,Morse 神经网络将许多技术纳入其中,如 Entropic Out-of-Distribution Detector(Mac\^edo et al., 2021)、one class Deep Support Vector Description method(Ruff et al., 2018)和 Continuous Learning 中的 Contrastive One Class 分类器(Sun et al., 2021)。Morse 神经网络与支持向量机、核方法和 Morse 理论在拓扑学中有连接。

Solving Multi-Agent Target Assignment and Path Finding with a Single Constraint Tree

  • paper_url: http://arxiv.org/abs/2307.00663
  • repo_url: https://github.com/whoenig/libMultiRobotPlanning
  • paper_authors: Yimin Tang, Zhongqiang Ren, Jiaoyang Li, Katia Sycara
  • for: addresses the Combined Target-Assignment and Path-Finding (TAPF) problem, which requires simultaneously assigning targets to agents and planning collision-free paths.
  • methods: leverages Conflict-Based Search with Target Assignment (CBS-TA), which creates multiple search trees and resolves collisions using Conflict-Based Search. However, CBS-TA suffers from scalability issues due to duplicated collision resolution and expensive computation of K-best assignments.
  • results: develops Incremental Target Assignment CBS (ITA-CBS), which generates a single search tree and avoids computing K-best assignments by incrementally computing new 1-best assignments during the search. ITA-CBS is guaranteed to find an optimal solution in theory and is computationally efficient in practice.
    Abstract Combined Target-Assignment and Path-Finding problem (TAPF) requires simultaneously assigning targets to agents and planning collision-free paths for agents from their start locations to their assigned targets. As a leading approach to address TAPF, Conflict-Based Search with Target Assignment (CBS-TA) leverages both K-best target assignments to create multiple search trees and Conflict-Based Search (CBS) to resolve collisions in each search tree. While being able to find an optimal solution, CBS-TA suffers from scalability due to the duplicated collision resolution in multiple trees and the expensive computation of K-best assignments. We therefore develop Incremental Target Assignment CBS (ITA-CBS) to bypass these two computational bottlenecks. ITA-CBS generates only a single search tree and avoids computing K-best assignments by incrementally computing new 1-best assignments during the search. We show that, in theory, ITA-CBS is guaranteed to find an optimal solution and, in practice, is computationally efficient.
    摘要

Minimum Levels of Interpretability for Artificial Moral Agents

  • paper_url: http://arxiv.org/abs/2307.00660
  • repo_url: None
  • paper_authors: Avish Vijayaraghavan, Cosmin Badea
  • for: 这 paper 是关于人工智能模型在道德决策中的可解释性,以及如何通过可解释性来信任和理解机器内部的决策机制,以便在实际应用中安全部署。
  • methods: 这 paper 使用了一种名为 “最小可解释性水平” (Minimum Level of Interpretability, MLI) 的概念,并建议了不同类型的机器人应用 MLI,以提高其在实际应用中的安全性。
  • results: 这 paper 提供了一个 rapidly-evolving 的可解释性子领域的概述,并介绍了 MLI 的概念和建议,以便在实际应用中安全部署机器人。
    Abstract As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
    摘要

Neuro-Symbolic Sudoku Solver

  • paper_url: http://arxiv.org/abs/2307.00653
  • repo_url: https://github.com/ashutosh1919/neuro-symbolic-sudoku-solver
  • paper_authors: Ashutosh Hathidara, Lalit Pandey
  • for: 解释 Sudoku 问题
  • methods: 使用 Neural Logic Machine (NLM) 和回归学习
  • results: 实现 100% 的准确率解决 3-10 个空格的 Sudoku 问题
    Abstract Deep Neural Networks have achieved great success in some of the complex tasks that humans can do with ease. These include image recognition/classification, natural language processing, game playing etc. However, modern Neural Networks fail or perform poorly when trained on tasks that can be solved easily using backtracking and traditional algorithms. Therefore, we use the architecture of the Neuro Logic Machine (NLM) and extend its functionality to solve a 9X9 game of Sudoku. To expand the application of NLMs, we generate a random grid of cells from a dataset of solved games and assign up to 10 new empty cells. The goal of the game is then to find a target value ranging from 1 to 9 and fill in the remaining empty cells while maintaining a valid configuration. In our study, we showcase an NLM which is capable of obtaining 100% accuracy for solving a Sudoku with empty cells ranging from 3 to 10. The purpose of this study is to demonstrate that NLMs can also be used for solving complex problems and games like Sudoku. We also analyze the behaviour of NLMs with a backtracking algorithm by comparing the convergence time using a graph plot on the same problem. With this study we show that Neural Logic Machines can be trained on the tasks that traditional Deep Learning architectures fail using Reinforcement Learning. We also aim to propose the importance of symbolic learning in explaining the systematicity in the hybrid model of NLMs.
    摘要 深度神经网络已经在一些人类可以轻松完成的任务上取得了很大的成功,包括图像识别/分类、自然语言处理、游戏等。然而,现代神经网络在使用回溯算法和传统算法解决问题时表现不佳,因此我们使用神经逻辑机器(NLM)的架构和扩展其功能来解决9X9个数独游戏。为扩展NLM的应用,我们生成了一个随机的网格维度的维度,从解决过的游戏中获取数据集,并将其中的10个空Cells赋值。游戏的目标是找到1到9的目标值,并填充剩下的空Cells,保持有效的配置。在我们的研究中,我们展示了一个可以在3到10个空Cells的情况下达到100%的准确率的NLM。本研究的目的是证明NLM可以用于解决复杂的问题和游戏,如数独。我们还分析了NLM与回溯算法的交互行为,并通过对同一问题进行图形比较来评估它们的融合性。我们的研究表明,可以通过强化学习训练NLM,使其在传统深度学习架构失败的任务上表现出色。此外,我们还提出了使用符号学习来解释NLM的系统性。

Intra- & Extra-Source Exemplar-Based Style Synthesis for Improved Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.00648
  • repo_url: https://github.com/boschresearch/issa
  • paper_authors: Yumeng Li, Dan Zhang, Margret Keuper, Anna Khoreva
  • for: 提高深度学习模型对域外数据的泛化能力,特别是在自动驾驶等应用场景中频繁出现的域外数据问题。
  • methods: 提出了一种基于StyleGAN2倒数学术的 exemplar-based 风格合成管道,通过Randomize style和content组合在训练集中进行内源风格增强(ISSA),提高了Semantic segmentation的多种数据拟合能力。
  • results: 在不同类型的数据拟合情况下(包括不同地理位置、不利天气条件和日到夜),通过ISSA提高了Semantic segmentation的mIoU分数,最高提高12.4%,并且可以和CNN和Transformers等模型结合使用,同时可以与其他域外数据泛化技术相结合使用。
    Abstract The generalization with respect to domain shifts, as they frequently appear in applications such as autonomous driving, is one of the remaining big challenges for deep learning models. Therefore, we propose an exemplar-based style synthesis pipeline to improve domain generalization in semantic segmentation. Our method is based on a novel masked noise encoder for StyleGAN2 inversion. The model learns to faithfully reconstruct the image, preserving its semantic layout through noise prediction. Using the proposed masked noise encoder to randomize style and content combinations in the training set, i.e., intra-source style augmentation (ISSA) effectively increases the diversity of training data and reduces spurious correlation. As a result, we achieve up to $12.4\%$ mIoU improvements on driving-scene semantic segmentation under different types of data shifts, i.e., changing geographic locations, adverse weather conditions, and day to night. ISSA is model-agnostic and straightforwardly applicable with CNNs and Transformers. It is also complementary to other domain generalization techniques, e.g., it improves the recent state-of-the-art solution RobustNet by $3\%$ mIoU in Cityscapes to Dark Z\"urich. In addition, we demonstrate the strong plug-n-play ability of the proposed style synthesis pipeline, which is readily usable for extra-source exemplars e.g., web-crawled images, without any retraining or fine-tuning. Moreover, we study a new use case to indicate neural network's generalization capability by building a stylized proxy validation set. This application has significant practical sense for selecting models to be deployed in the open-world environment. Our code is available at \url{https://github.com/boschresearch/ISSA}.
    摘要 总的来说,针对域外推断问题,深度学习模型的一个重要挑战是域外推断。为了解决这个问题,我们提出了一种基于例子的风格同化管道,以提高域外推断的 semantic segmentation。我们的方法基于StyleGAN2的受损噪声编码器,使得模型可以准确地重建图像,保留图像的 semantic 布局。通过在训练集中随机变换样式和内容的组合,我们实现了内源样式增强(ISSA)。这种方法可以增加训练数据的多样性,降低偶极相关性,从而达到最高的$12.4\%$ mIoU 改进。ISSA 是模型无关的和简单应用于 CNN 和 Transformer 上。此外,它与其他域外推断技术相结合,可以提高最近的状态艺术解决方案 RobustNet 的 Cityscapes 到 Dark Z\"urich 的 mIoU 表现。此外,我们还证明了我们提posed的风格同化管道具有强大的插件与检查能力,可以在不需要重新训练或微调的情况下使用。此外,我们还研究了一个新的应用场景,即通过建立风格化的代理验证集来评估神经网络的普适性。这种应用场景具有实际 significanc,可以用于选择要在开放世界环境中部署的模型。我们的代码可以在 \url{https://github.com/boschresearch/ISSA} 上找到。

Effects of Explanation Specificity on Passengers in Autonomous Driving

  • paper_url: http://arxiv.org/abs/2307.00633
  • repo_url: None
  • paper_authors: Daniel Omeiza, Raunak Bhattacharyya, Nick Hawes, Marina Jirotka, Lars Kunze
  • for: investigate the effects of natural language explanations’ specificity on passengers in autonomous driving
  • methods: extended an existing data-driven tree-based explainer algorithm by adding a rule-based option for explanation generation, generated auditory natural language explanations with different levels of specificity (abstract and specific)
  • results: both abstract and specific explanations had similar positive effects on passengers’ perceived safety and the feeling of anxiety, but specific explanations influenced the desire of passengers to takeover driving control from the autonomous vehicle, while abstract explanations did not.
    Abstract The nature of explanations provided by an explainable AI algorithm has been a topic of interest in the explainable AI and human-computer interaction community. In this paper, we investigate the effects of natural language explanations' specificity on passengers in autonomous driving. We extended an existing data-driven tree-based explainer algorithm by adding a rule-based option for explanation generation. We generated auditory natural language explanations with different levels of specificity (abstract and specific) and tested these explanations in a within-subject user study (N=39) using an immersive physical driving simulation setup. Our results showed that both abstract and specific explanations had similar positive effects on passengers' perceived safety and the feeling of anxiety. However, the specific explanations influenced the desire of passengers to takeover driving control from the autonomous vehicle (AV), while the abstract explanations did not. We conclude that natural language auditory explanations are useful for passengers in autonomous driving, and their specificity levels could influence how much in-vehicle participants would wish to be in control of the driving activity.
    摘要 自然语言说明提供的AI算法的特性已经在解释AI和人机交互领域引起了关注。在这篇论文中,我们研究了自动驾驶中旁 passer 的自然语言说明特定性的效果。我们将现有的数据驱动树结构式解释算法扩展为添加规则生成说明选项。我们生成了不同水平的特定性(抽象和具体)的听觉自然语言说明,并在N=39名参与者进行了内置式用户研究,使用了真实的physical driving simulation设置。我们的结果表明,抽象和具体的说明都有类似的正面效果,提高了旁 passer 的感受到的安全性和压力感。然而,具体的说明影响了参与者希望从自动驾驶车辆(AV)中控制驾驶活动的愿望,而抽象的说明没有这种影响。我们结论认为,自然语言听觉说明对旁 passer 在自动驾驶中有用,其特定性水平可以影响参与者是否希望控制驾驶活动。

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.00619
  • repo_url: https://github.com/liturout/psld
  • paper_authors: Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, Sanjay Shakkottai
  • for: Linear inverse problems, such as image inpainting, denoising, deblurring, and super-resolution.
  • methods: Pre-trained latent diffusion models, which are proven to achieve provable sample recovery in a linear model setting.
  • results: Outperform previously proposed posterior sampling algorithms in a wide variety of problems, including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.Here’s the full translation in Simplified Chinese:
  • for: Linear inverse problems的解决方案,如图像填充、降噪、去抖、超分辨率等。
  • methods: 使用预训练的潜在扩散模型,实现了在线性模型设置下的可证据样本恢复。
  • results: 在各种问题中,比如随机填充、块填充、降噪、去抖、去梦、超分辨率等问题中,超过先前提出的 posterior 采样算法的性能。
    Abstract We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.
    摘要 我团队提出了首个利用预训练的潜在扩散模型解决线性反向问题的框架。之前的提案(如DPS和DDRM)只适用于像素空间扩散模型。我们 theoretically analyzed our algorithm,并证明了在线性模型设置下的样本恢复。我们获得的算法理念可以推广到更加一般的实践中。实验表明,我们的 posterior sampling 算法在各种问题中表现更好,包括随机填充、块填充、降噪、去锈、去梦幻和超分解。

The Forward-Forward Algorithm as a feature extractor for skin lesion classification: A preliminary study

  • paper_url: http://arxiv.org/abs/2307.00617
  • repo_url: None
  • paper_authors: Abel Reyes-Angulo, Sidike Paheding
  • for: 针对皮肤癌早期 диагностиcs,提高生存率。
  • methods: 使用深度学习(DL)技术,包括卷积神经网络和变换器,自动化诊断。
  • results: 研究发现,使用FFA可以实现低功耗的皮肤癌分类,并且可以与BP相结合以实现更高的预测精度。
    Abstract Skin cancer, a deadly form of cancer, exhibits a 23\% survival rate in the USA with late diagnosis. Early detection can significantly increase the survival rate, and facilitate timely treatment. Accurate biomedical image classification is vital in medical analysis, aiding clinicians in disease diagnosis and treatment. Deep learning (DL) techniques, such as convolutional neural networks and transformers, have revolutionized clinical decision-making automation. However, computational cost and hardware constraints limit the implementation of state-of-the-art DL architectures. In this work, we explore a new type of neural network that does not need backpropagation (BP), namely the Forward-Forward Algorithm (FFA), for skin lesion classification. While FFA is claimed to use very low-power analog hardware, BP still tends to be superior in terms of classification accuracy. In addition, our experimental results suggest that the combination of FFA and BP can be a better alternative to achieve a more accurate prediction.
    摘要 皮肤癌,一种致命的癌症,在美国表现出23%的存活率,偏早诊断可以显著提高存活率,并促进时 opportune 的治疗。生物医学图像分类是医学分析中不可或缺的一环,帮助临床医生在疾病诊断和治疗中做出更加准确的决策。深度学习(DL)技术,如 convolutional neural networks 和 transformers,在临床决策自动化中发挥了重要作用。然而,计算成本和硬件限制使得现有的DL建筑不能得到实施。在这种情况下,我们探讨了一种不需要反propagation(BP)的新型神经网络,即 Forward-Forward Algorithm(FFA),用于皮肤病变分类。虽然FFA被宣称可以使用非常低的功率分析硬件,但BP仍然在分类准确性方面优于FFA。此外,我们的实验结果表明,将FFA和BP结合使用可以实现更加准确的预测。