cs.AI - 2023-08-02

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

  • paper_url: http://arxiv.org/abs/2308.01240
  • repo_url: None
  • paper_authors: Zhiqiang Yuan, Junwei Liu, Qiancheng Zi, Mingwei Liu, Xin Peng, Yiling Lou
  • for: 本研究评估了10个开源指定LM在四个代表性代码理解和生成任务上的性能。
  • methods: 我们使用了零shot、几个shot和精心调整的方法来评估指定LM的性能。
  • results: 我们发现,零shot设置下,指定LM在代码理解和生成任务上非常竞争力,有时 mêmebetter than小型SOTA模型专门为每个下游任务进行精心调整。此外,我们发现大型指定LM不总是在代码相关任务上更好。在几个shot设置下,我们发现,添加示例可以帮助指定LM在大多数代码理解和生成任务上表现更好,但是这些示例有时会导致模型的不稳定或worse性能。此外,我们发现广泛使用的BM25基于shot选择策略在生成问题上显著超过基本随机选择或固定选择。在精心调整设置下,我们发现,精心调整可以进一步提高模型在下游代码理解和生成任务上的性能,并且在同一个下游任务数据集上,指定LM在精心调整后表现更好于小型SOTA模型和无指定LM。
    Abstract In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction.
    摘要 在这项工作中,我们评估了10个开源指导大型语言模型(LLMs)在四个代表性的代码理解和生成任务上。我们发现以下主要结论:首先,在零次设定下,指导LLMs在代码理解和生成任务上非常竞争力,有时连小规模的特点领域模型特定 fine-tune 的每个下游任务都能够超越。我们还发现,更大的指导LLMs不总是在代码相关任务上更好。第二,在几次设定下,我们发现添加示例可以帮助指导LLMs在大多数代码理解和生成任务上表现更好,但有时示例会导致不稳定或甚至更差的表现。此外,我们发现广泛使用的BM25基于抽象选择策略可以在生成问题上显著超越基于随机选择或固定选择的策略。第三,在微调设定下,我们发现微调可以提高模型在下游代码理解和生成任务上的性能,并且在同一个下游任务数据集上微调后,指导LLMs可以超越小规模模型和无指导微调的同规模LLMs。根据我们的发现,我们进一步阐述了模型和使用建议、性能和成本贸易、未来方向等问题。

Do Multilingual Language Models Think Better in English?

  • paper_url: http://arxiv.org/abs/2308.01223
  • repo_url: https://github.com/juletx/self-translate
  • paper_authors: Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe
  • for: 提高多语言模型的性能
  • methods: 使用自动翻译系统
  • results: 自动翻译系统提高了模型的性能,但是模型无法充分利用其多语言潜力 when prompted in non-English languages.
    Abstract Translate-test is a popular technique to improve the performance of multilingual language models. This approach works by translating the input into English using an external machine translation system, and running inference over the translated input. However, these improvements can be attributed to the use of a separate translation system, which is typically trained on large amounts of parallel data not seen by the language model. In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models. Experiments over 5 tasks show that self-translate consistently outperforms direct inference, demonstrating that language models are unable to leverage their full multilingual potential when prompted in non-English languages. Our code is available at https://github.com/juletx/self-translate.
    摘要 《翻译测试是一种广泛使用的技术,以提高多语言语音模型的性能。这种方法工作于将输入翻译成英语使用外部机器翻译系统,然后运行推理。然而,这些改进可以归功于使用分离的翻译系统,该系统通常在大量的并行数据上进行了训练。在这个工作中,我们介绍了一种新的方法called自动翻译(Self-translate),它超越了需要外部翻译系统的需求,通过多语言语音模型的几个shot翻译能力。经过5个任务的实验表明,自动翻译可以一直超越直接推理,表明语音模型在非英语语言下提问时无法全面发挥其多语言潜力。我们的代码可以在https://github.com/juletx/self-translate中找到。》

Calibration in Deep Learning: A Survey of the State-of-the-Art

  • paper_url: http://arxiv.org/abs/2308.01222
  • repo_url: None
  • paper_authors: Cheng Wang
  • For: This paper reviews the state-of-the-art calibration methods for deep neural models and provides an understanding of their principles for performing model calibration.* Methods: The paper introduces four categories of calibration methods, including post-hoc calibration, regularization methods, uncertainty estimation, and composition methods.* Results: The paper discusses recent advancements in calibrating large models, particularly large language models (LLMs), and highlights some open issues, challenges, and potential directions in model calibration.Here’s the full translation in simplified Chinese:* For: 这篇论文总结了现代神经网络模型的准确性calibration方法,并提供了这些方法的原理。* Methods: 论文介绍了四种准确性calibration方法,包括后期calibration、regularization方法、uncertainty估计和组合方法。* Results: 论文讨论了大型模型(特别是大语言模型)的准确性calibration方法,并提出了一些开放的问题、挑战和可能的方向。
    Abstract Calibrating deep neural models plays an important role in building reliable, robust AI systems in safety-critical applications. Recent work has shown that modern neural networks that possess high predictive capability are poorly calibrated and produce unreliable model predictions. Though deep learning models achieve remarkable performance on various benchmarks, the study of model calibration and reliability is relatively underexplored. Ideal deep models should have not only high predictive performance but also be well calibrated. There have been some recent methods proposed to calibrate deep models by using different mechanisms. In this survey, we review the state-of-the-art calibration methods and provide an understanding of their principles for performing model calibration. First, we start with the definition of model calibration and explain the root causes of model miscalibration. Then we introduce the key metrics that can measure this aspect. It is followed by a summary of calibration methods that we roughly classified into four categories: post-hoc calibration, regularization methods, uncertainty estimation, and composition methods. We also covered some recent advancements in calibrating large models, particularly large language models (LLMs). Finally, we discuss some open issues, challenges, and potential directions.
    摘要 <>传送文本到简化中文。<>深度神经网络调整扮演重要的角色在安全关键应用中建立可靠、Robust AI系统。最近的研究表明,现代神经网络具有高预测能力,但是受到不可靠的模型预测问题的影响。虽然深度学习模型在不同的标准概念上达到了惊人的性能,但是模型准确性和稳定性的研究尚未得到充分的探索。理想的深度模型应该不仅具有高预测性能,还应该具有良好的准确性。在这篇评论中,我们回顾了当前领域的状态对模型准确性的研究,并提供了这些方法的原理。首先,我们开始定义模型准确性,并解释了模型误准的根本原因。然后,我们介绍了用于评估这一方面的关键度量。接下来,我们简要概述了准确性方法,分为四类:后期准确化、规范方法、不确定度估计和组合方法。我们还讨论了大模型(LLMs)的准确性calibration。最后,我们讨论了一些未解决的问题、挑战和可能的方向。

Using ScrutinAI for Visual Inspection of DNN Performance in a Medical Use Case

  • paper_url: http://arxiv.org/abs/2308.01220
  • repo_url: None
  • paper_authors: Rebekka Görge, Elena Haedecke, Michael Mock
  • for: 本研究使用Visual Analytics(VA)工具ScrutinAI,以帮助人类分析员Investigate模型性能和数据集。模型性能受标签质量的影响很大,特别在医疗设置下,生成高质量标签需要深厚专业知识和很costly。经常情况下,数据集被收集由多个专家的意见来标签。我们使用ScrutinAI来分析标签变化 между不同专家对模型性能的影响。
  • methods: 我们使用了ScrutinAI工具来分析模型性能的原因,包括标签质量的变化和缺失对模型的影响。我们使用了一个公共可用的数据集来进行检测脑内出血和分类不同亚型的检测。
  • results: 我们的结果表明,ScrutinAI可以帮助分析员快速地发现模型性能的原因,并且可以分析标签变化的影响。我们发现,模型性能受标签质量的影响很大,而且在某些情况下,模型可能会受到缺失标签的影响。
    Abstract Our Visual Analytics (VA) tool ScrutinAI supports human analysts to investigate interactively model performanceand data sets. Model performance depends on labeling quality to a large extent. In particular in medical settings, generation of high quality labels requires in depth expert knowledge and is very costly. Often, data sets are labeled by collecting opinions of groups of experts. We use our VA tool to analyse the influence of label variations between different experts on the model performance. ScrutinAI facilitates to perform a root cause analysis that distinguishes weaknesses of deep neural network (DNN) models caused by varying or missing labeling quality from true weaknesses. We scrutinize the overall detection of intracranial hemorrhages and the more subtle differentiation between subtypes in a publicly available data set.
    摘要 我们的视觉分析工具ScrutinAI可以帮助人类分析员 investigate模型性能和数据集。模型性能受标签质量的影响很大,特别在医疗场景下,生成高质量标签需要深入的专业知识和非常昂贵。经常情况下,数据集由多名专家的意见集成标签。我们使用ScrutinAI来分析标签变化 между不同专家对模型性能的影响。ScrutinAI可以进行根本原因分析,从而分解深度神经网络模型的弱点,是因为标签质量变化或缺失,而不是真正的弱点。我们在一个公共可用的数据集中对脑出血的总检测和较为细微的差异分类进行了检验。

Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning Accelerator

  • paper_url: http://arxiv.org/abs/2308.01193
  • repo_url: None
  • paper_authors: Xiaobei Yan, Xiaoxuan Lou, Guowen Xu, Han Qiu, Shangwei Guo, Chip Hong Chang, Tianwei Zhang
  • for: 防止深度学习模型在加速器上的泄露和攻击。
  • methods: 自动化的远程边频攻击,通过模型化侧频泄露为序列-到-序列问题,使用时间数字转换器(TDC)收集目标模型的执行轨迹,然后使用学习模型自动提取目标模型的架构细节。
  • results: 可以准确率低于1%地提取目标模型的架构细节。
    Abstract DNN accelerators have been widely deployed in many scenarios to speed up the inference process and reduce the energy consumption. One big concern about the usage of the accelerators is the confidentiality of the deployed models: model inference execution on the accelerators could leak side-channel information, which enables an adversary to preciously recover the model details. Such model extraction attacks can not only compromise the intellectual property of DNN models, but also facilitate some adversarial attacks. Although previous works have demonstrated a number of side-channel techniques to extract models from DNN accelerators, they are not practical for two reasons. (1) They only target simplified accelerator implementations, which have limited practicality in the real world. (2) They require heavy human analysis and domain knowledge. To overcome these limitations, this paper presents Mercury, the first automated remote side-channel attack against the off-the-shelf Nvidia DNN accelerator. The key insight of Mercury is to model the side-channel extraction process as a sequence-to-sequence problem. The adversary can leverage a time-to-digital converter (TDC) to remotely collect the power trace of the target model's inference. Then he uses a learning model to automatically recover the architecture details of the victim model from the power trace without any prior knowledge. The adversary can further use the attention mechanism to localize the leakage points that contribute most to the attack. Evaluation results indicate that Mercury can keep the error rate of model extraction below 1%.
    摘要 although previous works have demonstrated several side-channel techniques to extract models from DNN accelerators, they are not practical for two reasons. First, they only target simplified accelerator implementations, which have limited practicality in the real world. Second, they require heavy human analysis and domain knowledge.To overcome these limitations, this paper presents Mercury, the first automated remote side-channel attack against off-the-shelf Nvidia DNN accelerators. The key insight of Mercury is to model the side-channel extraction process as a sequence-to-sequence problem. The adversary can leverage a time-to-digital converter (TDC) to remotely collect the power trace of the target model's inference. Then he uses a learning model to automatically recover the architecture details of the victim model from the power trace without any prior knowledge. The adversary can further use the attention mechanism to localize the leakage points that contribute most to the attack.Evaluation results indicate that Mercury can keep the error rate of model extraction below 1%.

Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.01189
  • repo_url: None
  • paper_authors: Yongkang He, Mingjin Chen, Zhijing Yang, Yongyi Lu
  • for: Addressing the dense labeling problem in medical image segmentation, where a significant fraction of the dataset can be pruned without sacrificing much accuracy.
  • methods: Proposing a data pruning method based on the Dynamic Average Dice (DAD) score, which takes into consideration the training dynamics on target regions.
  • results: Showing that the proposed method can effectively identify important samples and reduce the amount of labeled data needed for training, making it a strong yet simple baseline for medical image segmentation with combined data sources.
    Abstract This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.
    摘要
  1. Empirical analysis of the underlying causes of the problem.2. Development of an effective data pruning approach for dense labeling tasks in medical image analysis.Our solution can be used as a strong and simple baseline for selecting important examples in medical image segmentation with combined data sources.

Machine Learning-Based Diabetes Detection Using Photoplethysmography Signal Features

  • paper_url: http://arxiv.org/abs/2308.01930
  • repo_url: None
  • paper_authors: Filipe A. C. Oliveira, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
  • for: 这个研究旨在开发一种可靠的、无侵入的血糖察测方法,以帮助预防和控制糖尿病。
  • methods: 该研究使用了光学折射 Plethysmography (PPG) 技术,通过分析 PPG 信号和相关 metadata,采用 Logistic Regression (LR) 和 eXtreme Gradient Boosting (XGBoost) 算法进行分类,以将非糖尿病和糖尿病患者区分开来。
  • results: 研究结果显示,使用 PPG 信号和 metadata 进行训练,可以达到 F1-Score 和 AUC 的值为 $58.8\pm20.0%$ 和 $79.2\pm15.0%$ 以及 $51.7\pm16.5%$ 和 $73.6\pm17.0%$,分别对应 LR 和 XGBoost 算法。此外,特征分析表明,PPG 形态特征含有糖尿病相关信息,同时 metadata 也有一定的作用。这些结果与文献报道的结果相似,表明机器学习方法在开发远程、无侵入、连续测量糖尿病的设备方面具有潜在的抑血糖监测技术。
    Abstract Diabetes is a prevalent chronic condition that compromises the health of millions of people worldwide. Minimally invasive methods are needed to prevent and control diabetes but most devices for measuring glucose levels are invasive and not amenable for continuous monitoring. Here, we present an alternative method to overcome these shortcomings based on non-invasive optical photoplethysmography (PPG) for detecting diabetes. We classify non-Diabetic and Diabetic patients using the PPG signal and metadata for training Logistic Regression (LR) and eXtreme Gradient Boosting (XGBoost) algorithms. We used PPG signals from a publicly available dataset. To prevent overfitting, we divided the data into five folds for cross-validation. By ensuring that patients in the training set are not in the testing set, the model's performance can be evaluated on unseen subjects' data, providing a more accurate assessment of its generalization. Our model achieved an F1-Score and AUC of $58.8\pm20.0\%$ and $79.2\pm15.0\%$ for LR and $51.7\pm16.5\%$ and $73.6\pm17.0\%$ for XGBoost, respectively. Feature analysis suggested that PPG morphological features contains diabetes-related information alongside metadata. Our findings are within the same range reported in the literature, indicating that machine learning methods are promising for developing remote, non-invasive, and continuous measurement devices for detecting and preventing diabetes.
    摘要 DIABETES 是一种常见的慢性疾病,影响全球数百万人的健康。为了预防和控制 DIABETES,需要采用轻侵入的方法,但现有的糖尿病测量设备大多是侵入式,不适合持续监测。在这里,我们提出了一种替代方案,利用非侵入式的光学折射 plethysmography (PPG) 来检测 DIABETES。我们使用 PPG 信号和 metadata 来训练 Logistic Regression (LR) 和 eXtreme Gradient Boosting (XGBoost) 算法,并将数据分成五个批次进行十字验证。这样可以避免过拟合,并且通过将训练集和测试集分开,可以评估模型在未看到的数据上的性能,提供更准确的评估。我们的模型实现了 F1 分数和 AUC 的 $58.8\pm20.0\%$ 和 $79.2\pm15.0\%$,以及 $51.7\pm16.5\%$ 和 $73.6\pm17.0\%$,分别用于 LR 和 XGBoost。特征分析表明,PPG 形态特征含有糖尿病相关信息,同时与 metadata 相关。我们的发现与文献中的报告相符,表明机器学习方法在开发远程、非侵入式、持续测量糖尿病的设备方面具有潜在的承诺。

LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs

  • paper_url: http://arxiv.org/abs/2308.01157
  • repo_url: https://github.com/interpretml/talktoebm
  • paper_authors: Benjamin J. Lengerich, Sebastian Bordt, Harsha Nori, Mark E. Nunnally, Yin Aphinyanaphongs, Manolis Kellis, Rich Caruana
  • for: 这篇论文旨在探讨大语言模型(LLM)如何与可解释模型相结合,以实现数据科学中的一些常见任务自动化。
  • methods: 论文使用了层次逻辑来理解复杂的结果,并采用了大语言模型的广泛背景知识来自动化数据科学中的一些任务,如检测异常点、描述异常原因以及修复异常。
  • results: 论文通过多个医疗Example示cases,展示了这种新的LLM功能的实用性,特别是在使用Generalized Additive Models (GAMs)时。最后,论文还介绍了一个开源的LLM-GAM接口——$\texttt{TalkToEBM}$ package。
    Abstract We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package $\texttt{TalkToEBM}$ as an open-source LLM-GAM interface.
    摘要 我们显示大型语言模型(LLMs)可以非常好地与可解释性模型(decompose complex outcomes into univariate graph-represented components)合作。通过阶层式思考方式,LLMs可以提供全面的模型水平摘要,无需整个模型满足上下文中的适应。这种方法允许LLMs应用它们广泛的背景知识来自动进行资料科学中常见的任务,例如检测对前知识不符的异常,描述异常的可能原因,并建议修复异常的方法。我们使用了多个医疗保健例子来证明这些新功能的价值,尤其是对于泛化添加模型(GAMs)。最后,我们发布了名为 $\texttt{TalkToEBM}$ 的开源 LLM-GAM 界面。

Arithmetic with Language Models: from Memorization to Computation

  • paper_url: http://arxiv.org/abs/2308.01154
  • repo_url: None
  • paper_authors: Davide Maltoni, Matteo Ferrara
  • for: investigate the emergent computation and problem-solving capabilities of recent large language models
  • methods: trained the language model to predict the next token, and tested its ability to perform binary addition and multiplication
  • results: the language model was able to learn these tasks and exhibited extrapolation capabilities, supporting the hypothesis that the model works as an Encoding-Regression-Decoding machine.
    Abstract A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypotheses that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.
    摘要 需要更深刻的理解 latest large language model 的 emergent computation 和问题解决能力,以便进一步改进它们和扩展其应用范围。这项工作 investigate 如何一个语言模型,受训练来预测下一个token,可以执行扩展到训练数据之外的数学计算。二进制加法和乘法是一个好的测试场景,因为它们需要非常小的词汇表,并且表现出输入/输出离散性,使得smooth输入 interpolate 无效 для novel data。我们成功地培养了一个轻量级语言模型,学习这些任务,并进行了一些实验来研究 extrapolation 能力和内部信息处理。我们的发现支持假设,语言模型是一个编码-回归-解码机器,计算在值空间中发生,只要输入token表示Mapping 到合适的内部表示即可。

A Transformer-based Prediction Method for Depth of Anesthesia During Target-controlled Infusion of Propofol and Remifentanil

  • paper_url: http://arxiv.org/abs/2308.01929
  • repo_url: https://github.com/heeeyk/transformer-doa-prediction
  • paper_authors: Yongkang He, Siyuan Peng, Mingjin Chen, Zhijing Yang, Yuanhui Chen
    for: 预测麻醉效果的准确性是脊梁控制注射系统的关键,传统的PK-PD模型需要手动选择模型参数,这可能是临床设置中困难的。methods: 我们提议使用变换器来预测麻醉深度(DOA),使用批处理和闭合径远网络来提高特征融合的效率,并应用关注机制来发现药物之间的互动。我们还使用标签分布平滑和重新平衡损失来解决数据不均衡。results: 我们的提议方法比传统PK-PD模型和先前的深度学习方法更高效,可以正确预测麻醉深度在快速和深度麻醉 conditons下。
    Abstract Accurately predicting anesthetic effects is essential for target-controlled infusion systems. The traditional (PK-PD) models for Bispectral index (BIS) prediction require manual selection of model parameters, which can be challenging in clinical settings. Recently proposed deep learning methods can only capture general trends and may not predict abrupt changes in BIS. To address these issues, we propose a transformer-based method for predicting the depth of anesthesia (DOA) using drug infusions of propofol and remifentanil. Our method employs long short-term memory (LSTM) and gate residual network (GRN) networks to improve the efficiency of feature fusion and applies an attention mechanism to discover the interactions between the drugs. We also use label distribution smoothing and reweighting losses to address data imbalance. Experimental results show that our proposed method outperforms traditional PK-PD models and previous deep learning methods, effectively predicting anesthetic depth under sudden and deep anesthesia conditions.
    摘要 Accurately predicting anesthetic effects is crucial for target-controlled infusion systems. Traditional PK-PD models for Bispectral index (BIS) prediction require manual selection of model parameters, which can be challenging in clinical settings. Recently proposed deep learning methods can only capture general trends and may not predict abrupt changes in BIS. To address these issues, we propose a transformer-based method for predicting the depth of anesthesia (DOA) using drug infusions of propofol and remifentanil. Our method employs long short-term memory (LSTM) and gate residual network (GRN) networks to improve the efficiency of feature fusion and applies an attention mechanism to discover the interactions between the drugs. We also use label distribution smoothing and reweighting losses to address data imbalance. Experimental results show that our proposed method outperforms traditional PK-PD models and previous deep learning methods, effectively predicting anesthetic depth under sudden and deep anesthesia conditions.Here's the text with some notes on the translation:1. "Accurately predicting anesthetic effects" is translated as "正确预测医疗效果" (zhèng kě bìng jì yì yì qiú yè)2. "target-controlled infusion systems" is translated as "目标控制注射系统" (mou zhì kòng zhì zhù shí tè)3. "Bispectral index" is translated as "复探测指数" (fù guān zhì shù)4. "traditional PK-PD models" is translated as "传统PK-PD模型" (chuán tǒng PK-PD mó yì)5. "deep learning methods" is translated as "深度学习方法" (shēn dào xué xí fāng fǎ)6. "long short-term memory" is translated as "长期忘却记忆" (cháng qī wàng qiū jì yì)7. "gate residual network" is translated as "门阶差异网络" (mén jiē yì zhī wǎng)8. "label distribution smoothing" is translated as "标签分布平滑" (biāo jiāo fān bù píng shuā)9. "reweighting losses" is translated as "重新评估损失" (zhòng xīn píng shí shū shì)Note that the translation is based on the standard Simplified Chinese characters and may vary depending on the specific context and region.

Can We Transfer Noise Patterns? A Multi-environment Spectrum Analysis Model Using Generated Cases

  • paper_url: http://arxiv.org/abs/2308.01138
  • repo_url: https://github.com/magnomic/cnst
  • paper_authors: Haiwen Du, Zheng Ju, Yu An, Honghui Du, Dongjie Zhu, Zhaoshuo Tian, Aonghus Lawlor, Ruihai Dong
  • for: 提高 Deep Learning 模型的性能,增加高质量的数据集
  • methods: 提出一种将噪声特征传递模型,通过学习不同环境中标准水样 спектrum 的差异,实现噪声传递
  • results: 对不同背景噪声进行实验,表明提出的方法可以减少噪声影响,提高 Deep Learning 模型的性能,并且比基eline系统(包括浪涌滤波、深度神经网络和生成模型)更好。
    Abstract Spectrum analysis systems in online water quality testing are designed to detect types and concentrations of pollutants and enable regulatory agencies to respond promptly to pollution incidents. However, spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. To make the analysis model applicable to more environments, we propose a noise patterns transferring model, which takes the spectrum of standard water samples in different environments as cases and learns the differences in their noise patterns, thus enabling noise patterns to transfer to unknown samples. Unfortunately, the inevitable sample-level baseline noise makes the model unable to obtain the paired data that only differ in dataset-level environmental noise. To address the problem, we generate a sample-to-sample case-base to exclude the interference of sample-level noise on dataset-level noise learning, enhancing the system's learning performance. Experiments on spectral data with different background noises demonstrate the good noise-transferring ability of the proposed method against baseline systems ranging from wavelet denoising, deep neural networks, and generative models. From this research, we posit that our method can enhance the performance of DL models by generating high-quality cases. The source code is made publicly available online at https://github.com/Magnomic/CNST.
    摘要

A Survey on Popularity Bias in Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.01118
  • repo_url: None
  • paper_authors: Anastasiia Klimashevskaia, Dietmar Jannach, Mehdi Elahi, Christoph Trattner
  • for: 本研究旨在探讨现有推荐系统中偏好强度的问题,以及如何检测、衡量和缓解这种偏好。
  • methods: 本文评论了现有的计算指标和技术方法,以帮助减少偏好强度的影响。
  • results: 研究发现,现有的推荐算法在大多数情况下受到偏好强度的影响,导致推荐结果偏向流行的项目。此外,研究还发现现有的研究基本上仅仅通过计算实验和假设来评估实际效果。
    Abstract Recommender systems help people find relevant content in a personalized way. One main promise of such systems is that they are able to increase the visibility of items in the long tail, i.e., the lesser-known items in a catalogue. Existing research, however, suggests that in many situations today's recommendation algorithms instead exhibit a popularity bias, meaning that they often focus on rather popular items in their recommendations. Such a bias may not only lead to limited value of the recommendations for consumers and providers in the short run, but it may also cause undesired reinforcement effects over time. In this paper, we discuss the potential reasons for popularity bias and we review existing approaches to detect, quantify and mitigate popularity bias in recommender systems. Our survey therefore includes both an overview of the computational metrics used in the literature as well as a review of the main technical approaches to reduce the bias. We furthermore critically discuss today's literature, where we observe that the research is almost entirely based on computational experiments and on certain assumptions regarding the practical effects of including long-tail items in the recommendations.
    摘要

Literal-Aware Knowledge Graph Embedding for Welding Quality Monitoring: A Bosch Case

  • paper_url: http://arxiv.org/abs/2308.01105
  • repo_url: None
  • paper_authors: Zhipeng Tan, Baifan Zhou, Zhuoxun Zheng, Ognjen Savkovic, Ziqi Huang, Irlan-Grangel Gonzalez, Ahmet Soylu, Evgeny Kharlamov
  • for: 本研究探讨了知识图 embedding (KGE) 是否可以应用于重要的工业问题:质量监测在生产过程中的焊接。
  • methods: 本研究使用了流行的 KGE 方法,并考虑了文本 literal。
  • results: 研究发现了 KGE 方法在实际工业数据上的限制和推荐,并解决了两个困难问题:焊接点大小和焊接点所属的车体类别。
    Abstract Recently there has been a series of studies in knowledge graph embedding (KGE), which attempts to learn the embeddings of the entities and relations as numerical vectors and mathematical mappings via machine learning (ML). However, there has been limited research that applies KGE for industrial problems in manufacturing. This paper investigates whether and to what extent KGE can be used for an important problem: quality monitoring for welding in manufacturing industry, which is an impactful process accounting for production of millions of cars annually. The work is in line with Bosch research of data-driven solutions that intends to replace the traditional way of destroying cars, which is extremely costly and produces waste. The paper tackles two very challenging questions simultaneously: how large the welding spot diameter is; and to which car body the welded spot belongs to. The problem setting is difficult for traditional ML because there exist a high number of car bodies that should be assigned as class labels. We formulate the problem as link prediction, and experimented popular KGE methods on real industry data, with consideration of literals. Our results reveal both limitations and promising aspects of adapted KGE methods.
    摘要

  • paper_url: http://arxiv.org/abs/2308.01098
  • repo_url: None
  • paper_authors: Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao, Chang-Ping Peng, Zhan-Gang Lin, Jing-He Hu, Jing-Ping Shao
  • for: 提高在线搜索系统中的用户意图理解效果,尤其是在低延迟下。
  • methods: 使用知识凝固(KC)框架,将在线快速的FastText模型升级到更深和复杂的BERT模型,以提高分类性能。
  • results: 通过在线A/B测试和多个数据集的实验,证明了提议的方法可以提高分类性能,同时保持低延迟。
    Abstract Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.
    摘要 “搜寻查询分类是在现实世界上线广告系统中非常重要的一种方法,以了解用户的意图。为了降低延迟,通常使用轻量级模型(例如 FastText)进行简单的在线推导。然而,FastText模型的表现能力不足,尤其是在一些低频查询和尾部分类上,导致分类性能不佳。使用更深和复杂的模型(例如 BERT)是一个有效的解决方案,但它将导致更高的在线推导延迟和更贵的计算成本。因此,如何均衡在线推导效率和分类性能是非常重要的实际问题。在这篇论文中,我们提出了知识储存(KC),一个简单而有效的知识传播框架,以提高在线 FastText 模型的分类性能。具体来说,我们提出了在网上训练一个 BERT 模型,以获取更多可能有用的数据。由于它的强大 semantic 表现,更多的标签不在历史数据中 exposed 的将被添加到训练集中,以提高 FastText 模型的训练。此外,我们还提出了一个多元专家学习策略,以进一步提高挖掘有用数据的能力。通过对多个数据分布进行多个 BERT 模型的训练,每个模型可以在不同的查询频率上表现更好。多元模型的集成可以将其挖掘能力提高。我们在 JD 搜寻中部署了两个版本的这个框架,并在多个数据集上进行了离线实验和在线 A/B 测试。结果显示,我们的方法有效地提高了在线 FastText 模型的分类性能。”

Scaling Data Science Solutions with Semantics and Machine Learning: Bosch Case

  • paper_url: http://arxiv.org/abs/2308.01094
  • repo_url: None
  • paper_authors: Baifan Zhou, Nikolay Nikolov, Zhuoxun Zheng, Xianghui Luo, Ognjen Savkovic, Dumitru Roman, Ahmet Soylu, Evgeny Kharlamov
  • For: 本研究旨在解决由于工业4.0和物联网(IoT)技术引入大量数据,导致云计算系统处理大数据的挑战。特别是,随着云计算系统的普及,需要更多的用户(如数据科学家、领域专家)在云计算系统上部署解决方案,但是训练这些用户需要很长时间。* Methods: 本研究提出了一种 semantics-enhanced 云计算系统(SemCloud),它将云计算系统与semantic技术和机器学习相结合。SemCloud 利用域ontologies和映射来实现数据集成,并在分布计算节点上并行进行semantic数据集成和数据分析。此外,SemCloud 采用自适应的 Datalog 规则和机器学习来自动配置资源,使得非云专家可以使用云计算系统。* Results: 本研究在工业用例中测试了 SemCloud,结果表明其在处理大数据时表现出色,并且在千次重复运行和领域用户的帮助下,SemCloud 能够实现自动化资源配置和高效数据处理。
    Abstract Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented amount of data from factory production, posing big data challenges in volume and variety. In that context, distributed computing solutions such as cloud systems are leveraged to parallelise the data processing and reduce computation time. As the cloud systems become increasingly popular, there is increased demand that more users that were originally not cloud experts (such as data scientists, domain experts) deploy their solutions on the cloud systems. However, it is non-trivial to address both the high demand for cloud system users and the excessive time required to train them. To this end, we propose SemCloud, a semantics-enhanced cloud system, that couples cloud system with semantic technologies and machine learning. SemCloud relies on domain ontologies and mappings for data integration, and parallelises the semantic data integration and data analysis on distributed computing nodes. Furthermore, SemCloud adopts adaptive Datalog rules and machine learning for automated resource configuration, allowing non-cloud experts to use the cloud system. The system has been evaluated in industrial use case with millions of data, thousands of repeated runs, and domain users, showing promising results.
    摘要 产业4.0和物联网(IoT)技术在工厂生产中释放了历史上未曾有的数据量,带来大规模数据挑战,包括数据量和多样性。在这种情况下,分布式计算解决方案如云系统得到了广泛应用,以并行处理数据并降低计算时间。然而,随着云系统的普及,需要更多的用户,包括不是云专家(如数据科学家、领域专家)部署他们的解决方案在云系统上。然而,寻求这些高需求和长时间培训是非常困难的。为此,我们提出了SemCloud,一个具有语义技术和机器学习的云系统。SemCloud通过域ontologies和映射来集成数据,并在分布式计算节点上并行执行语义数据集成和数据分析。此外,SemCloud采用自适应的Datalog规则和机器学习来自动配置资源,使非云专家可以使用云系统。我们在产业用例中测试了SemCloud,结果表现良好,处理了数百万个数据, thousands of repeated runs,和领域用户。

Hand tracking for clinical applications: validation of the Google MediaPipe Hand (GMH) and the depth-enhanced GMH-D frameworks

  • paper_url: http://arxiv.org/abs/2308.01088
  • repo_url: None
  • paper_authors: Gianluca Amprimo, Giulia Masi, Giuseppe Pettiti, Gabriella Olmo, Lorenzo Priano, Claudia Ferraris
  • for: validate the handtracking framework implemented by Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D
  • methods: 使用 Google MediaPipe Hand (GMH) 和一种改进版本 GMH-D,利用RGB-深度摄像头的深度估计来实现更加精准的3D手势跟踪
  • results: 比较GMH和GMH-D两种方法的结果,发现GMH-D在空间测量方面具有更高的准确性,特别是对于慢速和快速手势的测量Here’s the translation in English for reference:
  • for: validate the handtracking framework implemented by Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D
  • methods: use Google MediaPipe Hand (GMH) and an improved version GMH-D, utilizing the depth estimation of an RGB-Depth camera to achieve more accurate 3D hand gesture tracking
  • results: compare the results of GMH and GMH-D, showing that GMH-D has higher accuracy in spatial measurements, particularly for slow and fast hand gestures.
    Abstract Accurate 3D tracking of hand and fingers movements poses significant challenges in computer vision. The potential applications span across multiple domains, including human-computer interaction, virtual reality, industry, and medicine. While gesture recognition has achieved remarkable accuracy, quantifying fine movements remains a hurdle, particularly in clinical applications where the assessment of hand dysfunctions and rehabilitation training outcomes necessitate precise measurements. Several novel and lightweight frameworks based on Deep Learning have emerged to address this issue; however, their performance in accurately and reliably measuring fingers movements requires validation against well-established gold standard systems. In this paper, the aim is to validate the handtracking framework implemented by Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D, that exploits the depth estimation of an RGB-Depth camera to achieve more accurate tracking of 3D movements. Three dynamic exercises commonly administered by clinicians to assess hand dysfunctions, namely Hand Opening-Closing, Single Finger Tapping and Multiple Finger Tapping are considered. Results demonstrate high temporal and spectral consistency of both frameworks with the gold standard. However, the enhanced GMH-D framework exhibits superior accuracy in spatial measurements compared to the baseline GMH, for both slow and fast movements. Overall, our study contributes to the advancement of hand tracking technology, the establishment of a validation procedure as a good-practice to prove efficacy of deep-learning-based hand-tracking, and proves the effectiveness of GMH-D as a reliable framework for assessing 3D hand movements in clinical applications.
    摘要 准确的3D手部运动跟踪在计算机视觉领域 pose significant challenges. 该领域的应用范围广泛,包括人机交互、虚拟现实、工业和医学。虽然手势识别已经达到了很高的准确性,但量化细微的手部运动仍然是一个难点,特别是在医学应用中, где评估手功能缺陷和rehabilitation training outcome需要精准的量化。Recently, several novel and lightweight frameworks based on Deep Learning have emerged to address this issue. However, their performance in accurately and reliably measuring finger movements requires validation against well-established gold standard systems.在这篇论文中,我们的目标是验证Google MediaPipe Hand(GMH)实现的手跟踪框架和一个创新的改进版本GMH-D,该版本利用RGB-深度摄像头的深度估计来实现更高精度的3D手部运动跟踪。我们考虑了临床医生通常用于评估手功能缺陷的三种动作, namely Hand Opening-Closing, Single Finger Tapping和Multiple Finger Tapping。结果表明两个框架具有高度和spectral consistency with the gold standard。然而,GMH-D版本在空间量化方面表现出了更高的准确性,特别是对于slow和fast movement。总的来说,本研究对手跟踪技术的进步、设立了一种验证手 tracking效果的良好做法,以及证明了GMH-D版本在临床应用中是一个可靠的手部运动跟踪框架。

Spatial Intelligence of a Self-driving Car and Rule-Based Decision Making

  • paper_url: http://arxiv.org/abs/2308.01085
  • repo_url: None
  • paper_authors: Stanislav Kikot
  • for: 自动驾驶车辆在复杂交通情况下实现人类Like的行为, combining rule-based decision making with traditional motion planning techniques.
  • methods: 使用规则基于决策和传统的运动规划技术。
  • results: 实现了人类Like的自动驾驶车辆行为,提供了开发机器人空间意识的技术研究方向。
    Abstract In this paper we show how rule-based decision making can be combined with traditional motion planning techniques to achieve human-like behavior of a self-driving vehicle in complex traffic situations. We give and discuss examples of decision rules in autonomous driving. We draw on these examples to illustrate that developing techniques for spatial awareness of robots is an exciting activity which deserves more attention from spatial reasoning community that it had received so far.
    摘要 在这篇论文中,我们展示了如何基于规则的决策可以与传统的动态规划技术相结合,以实现自动驾驶车辆在复杂交通情况下的人类化行为。我们给出了和讨论了自动驾驶决策规则的例子。我们通过这些例子来示例,发展robots空间意识技术是一项有趣的活动,这个领域在空间理解社区中得到了更多的注意力。

Graph Anomaly Detection at Group Level: A Topology Pattern Enhanced Unsupervised Approach

  • paper_url: http://arxiv.org/abs/2308.01063
  • repo_url: None
  • paper_authors: Xing Ai, Jialong Zhou, Yulin Zhu, Gaolei Li, Tomasz P. Michalak, Xiapu Luo, Kai Zhou
  • For: This paper proposes a novel unsupervised framework for Group-level Graph Anomaly Detection (Gr-GAD) to identify and localize anomaly groups within a graph.* Methods: The proposed framework employs a variant of Graph AutoEncoder (GAE) to locate anchor nodes that belong to potential anomaly groups, followed by group sampling and Topology Pattern-based Graph Contrastive Learning (TPGCL) to identify and localize anomaly groups.* Results: The experimental results on both real-world and synthetic datasets demonstrate the superior performance of the proposed framework in identifying and localizing anomaly groups, highlighting its potential for practical applications.Here’s the full text in Simplified Chinese:* For: 本文提出了一种新的无监督框架,用于检测图像中异常群体(Group-level Graph Anomaly Detection,Gr-GAD)。* Methods: 该框架使用变形的图自编码器(Graph AutoEncoder,GAE)来找到潜在异常群体的anchor节点,然后使用组样本和图 Pattern-based Graph Contrastive Learning(TPGCL)来识别和定位异常群体。* Results: 实验结果表明,提出的框架在真实世界和 sintetic 数据集上具有优秀的表现,能够准确地识别和定位异常群体,这 highlights 其在实际应用中的潜在价值。
    Abstract Graph anomaly detection (GAD) has achieved success and has been widely applied in various domains, such as fraud detection, cybersecurity, finance security, and biochemistry. However, existing graph anomaly detection algorithms focus on distinguishing individual entities (nodes or graphs) and overlook the possibility of anomalous groups within the graph. To address this limitation, this paper introduces a novel unsupervised framework for a new task called Group-level Graph Anomaly Detection (Gr-GAD). The proposed framework first employs a variant of Graph AutoEncoder (GAE) to locate anchor nodes that belong to potential anomaly groups by capturing long-range inconsistencies. Subsequently, group sampling is employed to sample candidate groups, which are then fed into the proposed Topology Pattern-based Graph Contrastive Learning (TPGCL) method. TPGCL utilizes the topology patterns of groups as clues to generate embeddings for each candidate group and thus distinct anomaly groups. The experimental results on both real-world and synthetic datasets demonstrate that the proposed framework shows superior performance in identifying and localizing anomaly groups, highlighting it as a promising solution for Gr-GAD. Datasets and codes of the proposed framework are at the github repository https://anonymous.4open.science/r/Topology-Pattern-Enhanced-Unsupervised-Group-level-Graph-Anomaly-Detection.
    摘要 “几何异常检测(GAD)已经取得成功并广泛应用于不同领域,如诈欺检测、网络安全、金融安全和生物化学。但是现有的几何异常检测算法仅专注于分别的元素(节点或几何),忽略了可能的异常群体在几何中。为了解决这个限制,本文提出了一个新的无监督框架,即Group-level Graph Anomaly Detection(Gr-GAD)。”“提案的框架首先使用一种几何自动化器(GAE)的变种来找到可能的异常群体的节点。接着,群体抽样被使用来抽样候选群体,并将其 feed 到提案的几何模式基于的几何对称学习(TPGCL)方法。TPGCL 使用群体的几何模式作为来源,将每个候选群体转换为不同的异常群体。实验结果显示,提案的框架在真实世界和 sintetic 数据集上显示出了优秀的表现,能够实时识别和定位异常群体,因此被认为是一个有前途的解决方案。”“如需取得 datasets 和代码,请参考以下 GitHub 存储库:https://anonymous.4open.science/r/Topology-Pattern-Enhanced-Unsupervised-Group-level-Graph-Anomaly-Detection。”

A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles’ Riskiness

  • paper_url: http://arxiv.org/abs/2308.01050
  • repo_url: None
  • paper_authors: Alessandro Zanardi, Andrea Censi, Margherita Atzei, Luigi Di Lillo, Emilio Frazzoli
  • for: 这篇论文旨在评估自动驾驶车辆(AVs)的风险,以便更好地评估AVs在不同的操作设计域(ODDs)中的安全性。
  • methods: 这篇论文提出了一种基于对比式模拟的数据驱动方法,用于比较不同AVs的行为在不同ODDs中的风险。该方法基于“违反行为”的counterfactual simulations,以计算AVs的安全空间。
  • results: 实验结果表明,提出的方法可以评估AVs的风险,并且可以评估不同AV提供商的安全性。这种方法可以在AV的行为策略未知时也进行应用,因此可以用于第三方风险评估人员。
    Abstract Autonomous Vehicles (AVs) have the potential to provide numerous societal benefits, such as decreased road accidents and increased overall transportation efficiency. However, quantifying the risk associated with AVs is challenging due to the lack of historical data and the rapidly evolving technology. This paper presents a data-driven framework for comparing the risk of different AVs' behaviors in various operational design domains (ODDs), based on counterfactual simulations of "misbehaving" road users. We introduce the concept of counterfactual safety margin, which represents the minimum deviation from normal behavior that could lead to a collision. This concept helps to find the most critical scenarios but also to assess the frequency and severity of risk of AVs. We show that the proposed methodology is applicable even when the AV's behavioral policy is unknown -- through worst- and best-case analyses -- making the method useful also to external third-party risk assessors. Our experimental results demonstrate the correlation between the safety margin, the driving policy quality, and the ODD shedding light on the relative risk associated with different AV providers. This work contributes to AV safety assessment and aids in addressing legislative and insurance concerns surrounding this emerging technology.
    摘要 We introduce the concept of counterfactual safety margin, which represents the minimum deviation from normal behavior that could lead to a collision. This concept helps to identify the most critical scenarios and assess the frequency and severity of risks associated with AVs. We show that the proposed methodology is applicable even when the AV's behavioral policy is unknown, making it useful for external third-party risk assessors.Our experimental results demonstrate a correlation between the safety margin, driving policy quality, and ODD, shedding light on the relative risk associated with different AV providers. This work contributes to AV safety assessment and addresses legislative and insurance concerns surrounding this emerging technology.Translation notes:* "Autonomous Vehicles" is translated as "自动驾驶车辆" (zì àuto véhículóu)* "operational design domains" is translated as "运营设计领域" (yùn xíng jiè yì)* "counterfactual simulations" is translated as "对比 simulations" (duì bèi simulào)* "safety margin" is translated as "安全余地" (ān qū yú dì)* "driving policy" is translated as "驾驶策略" (jì shǐ mǎ lü)* "ODD" is translated as "运营设计领域" (yùn xíng jiè yì)Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China.

Chat Translation Error Detection for Assisting Cross-lingual Communications

  • paper_url: http://arxiv.org/abs/2308.01044
  • repo_url: https://github.com/cl-tohoku/bpersona-chat
  • paper_authors: Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Ryoko Tokuhisa, Ana Brassard, Kentaro Inui
  • for: 本研究开发了一个通信支持系统,用于监测机器翻译错误,以促进跨语言通信。
  • methods: 研究人员使用了一个错误检测器作为系统的基础,并建立了一个新的日本英文双语聊天数据库(BPersona-chat),该数据库包含多轮漫谈对话,并具有人工审核的品质评分。
  • results: 错误检测器可以作为更进阶的错误翻译检测系统的基础。
    Abstract In this paper, we describe the development of a communication support system that detects erroneous translations to facilitate crosslingual communications due to the limitations of current machine chat translation methods. We trained an error detector as the baseline of the system and constructed a new Japanese-English bilingual chat corpus, BPersona-chat, which comprises multiturn colloquial chats augmented with crowdsourced quality ratings. The error detector can serve as an encouraging foundation for more advanced erroneous translation detection systems.
    摘要 在这篇论文中,我们描述了一个交流支持系统,用于检测machine翻译错译以促进多语言交流。我们基于基线错误检测器进行了训练,并构建了一个新的日语英语对话资料集,BPersona-chat,这些对话包括多轮口语对话,并且通过人工投票来评估对话质量。错误检测器可以作为更先进的错误翻译检测系统的基础。

Three Factors to Improve Out-of-Distribution Detection

  • paper_url: http://arxiv.org/abs/2308.01030
  • repo_url: None
  • paper_authors: Hyunjun Choi, JaeHo Chung, Hawook Jeong, Jin Young Choi
  • for: 本研究旨在提高异常输入探测(OOD)的性能,通过使用辅助数据作为外围数据进行细化。
  • methods: 本研究使用了三个贡献来解决OOD探测与分类精度之间的负担:(i) incorporating自我智能填充损失可以提高网络的准确性;(ii) 采样 semi-hard 异常数据进行训练可以提高OOD探测性能,而无需影响分类精度;(iii) 我们提出了一种新的超vised Contrastive Learning,可以同时提高OOD探测性能和网络的准确性。
  • results: 我们的方法可以同时提高OOD探测性能和分类精度,解决了之前的负担。我们的方法与之前的方法相比,在两个性能指标上均有提高。
    Abstract In the problem of out-of-distribution (OOD) detection, the usage of auxiliary data as outlier data for fine-tuning has demonstrated encouraging performance. However, previous methods have suffered from a trade-off between classification accuracy (ACC) and OOD detection performance (AUROC, FPR, AUPR). To improve this trade-off, we make three contributions: (i) Incorporating a self-knowledge distillation loss can enhance the accuracy of the network; (ii) Sampling semi-hard outlier data for training can improve OOD detection performance with minimal impact on accuracy; (iii) The introduction of our novel supervised contrastive learning can simultaneously improve OOD detection performance and the accuracy of the network. By incorporating all three factors, our approach enhances both accuracy and OOD detection performance by addressing the trade-off between classification and OOD detection. Our method achieves improvements over previous approaches in both performance metrics.
    摘要 在 OUT-OF-DISTRIBUTION(OOD)检测问题中,启用辅助数据作为精度数据进行微调已经表现出了鼓舞人的效果。然而,先前的方法受到了准确率(ACC)和OOD检测性能(AUROC、FPR、AUPR)的负面交互。为了改善这种交互,我们提出了三种贡献:(i)添加自知ledge distillation损失可以提高网络的准确率;(ii)在训练中采用半硬分配的异常数据采样可以提高OOD检测性能,而无需影响准确率;(iii)我们提出的新的指导contrastive learning可以同时提高OOD检测性能和网络的准确率。通过涵盖所有这些因素,我们的方法可以同时提高准确率和OOD检测性能,解决了准确率和OOD检测性能之间的负面交互。我们的方法在先前的方法中均表现出了改善。

Enhancing Representation Learning for Periodic Time Series with Floss: A Frequency Domain Regularization Approach

  • paper_url: http://arxiv.org/abs/2308.01011
  • repo_url: https://github.com/agustdd/floss
  • paper_authors: Chunwei Yang, Xiaoxu Chen, Lijun Sun, Hongyu Yang, Yuankai Wu
  • for: 这篇论文的目的是提出一种无监督的方法,可以自动调整学习表现中的频率域表现,以提高深度学习模型在时间序列分析领域的表现。
  • methods: 这篇论文提出了一种名为Floss的方法,它可以自动检测时间序列中的主要频率,并使用频率域的periodic shift和spectral density similarity度量来学习有意义的表现。
  • results: 在实验中,Floss方法能够将时间序列分类、预测和异常探测等任务中的表现提高,并且可以与各种深度学习模型整合使用。
    Abstract Time series analysis is a fundamental task in various application domains, and deep learning approaches have demonstrated remarkable performance in this area. However, many real-world time series data exhibit significant periodic or quasi-periodic dynamics that are often not adequately captured by existing deep learning-based solutions. This results in an incomplete representation of the underlying dynamic behaviors of interest. To address this gap, we propose an unsupervised method called Floss that automatically regularizes learned representations in the frequency domain. The Floss method first automatically detects major periodicities from the time series. It then employs periodic shift and spectral density similarity measures to learn meaningful representations with periodic consistency. In addition, Floss can be easily incorporated into both supervised, semi-supervised, and unsupervised learning frameworks. We conduct extensive experiments on common time series classification, forecasting, and anomaly detection tasks to demonstrate the effectiveness of Floss. We incorporate Floss into several representative deep learning solutions to justify our design choices and demonstrate that it is capable of automatically discovering periodic dynamics and improving state-of-the-art deep learning models.
    摘要 时序分析是多个应用领域的基本任务,深度学习方法在这个领域表现出了惊人的表现。然而,许多实际世界时序数据表现出了显著的周期或几乎周期的动态行为,这些动态行为常常不受现有的深度学习基于解决方案完全捕捉。这导致了时序分析中的下降表示,从而影响了对真实的动态行为的理解。为解决这个问题,我们提出了一种不监督的方法called Floss,该方法可以自动规范学习的表示空间中的频率域。Floss方法首先自动检测时序数据中的主要周期性。然后,它使用周期偏移和频率密度相似度度量来学习具有周期一致性的有意义表示。此外,Floss可以轻松地在超级vised、半监督和无监督学习框架中 incorporated。我们对常见的时序分类、预测和异常检测任务进行了广泛的实验,以证明Floss的有效性。我们将Floss incorporated into several representative deep learning solutions to justify our design choices and demonstrate that it is capable of automatically discovering periodic dynamics and improving state-of-the-art deep learning models.

FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving

  • paper_url: http://arxiv.org/abs/2308.01006
  • repo_url: https://github.com/westlake-autolab/fusionad
  • paper_authors: Tengju Ye, Wei Jing, Chunyong Hu, Shikun Huang, Lingping Gao, Fangzhen Li, Jingke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo Chen, Kaicheng Yu
  • for: 这 paper 的目的是提出一个整合多种感知器的混合 neural network,以实现自适应驾驶任务中的准确和可靠性。
  • methods: 这 paper 使用了 transformer 基于的多模态混合网络,以生成高质量的混合特征。而不同于 camera-based end-to-end方法 UniAD,这 paper 采用了混合帮助的模态意识预测和状态意识规划模块,以便利用多模态特征。
  • results: 根据 nuScenes 数据集的广泛实验结果,这 paper 的 FusionAD 实现了state-of-the-art 性能,胜过基eline 的平均15% 在感知任务中,10% 在占用率预测精度中,从 ADE 分数下降至 0.389,并将预测错误率从 0.708 降至 0.12%。
    Abstract Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the information from two most critical sensors, camera and LiDAR, goes beyond perception task. Concretely, we first build a transformer based multi-modality fusion network to effectively produce fusion based features. In constrast to camera-based end-to-end method UniAD, we then establish a fusion aided modality-aware prediction and status-aware planning modules, dubbed FMSPnP that take advantages of multi-modality features. We conduct extensive experiments on commonly used benchmark nuScenes dataset, our FusionAD achieves state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score and reduces the collision rate from 0.31% to only 0.12%.
    摘要 建立多模态多任务神经网络,以实现自驾护航任务的准确和可靠性,是现代自驾护航技术的标准。然而,利用多感器数据进行共同优化预测和规划任务,仍然是未explored领域。本文提出了FusionAD,我们知道的首个整合框架,将Camera和LiDAR两种最重要的感器信息融合到一起,不仅超越了感知任务,还实现了预测和规划任务的协同优化。具体来说,我们首先构建了基于变换器的多模态融合网络,以生成高质量的融合特征。与UniAD相比,我们then建立了融合帮助模块和状态意识规划模块,通过多模态特征来优化预测和规划任务。我们在常用的nuScenes数据集上进行了广泛的实验,并证明了FusionAD可以在感知任务中表现出状元的性能,比如检测和跟踪任务的准确率提高了15%,占用率预测精度提高了10%,从0.708下降到0.389的ADE分数中的预测错误率下降了15%,并将collision rate从0.31%下降到0.12%。

Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.00989
  • repo_url: None
  • paper_authors: Haorui Li, Jiaqi Liang, Linjing Li, Daniel Zeng
  • for: 用于解决复杂任务的 hierarchical reinforcement learning composites subpolicies。
  • methods: 使用自动发现的 subpolicies,并使用 Wasserstein Diversity-Enriched Regularizer(WDER)来提高性能。
  • results: 实验结果表明,WDER 可以提高性能和样本效率,无需修改超参数。
    Abstract Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks.Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies.However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further.Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.
    摘要 Hierarchical reinforcement learning 组合不同层次的子政策来完成复杂任务。自动发现子政策的方法,不依赖域知识,是现有方法中最佳的approach。然而,劣化问题是现有方法难以处理的一大挑战,这是因为现有方法缺乏多样性或者使用弱regularizers。在这篇论文中,我们提出了一种新的任务无关的正则化器 called Wasserstein Diversity-Enriched Regularizer (WDER),它通过 maximizing Wasserstein distances among action distributions来扩大子政策的多样性。我们的WDER可以轻松地integrated into existing methods的损失函数中,以提高其性能。实验结果表明,我们的WDER可以提高性能和样本效率,而不需要修改 гиперпараметров,这表明了WDER的可用性和稳定性。

Knowledge-aware Collaborative Filtering with Pre-trained Language Model for Personalized Review-based Rating Prediction

  • paper_url: http://arxiv.org/abs/2308.02555
  • repo_url: https://github.com/wqxleo/kcf-plm
  • paper_authors: Quanxiu Wang, Xinlei Cao, Jianyong Wang, Wei Zhang
  • for: 这篇论文的目的是如何利用现有的评论来预测用户对Item的评分?
  • methods: 该论文提出了一种名为知识感知协同过滤(KCF-PLM)的方法,该方法通过模型用户-Item对的交互,并利用预训练的自然语言模型来更好地表示用户和Item的特征。
  • results: 经过实验表明,KCF-PLM可以更好地预测用户对Item的评分,并且在多个公共数据集上达到了比较高的预测精度。
    Abstract Personalized review-based rating prediction aims at leveraging existing reviews to model user interests and item characteristics for rating prediction. Most of the existing studies mainly encounter two issues. First, the rich knowledge contained in the fine-grained aspects of each review and the knowledge graph is rarely considered to complement the pure text for better modeling user-item interactions. Second, the power of pre-trained language models is not carefully studied for personalized review-based rating prediction. To address these issues, we propose an approach named Knowledge-aware Collaborative Filtering with Pre-trained Language Model (KCF-PLM). For the first issue, to utilize rich knowledge, KCF-PLM develops a transformer network to model the interactions of the extracted aspects w.r.t. a user-item pair. For the second issue, to better represent users and items, KCF-PLM takes all the historical reviews of a user or an item as input to pre-trained language models. Moreover, KCF-PLM integrates the transformer network and the pre-trained language models through representation propagation on the knowledge graph and user-item guided attention of the aspect representations. Thus KCF-PLM combines review text, aspect, knowledge graph, and pre-trained language models together for review-based rating prediction. We conduct comprehensive experiments on several public datasets, demonstrating the effectiveness of KCF-PLM.
    摘要 personalized review-based rating prediction aims to leverage existing reviews to model user interests and item characteristics for rating prediction. most of the existing studies mainly encounter two issues. first, the rich knowledge contained in the fine-grained aspects of each review and the knowledge graph is rarely considered to complement the pure text for better modeling user-item interactions. second, the power of pre-trained language models is not carefully studied for personalized review-based rating prediction. to address these issues, we propose an approach named knowledge-aware collaborative filtering with pre-trained language model (KCF-PLM). for the first issue, to utilize rich knowledge, KCF-PLM develops a transformer network to model the interactions of the extracted aspects w.r.t. a user-item pair. for the second issue, to better represent users and items, KCF-PLM takes all the historical reviews of a user or an item as input to pre-trained language models. moreover, KCF-PLM integrates the transformer network and the pre-trained language models through representation propagation on the knowledge graph and user-item guided attention of the aspect representations. thus KCF-PLM combines review text, aspect, knowledge graph, and pre-trained language models together for review-based rating prediction. we conduct comprehensive experiments on several public datasets, demonstrating the effectiveness of KCF-PLM.

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

  • paper_url: http://arxiv.org/abs/2308.00958
  • repo_url: https://github.com/dig-beihang/ini-model-stealing-defense
  • paper_authors: Jun Guo, Aishan Liu, Xingyu Zheng, Siyuan Liang, Yisong Xiao, Yichao Wu, Xianglong Liu
    for:这篇论文的目的是提出一个名为“Isolation and Induction”(InI)的新型训练框架,以提高机器学习模型的防护性。methods:这篇论文使用了一种名为“adversarial training”的方法,将敌对的训练gradient( gradient)与预期的gradient分离,从而减少了误差 Computational overhead。此外,它还使用了一种名为“induction”的方法,将敌对的训练gradient与预期的gradient分离,以生成不具有实际价值的输出,以防止敌对者获得有用的信息。results:实验结果显示,InI可以对机器学习模型进行有效的防护,将敌对者的侦测精度降低至48%。此外,InI还可以实现较高的速度(至25.4倍),比其他现有方法更具有实际价值。
    Abstract Despite the broad application of Machine Learning models as a Service (MLaaS), they are vulnerable to model stealing attacks. These attacks can replicate the model functionality by using the black-box query process without any prior knowledge of the target victim model. Existing stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. However, these defenses are now suffering problems of high inference computational overheads and unfavorable trade-offs between benign accuracy and stealing robustness, which challenges the feasibility of deployed models in practice. To address the problems, this paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. Instead of deploying auxiliary defense modules that introduce redundant inference time, InI directly trains a defensive model by isolating the adversary's training gradient from the expected gradient, which can effectively reduce the inference computational cost. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries, which can induce the adversary to extract little useful knowledge from victim models with minimal impact on the benign performance. Extensive experiments on several visual classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x faster) of our InI over other state-of-the-art methods. Our codes can be found in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.
    摘要 尽管机器学习模型作为服务(MLaaS)广泛应用,但它们受到模型盗用攻击的威胁。这些攻击可以通过黑盒查询过程来复制模型功能,无需任何受 target 受害者模型的知识。现有的防御措施添加了对受害者的 posterior 概率中的误导性扰响,但这些防御措施现在面临高于ferred Computational overhead和不利的负荷和盗用鲁棒性的问题,这担困了部署模型的实际应用。为解决这些问题,本文提出了隔离和推论(InI),一种新的和有效的训练框架 для模型盗用防御。而不是在auxiliary defense module中添加 redundancy 的推理时间,InI直接在针对敌方的训练梯度上进行隔离,可以有效减少推理 Computational overhead。与之前添加扰响到模型预测结果的方法不同,我们在模型生成不具有指导意义的输出时进行训练,以便使敌方提取到Model 中的有用信息非常少,对正常性产生最小的影响。经验表明,我们的 InI 在多个视觉分类 datasets(例如 MNIST 和 CIFAR10)上具有较高的鲁棒性(减少盗用精度48%)和速度(减少推理时间25.4倍)优势,代码可以在 找到。

From Sparse to Soft Mixtures of Experts

  • paper_url: http://arxiv.org/abs/2308.00951
  • repo_url: https://github.com/google-research/vmoe
  • paper_authors: Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Neil Houlsby
  • for: 这篇论文旨在提出一种基于 sparse mixture of expert(MoE)的完全可导的Transformer模型,以解决MoE模型在训练稳定性、选择token、不能扩展专家数量等方面存在的挑战。
  • methods: 这篇论文提出了一种名为Soft MoE的新方法,它使用了不同权重的输入token将所有输入token映射到每个专家中,以实现隐藏层的软分配。这种方法可以保持MoE模型的好处,同时解决许多问题。
  • results: 在视觉识别任务上,Soft MoE模型在与标准Transformer(ViT)和受欢迎的MoE变体(Tokens Choice和Experts Choice)进行比较时,具有更高的性能。例如,Soft MoE-Base/16需要10.5倍少的执行成本(5.7倍少的墙 clock时间)与ViT-Huge/14匹配其性能,而且可以轻松扩展。
    Abstract Sparse mixture of expert architectures (MoEs) scale model capacity without large increases in training or inference costs. Despite their success, MoEs suffer from a number of issues: training instability, token dropping, inability to scale the number of experts, or ineffective finetuning. In this work, we proposeSoft MoE, a fully-differentiable sparse Transformer that addresses these challenges, while maintaining the benefits of MoEs. Soft MoE performs an implicit soft assignment by passing different weighted combinations of all input tokens to each expert. As in other MoE works, experts in Soft MoE only process a subset of the (combined) tokens, enabling larger model capacity at lower inference cost. In the context of visual recognition, Soft MoE greatly outperforms standard Transformers (ViTs) and popular MoE variants (Tokens Choice and Experts Choice). For example, Soft MoE-Base/16 requires 10.5x lower inference cost (5.7x lower wall-clock time) than ViT-Huge/14 while matching its performance after similar training. Soft MoE also scales well: Soft MoE Huge/14 with 128 experts in 16 MoE layers has over 40x more parameters than ViT Huge/14, while inference time cost grows by only 2%, and it performs substantially better.
    摘要 《稀疏混合专家架构(MoE)缩放模型容量无需大幅提高训练或推理成本。尽管它们取得了成功,但MoE受到许多问题的困扰:训练不稳定、掉 Token、不能扩展专家数量以及不良的微调。在这项工作中,我们提出了Soft MoE,一种完全可导的稀疏转换器,解决了这些挑战,同时保持了MoE的优点。Soft MoE通过不同权重的 combinations来进行隐式软分配,将所有输入 tokens 传递给每个专家。与其他 MoE 工作相同,Soft MoE 专家只处理部分(组合)输入 tokens,允许更大的模型容量在更低的推理成本下运行。在视识知识领域,Soft MoE 在标准 Transformer 和受欢迎的 MoE 变体(Token Choice 和 Experts Choice)的基础上取得了很大的进步。例如,Soft MoE-Base/16 需要10.5倍lower的推理成本(5.7倍lower的墙 clock time),与 ViT-Huge/14 的性能相同。Soft MoE 也可扩展:Soft MoE Huge/14 WITH 128 专家 IN 16 MoE layers 有40倍更多的参数 чем ViT Huge/14,而推理时间成本只增加了2%,并且表现较好。》

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

  • paper_url: http://arxiv.org/abs/2308.00946
  • repo_url: https://github.com/timhartill/unseen_questions
  • paper_authors: Tim Hartill, Neset Tan, Michael Witbrock, Patricia J. Riddle
  • for: 本研究旨在将小型语言模型扩展到解答困难的作 compositional questions,不需要训练时见到的问题。
  • methods: 本研究使用多任务超级预训练和紧密搜寻系统,以将多元的推理能力传递给模型。
  • results: 本研究在多个评估数据集(StrategyQA、CommonsenseQA、IIRC、DROP、Musique和ARC-DA)上建立强大的基准值,并证明在对于问题的解答中,可以通过增加对于问题的数据库进行搜寻,以提高模型的性能。
    Abstract We equip a smaller Language Model to generalise to answering challenging compositional questions that have not been seen in training. To do so we propose a combination of multitask supervised pretraining on up to 93 tasks designed to instill diverse reasoning abilities, and a dense retrieval system that aims to retrieve a set of evidential paragraph fragments. Recent progress in question-answering has been achieved either through prompting methods against very large pretrained Language Models in zero or few-shot fashion, or by fine-tuning smaller models, sometimes in conjunction with information retrieval. We focus on the less explored question of the extent to which zero-shot generalisation can be enabled in smaller models with retrieval against a corpus within which sufficient information to answer a particular question may not exist. We establish strong baselines in this setting for diverse evaluation datasets (StrategyQA, CommonsenseQA, IIRC, DROP, Musique and ARC-DA), and show that performance can be significantly improved by adding retrieval-augmented training datasets which are designed to expose our models to a variety of heuristic reasoning strategies such as weighing partial evidence or ignoring an irrelevant context.
    摘要 我们将一个较小的语言模型训练来综合回答具有复杂构成的问题,这些问题在训练过程中没有出现过。我们提出了一种结合多任务超级预训和紧凑搜寻系统的方法,以实现对答案问题的扩展。现今问题回答的进步主要是通过对非常大的预训语言模型进行提示方法,或是精确地训练较小的模型。我们专注在较少探索的问题是,可以使用较小的模型和搜寻系统,对于尚未出现在训练数据中的问题进行零shot扩展。我们在不同的评估数据集(StrategyQA、CommonSenseQA、IIRC、DROP、Musique和ARC-DA)中建立了强大的基准,并证明可以通过增加搜寻增强训练数据集,让我们的模型掌握了复杂的推理策略,例如考虑部分证据或忽略无关的背景。

Feature-aware conditional GAN for category text generation

  • paper_url: http://arxiv.org/abs/2308.00939
  • repo_url: None
  • paper_authors: Xinze Li, Kezhi Mao, Fanfan Lin, Zijian Feng
  • for: 本文提出了一种新的文本生成框架,即特征意识 conditional GAN(FA-GAN),用于控制类别文本生成。
  • methods: FA-GAN使用了一种序列到序列结构的生成器,包括三个Encoder和一个基于 Relational Memory Core 的 decoder,并在 adversarial 训练中添加了多类分类损失函数。
  • results: 对于6种文本分类 datasets,FA-GAN consistent outperform 10 状态之前的文本生成方法,并在实际案例中证明了生成的 sintetic 句子可以匹配所需的类别,同时具有良好的可读性、流畅性和文本Authenticity。
    Abstract Category text generation receives considerable attentions since it is beneficial for various natural language processing tasks. Recently, the generative adversarial network (GAN) has attained promising performance in text generation, attributed to its adversarial training process. However, there are several issues in text GANs, including discreteness, training instability, mode collapse, lack of diversity and controllability etc. To address these issues, this paper proposes a novel GAN framework, the feature-aware conditional GAN (FA-GAN), for controllable category text generation. In FA-GAN, the generator has a sequence-to-sequence structure for improving sentence diversity, which consists of three encoders including a special feature-aware encoder and a category-aware encoder, and one relational-memory-core-based decoder with the Gumbel SoftMax activation function. The discriminator has an additional category classification head. To generate sentences with specified categories, the multi-class classification loss is supplemented in the adversarial training. Comprehensive experiments have been conducted, and the results show that FA-GAN consistently outperforms 10 state-of-the-art text generation approaches on 6 text classification datasets. The case study demonstrates that the synthetic sentences generated by FA-GAN can match the required categories and are aware of the features of conditioned sentences, with good readability, fluency, and text authenticity.
    摘要 文本生成领域在latest yearsreceived considerable attention, as it is beneficial for various自然语言处理任务。Recently, the generative adversarial network (GAN) has shown promising performance in text generation, thanks to its adversarial training process. However, there are several issues in text GANs, including discrete, training instability, mode collapse, lack of diversity and controllability, etc. To address these issues, this paper proposes a novel GAN framework, the feature-aware conditional GAN (FA-GAN), for controllable category text generation. In FA-GAN, the generator has a sequence-to-sequence structure to improve sentence diversity, which consists of three encoders, including a special feature-aware encoder and a category-aware encoder, and one relational-memory-core-based decoder with the Gumbel SoftMax activation function. The discriminator has an additional category classification head. To generate sentences with specified categories, the multi-class classification loss is supplemented in the adversarial training. Comprehensive experiments have been conducted, and the results show that FA-GAN consistently outperforms 10 state-of-the-art text generation approaches on 6 text classification datasets. The case study demonstrates that the synthetic sentences generated by FA-GAN can match the required categories and are aware of the features of conditioned sentences, with good readability, fluency, and text authenticity.

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

  • paper_url: http://arxiv.org/abs/2308.00937
  • repo_url: None
  • paper_authors: Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai, Gaurav S. Sukhatme
  • for: 本研究是为了开发未来基于人语言指令的多机器人系统。
  • methods: 本研究使用了模块化层次规划方法作为基础。
  • results: 研究结果表明了LEMMA的潜在用于发展未来多机器人系统。Here’s a more detailed explanation of each point:1. for: The paper is written to develop future language-conditioned multi-robot systems. The authors introduce a new benchmark called LEMMA, which focuses on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting.2. methods: The authors propose a modular hierarchical planning approach as a baseline for addressing the challenges of task allocation and strong temporal dependencies in each task. This approach is designed to identify each manipulator’s limitations and assign sub-tasks accordingly.3. results: The results of the study highlight the potential of LEMMA for developing future language-conditioned multi-robot systems. The authors provide 800 expert demonstrations and human instructions for training and evaluations, and the results show that the proposed approach is effective in handling complex manipulation tasks.
    Abstract Complex manipulation tasks often require robots with complementary capabilities to collaborate. We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting. LEMMA features 8 types of procedurally generated tasks with varying degree of complexity, some of which require the robots to use tools and pass tools to each other. For each task, we provide 800 expert demonstrations and human instructions for training and evaluations. LEMMA poses greater challenges compared to existing benchmarks, as it requires the system to identify each manipulator's limitations and assign sub-tasks accordingly while also handling strong temporal dependencies in each task. To address these challenges, we propose a modular hierarchical planning approach as a baseline. Our results highlight the potential of LEMMA for developing future language-conditioned multi-robot systems.
    摘要 多元化任务需要机器人团队协作,我们介绍了一个名为LanguagE-Conditioned Multi-robot MAnipulation(LEMMA)的标准,专注于基于人类语言指令的表格式设置中的任务分配和长期物品搬运。LEMMA包含8种生成过程中的任务,其中一些需要机器人使用工具并将工具传递给彼此。每个任务都有800名专家示范和人类指导用于训练和评估。LEMMA比现有的标准具有更大的挑战,因为它需要系统确定每个搬运者的局限性并将相应的子任务分配给它们,同时也处理每个任务中的强时间依赖关系。为解决这些挑战,我们提议了一种模块化层次规划方法作为基础。我们的结果表明LEMMA有助于未来的语言条件多机器人系统的发展。

Particle swarm optimization with state-based adaptive velocity limit strategy

  • paper_url: http://arxiv.org/abs/2308.00936
  • repo_url: None
  • paper_authors: Xinze Li, Kezhi Mao, Fanfan Lin, Xin Zhang
  • For: The paper proposes a novel particle swarm optimization (PSO) variant with a state-based adaptive velocity limit (SAVL) strategy to improve the performance of PSO in optimizing problems.* Methods: The proposed PSO-SAVL uses an evolutionary state estimation (ESE) to adaptively adjust the velocity limit based on the current searching state of particles. The limit handling strategies have been modified and adopted to improve the capability of avoiding local optima.* Results: The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions, and the satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems has been verified. The merits of the strategies in PSO-SAVL have been experimentally demonstrated.Here is the same information in Simplified Chinese text:* For: 本文提出了一种基于状态adaptive速度限制策略(SAVL)的 particle swarm optimization(PSO)变种,以提高PSO在优化问题中的表现。* Methods: PSO-SAVL使用了进化状态估计(ESE)来适应性地调整速度限制,以适应当前粒子的搜索状态。限制处理策略也被修改和采用,以提高避免地点最优化的能力。* Results: PSO-SAVL在50维度的标准函数库上进行了广泛的实验 validate,并在高维度和大规模问题中得到了满意的扩展性。PSO-SAVL的策略优势也在实验中得到了证明。
    Abstract Velocity limit (VL) has been widely adopted in many variants of particle swarm optimization (PSO) to prevent particles from searching outside the solution space. Several adaptive VL strategies have been introduced with which the performance of PSO can be improved. However, the existing adaptive VL strategies simply adjust their VL based on iterations, leading to unsatisfactory optimization results because of the incompatibility between VL and the current searching state of particles. To deal with this problem, a novel PSO variant with state-based adaptive velocity limit strategy (PSO-SAVL) is proposed. In the proposed PSO-SAVL, VL is adaptively adjusted based on the evolutionary state estimation (ESE) in which a high value of VL is set for global searching state and a low value of VL is set for local searching state. Besides that, limit handling strategies have been modified and adopted to improve the capability of avoiding local optima. The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions. The satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems is also verified. Besides, the merits of the strategies in PSO-SAVL are verified in experiments. Sensitivity analysis for the relevant hyper-parameters in state-based adaptive VL strategy is conducted, and insights in how to select these hyper-parameters are also discussed.
    摘要 Particle swarm optimization (PSO) 的 velocity limit (VL) 已经广泛应用在许多变种中,以避免粒子寻找外部解空间。但现有的适应VL策略仅根据迭代数量进行调整,导致优化结果不 satisfactory,因为VL 与粒子当前搜索状态不兼容。为解决这个问题,一种基于Evolutionary State Estimation (ESE)的PSO变体(PSO-SAVL)被提出。在PSO-SAVL中,VL 的适应调整基于ESE,其中在全球搜索状态时设置高值的VL,在本地搜索状态时设置低值的VL。此外,限制处理策略也得到了修改和采用,以提高避免本地极点的能力。PSO-SAVL的性能在50维度的 benchmark 函数上得到了实验验证,并且在高维度和大规模问题中也验证了可扩展性。此外,PSO-SAVL中的策略优势也得到了实验验证。在ESE基于的适应VL策略中,对相关的 гипер参数的敏感分析也进行了,并对如何选择这些 гипер参数进行了讨论。

Physics-informed neural networks for blood flow inverse problems

  • paper_url: http://arxiv.org/abs/2308.00927
  • repo_url: https://github.com/yeyemedicen/pinns-wk-mri
  • paper_authors: Jeremias Garay, Jocelyn Dunstan, Sergio Uribe, Francisco Sahli Costabal
  • for: 解决各种不完整的系统信息和血流测量不易获得的逆问题,特别是血液动力学中精度很高的血流场测量困难。
  • methods: 使用物理学习神经网络(PINNs)方法,通过受限的血流场测量来估算系统参数和全 Velocity 场。
  • results: 使用 simulate 数据显示了稳定和准确的参数估算,而 Velocity 重建结果受测量质量和流动模式复杂度的影响。
    Abstract Physics-informed neural networks (PINNs) have emerged as a powerful tool for solving inverse problems, especially in cases where no complete information about the system is known and scatter measurements are available. This is especially useful in hemodynamics since the boundary information is often difficult to model, and high-quality blood flow measurements are generally hard to obtain. In this work, we use the PINNs methodology for estimating reduced-order model parameters and the full velocity field from scatter 2D noisy measurements in the ascending aorta. The results show stable and accurate parameter estimations when using the method with simulated data, while the velocity reconstruction shows dependence on the measurement quality and the flow pattern complexity. The method allows for solving clinical-relevant inverse problems in hemodynamics and complex coupled physical systems.
    摘要 物理学 Informed neural networks (PINNs) 已经成为解决反向问题的有力工具,特别是当系统中的信息不完整或者杂谱测量数据存在时。这对血液动力学 particuilarly useful,因为边界信息往往难以模型,高质量血液流量测量很难实现。在这种情况下,我们使用 PINNs 方法来估算减少的模型参数和全部流速场从杂谱2D 雷达测量数据中。结果显示使用这种方法时,稳定且准确地估算参数,而流速重建受测量质量和流动模式复杂度的影响。这种方法可以解决临床有用的反向问题和复杂相互作用的物理系统。

VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference

  • paper_url: http://arxiv.org/abs/2308.00904
  • repo_url: None
  • paper_authors: Yonghe Zhao, Qiang Huang, Siwei Wu, Yun Peng, Huiyan Sun
  • For: The paper focuses on the challenge of de-confounding and counterfactual prediction in observational data, particularly in the presence of unobserved confounders. It proposes a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI).* Methods: VLUCI relaxes the unconfoundedness assumption often made in causal inference methods and disentangles observed and unobserved confounders. It uses a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes.* Results: Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI’s superior performance in inferring unobserved confounders compared to state-of-the-art counterfactual inference models. VLUCI also provides confidence intervals for counterfactual outcomes, which can aid decision-making in risk-sensitive domains.Here’s the same information in Simplified Chinese:* For: 该论文关注了在观察数据中的权衡和Counterfactual预测问题,尤其是在存在隐藏的干扰因素的情况下。它提出了一种基于变量学习的隐藏干扰因素counterfactual推断模型(VLUCI)。* Methods: VLUCI弃置了常见的隐藏干扰因素假设,并将观察和隐藏干扰因素分离开来。它使用了一种双变量推断模型来估算隐藏干扰因素的 posterior distribution,并将其用于更准确地预测Counterfactual结果。* Results: 对于 synthetic和半 synthetic 数据集,VLUCI的实验结果表明,它在推断隐藏干扰因素方面表现出色,与当前的Counterfactual推断模型相比。VLUCI还提供了Counterfactual结果的置信区间,可以帮助在风险敏感领域做出决策。
    Abstract Causal inference plays a vital role in diverse domains like epidemiology, healthcare, and economics. De-confounding and counterfactual prediction in observational data has emerged as a prominent concern in causal inference research. While existing models tackle observed confounders, the presence of unobserved confounders remains a significant challenge, distorting causal inference and impacting counterfactual outcome accuracy. To address this, we propose a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI), which generates the posterior distribution of unobserved confounders. VLUCI relaxes the unconfoundedness assumption often overlooked by most causal inference methods. By disentangling observed and unobserved confounders, VLUCI constructs a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes. Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI's superior performance in inferring unobserved confounders. It is compatible with state-of-the-art counterfactual inference models, significantly improving inference accuracy at both group and individual levels. Additionally, VLUCI provides confidence intervals for counterfactual outcomes, aiding decision-making in risk-sensitive domains. We further clarify the considerations when applying VLUCI to cases where unobserved confounders don't strictly conform to our model assumptions using the public IHDP dataset as an example, highlighting the practical advantages of VLUCI.
    摘要 causal inference在多个领域中扮演着重要的角色,如 epidemiology、医疗和经济学。在观察数据中,去掉和预测counterfactual outcome是 causal inference研究中的一项显著挑战。而现有的模型只能处理观察到的干扰因素,不能正确地处理未观察到的干扰因素,这会扭曲 causal inference 和降低 counterfactual outcome 的准确性。为解决这个问题,我们提出了一种新的变量学习模型,即 unobserved confounders variational learning model (VLUCI),它可以生成未观察到的干扰因素的 posterior distribution。VLUCI 采用了一种放弃了 unconfoundedness 假设的方法,从而更好地处理未观察到的干扰因素。通过分离观察到的干扰因素和未观察到的干扰因素,VLUCI 构建了一个 doubly variational inference 模型,用于近似未观察到的干扰因素的分布,并用这些分布来预测更加准确的 counterfactual outcomes。我们在 sintetic 和 semi-syntetic 数据集上进行了广泛的实验,并证明了 VLUCI 在推断未观察到的干扰因素方面的优秀表现。此外,VLUCI 与当前的 counterfactual inference 模型相容,可以在组织和个体水平上提高推断准确性。同时,VLUCI 还提供了对 counterfactual outcomes 的信任 интерVAL,帮助在风险敏感领域做出决策。我们还在公共 IHDP 数据集上进行了实践推断,并通过 illustrate 了在实际应用中的考虑因素,展示了 VLUCI 的实用优势。

Enhancing Machine Learning Performance with Continuous In-Session Ground Truth Scores: Pilot Study on Objective Skeletal Muscle Pain Intensity Prediction

  • paper_url: http://arxiv.org/abs/2308.00886
  • repo_url: None
  • paper_authors: Boluwatife E. Faremi, Jonathon Stavres, Nuno Oliveira, Zhaoxian Zhou, Andrew H. Sung
    for:这个研究的目的是为了开发一种可以实时、连续地评估疼痛Intensity的设备,并使用机器学习模型来对疼痛进行分类。methods:这个研究使用了两种设备来获取实时、连续的疼痛Score,并使用机器学习模型来对疼痛进行分类。这些模型包括多层感知机(MLP)和随机森林(RF)。results:研究发现,使用实时、连续的疼痛Score可以提高机器学习模型对疼痛的分类性能,比使用后期录入的疼痛Score更高。具体来说,使用实时、连续的疼痛Score可以提高模型的平均准确率达75.9%和78.3%,而使用后期录入的疼痛Score只能达70.3%和74.6%。这个研究提供了一种新的方法,可以帮助解决疼痛分类中的问题,例如真实性issue、数据不均衡和高方差。
    Abstract Machine learning (ML) models trained on subjective self-report scores struggle to objectively classify pain accurately due to the significant variance between real-time pain experiences and recorded scores afterwards. This study developed two devices for acquisition of real-time, continuous in-session pain scores and gathering of ANS-modulated endodermal activity (EDA).The experiment recruited N = 24 subjects who underwent a post-exercise circulatory occlusion (PECO) with stretch, inducing discomfort. Subject data were stored in a custom pain platform, facilitating extraction of time-domain EDA features and in-session ground truth scores. Moreover, post-experiment visual analog scale (VAS) scores were collected from each subject. Machine learning models, namely Multi-layer Perceptron (MLP) and Random Forest (RF), were trained using corresponding objective EDA features combined with in-session scores and post-session scores, respectively. Over a 10-fold cross-validation, the macro-averaged geometric mean score revealed MLP and RF models trained with objective EDA features and in-session scores achieved superior performance (75.9% and 78.3%) compared to models trained with post-session scores (70.3% and 74.6%) respectively. This pioneering study demonstrates that using continuous in-session ground truth scores significantly enhances ML performance in pain intensity characterization, overcoming ground truth sparsity-related issues, data imbalance, and high variance. This study informs future objective-based ML pain system training.
    摘要 机器学习(ML)模型在主观自报分数上训练时,很难准确地分类疼痛,因为真实时间疼痛经验和记录后分数之间存在很大的变化。这项研究开发了两种设备,用于实时、连续式疼痛分数的获取和胰脏活动(EDA)的捕获。实验采用N = 24名参与者,进行了后遮挡(PECO)和压缩,引起不适。参与者数据被存储在自定义疼痛平台上,以便提取时间域EDA特征和实时真实分数。此外,每名参与者也提供了后实验Visual Analog Scale(VAS)分数。使用对应的物理层拟合(MLP)和随机森林(RF)机器学习模型,并对它们进行10个拆分验证。结果表明,MLP和RF模型通过与实时EDA特征和实时分数进行组合训练,在10个拆分验证中表现出色(75.9%和78.3%),比使用后实验VAS分数进行训练的模型(70.3%和74.6%)高得多。这项先锋性的研究表明,使用连续实时真实分数可以大幅提高ML在疼痛Intensity Characterization中的表现,超越真实分数稀缺、数据不均衡和高变化问题。这项研究对未来基于Objective的ML疼痛系统训练提供了重要信息。

Beneficent Intelligence: A Capability Approach to Modeling Benefit, Assistance, and Associated Moral Failures through AI Systems

  • paper_url: http://arxiv.org/abs/2308.00868
  • repo_url: None
  • paper_authors: Alex John London, Hoda heidari
  • for: 本文使用尼钦和泽妮的能力方法 formalizes a network of ethical concepts and entitlements necessary for AI systems to confer meaningful benefit or assistance to stakeholders.
  • methods: 本文使用尼钦和泽妮的能力方法 to characterize two necessary conditions for morally permissible interactions between AI systems and those impacted by their functioning, and two sufficient conditions for realizing the ideal of meaningful benefit.
  • results: 本文证明了AI系统与利益受者之间的互动满足了两个必要条件,以及两个足够条件,以实现意义fulfillment的理想。同时,文章还描述了several salient failure modes, such as unjustified paternalism, coercion, deception, exploitation, and domination.
    Abstract The prevailing discourse around AI ethics lacks the language and formalism necessary to capture the diverse ethical concerns that emerge when AI systems interact with individuals. Drawing on Sen and Nussbaum's capability approach, we present a framework formalizing a network of ethical concepts and entitlements necessary for AI systems to confer meaningful benefit or assistance to stakeholders. Such systems enhance stakeholders' ability to advance their life plans and well-being while upholding their fundamental rights. We characterize two necessary conditions for morally permissible interactions between AI systems and those impacted by their functioning, and two sufficient conditions for realizing the ideal of meaningful benefit. We then contrast this ideal with several salient failure modes, namely, forms of social interactions that constitute unjustified paternalism, coercion, deception, exploitation and domination. The proliferation of incidents involving AI in high-stakes domains underscores the gravity of these issues and the imperative to take an ethics-led approach to AI systems from their inception.
    摘要 现存的AI伦理报告缺乏能够捕捉AI系统与个人之间多样化伦理问题的语言和形式主义。基于尼钦和恩素的能力方法,我们提出了一个框架,它将定义AI系统与利益所有者之间的伦理概念和权利网络,以确保AI系统对利益所有者 confer meaningful benefit或帮助。这些系统可以提高利益所有者的生活计划和 благополучия,同时尊重其基本权利。我们确定了AI系统与利益所有者之间的两个必要条件,以及实现意义ful benefit的两个 suficient condition。然后,我们对这一理想与一些常见的失败模式进行了对比,包括不公正的父权主义、压力、骗子、掠夺和支配。随着AI在高风险领域的普及,这些问题的严重性和采取伦理领导的AI系统的必要性变得更加明显。

PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

  • paper_url: http://arxiv.org/abs/2308.00864
  • repo_url: None
  • paper_authors: Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho, Cathy Wu, Katherine Driggs-Campbell
  • for: 这篇论文旨在提出一个基于 Piecewise Constant (PC) 政策的合作性建议系统,以减少城市路段的拥堵。
  • methods: 本论文使用了一种基于 variational autoencoder 的不监督学习方法来推断 drivers 的内在特征,然后使用这些特征来构成一个Personalized Residual Policy (PeRP),以提供对 drivers 的个性化建议。
  • results: 本论文的结果显示,这个方法可以成功地减少城市路段的拥堵,并且适应不同的 driver 行为,提高了平均速度的效率,相比基于eline 的方法,提高了4%到22%。
    Abstract Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver's intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.
    摘要 智能驾驶系统可以减轻交通压力,提高多种社会经济因素,如通勤时间和油耗成本。然而,这些系统假设自动驾驶车辆队伍的精准控制,因此在实践中有限制,因为它们不能考虑人类行为的不确定性。 Piecewise Constant(PC)政策可以解决这些问题,通过结构化模型人类驾驶行为,以减少笔记压力,提供行为建议,使人类 drivers 遵循。然而,PC 政策假设所有 drivers 都会类似行为。为此,我们开发了一种合作建议系统,基于 PC 政策,并使用一种 novel 的 Driver Trait 受控 Personalized Residual Policy(PeRP)。PeRP 建议 drivers 采取降低交通压力的行为。我们首先通过不监督的方式,使用变量自动编码器,推断 driver 的内在特征,然后,根据推断的特征,condition 政策,以提供个性化建议。我们的系统在模拟环境中训练,并使用新型的 driver 模型来评估 adherence 指标。我们的方法成功地减轻交通压力,适应不同的 driver 行为,相比基准值,提高了4%-22%的平均速度。

Active Inference in String Diagrams: A Categorical Account of Predictive Processing and Free Energy

  • paper_url: http://arxiv.org/abs/2308.00861
  • repo_url: None
  • paper_authors: Sean Tull, Johannes Kleiner, Toby St Clere Smithe
  • for: 提供一种Category的表述方法,用于表示Predictive Processing和Active Inference的认知框架。
  • methods: 使用字符串 диаграмм表示Generative Model、Bayesian更新、感知、规划、活动推理和自由能量。
  • results: 提出一种Diagrammatic derivation of the formula for active inference via free energy minimization,并证明自由能量可以在任何 Agent 的生成模型中应用。
    Abstract We present a categorical formulation of the cognitive frameworks of Predictive Processing and Active Inference, expressed in terms of string diagrams interpreted in a monoidal category with copying and discarding. This includes diagrammatic accounts of generative models, Bayesian updating, perception, planning, active inference, and free energy. In particular we present a diagrammatic derivation of the formula for active inference via free energy minimisation, and establish a compositionality property for free energy, allowing free energy to be applied at all levels of an agent's generative model. Aside from aiming to provide a helpful graphical language for those familiar with active inference, we conversely hope that this article may provide a concise formulation and introduction to the framework.
    摘要 我们提出了一种分类的框架表述,表示预测处理和活跃推理的 cognitive 框架,用 string diagram 在一个单簇类别中表示,这包括生成模型、贝叶斯更新、感知、规划、活跃推理和自由能。特别是,我们提供了一个 diagrammatic 的更新方法,以及证明了 free energy 的合理性和可composability,允许 free energy 在生成模型中的所有层级上适用。除了将 active inference 表示为一个帮助 grafical 语言,我们希望这篇文章能够提供一个简洁的框架表述,并且对于 familiar 的人来说,提供一个帮助他们更好地理解这个框架。

Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic Processes

  • paper_url: http://arxiv.org/abs/2308.00858
  • repo_url: None
  • paper_authors: Stephan Johann Lehmler, Muhammad Saif-ur-Rehman, Tobias Glasmachers, Ioannis Iossifidis
  • for: 这个论文的目的是用数学抽象和模型来 deeper understanding of (deep) artificial neural networks 的行为和学习动力。
  • methods: 这篇论文提出了使用 Stochastic Processes 框架,模型(深度)人工神经网络的活动模式。文中使用 neuroscience 技术来检测和分析神经网络中的活动模式。
  • results: 研究人员通过对不同的人工神经网络图像识别任务数据进行分析,发现了不同的网络架构和训练集的活动模式之间的一致性。通过计算 Mean Firing Rate、Mean Fano Factor 和 Variances,研究人员发现了在学习过程中的记忆化现象,提供了有价值的学习行为的理解。
    Abstract To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural networks, it is valuable to employ mathematical abstractions and models. These tools provide a simplified perspective on network performance and facilitate systematic investigations through simulations. In this paper, we propose utilizing the framework of stochastic processes, which has been underutilized thus far. Our approach models activation patterns of thresholded nodes in (deep) artificial neural networks as stochastic processes. We focus solely on activation frequency, leveraging neuroscience techniques used for real neuron spike trains. During a classification task, we extract spiking activity and use an arrival process following the Poisson distribution. We examine observed data from various artificial neural networks in image recognition tasks, fitting the proposed model's assumptions. Through this, we derive parameters describing activation patterns in each network. Our analysis covers randomly initialized, generalizing, and memorizing networks, revealing consistent differences across architectures and training sets. Calculating Mean Firing Rate, Mean Fano Factor, and Variances, we find stable indicators of memorization during learning, providing valuable insights into network behavior. The proposed model shows promise in describing activation patterns and could serve as a general framework for future investigations. It has potential applications in theoretical simulations, pruning, and transfer learning.
    摘要 使用数学抽象和模型来深入理解人工神经网络的行为和学习 dinamics可以提供有价值的信息。在这篇论文中,我们提议使用Stochastic Processes框架,这种框架在人工神经网络领域一直未得到过足够的利用。我们的方法是将人工神经网络中的激活节点的激活模式模型为Stochastic Processes,并且仅ocus on激活频率,基于 neuroscience 技术来处理真正的神经元发射 Train。在进行图像识别任务时,我们提取了激活活动,并使用Poisson分布来描述到达过程。我们对各种人工神经网络的图像识别任务数据进行了分析,并适应了我们的模型假设。我们的分析覆盖了随机初始化、通用和记忆网络,并发现这些网络在不同的架构和训练集上存在一致的差异。通过计算Mean Firing Rate、Mean Fano Factor和Variances,我们发现在学习过程中记忆化的指标,这些指标提供了人工神经网络的行为中的有价值信息。我们的模型表现良好,并有可能在理论 simulations、树脂和传输学习等领域得到应用。

Training on Foveated Images Improves Robustness to Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2308.00854
  • repo_url: None
  • paper_authors: Muhammad A. Shah, Bhiksha Raj
  • for: 本研究旨在探讨人类视觉系统的一种重要特征:在周围视场中常见低精度视觉刺激对视觉模型的Robustness有什么影响。
  • methods: 我们开发了一种名为\RBlur的图像变换,用于模拟周围视场中视觉刺激的失真。这种变换基于给定注意点的距离来减少图像的清晰度和颜色温度。
  • results: 与原始图像进行训练的DNNs比起,使用\RBlur变换训练的DNNs在受到攻击和非攻击干扰后的数据上的准确率提高了15%至25%。
    Abstract Deep neural networks (DNNs) have been shown to be vulnerable to adversarial attacks -- subtle, perceptually indistinguishable perturbations of inputs that change the response of the model. In the context of vision, we hypothesize that an important contributor to the robustness of human visual perception is constant exposure to low-fidelity visual stimuli in our peripheral vision. To investigate this hypothesis, we develop \RBlur, an image transform that simulates the loss in fidelity of peripheral vision by blurring the image and reducing its color saturation based on the distance from a given fixation point. We show that compared to DNNs trained on the original images, DNNs trained on images transformed by \RBlur are substantially more robust to adversarial attacks, as well as other, non-adversarial, corruptions, achieving up to 25\% higher accuracy on perturbed data.
    摘要

Designing a Communication Bridge between Communities: Participatory Design for a Question-Answering AI Agent

  • paper_url: http://arxiv.org/abs/2308.00813
  • repo_url: None
  • paper_authors: Jeonghyun Lee, Vrinda Nandan, Harshvardhan Sikka, Spencer Rugaber, Ashok Goal
  • For: The paper aims to design an AI system that acts as a communication bridge between two user communities with different mental models and vocabularies.* Methods: The authors used a variation of participatory design to elicit requirements for developing a question-answering agent that explains how Skillsync works and acts as a communication bridge between company and college users.* Results: The study found that participatory design was useful in guiding the requirements gathering and eliciting user questions for the development of AskJill, and the two Skillsync user communities perceived glossary assistance as a key feature that AskJill needs to offer.Here are the three points in Simplified Chinese text:* For: 这篇论文的目的是设计一个能够 acted as a communication bridge between two个用户群体的 AI 系统,这两个用户群体具有不同的认知模型和术语。* Methods: 作者使用了一种变体的参与式设计方法来激发 Skillsync 的开发需求,以帮助它成为公司和学院用户之间的沟通桥梁。* Results: 研究发现,参与式设计是有用的,可以引导收集需求和提取用户问题,以便为 AskJill 的开发而设计。此外,两个 Skillsync 用户群体认为术语帮助是 AskJill 必备的功能之一,他们将从这种共同词汇中受益。
    Abstract How do we design an AI system that is intended to act as a communication bridge between two user communities with different mental models and vocabularies? Skillsync is an interactive environment that engages employers (companies) and training providers (colleges) in a sustained dialogue to help them achieve the goal of building a training proposal that successfully meets the needs of the employers and employees. We used a variation of participatory design to elicit requirements for developing AskJill, a question-answering agent that explains how Skillsync works and thus acts as a communication bridge between company and college users. Our study finds that participatory design was useful in guiding the requirements gathering and eliciting user questions for the development of AskJill. Our results also suggest that the two Skillsync user communities perceived glossary assistance as a key feature that AskJill needs to offer, and they would benefit from such a shared vocabulary.
    摘要 如何设计一个人工智能系统,让它作为两个用户社区之间的沟通桥梁?我们使用了一种参与设计的变体,与雇主(公司)和训练提供者(学院)进行持续的对话,以帮助他们建立一个成功地满足雇主和员工需求的训练提案。我们的研究发现,参与设计是有用的,可以引导需求收集和发现用户问题,以便为AskJill的开发而设计。我们的结果还显示,两个Skillsync用户社区认为词汇帮助是应有的功能,并且它们从中获益。

AnyLoc: Towards Universal Visual Place Recognition

  • paper_url: http://arxiv.org/abs/2308.00688
  • repo_url: https://github.com/AnyLoc/AnyLoc
  • paper_authors: Nikhil Keetha, Avneesh Mishra, Jay Karhade, Krishna Murthy Jatavallabhula, Sebastian Scherer, Madhava Krishna, Sourav Garg
  • for: 本研究旨在开发一种可靠的视觉地理位置认知(VPR)方法,能够在各种不同环境中(都市、室内、户外、航空、水下和地底环境)进行高精度的位置定位。
  • methods: 本研究使用了自然语言处理(NLP)和计算机视觉(CV)技术,特别是使用了自适应网络和自动Feature learning来学习通用的特征表示。
  • results: 研究结果显示,使用这些通用的特征表示和无监督特征聚合技术,可以实现4倍的性能提升,并且通过semantic特征分析得到6%的性能提升。
    Abstract Visual Place Recognition (VPR) is vital for robot localization. To date, the most performant VPR approaches are environment- and task-specific: while they exhibit strong performance in structured environments (predominantly urban driving), their performance degrades severely in unstructured environments, rendering most approaches brittle to robust real-world deployment. In this work, we develop a universal solution to VPR -- a technique that works across a broad range of structured and unstructured environments (urban, outdoors, indoors, aerial, underwater, and subterranean environments) without any re-training or fine-tuning. We demonstrate that general-purpose feature representations derived from off-the-shelf self-supervised models with no VPR-specific training are the right substrate upon which to build such a universal VPR solution. Combining these derived features with unsupervised feature aggregation enables our suite of methods, AnyLoc, to achieve up to 4X significantly higher performance than existing approaches. We further obtain a 6% improvement in performance by characterizing the semantic properties of these features, uncovering unique domains which encapsulate datasets from similar environments. Our detailed experiments and analysis lay a foundation for building VPR solutions that may be deployed anywhere, anytime, and across anyview. We encourage the readers to explore our project page and interactive demos: https://anyloc.github.io/.
    摘要 “视觉地点识别(VPR)是机器人地理位置定位的关键。至今为止,最高效的VPR方法都是环境和任务特定的:它们在结构化环境(主要是城市驾驶)中显示出强大的表现,但在无结构化环境中表现很差,导致大多数方法在实际世界中不稳定。在这项工作中,我们开发了一种通用的VPR解决方案——一种能够在各种结构化和无结构化环境中工作(城市、户外、室内、航空、水下和地下环境),无需更新或微调。我们示示了通用特征表示的概念,基于市场上的自我超级vised模型,无需VPR特定的训练,可以构建这种通用VPR解决方案。通过对这些特征进行无监督的feature集成,我们的AnyLoc方法可以实现与现有方法相比4倍更高的性能。此外,我们还发现了这些特征的 semantic properties,揭示了这些特征的唯一领域,从而提高了性能6%。我们的详细实验和分析为建立VPR解决方案提供了基础,让读者可以通过我们的项目页面和互动示例了解更多信息:https://anyloc.github.io/。”

A Knowledge-Oriented Approach to Enhance Integration and Communicability in the Polkadot Ecosystem

  • paper_url: http://arxiv.org/abs/2308.00735
  • repo_url: None
  • paper_authors: Marcio Ferreira Moreno, Rafael Rossi de Mello Brandão
  • for: 本研究旨在提供一个概念框架,以探讨和解决Polkadot生态系统中数据分析和通信问题。
  • methods: 本研究使用了域 ontology(POnto),实现了Polkadot生态系统中概念和关系的结构表示,从而提高了生态系统的 интеграbles和通信能力。
  • results: 本研究通过专家反馈和Polkadot社区的意见, validate了提案的概念框架,并提供了一个基于Controlled Natural Language的查询引擎路线图,实现了生态系统的扩展和推广。
    Abstract The Polkadot ecosystem is a disruptive and highly complex multi-chain architecture that poses challenges in terms of data analysis and communicability. Currently, there is a lack of standardized and holistic approaches to retrieve and analyze data across parachains and applications, making it difficult for general users and developers to access ecosystem data consistently. This paper proposes a conceptual framework that includes a domain ontology called POnto (a Polkadot Ontology) to address these challenges. POnto provides a structured representation of the ecosystem's concepts and relationships, enabling a formal understanding of the platform. The proposed knowledge-oriented approach enhances integration and communicability, enabling a wider range of users to participate in the ecosystem and facilitating the development of AI-based applications. The paper presents a case study methodology to validate the proposed framework, which includes expert feedback and insights from the Polkadot community. The POnto ontology and the roadmap for a query engine based on a Controlled Natural Language using the ontology, provide valuable contributions to the growth and adoption of the Polkadot ecosystem in heterogeneous socio-technical environments.
    摘要 派拉达网络生态系统是一种破坏性和高度复杂的多链架构,它在数据分析和通信方面带来了挑战。目前,有很多不同的渠道和应用程序之间的数据检索和分析几乎没有标准化和整体的方法,这使得普通用户和开发者难以一览ecosystem数据。这篇论文提出了一个概念框架,包括一个叫做POnto(派拉达 Ontology)的领域 ontology,以解决这些挑战。POnto提供了一种结构化的表示方式,帮助建立派拉达平台的正规理解。该提议的知识导向方法提高了集成和通信,使得更广泛的用户参与到ecosystem中,并促进了基于人工智能应用的开发。论文采用了一种实证方法,包括专家反馈和派拉达社区的意见,以验证提议的可行性。POnto ontology和基于控制自然语言的查询引擎路线图,为派拉达生态系统在不同的社会技术环境中的发展和推广做出了重要贡献。

Applicability of scaling laws to vision encoding models

  • paper_url: http://arxiv.org/abs/2308.00678
  • repo_url: https://github.com/suyamat/ScalingVisionEncoder
  • paper_authors: Takuya Matsuyama, Kota S Sasaki, Shinji Nishimoto
  • for: 这个论文的目的是如何建立一个高性能的视觉编码模型,以预测观看图像时的脑活动,作为Algonauts Project 2023 Challenge 的一部分。
  • methods: 这个论文使用了多种视觉模型,其参数大小从86M到4.3B不等,以建立预测模型。研究者主要关注两个方面:(1)如何通过训练集大小的变化来改善预测精度?(2)如何通过视觉模型参数大小的变化来改善预测精度?
  • results: 研究结果表明,随着训练集大小的增加,预测精度随着增加的 scaling law 改善。同时,我们发现,随着视觉模型参数大小的增加,预测精度也随着增加的 scaling law 改善。这些结果表明,增加训练集大小和视觉模型参数大小可能会导致更加准确的视觉模型,并且可能会促进视觉科学的发展。
    Abstract In this paper, we investigated how to build a high-performance vision encoding model to predict brain activity as part of our participation in the Algonauts Project 2023 Challenge. The challenge provided brain activity recorded by functional MRI (fMRI) while participants viewed images. Several vision models with parameter sizes ranging from 86M to 4.3B were used to build predictive models. To build highly accurate models, we focused our analysis on two main aspects: (1) How does the sample size of the fMRI training set change the prediction accuracy? (2) How does the prediction accuracy across the visual cortex vary with the parameter size of the vision models? The results show that as the sample size used during training increases, the prediction accuracy improves according to the scaling law. Similarly, we found that as the parameter size of the vision models increases, the prediction accuracy improves according to the scaling law. These results suggest that increasing the sample size of the fMRI training set and the parameter size of visual models may contribute to more accurate visual models of the brain and lead to a better understanding of visual neuroscience.
    摘要 在这篇论文中,我们调查了如何建立高性能视觉编码模型,以便预测大脑活动,这是我们参加了2023年Algonauts项目挑战的一部分。挑战提供了参与者通过功能成像(fMRI)技术记录的大脑活动。我们使用了多种视觉模型,其参数大小从86M到4.3B不同,建立预测模型。为建立高精度模型,我们对两个主要方面进行了分析:(1)如何随训练集大脑活动样本数量的变化,影响预测精度?(2)随视觉模型参数大小的变化,在视觉区域中预测精度如何变化?结果表明,随着训练集大脑活动样本数量的增加,预测精度按照尺度法则提高。同时,我们发现,随着视觉模型参数大小的增加,预测精度按照尺度法则提高。这些结果表明,增加训练集大脑活动样本数量和视觉模型参数大小可能会导致更高精度的视觉模型,并促进视觉 neuroscience的研究。

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00675
  • repo_url: None
  • paper_authors: Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
  • for: 本研究旨在替代示例,提供工具文档来帮助大语言模型学习新工具。
  • methods: 本研究使用了工具文档,而不是示例,来训练大语言模型。
  • results: 研究发现,使用工具文档可以达到与少量示例相同的性能,并且在实际工具使用场景中表现更加出色。
    Abstract Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones to provide. As tasks grow more complex, the selection search grows combinatorially and invariably becomes intractable. Our work provides an alternative to demonstrations: tool documentation. We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations. We substantiate our claim through three main empirical findings on 6 tasks across both vision and language modalities. First, on existing benchmarks, zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. Second, on a newly collected realistic tool-use dataset with hundreds of available tool APIs, we show that tool documentation is significantly more valuable than demonstrations, with zero-shot documentation significantly outperforming few-shot without documentation. Third, we highlight the benefits of tool documentations by tackling image generation and video tracking using just-released unseen state-of-the-art models as tools. Finally, we highlight the possibility of using tool documentation to automatically enable new applications: by using nothing more than the documentation of GroundingDino, Stable Diffusion, XMem, and SAM, LLMs can re-invent the functionalities of the just-released Grounded-SAM and Track Anything models.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China.

cs.CL - 2023-08-02

Careful Whisper – leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification

  • paper_url: http://arxiv.org/abs/2308.01327
  • repo_url: None
  • paper_authors: Laurin Wagner, Mario Zusag, Theresa Bloder
  • for: 本研究旨在提供一种自动化方法,用于从语音录音中识别语音异常,以帮助评估语音障碍。
  • methods: 该方法结合了 Connectionist Temporal Classification (CTC) 和 encoder-decoder 型自动语音识别模型,通过生成丰富的语音特征和清晰的转录,并应用了一些自然语言处理技术来提取特征,生成健康语音的原型。
  • results: 该方法可以很准确地分类 recording 中的人群,并且可以准确地分类最常见的语音障碍类型。
    Abstract This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments. By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts. We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech. Basic distance measures from these prototypes serve as input features for standard machine learning classifiers, yielding human-level accuracy for the distinction between recordings of people with aphasia and a healthy control group. Furthermore, the most frequently occurring aphasia types can be distinguished with 90% accuracy. The pipeline is directly applicable to other diseases and languages, showing promise for robustly extracting diagnostic speech biomarkers.
    摘要

Grounded Image Text Matching with Mismatched Relation Reasoning

  • paper_url: http://arxiv.org/abs/2308.01236
  • repo_url: None
  • paper_authors: Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He
  • for: 本 paper 引入了 Grounded Image Text Matching with Mismatched Relation (GITM-MR),一个新的视觉语言联合任务,用于评估基于 transformer 预训练模型的关系理解能力。
  • methods: GITM-MR 任务需要模型首先确定一个表达是否描述了一幅图片,然后Localize 提及的对象或者将文本中的匹配错误部分与图片相匹配。我们提供了一个评估预训练模型的标准准则,专注于有限数据和非标准句子长度的情况。
  • results: 我们的评估结果显示,预训练模型在有限数据和非标准句子长度情况下缺乏数据效率和长度泛化能力。为此,我们提出了 Relation-sensitive Correspondence Reasoning Network (RCRN),该模型通过irectional 信息传递和语言结构引导的bi-directional message propagation来提高关系意识和推理能力,并能够在长度泛化和数据效率两个领域中具有出色的表现。
    Abstract This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models. GITM-MR requires a model to first determine if an expression describes an image, then localize referred objects or ground the mismatched parts of the text. We provide a benchmark for evaluating pre-trained models on this task, with a focus on the challenging settings of limited data and out-of-distribution sentence lengths. Our evaluation demonstrates that pre-trained models lack data efficiency and length generalization ability. To address this, we propose the Relation-sensitive Correspondence Reasoning Network (RCRN), which incorporates relation-aware reasoning via bi-directional message propagation guided by language structure. RCRN can be interpreted as a modular program and delivers strong performance in both length generalization and data efficiency.
    摘要 Simplified Chinese translation:这篇论文介绍了一个新任务,即基于图像和文本的链接匹配任务(GITM-MR),该任务评估基于转换器预训练模型的关系理解能力。该任务需要模型先确定文本描述图像,然后在图像中找到引用的对象或将文本中匹配不正确的部分与图像相匹配。作者们提供了一个评估基准,以评估预训练模型在这个任务中的表现,特别是在有限数据和非常长文本长度下。评估结果显示,预训练模型在数据有限和文本长度不正常的情况下缺乏数据效率和长度泛化能力。为解决这个问题,作者们提出了关系感知相关理解网络(RCRN),该网络通过双向消息传递和语言结构引导的关系感知来提高模型的数据效率和长度泛化能力。RCRN可以被视为一个模块化程序,并在长度泛化和数据效率两个方面达到了优秀的表现。

Global Hierarchical Neural Networks using Hierarchical Softmax

  • paper_url: http://arxiv.org/abs/2308.01210
  • repo_url: https://github.com/jschuurmans/hsoftmax
  • paper_authors: Jetze Schuurmans, Flavius Frasincar
  • for: 这篇论文提出了一个基于垂直软max的全球层次分类框架,这种方法适用于任何分类任务中存在自然的阶层结构。
  • methods: 这篇论文使用了垂直软max创建全球层次分类器,并在四个文本分类 datasets 上进行了实验。
  • results: 在四个 datasets 中,垂直软max 都提高了与平均软max 相比的macro-F1和macro- recall,并在三个 datasets 中 дости得了更高的微精度和macro精度。
    Abstract This paper presents a framework in which hierarchical softmax is used to create a global hierarchical classifier. The approach is applicable for any classification task where there is a natural hierarchy among classes. We show empirical results on four text classification datasets. In all datasets the hierarchical softmax improved on the regular softmax used in a flat classifier in terms of macro-F1 and macro-recall. In three out of four datasets hierarchical softmax achieved a higher micro-accuracy and macro-precision.
    摘要

ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora

  • paper_url: http://arxiv.org/abs/2308.01143
  • repo_url: https://github.com/njucckevin/ads-cap
  • paper_authors: Kanzhi Cheng, Zheng Ma, Shi Zong, Jianbing Zhang, Xinyu Dai, Jiajun Chen
  • for: 这 paper 的目的是提出一种 novel framework 来生成具有准确性和多样性的风格化描述(ADS-Cap)。
  • methods: 这 paper 使用了对匹配学习模块来对图像和文本特征进行对接,并使用了 conditional variational auto-encoder 来自动记忆多样化的风格特征在隐藏空间中。它还设计了一个简单 yet effective 的检查模块来提高风格准确性。
  • results: 对两个广泛使用的风格化图像描述数据集进行实验,ADS-Cap 在保持图像一致性、风格准确性和多样性三者之间的折衔上达到了出色的表现。
    Abstract Generating visually grounded image captions with specific linguistic styles using unpaired stylistic corpora is a challenging task, especially since we expect stylized captions with a wide variety of stylistic patterns. In this paper, we propose a novel framework to generate Accurate and Diverse Stylized Captions (ADS-Cap). Our ADS-Cap first uses a contrastive learning module to align the image and text features, which unifies paired factual and unpaired stylistic corpora during the training process. A conditional variational auto-encoder is then used to automatically memorize diverse stylistic patterns in latent space and enhance diversity through sampling. We also design a simple but effective recheck module to boost style accuracy by filtering style-specific captions. Experimental results on two widely used stylized image captioning datasets show that regarding consistency with the image, style accuracy and diversity, ADS-Cap achieves outstanding performances compared to various baselines. We finally conduct extensive analyses to understand the effectiveness of our method. Our code is available at https://github.com/njucckevin/ADS-Cap.
    摘要 生成具有具体语言风格特征的图像描述文本是一项复杂的任务,尤其是需要具有各种风格特征的描述文本。在这篇论文中,我们提出了一种新的框架,即准确多样化风格描述(ADS-Cap)。我们的 ADS-Cap 首先使用对比学习模块将图像和文本特征进行对应,在训练过程中统一 paired фактических和无对应风格 Corpora。然后,我们使用conditional variational autoencoder来自动记忆在latent space中的多样化风格特征,并通过抽样提高多样性。我们还设计了一个简单 yet effective的重新检查模块,以提高风格准确性。我们的实验结果表明, compared to various baselines, ADS-Cap 在图像和风格准确性方面达到了出色的表现。我们 finally conducted extensive analyses to understand the effectiveness of our method.我们的代码可以在 https://github.com/njucckevin/ADS-Cap 上获取。

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

  • paper_url: http://arxiv.org/abs/2308.01126
  • repo_url: https://github.com/njucckevin/knowcap
  • paper_authors: Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang
  • for: This paper aims to improve the ability of current captioning approaches to generate descriptions that incorporate real-world knowledge, such as named entities and contextual information.
  • methods: The proposed method, called Knowledge-guided Replay (K-Replay), consists of two parts: a knowledge prediction task on automatically collected replay exemplars to continuously awaken the VLP model’s memory about knowledge, and a knowledge distillation constraint to improve the faithfulness of generated descriptions.
  • results: The approach effectively incorporates knowledge into descriptions, outperforming a strong VLP baseline by 20.9 points (78.7->99.6) in CIDEr score and 20.5 percentage points (34.0%->54.5%) in knowledge recognition accuracy.Here is the information in Simplified Chinese text:
  • for: 本研究旨在提高当前captioning方法可以 incorporate 实际世界知识,如名词和上下文信息。
  • methods: 提议的方法是 Knowledge-guided Replay (K-Replay),包括两部分:一个知识预测任务,使得 VLP 模型对知识的记忆不断被触发,以避免模型填充通用模式; 以及一个知识储存约束,以改善生成的描述的准确性,从而缓解知识幻觉。
  • results: 方法能够有效地将知识 incorporated 到描述中,比对 STRONG VLP 基eline 高20.9分 (78.7->99.6) 的 CIDEr 得分,以及20.5% (34.0%->54.5%) 的知识认知精度。
    Abstract Current captioning approaches tend to generate correct but "generic" descriptions that lack real-world knowledge, e.g., named entities and contextual information. Considering that Vision-Language Pre-Training (VLP) models master massive such knowledge from large-scale web-harvested data, it is promising to utilize the generalizability of VLP models to incorporate knowledge into image descriptions. However, using VLP models faces challenges: zero-shot inference suffers from knowledge hallucination that leads to low-quality descriptions, but the generic bias in downstream task fine-tuning hinders the VLP model from expressing knowledge. To address these concerns, we propose a simple yet effective method called Knowledge-guided Replay (K-Replay), which enables the retention of pre-training knowledge during fine-tuning. Our approach consists of two parts: (1) a knowledge prediction task on automatically collected replay exemplars to continuously awaken the VLP model's memory about knowledge, thus preventing the model from collapsing into the generic pattern; (2) a knowledge distillation constraint to improve the faithfulness of generated descriptions hence alleviating the knowledge hallucination. To evaluate knowledge-enhanced descriptions, we construct a novel captioning benchmark KnowCap, containing knowledge of landmarks, famous brands, special foods and movie characters. Experimental results show that our approach effectively incorporates knowledge into descriptions, outperforming strong VLP baseline by 20.9 points (78.7->99.6) in CIDEr score and 20.5 percentage points (34.0%->54.5%) in knowledge recognition accuracy. Our code and data is available at https://github.com/njucckevin/KnowCap.
    摘要 当前的标题生成方法通常会生成正确的 pero "通用" 的描述,缺乏实际世界知识,例如名词和上下文信息。考虑到视力语言预训练(VLP)模型从大规模的网络采集数据中积累了庞大的知识,因此可以利用 VLP 模型的通用性来插入知识到图像描述中。然而,使用 VLP 模型存在挑战:零 shot 推理会导致知识幻觉,从而导致低质量的描述,而下游任务精通化也会阻碍 VLP 模型表达知识。为了解决这些问题,我们提出了一种简单 yet 有效的方法,即知识引导重播(K-Replay),它可以在精通化过程中保持 VLP 模型的预训练知识。我们的方法包括两个部分:1. 使用自动收集的回退 exemplars 进行知识预测任务,以continuously awaken VLP 模型的记忆,防止模型落入通用模式;2. 使用知识继承约束,以提高生成的描述的准确性,从而缓解知识幻觉。为了评估描述中的知识,我们建立了一个新的描述 benchmark 知Cap,包含了地标、名牌产品、特色美食和电影人物的知识。实验结果显示,我们的方法可以有效地插入知识到描述中,高于强 VLP 基线 by 20.9 个 CIDEr 分数(78.7->99.6)和 20.5 个 percentage points(34.0%->54.5%)的知识认可率。我们的代码和数据可以在 上获取。

MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching

  • paper_url: http://arxiv.org/abs/2308.01927
  • repo_url: https://github.com/zju-daily/multiem
  • paper_authors: Xiaocan Zeng, Pengfei Wang, Yuren Mao, Lu Chen, Xiaoze Liu, Yunjun Gao
  • for: 这篇论文主要针对的是实际数据管理系统中实现不监督实体匹配(Unsupervised Entity Matching,UEM)问题。
  • methods: 这篇论文提出了一种新的无监督多表实体匹配方法(Multi-table Entity Matching,MultiEM),它是一个并行可分解的管道,包括加强实体表示、表wise层次合并和浸泡筛选。
  • results: 对六个实际数据集进行了广泛的实验,证明了 MultiEM 在效果和效率两个方面具有优势。
    Abstract Entity Matching (EM), which aims to identify all entity pairs referring to the same real-world entity from relational tables, is one of the most important tasks in real-world data management systems. Due to the labeling process of EM being extremely labor-intensive, unsupervised EM is more applicable than supervised EM in practical scenarios. Traditional unsupervised EM assumes that all entities come from two tables; however, it is more common to match entities from multiple tables in practical applications, that is, multi-table entity matching (multi-table EM). Unfortunately, effective and efficient unsupervised multi-table EM remains under-explored. To fill this gap, this paper formally studies the problem of unsupervised multi-table entity matching and proposes an effective and efficient solution, termed as MultiEM. MultiEM is a parallelable pipeline of enhanced entity representation, table-wise hierarchical merging, and density-based pruning. Extensive experimental results on six real-world benchmark datasets demonstrate the superiority of MultiEM in terms of effectiveness and efficiency.
    摘要 实体匹配(EM),旨在从关系表中找到真实世界实体对应的所有实体对,是现实数据管理系统中最重要的任务之一。由于EM标注过程非常劳动密集,因此在实际场景中更适合使用无监督EM。传统的无监督EM假设所有实体来自两个表,但在实际应用中更常见的是从多个表匹配实体,即多表实体匹配(多表EM)。然而,有效和高效的无监督多表EM还未得到了足够的探索。为了填补这个空白,本文正式研究了无监督多表实体匹配问题,并提出了一种高效和高效的解决方案,名为MultiEM。MultiEM是一个并行的管道,包括增强实体表示、表位层次合并和浮点筛选。在六个真实世界 benchmark 数据集上进行了广泛的实验,结果表明MultiEM在效果和效率两个方面具有优势。

Leveraging Few-Shot Data Augmentation and Waterfall Prompting for Response Generation

  • paper_url: http://arxiv.org/abs/2308.01080
  • repo_url: None
  • paper_authors: Lea Krause, Selene Báez Santamaría, Michiel van der Meer, Urja Khurana
  • for: 本研究探讨了基于主观知识的对话模型设计,尤其是响应生成。
  • methods: 我们的方法采用了大量数据分析,以评估提供的数据集中的关键因素,如响应长度、情感和对话动作。我们还使用了少量学习来扩展数据集,并提出了三种方法来解决DSTC11:(1)任务特定模型探索,(2)将最常见问题 incorporate into all generated responses,和(3)水fall提问技术使用组合GPT-3和ChatGPT。
  • results: 我们的实验结果表明,使用水fall提问技术可以提高对话模型的性能,并且可以在不同的任务上实现高效的对话模型设计。
    Abstract This paper discusses our approaches for task-oriented conversational modelling using subjective knowledge, with a particular emphasis on response generation. Our methodology was shaped by an extensive data analysis that evaluated key factors such as response length, sentiment, and dialogue acts present in the provided dataset. We used few-shot learning to augment the data with newly generated subjective knowledge items and present three approaches for DSTC11: (1) task-specific model exploration, (2) incorporation of the most frequent question into all generated responses, and (3) a waterfall prompting technique using a combination of both GPT-3 and ChatGPT.
    摘要
  1. 任务特定模型探索 (task-specific model exploration)2. 包含最常见问题的所有生成响应 (incorporation of the most frequent question into all generated responses)3. 组合GPT-3和ChatGPT的水质提示技术 (a waterfall prompting technique using a combination of both GPT-3 and ChatGPT)Note:* “DSTC11” refers to the Dialogue System Technology Challenge (DSTC) 2011, which is a benchmarking task for conversational AI systems.* “subjective knowledge” refers to the knowledge that is personal and subjective, and may not be easily quantifiable or observable.* “response generation” refers to the task of generating appropriate responses to user inputs in a conversational setting.

Industrial Memories: Exploring the Findings of Government Inquiries with Neural Word Embedding and Machine Learning

  • paper_url: http://arxiv.org/abs/2308.02556
  • repo_url: None
  • paper_authors: Susan Leavy, Emilie Pine, Mark T Keane
  • for: 支持大量文本检索,探索政府调查发现的结论
  • methods: 使用word embedding、文本分类和可视化技术,创建一个交互式网页平台,帮助探索文本,发现新的历史发现
  • results: 通过转换爱尔兰政府的industrial school inquiry发现,创建了一个可交互的网页平台,帮助探索文本,发现新的历史发现
    Abstract We present a text mining system to support the exploration of large volumes of text detailing the findings of government inquiries. Despite their historical significance and potential societal impact, key findings of inquiries are often hidden within lengthy documents and remain inaccessible to the general public. We transform the findings of the Irish government's inquiry into industrial schools and through the use of word embedding, text classification and visualisation, present an interactive web-based platform that enables the exploration of the text to uncover new historical insights.
    摘要 我们提供一个文本挖掘系统,用于探索大量关于政府调查结果的文本。尽管这些调查结果具有历史意义和社会影响,但它们常常被困在长篇文本中,不可供一般公众查阅。我们使用词嵌入、文本分类和视觉化技术,将爱尔兰政府关于工业学校调查的结果转化为一个交互式网页平台,让用户可以通过探索文本来发现新的历史发现。

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

  • paper_url: http://arxiv.org/abs/2308.01018
  • repo_url: None
  • paper_authors: Ramanan Sivaguru, Vasista Sai Lodagala, S Umesh
  • for: 提高 FastSpeech2 synthesized speech 质量
  • methods: 使用 Self-Supervised Learning (SSL) 模型的表示来增强 FastSpeech2 synthesized speech 的质量
  • results: 比基eline FastSpeech2 更高的对象和主观评价指标表示提高了 synthesized speech 的质量
    Abstract While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditional inputs, it still leaves scope for richer representations. As a part of this work, we leverage representations from various Self-Supervised Learning (SSL) models to enhance the quality of the synthesized speech. In particular, we pass the FastSpeech2 encoder's length-regulated outputs through a series of encoder layers with the objective of reconstructing the SSL representations. In the SALTTS-parallel implementation, the representations from this second encoder are used for an auxiliary reconstruction loss with the SSL features. The SALTTS-cascade implementation, however, passes these representations through the decoder in addition to having the reconstruction loss. The richness of speech characteristics from the SSL features reflects in the output speech quality, with the objective and subjective evaluation measures of the proposed approach outperforming the baseline FastSpeech2.
    摘要 While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditional inputs, it still leaves scope for richer representations. As part of this work, we leverage representations from various Self-Supervised Learning (SSL) models to enhance the quality of the synthesized speech. Specifically, we pass the FastSpeech2 encoder's length-regulated outputs through a series of encoder layers with the objective of reconstructing the SSL representations. In the SALTTS-parallel implementation, the representations from this second encoder are used for an auxiliary reconstruction loss with the SSL features. The SALTTS-cascade implementation, however, passes these representations through the decoder in addition to having the reconstruction loss. The richness of speech characteristics from the SSL features is reflected in the output speech quality, with the proposed approach outperforming the baseline FastSpeech2 based on both objective and subjective evaluation measures.

DiactTOD: Learning Generalizable Latent Dialogue Acts for Controllable Task-Oriented Dialogue Systems

  • paper_url: http://arxiv.org/abs/2308.00878
  • repo_url: None
  • paper_authors: Qingyang Wu, James Gung, Raphael Shu, Yi Zhang
  • for: 这篇论文主要是为了提高任务对话系统的回答生成质量,通过使用对话动作标注。
  • methods: 这篇论文使用了一种新的终端对话动作模型(DiactTOD),该模型可以在隐藏空间中表示对话动作,并在零Instance情况下使用这些隐藏表示来生成可控的回答。
  • results: 在多个实验设定下,包括零实例、少数实例和全数据精度调整,该方法达到了状态之最高的性能,并且可以在终端和策略优化配置下实现Zero-shot和几shot的情况。
    Abstract Dialogue act annotations are important to improve response generation quality in task-oriented dialogue systems. However, it can be challenging to use dialogue acts to control response generation in a generalizable way because different datasets and tasks may have incompatible annotations. While alternative methods that utilize latent action spaces or reinforcement learning do not require explicit annotations, they may lack interpretability or face difficulties defining task-specific rewards. In this work, we present a novel end-to-end latent dialogue act model (DiactTOD) that represents dialogue acts in a latent space. DiactTOD, when pre-trained on a large corpus, is able to predict and control dialogue acts to generate controllable responses using these latent representations in a zero-shot fashion. Our approach demonstrates state-of-the-art performance across a wide range of experimental settings on the MultiWOZ dataset, including zero-shot, few-shot, and full data fine-tuning with both end-to-end and policy optimization configurations.
    摘要 对话执行动标注是提高任务对话系统响应质量的关键。然而,使用对话执行动来控制响应生成可能会困难,因为不同的数据集和任务可能存在不兼容的标注。而使用隐藏空间或奖励学习方法可能缺乏可读性或定义任务特定的奖励是困难的。在这项工作中,我们提出了一种新的端到端隐藏对话执行动模型(DiactTOD),该模型可以在隐藏空间中表示对话执行动。当 DiactTOD 在大量文本 corpus 上预训练后,可以预测和控制对话执行动,并使用这些隐藏表示生成可控的响应。我们的方法在多种实验设置下达到了状态级表现,包括零shot、少shot 和全数据精度调整,并在端到端和政策优化配置下达到了最佳性能。

Proceedings Modalities in substructural logics: Applications at the interfaces of logic, language and computation

  • paper_url: http://arxiv.org/abs/2308.03679
  • repo_url: None
  • paper_authors: Michael Moortgat, Mehrnoosh Sadrzadeh
  • for: 本文探讨了类别逻辑中的隐式结构规则,并通过模态逻辑控制逻辑资源的分配。
  • methods: 本文使用了模态逻辑来控制逻辑资源的分配,并应用于自然语言 syntax 和 semantics 等领域。
  • results: 本文在逻辑方面提供了新的应用领域,包括自然语言 syntax 和 semantics 等领域的动态逻辑推理。
    Abstract By calling into question the implicit structural rules that are taken for granted in classical logic, substructural logics have brought to the fore new forms of reasoning with applications in many interdisciplinary areas of interest. Modalities, in the substructural setting, provide the tools to control and finetune the logical resource management. The focus of the workshop is on applications in the areas of interest to the ESSLLI community, in particular logical approaches to natural language syntax and semantics and the dynamics of reasoning. The workshop is held with the support of the Horizon 2020 MSCA-Rise project MOSAIC .
    摘要 经启发了古典逻辑中隐式结构规则的假设,Subview逻辑已经把新的理由形式带到了前线。Modalities在Subview设置下提供了控制和精细化逻辑资源的工具。工作室的焦点是在ESSLLI社区兴趣领域的应用,特别是逻辑对自然语言语法和 semantics的逻辑approaches以及推理的动态。工作室得到了欧盟海绵2020年MSCA-Rise项目MOSAIC的支持。

Aspect based sentimental analysis for travellers’ reviews

  • paper_url: http://arxiv.org/abs/2308.02548
  • repo_url: None
  • paper_authors: Mohammed Saad M Alaydaa, Jun Li, Karl Jinkins
  • for: 本研究旨在为机场管理人员提供更加细化的服务质量评估方法,以便更好地了解旅客的需求和意见。
  • methods: 本研究使用了方面基 sentiment分析方法,以分析来自Google Maps的旅客评论,并提供了更加细化的服务质量评估结果。
  • results: 研究结果表明,使用方面基 sentiment分析方法可以提供更加细化的服务质量评估结果,帮助机场管理人员更好地了解旅客的需求和意见,并提供了更加准确的改进建议。
    Abstract Airport service quality evaluation is commonly found on social media, including Google Maps. This valuable for airport management in order to enhance the quality of services provided. However; prior studies either provide general review for topics discussed by travellers or provide sentimental value to tag the entire review without specifically mentioning the airport service that is behind such value. Accordingly, this work proposes using aspect based sentimental analysis in order to provide more detailed analysis for travellers reviews. This works applied aspect based sentimental analysis on data collected from Google Map about Dubai and Doha airports. The results provide tangible reasons to use aspect based sentimental analysis in order to understand more the travellers and spot airport services that are in need for improvement.
    摘要 空港服务质量评估通常出现在社交媒体上,包括Google Maps。这对空港管理有益,可以提高提供的服务质量。然而,先前的研究 either提供旁ieri travelers discussed的通用评论或对整个评论进行情感值标签,而不是特定的空港服务。因此,这项工作提议使用方面基 sentimental analysis,以提供更加细化的旅客评论分析。这个工作在Google Maps上收集的关于 Dubai 和 Doha 机场的数据上进行了方面基 sentimental analysis。结果提供了具体的原因,以便使用方面基 sentimental analysis,以更好地理解旅客和改进机场服务。

GRDD: A Dataset for Greek Dialectal NLP

  • paper_url: http://arxiv.org/abs/2308.00802
  • repo_url: https://github.com/stergioscha/greek_dialect_corpus
  • paper_authors: Stergios Chatzikyriakidis, Chatrine Qwaider, Ilias Kolokousis, Christina Koula, Dimitris Papadakis, Efthymia Sakellariou
  • for: This paper is written for the purpose of creating a large-scale dataset for the computational study of Modern Greek dialects, and to perform dialect identification using machine learning (ML) algorithms and deep learning (DL) architectures.
  • methods: The paper uses a dataset of raw text data from four Modern Greek dialects (Cretan, Pontic, Northern Greek, and Cypriot Greek) to perform dialect identification. The authors experiment with traditional ML algorithms and simple DL architectures to achieve good performance on the task.
  • results: The results of the paper show very good performance on the task of dialect identification, with the top performing algorithms achieving high accuracy. However, error analysis reveals that insufficient dataset cleaning is a major source of errors.
    Abstract In this paper, we present a dataset for the computational study of a number of Modern Greek dialects. It consists of raw text data from four dialects of Modern Greek, Cretan, Pontic, Northern Greek and Cypriot Greek. The dataset is of considerable size, albeit imbalanced, and presents the first attempt to create large scale dialectal resources of this type for Modern Greek dialects. We then use the dataset to perform dialect idefntification. We experiment with traditional ML algorithms, as well as simple DL architectures. The results show very good performance on the task, potentially revealing that the dialects in question have distinct enough characteristics allowing even simple ML models to perform well on the task. Error analysis is performed for the top performing algorithms showing that in a number of cases the errors are due to insufficient dataset cleaning.
    摘要 在这篇论文中,我们发布了现代希腊方言的计算机研究数据集。数据集包含四种现代希腊方言的原始文本数据,即crete, pontic, northern greek和cypriot greek。数据集虽然庞大但受到干扰,但是这是现代希腊方言的大规模方言资源的首次尝试。我们使用该数据集进行方言定义,并使用传统的机器学习算法以及简单的人工智能建筑。结果表明,这些方言之间有明显的特征,使得简单的机器学习模型可以在这种任务上表现出色。我们进行错误分析,发现在一些情况下,错误是由于数据集不够清洁。

Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval

  • paper_url: http://arxiv.org/abs/2308.00762
  • repo_url: https://github.com/d3mlab/rir_data
  • paper_authors: Mohammad Mahdi Abdollah Pour, Parsa Farinneya, Armin Toroghi, Anton Korikov, Ali Pesaranghader, Touqir Sajed, Manasa Bharadwaj, Borislav Mavrin, Scott Sanner
  • for: 该论文主要针对的是 Reviewed-Item Retrieval (RIR) 任务,即使用神经网络 Retrieval (IR) 方法来对用户提出的复杂自然语言查询进行匹配。
  • methods: 该论文提出了一种基于自我超vised learning的方法,通过对 BERT 表示的对比学习来扩展 Neural IR 方法到 RIR 任务。具体来说,该方法使用了同一个物品的正面评论作为积极样本,选择最不相似的评论作为硬性正样本,并使用不同物品的评论作为硬性负样本。此外,该方法还 explore anchor sub-sampling 和 meta-data 的使用。
  • results: 实验结果表明,使用 Late Fusion 方法进行对比学习的 Neural RIR 方法可以与所有其他对比 IR 配置、神经 IR 和稀 scattered retrieval 基线方法进行比较,并且达到最高效果。这表明了在 Neural RIR 方法中利用两级结构以及在 Early Fusion 和 Late Fusion 方法之间的转换可以提高效果。
    Abstract As natural language interfaces enable users to express increasingly complex natural language queries, there is a parallel explosion of user review content that can allow users to better find items such as restaurants, books, or movies that match these expressive queries. While Neural Information Retrieval (IR) methods have provided state-of-the-art results for matching queries to documents, they have not been extended to the task of Reviewed-Item Retrieval (RIR), where query-review scores must be aggregated (or fused) into item-level scores for ranking. In the absence of labeled RIR datasets, we extend Neural IR methodology to RIR by leveraging self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews. Specifically, contrastive learning requires a choice of positive and negative samples, where the unique two-level structure of our item-review data combined with meta-data affords us a rich structure for the selection of these samples. For contrastive learning in a Late Fusion scenario, we investigate the use of positive review samples from the same item and/or with the same rating, selection of hard positive samples by choosing the least similar reviews from the same anchor item, and selection of hard negative samples by choosing the most similar reviews from different items. We also explore anchor sub-sampling and augmenting with meta-data. For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings. Experimental results show that Late Fusion contrastive learning for Neural RIR outperforms all other contrastive IR configurations, Neural IR, and sparse retrieval baselines, thus demonstrating the power of exploiting the two-level structure in Neural RIR approaches as well as the importance of preserving the nuance of individual review content via Late Fusion methods.
    摘要 随着自然语言界面的发展,用户可以提出更加复杂的自然语言查询,这对搜索笔记、书籍和电影等项目的搜索提供了更多的可能性。然而,神经信息检索(Neural IR)方法已经提供了状态态的结果,但它们没有被扩展到评论项目检索(RIR)任务中,其中需要将查询和评论得分聚合(或融合)为项目级别的分数。由于缺乏标注的 RIR 数据集,我们将神经 IR 方法扩展到 RIR 任务中,通过利用自我指导的方法进行对比学习BERT表示。具体来说,对比学习需要选择正例和负例样本,我们利用Item-review数据集的特殊两级结构,以及元数据,选择正例和负例样本。我们还 investigate了使用相同项目和相同分数的积极正例样本,选择最不相似的 anchor item 中的积极负例样本,以及使用不同项目的最相似负例样本。此外,我们还探索了 anchor 子样本抽取和元数据增强。为了更加端到端,我们引入对比项目嵌入学习,将评论 fusion 到单个项目嵌入中。实验结果表明,使用 Late Fusion 对比学习方法,Neural RIR 的性能高于所有其他对比 IR 配置、神经 IR 和稀肥检索基线,这表明了利用 Neural RIR 方法的两级结构以及在 Late Fusion 方法中保持评论内容的细节的重要性。

The Bias Amplification Paradox in Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2308.00755
  • repo_url: https://github.com/preethiseshadri518/bias-amplification-paradox
  • paper_authors: Preethi Seshadri, Sameer Singh, Yanai Elazar
  • for: 这个论文研究了模型在文本到图像领域中的偏见增强现象,通过比较训练和生成图像中的性别比例来研究。
  • methods: 这个论文使用了稳定扩散来研究模型的偏见增强现象,并发现了模型在训练数据中存在性别职业偏见的增强。
  • results: 研究发现,模型的偏见增强主要归结于训练和生成文本之间的分布差异,例如训练caption中常包含直接表达性别信息的情况,而生成提示则不包含这种信息,这导致分布差异并影响偏见度量。经过考虑这些分布差异, amplification 降低了许多。这些发现表明了比较模型和训练数据中的偏见是一个挑战,以及模型训练过程中存在干扰因素的影响。
    Abstract Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We find that the model appears to amplify gender-occupation biases found in the training data (LAION). However, we discover that amplification can largely be attributed to discrepancies between training captions and model prompts. For example, an inherent difference is that captions from the training data often contain explicit gender information while the prompts we use do not, which leads to a distribution shift and consequently impacts bias measures. Once we account for various distributional differences between texts used for training and generation, we observe that amplification decreases considerably. Our findings illustrate the challenges of comparing biases in models and the data they are trained on, and highlight confounding factors that contribute to bias amplification.
    摘要 <偏见增强是一种现象,在训练数据中存在的偏见会被模型进一步增加。在这篇论文中,我们研究了在文本到图像领域中的偏见增强,使用稳定扩散比较训练和生成图像中的性别比。我们发现,模型似乎会增加在训练数据中存在的性别职业偏见,但是我们发现,增强的主要原因是训练caption和模型提示之间的分布差异。例如,训练数据中的caption通常包含直接表达性别信息,而模型提示则不包含这种信息,这导致分布shift并影响偏见测量。一旦我们考虑了训练和生成文本之间的各种分布差异,我们发现增强明显减少。我们的发现表明了对比模型和它们训练数据中的偏见的困难,以及生成图像中的偏见增强的干扰因素。>

CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

  • paper_url: http://arxiv.org/abs/2308.00683
  • repo_url: None
  • paper_authors: Nadezhda Chirkova, Sergey Troshin
  • for: 这个研究目的是调查不同子tokization选项对代码预训练模型的影响,并找到最有效和最短的子tokization方法。
  • methods: 该研究使用了大量语言模型预训练,并提出了一种新的子tokization方法,该方法可以减少平均长度17%,而无需下游性能下降。
  • results: 研究发现,选择合适的子tokization方法可以提高质量0.5-2%,可能会增加一些长度。
    Abstract Recent works have widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, namely the effect of different subtokenization options, and aims at identifying most effective and length-efficient subtokenizations, taking into account code specifics. We propose subtokenziation that reduces average length by 17% without downstream performance drop, and show that a carefully chosen subtokenization may improve quality by 0.5-2%, possibly with some length increase.
    摘要

cs.LG - 2023-08-02

Careful Whisper – leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification

  • paper_url: http://arxiv.org/abs/2308.01327
  • repo_url: None
  • paper_authors: Laurin Wagner, Mario Zusag, Theresa Bloder
  • for: 这 paper 是为了 automatization speech anomaly detection 以检测语言障碍的评估。
  • methods: 该 paper 使用了 Connectionist Temporal Classification (CTC) 和 encoder-decoder 自动语音识别模型,生成了丰富的声学特征和清晰的转录。然后,通过应用一些自然语言处理方法,提取了转录中的特征,生成了健康语音的原型。
  • results: 该 paper 的结果表明,使用这些原型可以以人工智能水平准确地分辨人们有语言障碍和健康控制组之间的差异。此外,还可以准确地分辨出最常见的语言障碍类型。该 pipeline 可以直接应用于其他疾病和语言,显示出robustly extracting diagnostic speech biomarkers的承诺。
    Abstract This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments. By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts. We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech. Basic distance measures from these prototypes serve as input features for standard machine learning classifiers, yielding human-level accuracy for the distinction between recordings of people with aphasia and a healthy control group. Furthermore, the most frequently occurring aphasia types can be distinguished with 90% accuracy. The pipeline is directly applicable to other diseases and languages, showing promise for robustly extracting diagnostic speech biomarkers.
    摘要 这篇论文介绍了一种完全自动化的声音异常识别方法,通过结合连接主义时间分类(CTC)和扩展器-解码器基于自动语音识别模型,生成丰富的声音特征和清晰的译文。然后,通过应用一些自然语言处理方法,提取这些译文中的特征,生成健康语音的原型。基本距离度量从这些原型中提取,作为机器学习分类器的输入特征,可以达到人类水平的准确率,将recordings of people with aphasia和健康控制组分开。此外,最常出现的语言障碍类型可以达到90%的准确率。这个管道可以直接应用于其他疾病和语言,展现了抽取健康语音生物标志物的可靠性。

Do Multilingual Language Models Think Better in English?

  • paper_url: http://arxiv.org/abs/2308.01223
  • repo_url: https://github.com/juletx/self-translate
  • paper_authors: Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe
  • for: 本研究旨在提高多语言语言模型的性能,并提出了一种新的自动翻译方法(self-translate),可以减少外部翻译系统的使用。
  • methods: 本研究使用了多语言语言模型的几个任务来评估自动翻译方法的性能,并通过对输入数据进行翻译并运行推理来评估模型的性能。
  • results: 实验结果显示,使用自动翻译方法可以 Consistently outperform direct inference,这表明多语言语言模型在非英语语言下的表达能力尚未被完全发挥。
    Abstract Translate-test is a popular technique to improve the performance of multilingual language models. This approach works by translating the input into English using an external machine translation system, and running inference over the translated input. However, these improvements can be attributed to the use of a separate translation system, which is typically trained on large amounts of parallel data not seen by the language model. In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models. Experiments over 5 tasks show that self-translate consistently outperforms direct inference, demonstrating that language models are unable to leverage their full multilingual potential when prompted in non-English languages. Our code is available at https://github.com/juletx/self-translate.
    摘要 <> traduction-test 是一种受欢迎的技术,用于改进多语言语音模型的性能。这种方法工作通过将输入翻译成英语使用外部机器翻译系统,然后运行推理 sobre 翻译后的输入。然而,这些改进可以归功于使用分开的翻译系统,这个系统通常是基于大量的并行数据不被语音模型训练的。在这项工作中,我们介绍了一种新的方法called自动翻译(self-translate),它超越了需要外部翻译系统的需求,通过多语言语音模型的几个shot翻译能力来实现。经过5个任务的实验表明,自动翻译在直接推理的情况下一直表现出优于,这说明了语音模型在非英语提问时不能完全发挥其多语言潜力。我们的代码可以在 中找到。

Calibration in Deep Learning: A Survey of the State-of-the-Art

  • paper_url: http://arxiv.org/abs/2308.01222
  • repo_url: None
  • paper_authors: Cheng Wang
  • for: This paper is written for researchers and practitioners who are interested in calibrating deep neural models for safety-critical applications.
  • methods: The paper reviews state-of-the-art calibration methods for deep models, including post-hoc calibration, regularization methods, uncertainty estimation, and composition methods.
  • results: The paper provides an understanding of the principles of model calibration, including the definition of model miscalibration and key metrics for measuring it. It also covers recent advancements in calibrating large models, particularly large language models (LLMs).
    Abstract Calibrating deep neural models plays an important role in building reliable, robust AI systems in safety-critical applications. Recent work has shown that modern neural networks that possess high predictive capability are poorly calibrated and produce unreliable model predictions. Though deep learning models achieve remarkable performance on various benchmarks, the study of model calibration and reliability is relatively underexplored. Ideal deep models should have not only high predictive performance but also be well calibrated. There have been some recent methods proposed to calibrate deep models by using different mechanisms. In this survey, we review the state-of-the-art calibration methods and provide an understanding of their principles for performing model calibration. First, we start with the definition of model calibration and explain the root causes of model miscalibration. Then we introduce the key metrics that can measure this aspect. It is followed by a summary of calibration methods that we roughly classified into four categories: post-hoc calibration, regularization methods, uncertainty estimation, and composition methods. We also covered some recent advancements in calibrating large models, particularly large language models (LLMs). Finally, we discuss some open issues, challenges, and potential directions.
    摘要 <>模型调整在安全应用中建立可靠、Robust AI系统的重要作用。最近的工作表明,现代神经网络具有高预测能力,但是受到误差的影响,其预测结果不可靠。虽然深度学习模型在不同的测试环境中表现出色,但是模型调整和可靠性的研究相对落后。理想的深度模型应该不仅具有高预测性能,还应该具有良好的调整性。在这篇评论中,我们回顾了当前领域的状态对模型调整的方法,并提供了这些方法的原理。首先,我们开始定义模型调整,并解释了模型调整的根本原因。然后,我们介绍了评估模型调整的关键指标。接着,我们概括了调整方法,我们将它们分为四类:后处调整、regularization方法、uncertainty估计和组合方法。我们还讲述了在调整大型模型时的一些最新进展,特别是大语言模型(LLMs)。最后,我们讨论了一些开放的问题、挑战和未来的方向。<>

Using ScrutinAI for Visual Inspection of DNN Performance in a Medical Use Case

  • paper_url: http://arxiv.org/abs/2308.01220
  • repo_url: None
  • paper_authors: Rebekka Görge, Elena Haedecke, Michael Mock
  • for: 该论文旨在探讨模型性能如何受到标签质量的影响,特别是在医疗设置下,生成高质量标签需要深刻的专家知识和是非常昂贵的。
  • methods: 该论文使用了一种名为ScrutinAI的可见分析工具,来分析不同专家对数据集的标签变化对模型性能的影响。
  • results: 该论文通过对一个公共可用的数据集中脑出血的检测和分类的检测进行分析,发现模型性能受到标签质量的影响,并且可以通过ScrutinAI来分析出模型的真正弱点。
    Abstract Our Visual Analytics (VA) tool ScrutinAI supports human analysts to investigate interactively model performanceand data sets. Model performance depends on labeling quality to a large extent. In particular in medical settings, generation of high quality labels requires in depth expert knowledge and is very costly. Often, data sets are labeled by collecting opinions of groups of experts. We use our VA tool to analyse the influence of label variations between different experts on the model performance. ScrutinAI facilitates to perform a root cause analysis that distinguishes weaknesses of deep neural network (DNN) models caused by varying or missing labeling quality from true weaknesses. We scrutinize the overall detection of intracranial hemorrhages and the more subtle differentiation between subtypes in a publicly available data set.
    摘要 我们的视觉分析工具ScrutinAI可以帮助人类分析员在互动式地模型性能和数据集之间进行调查。模型性能受标注质量的影响很大。尤其在医疗设置下,生成高质量标注需要深厚的专家知识并非常昂贵。经常情况下,数据集被由不同专家的意见集成来标注。我们使用ScrutinAI来分析标注之间的差异对模型性能的影响。ScrutinAI可以进行根本原因分析,并将模型因标注质量变化或缺失而导致的弱点与真正的弱点区分开来。我们在一个公共可用的数据集中进行了总脑出血的检测和更加细致的分类 между不同类型的差异分析。

Global Hierarchical Neural Networks using Hierarchical Softmax

  • paper_url: http://arxiv.org/abs/2308.01210
  • repo_url: https://github.com/jschuurmans/hsoftmax
  • paper_authors: Jetze Schuurmans, Flavius Frasincar
  • for: 这篇论文提出了一种基于层次softmax的全局分类器框架,适用于任何具有自然层次结构的分类任务。
  • methods: 该方法使用层次softmax创建全局分类器,并在四个文本分类数据集上进行了实验。在所有数据集中,层次softmax超过了常规softmax在扫描分类器中的 macro-F1 和 macro- recall。在三个数据集中,层次softmax达到了更高的微精度和macro精度。
  • results: 实验结果表明,层次softmax在四个文本分类数据集上均显著提高了分类性能,特别是在 macro-F1 和 macro-recall 上。
    Abstract This paper presents a framework in which hierarchical softmax is used to create a global hierarchical classifier. The approach is applicable for any classification task where there is a natural hierarchy among classes. We show empirical results on four text classification datasets. In all datasets the hierarchical softmax improved on the regular softmax used in a flat classifier in terms of macro-F1 and macro-recall. In three out of four datasets hierarchical softmax achieved a higher micro-accuracy and macro-precision.
    摘要 Here is the text in Simplified Chinese:这篇论文提出了一个基于层次软max的全局层次分类器框架,适用于任何具有自然层次结构的分类任务。我们通过四个文本分类 dataset 的实验结果表明,层次软max 在 macro-F1 和 macro-recall 方面比常软max 使用的平面分类器表现更好。此外,层次软max 在三个 dataset 中达到了更高的微准确率和macro准确率。

Generative Noisy-Label Learning by Implicit Dicriminative Approximation with Partial Label Prior

  • paper_url: http://arxiv.org/abs/2308.01184
  • repo_url: None
  • paper_authors: Fengbei Liu, Yuanhong Chen, Chong Wang, Yuyuan Liu, Gustavo Carneiro
  • for: addresses the problem of learning with noisy labels, proposing a new generative approach that improves the estimation of the label transition matrix and disentangles clean and noisy labels.
  • methods: uses a new model optimization that directly associates data and clean labels, and implicitly estimates the generative model using a discriminative model, eliminating the need for inefficient training of a generative model.
  • results: achieves state-of-the-art results on several noisy-label benchmarks while maintaining a similar computational complexity as discriminative models.
    Abstract The learning with noisy labels has been addressed with both discriminative and generative models. Although discriminative models have dominated the field due to their simpler modeling and more efficient computational training processes, generative models offer a more effective means of disentangling clean and noisy labels and improving the estimation of the label transition matrix. However, generative approaches maximize the joint likelihood of noisy labels and data using a complex formulation that only indirectly optimizes the model of interest associating data and clean labels. Additionally, these approaches rely on generative models that are challenging to train and tend to use uninformative clean label priors. In this paper, we propose a new generative noisy-label learning approach that addresses these three issues. First, we propose a new model optimisation that directly associates data and clean labels. Second, the generative model is implicitly estimated using a discriminative model, eliminating the inefficient training of a generative model. Third, we propose a new informative label prior inspired by partial label learning as supervision signal for noisy label learning. Extensive experiments on several noisy-label benchmarks demonstrate that our generative model provides state-of-the-art results while maintaining a similar computational complexity as discriminative models.
    摘要 学习噪声标签已经通过推论性和生成模型进行解决。虽然推论模型因其更简单的模型化和更高效的计算训练过程而占据了领先地位,但生成模型可以更好地分离噪声标签和数据,并提高标签过渡矩阵的估计。然而,生成方法需要使用复杂的定义,只是间接地优化模型关注的数据和干净标签的关系。此外,这些方法通常需要困难地训练生成模型,并使用不够有用的干净标签估计。在这篇论文中,我们提出了一种新的生成噪声标签学习方法,解决了这三个问题。首先,我们提出了一种直接关联数据和干净标签的模型优化方法。其次,我们使用推论模型来隐式地估计生成模型,从而消除生成模型的不fficient 训练。最后,我们提出了一种基于 partial label learning 的新估计标签超级视觉信号。我们在几个噪声标签 benchmark 上进行了广泛的实验,结果表明,我们的生成模型可以在计算复杂性相同的情况下提供状态机器人的结果。

Direct Gradient Temporal Difference Learning

  • paper_url: http://arxiv.org/abs/2308.01170
  • repo_url: None
  • paper_authors: Xiaochi Qian, Shangtong Zhang
  • for: This paper focuses on addressing the instability issue in off-policy learning with function approximation and bootstrapping, known as the “deadly triad” in reinforcement learning.
  • methods: The proposed method uses two samples in a Markovian data stream with an increasing gap to directly solve the double sampling issue, without the need for extra weights or Fenchel duality.
  • results: The proposed algorithm is computationally efficient and has a convergence rate on par with the canonical on-policy temporal difference learning, as demonstrated through both asymptotic and finite sample analysis. Additionally, the method only requires a logarithmically increasing memory as time progresses.
    Abstract Off-policy learning enables a reinforcement learning (RL) agent to reason counterfactually about policies that are not executed and is one of the most important ideas in RL. It, however, can lead to instability when combined with function approximation and bootstrapping, two arguably indispensable ingredients for large-scale reinforcement learning. This is the notorious deadly triad. Gradient Temporal Difference (GTD) is one powerful tool to solve the deadly triad. Its success results from solving a doubling sampling issue indirectly with weight duplication or Fenchel duality. In this paper, we instead propose a direct method to solve the double sampling issue by simply using two samples in a Markovian data stream with an increasing gap. The resulting algorithm is as computationally efficient as GTD but gets rid of GTD's extra weights. The only price we pay is a logarithmically increasing memory as time progresses. We provide both asymptotic and finite sample analysis, where the convergence rate is on-par with the canonical on-policy temporal difference learning. Key to our analysis is a novel refined discretization of limiting ODEs.
    摘要 <> translates into 掌握了非政策学习可以让人工智能推理counterfactual about不被执行的策略,是现代人工智能中最重要的想法之一。然而,它可能会导致不稳定性,特别是在函数近似和重启执行时。这个问题被称为“恐怖三重套”。梯度时间差(GTD)是一种强大的解决方案,它通过解决重启执行和函数近似问题来解决这个问题。在这篇论文中,我们提出了一种直接解决double sampling问题的方法,通过在Markov链中使用两个采样,并且采样间隔逐渐增加。这种方法的计算效率与GTD相当,但是它不需要额外的Weight。我们付出的价格是在时间上增加logarithmic的内存。我们提供了both asymptotic和finite sample分析,其 convergencerate与标准的在政策学习中相同。关键在于我们的新的精细的分解限制ODEs。

Machine Learning-Based Diabetes Detection Using Photoplethysmography Signal Features

  • paper_url: http://arxiv.org/abs/2308.01930
  • repo_url: None
  • paper_authors: Filipe A. C. Oliveira, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
  • for: 这项研究旨在开发一种基于非侵入式光学 фото折射(PPG)的方法,用于检测糖尿病。
  • methods: 该研究使用了PPG信号和 metadata 进行训练 Logistic Regression(LR)和 eXtreme Gradient Boosting(XGBoost)算法,以分类非糖尿病和糖尿病患者。
  • results: 该模型在5个批处验证中获得了F1分数和AUC值为58.8±20.0%和79.2±15.0%(LR),以及51.7±16.5%和73.6±17.0%(XGBoost)。特征分析表明,PPG形态特征包含了糖尿病相关信息,同时metadata 也对模型的性能产生了影响。
    Abstract Diabetes is a prevalent chronic condition that compromises the health of millions of people worldwide. Minimally invasive methods are needed to prevent and control diabetes but most devices for measuring glucose levels are invasive and not amenable for continuous monitoring. Here, we present an alternative method to overcome these shortcomings based on non-invasive optical photoplethysmography (PPG) for detecting diabetes. We classify non-Diabetic and Diabetic patients using the PPG signal and metadata for training Logistic Regression (LR) and eXtreme Gradient Boosting (XGBoost) algorithms. We used PPG signals from a publicly available dataset. To prevent overfitting, we divided the data into five folds for cross-validation. By ensuring that patients in the training set are not in the testing set, the model's performance can be evaluated on unseen subjects' data, providing a more accurate assessment of its generalization. Our model achieved an F1-Score and AUC of $58.8\pm20.0\%$ and $79.2\pm15.0\%$ for LR and $51.7\pm16.5\%$ and $73.6\pm17.0\%$ for XGBoost, respectively. Feature analysis suggested that PPG morphological features contains diabetes-related information alongside metadata. Our findings are within the same range reported in the literature, indicating that machine learning methods are promising for developing remote, non-invasive, and continuous measurement devices for detecting and preventing diabetes.
    摘要 diabetes 是一种流行的慢性疾病,对全球数百万人的健康产生了影响。为了预防和控制 diabetes,需要使用非侵入性的方法,但现有的血糖水平测量设备多为侵入性,不适合持续监测。在这里,我们提出了一种新的方法,利用非侵入性的光学折射 Plethysmography (PPG) 测测血糖水平。我们使用 PPG 信号和 metadata 进行分类,使用 Logistic Regression (LR) 和 eXtreme Gradient Boosting (XGBoost) 算法进行训练。我们使用公共可用的数据集。为了避免过拟合,我们将数据分成五个拟合集,进行十字验证。由于我们确保在训练集中没有测试集中的病人,因此我们可以评估模型在未见数据上的性能,从而提供更准确的评估。我们的模型达到了 F1 分数和 AUC 的 $58.8\pm20.0\%$ 和 $79.2\pm15.0\%$ для LR 和 $51.7\pm16.5\%$ 和 $73.6\pm17.0\%$ для XGBoost,分别。特征分析表明,PPG 形态特征包含糖尿病相关信息,并且与 metadata 相关。我们的发现与文献中的报告相同,表明机器学习方法在开发远程、非侵入性、持续测量设备方面具有投资前景。

LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs

  • paper_url: http://arxiv.org/abs/2308.01157
  • repo_url: https://github.com/interpretml/talktoebm
  • paper_authors: Benjamin J. Lengerich, Sebastian Bordt, Harsha Nori, Mark E. Nunnally, Yin Aphinyanaphongs, Manolis Kellis, Rich Caruana
  • for: 这篇论文旨在探讨大语言模型(LLMs)如何与可解释模型( interpretable models)结合使用,以提高数据科学中的常见任务自动化。
  • methods: 论文使用了层次推理方法,使得LLMs可以对复杂结果进行可解释性的分解,并且不需要整个模型都适应上下文中。
  • results: 论文通过多个医疗实例展示了LLMs在数据科学中的新能力,特别是在通用加分模型(GAMs)中。此外,论文还提供了一个开源的LLM-GAM接口 package $\texttt{TalkToEBM}$。
    Abstract We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package $\texttt{TalkToEBM}$ as an open-source LLM-GAM interface.
    摘要 我们显示大型语言模型(LLM)可以非常好地与可解释模型(GAM)结合,将复杂的结果拆分为单一图表表示的分量。通过运用层次推理的方法,LLM可以提供完整的模型级别概要而不需要整个模型适应上下文。这种方法使得LLM可以自动应用它们广泛的背景知识来进行资料科学中常见的任务,例如检测资料过程中的问题,描述问题的可能原因,以及提出修复方案来解决问题。我们使用了多个医疗保健例子来证明这些新的LLM功能的用 utility,特别是适用于GAM。最后,我们提出了一个名为 $\texttt{TalkToEBM}$ 的开源 LLM-GAM 界面。

A Transformer-based Prediction Method for Depth of Anesthesia During Target-controlled Infusion of Propofol and Remifentanil

  • paper_url: http://arxiv.org/abs/2308.01929
  • repo_url: https://github.com/heeeyk/transformer-doa-prediction
  • paper_authors: Yongkang He, Siyuan Peng, Mingjin Chen, Zhijing Yang, Yuanhui Chen
  • for: 预测麻醉效果的准确性是脊梁控制输液系统的关键。传统的PK-PD模型需要手动选择模型参数,在临床设置下可能具有挑战。现代深度学习方法可能只能捕捉总趋势,并不能预测麻醉 depth 的突然变化。
  • methods: 我们提议使用 transformer 网络来预测麻醉 depth,并使用 LSTM 和 GRN 网络来改进特征融合效率。我们还使用注意力机制来发现药物之间的交互关系。此外,我们使用标签分布平滑和重新权重损失来解决数据不均衡问题。
  • results: 我们的提议方法比传统 PK-PD 模型和先前的深度学习方法更高效地预测麻醉 depth,特别在突然的深度麻醉情况下。
    Abstract Accurately predicting anesthetic effects is essential for target-controlled infusion systems. The traditional (PK-PD) models for Bispectral index (BIS) prediction require manual selection of model parameters, which can be challenging in clinical settings. Recently proposed deep learning methods can only capture general trends and may not predict abrupt changes in BIS. To address these issues, we propose a transformer-based method for predicting the depth of anesthesia (DOA) using drug infusions of propofol and remifentanil. Our method employs long short-term memory (LSTM) and gate residual network (GRN) networks to improve the efficiency of feature fusion and applies an attention mechanism to discover the interactions between the drugs. We also use label distribution smoothing and reweighting losses to address data imbalance. Experimental results show that our proposed method outperforms traditional PK-PD models and previous deep learning methods, effectively predicting anesthetic depth under sudden and deep anesthesia conditions.
    摘要 Accurately predicting anesthetic effects is crucial for target-controlled infusion systems. Traditional PK-PD models for Bispectral index (BIS) prediction require manual selection of model parameters, which can be challenging in clinical settings. Recently proposed deep learning methods can only capture general trends and may not predict abrupt changes in BIS. To address these issues, we propose a transformer-based method for predicting the depth of anesthesia (DOA) using drug infusions of propofol and remifentanil. Our method employs long short-term memory (LSTM) and gate residual network (GRN) networks to improve the efficiency of feature fusion and applies an attention mechanism to discover the interactions between the drugs. We also use label distribution smoothing and reweighting losses to address data imbalance. Experimental results show that our proposed method outperforms traditional PK-PD models and previous deep learning methods, effectively predicting anesthetic depth under sudden and deep anesthesia conditions.Here's the text with some additional information about the Simplified Chinese translation:Simplified Chinese is a written version of Chinese that uses simplified characters and is commonly used in mainland China. In this translation, I have used Simplified Chinese characters to represent the text. However, it's worth noting that Traditional Chinese characters are also commonly used in Taiwan and other regions, and may be preferred in some contexts. Additionally, the translation is written in a formal, academic style, which may not be appropriate for all contexts. If you have any specific requests or preferences for the translation, please let me know and I will do my best to accommodate them.

DySTreSS: Dynamically Scaled Temperature in Self-Supervised Contrastive Learning

  • paper_url: http://arxiv.org/abs/2308.01140
  • repo_url: None
  • paper_authors: Siladittya Manna, Soumitri Chattopadhyay, Rakesh Dey, Saumik Bhattacharya, Umapada Pal
  • for: 提高自适应对SSL中InfoNCE损失的性能,研究InfoNCE损失中温度超参的影响。
  • methods: 提出了一种基于cosine相似性的温度扩展函数,并对uniformity和tolerance度量进行了分析,以便更好地优化分布在特征空间中。
  • results: 实验证明,提出的方法可以超过或与对比损失基本相同的SSL算法相当。认为该研究为后续对异性学习的研究提供了基础。
    Abstract In contemporary self-supervised contrastive algorithms like SimCLR, MoCo, etc., the task of balancing attraction between two semantically similar samples and repulsion between two samples from different classes is primarily affected by the presence of hard negative samples. While the InfoNCE loss has been shown to impose penalties based on hardness, the temperature hyper-parameter is the key to regulating the penalties and the trade-off between uniformity and tolerance. In this work, we focus our attention to improve the performance of InfoNCE loss in SSL by studying the effect of temperature hyper-parameter values. We propose a cosine similarity-dependent temperature scaling function to effectively optimize the distribution of the samples in the feature space. We further analyze the uniformity and tolerance metrics to investigate the optimal regions in the cosine similarity space for better optimization. Additionally, we offer a comprehensive examination of the behavior of local and global structures in the feature space throughout the pre-training phase, as the temperature varies. Experimental evidence shows that the proposed framework outperforms or is at par with the contrastive loss-based SSL algorithms. We believe our work (DySTreSS) on temperature scaling in SSL provides a foundation for future research in contrastive learning.
    摘要 现代自我超vised contrastive算法如SimCLR、MoCo等,任务是让两个semantically similar sample之间吸引,而两个不同类型的sample之间冲突。而这个任务主要受到强度负样本的影响。InfoNCE损失已经显示出对强度做出了罚款,但是温度超参数是控制这些罚款和吸引与耐受之间的折衔。在这项工作中,我们专注于改进InfoNCE损失在SSL中的性能,研究温度超参数的效果。我们提议一个cosine similarity-dependent温度缩放函数,可以有效地优化样本在特征空间的分布。我们进一步分析了uniformity和tolerance指标,以查找最佳的cosine similarity空间区域,以便更好地优化。此外,我们还对批处和全局结构在特征空间中的变化,随着温度的变化,进行了详细的分析。实验证明,我们提出的框架(DySTreSS)在SSL中的性能比或与基于contrastive损失的SSL算法相当。我们认为我们的工作在SSL中的温度缩放(DySTreSS)提供了一个基础 для未来的对冲学习的研究。

Dynamic Privacy Allocation for Locally Differentially Private Federated Learning with Composite Objectives

  • paper_url: http://arxiv.org/abs/2308.01139
  • repo_url: None
  • paper_authors: Jiaojiao Zhang, Dominik Fay, Mikael Johansson
  • for: 本研究提出了一种用于强Converter convex但可能不连续问题的本地差分隐私联合学习算法,保护每名工作者的梯度 Against an honest but curious server.
  • methods: 提出的算法在共享信息中添加人工噪声以确保隐私,并动态分配时变 noise variance来最小化优化错误的上限,以满足先前定义的隐私预算限制。
  • results: 数值结果表明,提出的算法在比较 estado-of-the-art方法的基础上具有更高的超越性和隐私保护能力,可以在一个适当的隐私预算下实现更高的优化质量。
    Abstract This paper proposes a locally differentially private federated learning algorithm for strongly convex but possibly nonsmooth problems that protects the gradients of each worker against an honest but curious server. The proposed algorithm adds artificial noise to the shared information to ensure privacy and dynamically allocates the time-varying noise variance to minimize an upper bound of the optimization error subject to a predefined privacy budget constraint. This allows for an arbitrarily large but finite number of iterations to achieve both privacy protection and utility up to a neighborhood of the optimal solution, removing the need for tuning the number of iterations. Numerical results show the superiority of the proposed algorithm over state-of-the-art methods.
    摘要 (本文提出了一种地域分ifferentially private的联合学习算法,用于保护每个工作者的梯度 against an honest but curious server。该算法在共享信息中添加了人工噪声以保障隐私,并在时间变化的噪声方差中动态分配时间变化的噪声方差以最小化优化误差的Upper bound,以达到一定的隐私预算限制。这使得可以在一个finite但countlessiterations中实现隐私保护和实用性,从而消除调整iterations的需求。numerical results show that the proposed algorithm outperforms existing methods。)

Can We Transfer Noise Patterns? A Multi-environment Spectrum Analysis Model Using Generated Cases

  • paper_url: http://arxiv.org/abs/2308.01138
  • repo_url: https://github.com/magnomic/cnst
  • paper_authors: Haiwen Du, Zheng Ju, Yu An, Honghui Du, Dongjie Zhu, Zhaoshuo Tian, Aonghus Lawlor, Ruihai Dong
  • for: 这个研究旨在提高在线水质测试中的 спектル分析系统,以检测污染物的类型和浓度,并帮助管理机关对污染事件作出迅速回应。
  • methods: 本研究提出了一个噪音传播模型,可以将噪音模式传递到不同环境中的标准水样本上,并将这些噪音模式转换到未知标准水样本上,以提高分析模型的可行性。
  • results: 实验结果显示,提案的方法可以对比基准系统(包括波лет残减、深度神经网络和生成模型)进行比较,在不同背景噪音下表现出良好的噪音传播能力。
    Abstract Spectrum analysis systems in online water quality testing are designed to detect types and concentrations of pollutants and enable regulatory agencies to respond promptly to pollution incidents. However, spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. To make the analysis model applicable to more environments, we propose a noise patterns transferring model, which takes the spectrum of standard water samples in different environments as cases and learns the differences in their noise patterns, thus enabling noise patterns to transfer to unknown samples. Unfortunately, the inevitable sample-level baseline noise makes the model unable to obtain the paired data that only differ in dataset-level environmental noise. To address the problem, we generate a sample-to-sample case-base to exclude the interference of sample-level noise on dataset-level noise learning, enhancing the system's learning performance. Experiments on spectral data with different background noises demonstrate the good noise-transferring ability of the proposed method against baseline systems ranging from wavelet denoising, deep neural networks, and generative models. From this research, we posit that our method can enhance the performance of DL models by generating high-quality cases. The source code is made publicly available online at https://github.com/Magnomic/CNST.
    摘要 (Simplified Chinese translation) spectral analysis systems in online water quality testing 是为检测污染物种和浓度,并帮助管理机构快速应对污染事件。然而,基于spectrum数据的测试设备在非实验室环境中会出现复杂的噪声模式。为了使分析模型适用于更多环境,我们提议一种噪声模式传递模型,该模型通过学习不同环境中标准水样的spectrum的差异,以便将噪声模式传递到未知样本。然而,不可避免的样本水平噪声使得模型无法获得只有数据aset级噪声的对照数据。为解决问题,我们生成了样本到样本的case基,以排除样本水平噪声对数据aset级噪声学习的干扰。实验结果表明,我们的方法可以减少对比基线方法(包括wavelet denoising、深度神经网络和生成模型)的干扰,提高系统学习性能。从这项研究中,我们认为我们的方法可以提高DL模型的性能,通过生成高质量的case。源代码已经在https://github.com/Magnomic/CNST上公开。

Multi-task learning for classification, segmentation, reconstruction, and detection on chest CT scans

  • paper_url: http://arxiv.org/abs/2308.01137
  • repo_url: None
  • paper_authors: Weronika Hryniewska-Guzik, Maria Kędzierska, Przemysław Biecek
  • for: 这个研究是为了提高肺癌和COVID-19的诊断和预后预测。
  • methods: 这个研究使用了多任务学习方法,把肺癌鉴别、分类、重建和检测作为多个任务,以提高医疗资料的抽象和普遍化。
  • results: 这个研究获得了肺癌鉴别、分类、重建和检测的良好结果,并且是首个在多任务解决方案中添加检测任务的研究。
    Abstract Lung cancer and covid-19 have one of the highest morbidity and mortality rates in the world. For physicians, the identification of lesions is difficult in the early stages of the disease and time-consuming. Therefore, multi-task learning is an approach to extracting important features, such as lesions, from small amounts of medical data because it learns to generalize better. We propose a novel multi-task framework for classification, segmentation, reconstruction, and detection. To the best of our knowledge, we are the first ones who added detection to the multi-task solution. Additionally, we checked the possibility of using two different backbones and different loss functions in the segmentation task.
    摘要 肺癌和 COVID-19 在全球具有非常高的疾病率和死亡率。为医生而言,在疾病的早期阶段标识病变很困难和时间消耗。因此,多任务学习是一种提取重要特征,如病变,从小量医疗数据中提取特征的方法,因为它可以更好地泛化。我们提议一种新的多任务框架,用于分类、 segmentation、重建和检测。根据我们所知,我们是第一个将检测添加到多任务解决方案中的人。此外,我们还检查了使用不同的背景和损失函数在 segmentation 任务中的可能性。

Unlearning Spurious Correlations in Chest X-ray Classification

  • paper_url: http://arxiv.org/abs/2308.01119
  • repo_url: None
  • paper_authors: Misgina Tsighe Hagos, Kathleen M. Curran, Brian Mac Namee
  • for: 这个论文的目的是为了提高医疗图像分类模型的可靠性和透明度,并解决跨数据源集合中的隐藏关系问题。
  • methods: 这个论文使用了一种基于 Covid-19 胸部X射线图像的深度学习模型,并使用了一种基于用户反馈的交互式解释学习(XBL)方法来解决隐藏关系问题。
  • results: 研究发现,通过使用 XBL 方法可以有效地消除隐藏关系,从而提高模型的准确率和透明度。
    Abstract Medical image classification models are frequently trained using training datasets derived from multiple data sources. While leveraging multiple data sources is crucial for achieving model generalization, it is important to acknowledge that the diverse nature of these sources inherently introduces unintended confounders and other challenges that can impact both model accuracy and transparency. A notable confounding factor in medical image classification, particularly in musculoskeletal image classification, is skeletal maturation-induced bone growth observed during adolescence. We train a deep learning model using a Covid-19 chest X-ray dataset and we showcase how this dataset can lead to spurious correlations due to unintended confounding regions. eXplanation Based Learning (XBL) is a deep learning approach that goes beyond interpretability by utilizing model explanations to interactively unlearn spurious correlations. This is achieved by integrating interactive user feedback, specifically feature annotations. In our study, we employed two non-demanding manual feedback mechanisms to implement an XBL-based approach for effectively eliminating these spurious correlations. Our results underscore the promising potential of XBL in constructing robust models even in the presence of confounding factors.
    摘要 医疗图像分类模型经常使用多种数据源进行训练。虽然利用多种数据源对模型泛化有益,但是需要注意这些来源的多样性自然会引入无意义的混合因素和其他挑战,这些挑战可能会影响模型准确性和透明度。在医疗图像分类中,特别是在骨骼成像中,生长induced by skeletal maturation during adolescence是一个明显的混合因素。我们使用COVID-19胸部X射线数据集训练深度学习模型,并显示这些数据集可能会导致不必要的相关性。基于解释的学习(XBL)是一种深度学习方法,它不仅提供了解释,还可以通过交互式的用户反馈来解除不必要的相关性。我们使用了两种不需要高度技术知识的手动反馈机制来实现XBL基于的方法。我们的结果表明XBL在混合因素存在时可以建立可靠的模型。

A Survey on Popularity Bias in Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.01118
  • repo_url: None
  • paper_authors: Anastasiia Klimashevskaia, Dietmar Jannach, Mehdi Elahi, Christoph Trattner
  • for: 本研究旨在探讨推荐系统中偏好媒体文件的问题,以及如何探测、评估和缓解这种偏好。
  • methods: 本文评论了现有的推荐算法是如何导致媒体文件的偏好问题,并提出了一些方法来探测、评估和缓解这种偏好。
  • results: 本文发现现有的推荐算法在很多情况下会导致媒体文件的偏好问题,这可能会导致推荐的价值受到限制,并可能在长期内产生不良的循环效应。
    Abstract Recommender systems help people find relevant content in a personalized way. One main promise of such systems is that they are able to increase the visibility of items in the long tail, i.e., the lesser-known items in a catalogue. Existing research, however, suggests that in many situations today's recommendation algorithms instead exhibit a popularity bias, meaning that they often focus on rather popular items in their recommendations. Such a bias may not only lead to limited value of the recommendations for consumers and providers in the short run, but it may also cause undesired reinforcement effects over time. In this paper, we discuss the potential reasons for popularity bias and we review existing approaches to detect, quantify and mitigate popularity bias in recommender systems. Our survey therefore includes both an overview of the computational metrics used in the literature as well as a review of the main technical approaches to reduce the bias. We furthermore critically discuss today's literature, where we observe that the research is almost entirely based on computational experiments and on certain assumptions regarding the practical effects of including long-tail items in the recommendations.
    摘要

Spatio-Temporal Branching for Motion Prediction using Motion Increments

  • paper_url: http://arxiv.org/abs/2308.01097
  • repo_url: https://github.com/jasonwang959/stpmp
  • paper_authors: Jiexin Wang, Yujie Zhou, Wenwen Qiang, Ying Ba, Bing Su, Ji-Rong Wen
  • for: 人体动作预测 (HMP) 是一个流行的研究领域,但是它仍然是一个具有杂乱和不规则性的任务,尤其是将来的姿势预测。传统方法通常使用手工设计的特征和机器学习技术,这些技术经常难以捕捉人体动作的复杂动态。
  • methods: 我们提出了一种新的空间temporal分支网络方法,使得各个节点之间的时间和空间关系得到了更好的利用。我们通过知识储存技术来实现域知识储存和跨频率学习。
  • results: 我们的方法在标准HMP测试集上进行评估,并与当前最佳方法进行比较。我们发现,我们的方法可以更好地降低噪声干扰,并提供更多的动作特征来Characterize人体动作。
    Abstract Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications, but it remains a challenging task due to the stochastic and aperiodic nature of future poses. Traditional methods rely on hand-crafted features and machine learning techniques, which often struggle to model the complex dynamics of human motion. Recent deep learning-based methods have achieved success by learning spatio-temporal representations of motion, but these models often overlook the reliability of motion data. Additionally, the temporal and spatial dependencies of skeleton nodes are distinct. The temporal relationship captures motion information over time, while the spatial relationship describes body structure and the relationships between different nodes. In this paper, we propose a novel spatio-temporal branching network using incremental information for HMP, which decouples the learning of temporal-domain and spatial-domain features, extracts more motion information, and achieves complementary cross-domain knowledge learning through knowledge distillation. Our approach effectively reduces noise interference and provides more expressive information for characterizing motion by separately extracting temporal and spatial features. We evaluate our approach on standard HMP benchmarks and outperform state-of-the-art methods in terms of prediction accuracy.
    摘要 人体运动预测(HMP)已经成为一个受欢迎的研究主题,因为它在多个应用领域有广泛的应用前景。然而,HMP仍然是一个具有抽象和不规则的未来姿势的挑战。传统的方法通常采用手动设计的特征和机器学习技术,经常难以模型人体运动的复杂动力学。现代深度学习基于的方法在学习人体运动的空间-时间表示方面取得了成功,但这些模型经常忽略人体运动数据的可靠性。此外,人体运动中的时间和空间关系不同。时间关系捕捉人体运动的时间信息,而空间关系描述人体结构和不同节点之间的关系。在这篇论文中,我们提出了一种新的空间-时间分支网络,使用增量信息来进行HMP,这种方法可以分离学习时间Domain和空间Domain的特征,提取更多的运动信息,并通过知识储存来实现补做cross-domain知识学习。我们的方法可以减少噪声干扰和提供更多的表达信息,以便更好地描述运动。我们在标准HMP测试benchmark上评估了我们的方法,并在预测精度方面超过了当前的状态艺术方法。

Multi-variable Hard Physical Constraints for Climate Model Downscaling

  • paper_url: http://arxiv.org/abs/2308.01868
  • repo_url: None
  • paper_authors: Jose González-Abad, Álex Hernández-García, Paula Harder, David Rolnick, José Manuel Gutiérrez
  • for: 实现地方气候变化的影响和演化,Global Climate Models (GCMs) 是主要工具。
  • methods: 使用深度学习统计下降法,将粗糙空间解析的气候变化转换为本地规模的气候场。
  • results: 这种方法可以确保气候变化的本地规模预测的准确性,但是通常只是单一气候变量的独立下降。这种研究探讨了这个问题的范围和解决方案。
    Abstract Global Climate Models (GCMs) are the primary tool to simulate climate evolution and assess the impacts of climate change. However, they often operate at a coarse spatial resolution that limits their accuracy in reproducing local-scale phenomena. Statistical downscaling methods leveraging deep learning offer a solution to this problem by approximating local-scale climate fields from coarse variables, thus enabling regional GCM projections. Typically, climate fields of different variables of interest are downscaled independently, resulting in violations of fundamental physical properties across interconnected variables. This study investigates the scope of this problem and, through an application on temperature, lays the foundation for a framework introducing multi-variable hard constraints that guarantees physical relationships between groups of downscaled climate variables.
    摘要

Homography Estimation in Complex Topological Scenes

  • paper_url: http://arxiv.org/abs/2308.01086
  • repo_url: None
  • paper_authors: Giacomo D’Amicantonio, Egor Bondarau, Peter H. N. De With
  • for: 这篇论文主要用于提出一种自动化摄像头卡利ibration过程,以便更好地处理环境变化和小型摄像头运动对摄像头卡利ibration的影响。
  • methods: 该方法使用了一个自定义的空间变换网络(STN)和一种新的topological损失函数,不需要任何相机设置的先验知识。
  • results: 实验表明,提议的方法可以提高IoU指标(相对于一个状态对照模型),在五个 sintetic dataset和2014年世界杯 dataset上提高IoU指标达12%。
    Abstract Surveillance videos and images are used for a broad set of applications, ranging from traffic analysis to crime detection. Extrinsic camera calibration data is important for most analysis applications. However, security cameras are susceptible to environmental conditions and small camera movements, resulting in a need for an automated re-calibration method that can account for these varying conditions. In this paper, we present an automated camera-calibration process leveraging a dictionary-based approach that does not require prior knowledge on any camera settings. The method consists of a custom implementation of a Spatial Transformer Network (STN) and a novel topological loss function. Experiments reveal that the proposed method improves the IoU metric by up to 12% w.r.t. a state-of-the-art model across five synthetic datasets and the World Cup 2014 dataset.
    摘要 侦查视频和图像可以用于各种应用程序,从交通分析到犯罪检测。外部摄像头准备数据非常重要,但安全摄像头受到环境因素和小型摄像头运动的影响,需要一种自动重新准确方法,能够考虑这些不同的条件。本文提出了一种基于词典方法的自动摄像头准确过程,不需要任何摄像头设置的先验知识。该方法包括一个自定义的空间变换网络(STN)和一个新的topological损失函数。实验表明,提议的方法可以提高IoU指标(相对于状态静态模型),在五个人工数据集和2014年世界杯数据集上提高IoU指标达12%。

Data-Driven Identification of Quadratic Symplectic Representations of Nonlinear Hamiltonian Systems

  • paper_url: http://arxiv.org/abs/2308.01084
  • repo_url: None
  • paper_authors: Süleyman Yildiz, Pawan Goyal, Thomas Bendokat, Peter Benner
  • for: 学习哈密顿系统使用数据
  • methods: 使用生成函数和自动编码器来学习 quadratic 动力学系统,保持哈密顿结构,并使用高阶变量系统来解决高维数据问题
  • results: 提出了一种基于生成函数和自动编码器的方法来学习哈密顿系统,并实现了系统的长期稳定性和低模型复杂性In English:
  • for: Learning Hamiltonian systems using data
  • methods: Using a generating function and a symplectic autoencoder to learn quadratic dynamical systems that preserve the Hamiltonian structure, and using high-order variable systems to solve high-dimensional data problems
  • results: Proposed a method based on generating functions and symplectic autoencoders to learn Hamiltonian systems, and achieved long-term stability and low model complexity of the system.
    Abstract We present a framework for learning Hamiltonian systems using data. This work is based on the lifting hypothesis, which posits that nonlinear Hamiltonian systems can be written as nonlinear systems with cubic Hamiltonians. By leveraging this, we obtain quadratic dynamics that are Hamiltonian in a transformed coordinate system. To that end, for given generalized position and momentum data, we propose a methodology to learn quadratic dynamical systems, enforcing the Hamiltonian structure in combination with a symplectic auto-encoder. The enforced Hamiltonian structure exhibits long-term stability of the system, while the cubic Hamiltonian function provides relatively low model complexity. For low-dimensional data, we determine a higher-order transformed coordinate system, whereas, for high-dimensional data, we find a lower-order coordinate system with the desired properties. We demonstrate the proposed methodology by means of both low-dimensional and high-dimensional nonlinear Hamiltonian systems.
    摘要 我们提出了一种基于数据学习哈密顿系统的框架。这项工作基于升降 гипотезы,即非线性哈密顿系统可以写作非线性系统的立方函数哈密顿。通过这种方式,我们得到了各自协调的quadratic动力学,并且在变换坐标系中强制实施哈密顿结构。为此,我们提议一种基于泛函和自动编码器的方法,用于学习哈密顿系统,并在数据的总体稳定性和立方函数哈密顿函数的模型简单性之间进行权衡。在低维数据时,我们可以找到更高阶的变换坐标系,而在高维数据时,我们可以找到一个较低阶的坐标系,满足需求。我们通过低维和高维非线性哈密顿系统的示例来证明这种方法的有效性。

A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards

  • paper_url: http://arxiv.org/abs/2308.01074
  • repo_url: https://github.com/JBFH-Dev/Keystroke-Datasets
  • paper_authors: Joshua Harrison, Ehsan Toreini, Maryam Mehrnezhad
  • for: 防止键盘攻击(keyboard attacks)
  • methods: 使用深度学习模型和smartphone搭载的 Mikrofon记录键盘输入
  • results: 95%的准确率(最高精度)和93%的准确率(via Zoom视频会议软件)
    Abstract With recent developments in deep learning, the ubiquity of micro-phones and the rise in online services via personal devices, acoustic side channel attacks present a greater threat to keyboards than ever. This paper presents a practical implementation of a state-of-the-art deep learning model in order to classify laptop keystrokes, using a smartphone integrated microphone. When trained on keystrokes recorded by a nearby phone, the classifier achieved an accuracy of 95%, the highest accuracy seen without the use of a language model. When trained on keystrokes recorded using the video-conferencing software Zoom, an accuracy of 93% was achieved, a new best for the medium. Our results prove the practicality of these side channel attacks via off-the-shelf equipment and algorithms. We discuss a series of mitigation methods to protect users against these series of attacks.
    摘要

Automatic Feature Engineering for Time Series Classification: Evaluation and Discussion

  • paper_url: http://arxiv.org/abs/2308.01071
  • repo_url: None
  • paper_authors: Aurélien Renault, Alexis Bondu, Vincent Lemaire, Dominique Gay
  • for: 本研究的目的是评估现有的特征工程工具在时间序列分类问题中的潜在预测性能。
  • methods: 本研究使用了11种特征工程工具,并与9种超参数化分类器结合使用,对112个时间序列数据集进行了10000多个学习实验。
  • results: 结果显示,基于特征的方法与当前状态艺术时间序列分类算法的准确率相当,因此应该在时间序列分类领域进一步考虑。
    Abstract Time Series Classification (TSC) has received much attention in the past two decades and is still a crucial and challenging problem in data science and knowledge engineering. Indeed, along with the increasing availability of time series data, many TSC algorithms have been suggested by the research community in the literature. Besides state-of-the-art methods based on similarity measures, intervals, shapelets, dictionaries, deep learning methods or hybrid ensemble methods, several tools for extracting unsupervised informative summary statistics, aka features, from time series have been designed in the recent years. Originally designed for descriptive analysis and visualization of time series with informative and interpretable features, very few of these feature engineering tools have been benchmarked for TSC problems and compared with state-of-the-art TSC algorithms in terms of predictive performance. In this article, we aim at filling this gap and propose a simple TSC process to evaluate the potential predictive performance of the feature sets obtained with existing feature engineering tools. Thus, we present an empirical study of 11 feature engineering tools branched with 9 supervised classifiers over 112 time series data sets. The analysis of the results of more than 10000 learning experiments indicate that feature-based methods perform as accurately as current state-of-the-art TSC algorithms, and thus should rightfully be considered further in the TSC literature.
    摘要

When Analytic Calculus Cracks AdaBoost Code

  • paper_url: http://arxiv.org/abs/2308.01070
  • repo_url: None
  • paper_authors: Jean-Marc Brossier, Olivier Lafitte, Lenny Réthoré
  • for: 这个论文主要是为了探讨AdaBoost算法的准确性和优化性。
  • methods: 本论文使用了多个弱分类器的组合方法来构建一个更强的分类器。
  • results: 研究发现,AdaBoost算法不是一个真正的优化算法,而是可以通过 truth table 直接计算出最终的分类结果。 compared with scikit-learn中实现的AdaBoost算法,本研究的结果表明了这种方法的准确性和效率。
    Abstract The principle of boosting in supervised learning involves combining multiple weak classifiers to obtain a stronger classifier. AdaBoost has the reputation to be a perfect example of this approach. We have previously shown that AdaBoost is not truly an optimization algorithm. This paper shows that AdaBoost is an algorithm in name only, as the resulting combination of weak classifiers can be explicitly calculated using a truth table. This study is carried out by considering a problem with two classes and is illustrated by the particular case of three binary classifiers and presents results in comparison with those from the implementation of AdaBoost algorithm of the Python library scikit-learn.
    摘要 “boosting”在超级vised学习中的原则是将多个弱分类器组合成一个更强的分类器。阿达Boost是这种方法的典型示例。我们之前已经证明了阿达Boost不是一个优化算法。这篇论文表明,阿达Boost并不是一个真正的算法,因为将弱分类器组合的结果可以由真理表来直接计算。本研究通过考虑两个类别问题,使用三个二进制分类器进行示例,并与scikit-learnPython库中实现的阿达Boost算法相比较。

Graph Anomaly Detection at Group Level: A Topology Pattern Enhanced Unsupervised Approach

  • paper_url: http://arxiv.org/abs/2308.01063
  • repo_url: None
  • paper_authors: Xing Ai, Jialong Zhou, Yulin Zhu, Gaolei Li, Tomasz P. Michalak, Xiapu Luo, Kai Zhou
    for:This paper focuses on the task of Group-level Graph Anomaly Detection (Gr-GAD), which aims to identify and localize anomalous groups within a graph.methods:The proposed framework uses a variant of Graph AutoEncoder (GAE) to locate anchor nodes that belong to potential anomaly groups, and then employs group sampling and Topology Pattern-based Graph Contrastive Learning (TPGCL) to identify and localize anomaly groups.results:The experimental results on both real-world and synthetic datasets demonstrate that the proposed framework shows superior performance in identifying and localizing anomaly groups, highlighting it as a promising solution for Gr-GAD.
    Abstract Graph anomaly detection (GAD) has achieved success and has been widely applied in various domains, such as fraud detection, cybersecurity, finance security, and biochemistry. However, existing graph anomaly detection algorithms focus on distinguishing individual entities (nodes or graphs) and overlook the possibility of anomalous groups within the graph. To address this limitation, this paper introduces a novel unsupervised framework for a new task called Group-level Graph Anomaly Detection (Gr-GAD). The proposed framework first employs a variant of Graph AutoEncoder (GAE) to locate anchor nodes that belong to potential anomaly groups by capturing long-range inconsistencies. Subsequently, group sampling is employed to sample candidate groups, which are then fed into the proposed Topology Pattern-based Graph Contrastive Learning (TPGCL) method. TPGCL utilizes the topology patterns of groups as clues to generate embeddings for each candidate group and thus distinct anomaly groups. The experimental results on both real-world and synthetic datasets demonstrate that the proposed framework shows superior performance in identifying and localizing anomaly groups, highlighting it as a promising solution for Gr-GAD. Datasets and codes of the proposed framework are at the github repository https://anonymous.4open.science/r/Topology-Pattern-Enhanced-Unsupervised-Group-level-Graph-Anomaly-Detection.
    摘要 “图像异常检测(GAD)已经取得成功并广泛应用于不同领域,如诈骗检测、网络安全、金融安全和生物化学。然而,现有的图像异常检测算法偏向异常个体(节点或图),忽略图中异常群体的可能性。为了解决这种限制,本文提出了一种新的无监督框架,称为群体级图像异常检测(Gr-GAD)。提案的框架首先使用变体的图自编码器(GAE)来确定异常群体的担 anchor节点,并capture长距离不一致性。然后,群体采样被使用来采样候选组,并将其feed到提案的图Pattern-based Graph Contrastive Learning(TPGCL)方法。TPGCL利用组 topology patterns作为特征来生成每个候选组的嵌入,从而分辨细节异常群体。实验结果表明,提案的框架在真实世界和 sintetic 数据集上具有优秀的异常组检测和定位能力,这得出了一个有前途的解决方案。数据集和代码可以在 GitHub 仓库 https://anonymous.4open.science/r/Topology-Pattern-Enhanced-Unsupervised-Group-level-Graph-Anomaly-Detection 中找到。”Note: The translation is in Simplified Chinese, which is the most widely used standard for Chinese writing. The translation is based on the official translation of the text into Simplified Chinese, and the word order and grammar may be different from the original text in Traditional Chinese.

Simulation-based inference using surjective sequential neural likelihood estimation

  • paper_url: http://arxiv.org/abs/2308.01054
  • repo_url: https://github.com/dirmeier/ssnl
  • paper_authors: Simon Dirmeier, Carlo Albert, Fernando Perez-Cruz
  • for: 该论文主要用于 simulation-based inference 领域,特别是在模型评估不可靠,只有一个可生成假数据的 simulator 存在的情况下。
  • methods: 该方法使用Surjective Sequential Neural Likelihood(SSNL)来实现 simulation-based inference,包括一个维度减少的射影正常分布模型,用于代替可评估的几何函数。可以使用 Markov chain Monte Carlo 方法或变量插入法进行 Bayesian 推断。
  • results: 作者在多种实验中证明了 SSNL 比现有的likelihood-based方法在高维数据集上表现更好,特别是在一个具有astrophysics 应用的实际例子中,模拟太阳magnetic field strength 的 solar dynamo 模型。
    Abstract We present Surjective Sequential Neural Likelihood (SSNL) estimation, a novel method for simulation-based inference in models where the evaluation of the likelihood function is not tractable and only a simulator that can generate synthetic data is available. SSNL fits a dimensionality-reducing surjective normalizing flow model and uses it as a surrogate likelihood function which allows for conventional Bayesian inference using either Markov chain Monte Carlo methods or variational inference. By embedding the data in a low-dimensional space, SSNL solves several issues previous likelihood-based methods had when applied to high-dimensional data sets that, for instance, contain non-informative data dimensions or lie along a lower-dimensional manifold. We evaluate SSNL on a wide variety of experiments and show that it generally outperforms contemporary methods used in simulation-based inference, for instance, on a challenging real-world example from astrophysics which models the magnetic field strength of the sun using a solar dynamo model.
    摘要 我们介绍了一种新的Sequential Neural Likelihood(SSNL)估计方法,用于基于模拟的推断,其中只有一个可以生成假数据的 simulate器,但是evaluate the likelihood function的计算不可 tractable。SSNL采用了一种维度减少的射影正则化模型,并使其作为媒介假概率函数,以便使用Conventional Bayesian inference方法,如Markov chain Monte Carlo方法或variational inference。通过嵌入数据到低维度空间中,SSNL解决了之前基于概率函数的方法在高维度数据集中遇到的许多问题,例如非指导性数据维度或在低维度抽象上的扩展。我们在多种实验中评估了SSNL,并发现它通常比当前的基于模拟的推断方法高效,例如在一个实际的astrophysics例子中,模拟太阳magnetic field strength的solar dynamo模型。

A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles’ Riskiness

  • paper_url: http://arxiv.org/abs/2308.01050
  • repo_url: None
  • paper_authors: Alessandro Zanardi, Andrea Censi, Margherita Atzei, Luigi Di Lillo, Emilio Frazzoli
  • For: This paper aims to provide a data-driven framework for comparing the risk of different autonomous vehicles (AVs) in various operational design domains (ODDs).* Methods: The paper uses counterfactual simulations of “misbehaving” road users to assess the risk of AVs. The concept of counterfactual safety margin is introduced, which represents the minimum deviation from normal behavior that could lead to a collision. The methodology is applicable even when the AV’s behavioral policy is unknown.* Results: The experimental results demonstrate the correlation between the safety margin, the driving policy quality, and the ODD, shedding light on the relative risk associated with different AV providers. The work contributes to AV safety assessment and addresses legislative and insurance concerns surrounding this emerging technology.Here is the same information in Simplified Chinese text:* For: 这篇论文旨在提供一种数据驱动的自动驾驶车(AV)在各种操作设计域(ODD)中的风险比较框架。* Methods: 论文使用“不良”道路用户的 counterfactual 模拟来评估 AV 的风险。该概念表示最小偏离正常行为的行为可能导致事故的最小差异。这种方法可以在 AV 行为策略不明确时进行应用。* Results: 实验结果显示了安全准备度、驾驶策略质量和 ODD 之间的相关性,揭示了不同 AV 提供商的相对风险水平。这项工作对自动驾驶车安全评估做出了贡献,并解决了立法和保险方面对这种新技术的关注。
    Abstract Autonomous Vehicles (AVs) have the potential to provide numerous societal benefits, such as decreased road accidents and increased overall transportation efficiency. However, quantifying the risk associated with AVs is challenging due to the lack of historical data and the rapidly evolving technology. This paper presents a data-driven framework for comparing the risk of different AVs' behaviors in various operational design domains (ODDs), based on counterfactual simulations of "misbehaving" road users. We introduce the concept of counterfactual safety margin, which represents the minimum deviation from normal behavior that could lead to a collision. This concept helps to find the most critical scenarios but also to assess the frequency and severity of risk of AVs. We show that the proposed methodology is applicable even when the AV's behavioral policy is unknown -- through worst- and best-case analyses -- making the method useful also to external third-party risk assessors. Our experimental results demonstrate the correlation between the safety margin, the driving policy quality, and the ODD shedding light on the relative risk associated with different AV providers. This work contributes to AV safety assessment and aids in addressing legislative and insurance concerns surrounding this emerging technology.
    摘要 Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the original text and may not capture all the nuances of the original language.

Are Easy Data Easy (for K-Means)

  • paper_url: http://arxiv.org/abs/2308.01926
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Mieczysław A. Kłopotek
  • for: 这个论文 investigate $k$-means 算法是否可以正确地恢复良好分割的群集。
  • methods: 本论文使用了直接从通用定义cluster中得到的分割性定义,并 derivated conditions for a special case of well-separated clusters such that the global minimum of $k$-means cost function coincides with the well-separatedness。
  • results: 实验表明,$k$-means 算法不能correctly recover well-separated clusters。一种新的算法是 $k$-means++ via repeated {sub}sampling when choosing a seed,该算法在这个任务上表现更好。
    Abstract This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm. The concept of well-separatedness used here is derived directly from the common definition of clusters, which imposes an interplay between the requirements of within-cluster-homogenicity and between-clusters-diversity. Conditions are derived for a special case of well-separated clusters such that the global minimum of $k$-means cost function coincides with the well-separatedness. An experimental investigation is performed to find out whether or no various brands of $k$-means are actually capable of discovering well separated clusters. It turns out that they are not. A new algorithm is proposed that is a variation of $k$-means++ via repeated {sub}sampling when choosing a seed. The new algorithm outperforms four other algorithms from $k$-means family on the task.
    摘要

Evaluation of network-guided random forest for disease gene discovery

  • paper_url: http://arxiv.org/abs/2308.01323
  • repo_url: None
  • paper_authors: Jianchang Hu, Silke Szymczak
  • for: 本研究旨在探讨Random Forest(RF)算法在基因表达数据分析中是否可以利用基因网络信息提高疾病预测性能。
  • methods: 研究者使用了一种基于网络信息的采样概率方法,将网络信息纳入RF构建中。
  • results: 研究结果表明,基于网络信息的RF不能提高疾病预测性能,但是在疾病基因发现方面,如果疾病基因组成模块, THEN 基于网络信息的RF可以更准确地预测疾病基因。此外,当疾病状况与基因在给定网络中无关时,使用网络信息时可能会产生干扰选择结果,特别是对于Hub基因。
    Abstract Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. Our results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes.
    摘要 生物网络信息被认为对疾病模块和代谢通路的识别有利,但在标准随机森林(RF)算法中没有直接使用生物网络信息进行基因表达数据分析。我们研究了基于网络信息的随机森林(Network-guided RF),其中网络信息被概括为预测变量的抽样概率,并在随机森林的构建中使用。我们的结果表明,与标准RF相比,网络指导RF不提供更好的疾病预测。在疾病基因发现方面,如果疾病基因组成模块, тогда网络指导RF能够更准确地识别这些模块。此外,当疾病状况与生物网络中的基因独立时,使用网络信息可能会导致假阳性基因选择结果,特别是对于中心基因。我们对TCGA breast cancer数据集进行了empirical分析,并证明了网络指导RF可以识别PGR相关的基因路径,从而得到更好地连接的模块。

Computing the Distance between unbalanced Distributions – The flat Metric

  • paper_url: http://arxiv.org/abs/2308.01039
  • repo_url: https://github.com/hs42/flat_metric
  • paper_authors: Henri Schmidt, Christian Düll
  • for: Computes the flat metric in any dimension for unbalanced optimal transport tasks and data analysis.
  • methods: Uses a neural network to determine an optimal test function for computing the distance between two given measures.
  • results: Achieves comparability of pairwise computed distances from independently trained networks and shows high quality output in experiments and simulations.
    Abstract We provide an implementation to compute the flat metric in any dimension. The flat metric, also called dual bounded Lipschitz distance, generalizes the well-known Wasserstein distance W1 to the case that the distributions are of unequal total mass. This is of particular interest for unbalanced optimal transport tasks and for the analysis of data distributions where the sample size is important or normalization is not possible. The core of the method is based on a neural network to determine on optimal test function realizing the distance between two given measures. Special focus was put on achieving comparability of pairwise computed distances from independently trained networks. We tested the quality of the output in several experiments where ground truth was available as well as with simulated data.
    摘要 我们提供了一个实现方式来计算任意维度的扁平度量。扁平度量,也称为双对称LIPschitz距离,将 Wasserstein距离W1扩展到分布是不均匀的情况下。这对不均匀优化运输和资料分布分析中具有特别的 interess。我们的方法靠在一个神经网络来决定两个给出的度量之间的距离。我们特别强调了独立训练的神经网络之间的比较可靠性。我们在一些实验中使用了实际的测试数据和伪实验数据进行测试。Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.

Three Factors to Improve Out-of-Distribution Detection

  • paper_url: http://arxiv.org/abs/2308.01030
  • repo_url: None
  • paper_authors: Hyunjun Choi, JaeHo Chung, Hawook Jeong, Jin Young Choi
  • for: 提高Out-of-distribution(OOD)检测和分类精度之间的质量负担。
  • methods: 使用辅助数据作为异常数据进行细化,并具有自知识整合、半硬样本选择和新的监督对比学习等三大贡献。
  • results: 三大贡献的结合,同时提高了OOD检测性能和分类精度,并且与之前的方法相比,提高了OOD检测性能和分类精度。
    Abstract In the problem of out-of-distribution (OOD) detection, the usage of auxiliary data as outlier data for fine-tuning has demonstrated encouraging performance. However, previous methods have suffered from a trade-off between classification accuracy (ACC) and OOD detection performance (AUROC, FPR, AUPR). To improve this trade-off, we make three contributions: (i) Incorporating a self-knowledge distillation loss can enhance the accuracy of the network; (ii) Sampling semi-hard outlier data for training can improve OOD detection performance with minimal impact on accuracy; (iii) The introduction of our novel supervised contrastive learning can simultaneously improve OOD detection performance and the accuracy of the network. By incorporating all three factors, our approach enhances both accuracy and OOD detection performance by addressing the trade-off between classification and OOD detection. Our method achieves improvements over previous approaches in both performance metrics.
    摘要 在 OUT-OF-DISTRIBUTION(OOD)探测问题中,使用辅助数据作为精度数据进行练习显示了激励人的性能。然而,先前的方法受到了准确率(ACC)和OOD探测性能(AUROC、FPR、AUPR)的负面影响。为了改进这种负面影响,我们提出了三个贡献:(i) incorporating self-knowledge distillation loss可以提高网络的准确率;(ii) 使用 semi-hard outlier 数据进行训练可以提高 OOD 探测性能,而不会影响准确率;(iii) 我们提出的新的超级vised contrastive learning可以同时提高 OOD 探测性能和网络的准确率。通过结合这三个因素,我们的方法可以同时改进准确率和 OOD 探测性能,解决了准确率和 OOD 探测性能之间的负面影响。我们的方法在两个性能指标上都取得了改进。

Maximizing Success Rate of Payment Routing using Non-stationary Bandits

  • paper_url: http://arxiv.org/abs/2308.01028
  • repo_url: None
  • paper_authors: Aayush Chaudhary, Abhinav Rai, Abhishek Gupta
  • for: 该论文是为了设计和部署非站立式多臂瑞伯投注策略来确定近似优化的支付路由策略,以优化支付系统性能和安全性。
  • methods: 该论文使用了一种新型的射线基实现来优化非站立式多臂瑞伯投注策略,以实现支付系统的扩展和缩放。具体来说,该论文使用了一种基于射线的 Routing Service 架构,并对多种非站立式多臂瑞伯投注策略进行了评估和比较。
  • results: live 实验结果显示,非站立式多臂瑞伯投注策略可以在一个月内提高支付交易的成功率,相比传统规则基于的方法,提高了0.92%。
    Abstract This paper discusses the system architecture design and deployment of non-stationary multi-armed bandit approaches to determine a near-optimal payment routing policy based on the recent history of transactions. We propose a Routing Service architecture using a novel Ray-based implementation for optimally scaling bandit-based payment routing to over 10000 transactions per second, adhering to the system design requirements and ecosystem constraints with Payment Card Industry Data Security Standard (PCI DSS). We first evaluate the effectiveness of multiple bandit-based payment routing algorithms on a custom simulator to benchmark multiple non-stationary bandit approaches and identify the best hyperparameters. We then conducted live experiments on the payment transaction system on a fantasy sports platform Dream11. In the live experiments, we demonstrated that our non-stationary bandit-based algorithm consistently improves the success rate of transactions by 0.92\% compared to the traditional rule-based methods over one month.
    摘要 First, we evaluated the effectiveness of multiple bandit-based payment routing algorithms on a custom simulator to benchmark different non-stationary bandit approaches and identify the best hyperparameters. We then conducted live experiments on a real-world payment transaction system on a fantasy sports platform Dream11, demonstrating that our non-stationary bandit-based algorithm consistently improves the success rate of transactions by 0.92% compared to traditional rule-based methods over a one-month period.

Enhancing Representation Learning for Periodic Time Series with Floss: A Frequency Domain Regularization Approach

  • paper_url: http://arxiv.org/abs/2308.01011
  • repo_url: https://github.com/agustdd/floss
  • paper_authors: Chunwei Yang, Xiaoxu Chen, Lijun Sun, Hongyu Yang, Yuankai Wu
  • for: 这篇论文旨在提出一种无监督的方法,以帮助深度学习模型更好地处理具有周期性或假周期性特征的时间序列数据。
  • methods: 这篇论文提出了一种名为Floss的方法,它可以自动在频域中调整学习的表现。Floss方法首先自动检测时间序列中的主要周期性,然后使用周期性移动和频谱浓度相似度度量来学习有意义的表现。
  • results: 在实验中,Floss方法能够自动发现时间序列中的周期性,并且与其他深度学习模型相比,提高了时间序列分类、预测和偏常检测等任务的表现。
    Abstract Time series analysis is a fundamental task in various application domains, and deep learning approaches have demonstrated remarkable performance in this area. However, many real-world time series data exhibit significant periodic or quasi-periodic dynamics that are often not adequately captured by existing deep learning-based solutions. This results in an incomplete representation of the underlying dynamic behaviors of interest. To address this gap, we propose an unsupervised method called Floss that automatically regularizes learned representations in the frequency domain. The Floss method first automatically detects major periodicities from the time series. It then employs periodic shift and spectral density similarity measures to learn meaningful representations with periodic consistency. In addition, Floss can be easily incorporated into both supervised, semi-supervised, and unsupervised learning frameworks. We conduct extensive experiments on common time series classification, forecasting, and anomaly detection tasks to demonstrate the effectiveness of Floss. We incorporate Floss into several representative deep learning solutions to justify our design choices and demonstrate that it is capable of automatically discovering periodic dynamics and improving state-of-the-art deep learning models.
    摘要 时序分析是多种应用领域的基础任务,深度学习方法在这个领域表现了惊人的表现。然而,许多实际世界时序数据具有重要的周期或准周期动态特征,这些特征经常不被现有的深度学习基础方法完全捕捉。这导致了时序动态行为的下面表示不够完整。为解决这个差距,我们提出了一种无监督的方法 called Floss,该方法可以自动在频率域中规范学习的表示。Floss方法首先自动检测时序数据中的主要周期性。然后,它使用周期偏移和频率分布相似度度量来学习具有周期一致性的有意义表示。此外,Floss可以轻松地integrated到supervised、semi-supervised和无监督学习框架中。我们在常见的时序分类、预测和异常检测任务中进行了广泛的实验,以证明Floss的有效性。我们将Floss incorporated into 多种代表性的深度学习解决方案,以证明我们的设计选择是合理的,并证明Floss可以自动发现周期动态和提高当前最佳深度学习模型。

MDT3D: Multi-Dataset Training for LiDAR 3D Object Detection Generalization

  • paper_url: http://arxiv.org/abs/2308.01000
  • repo_url: None
  • paper_authors: Louis Soum-Fontez, Jean-Emmanuel Deschaud, François Goulette
    for:这个论文的目标是提高3D物体检测模型在新环境和不同感知器配置下的稳定性。methods:该方法使用多个注释源数据集进行共同训练,并使用新的标签映射技术来填充标签空白。此外,该方法还提出了一种混合数据集的训练方法和跨数据集对象插入 augmetnation 技术。results:研究表明,该方法可以提高不同类型的3D物体检测模型在新环境下的性能。 Code and additional results will be publicly available on GitHub for further reference.
    Abstract Supervised 3D Object Detection models have been displaying increasingly better performance in single-domain cases where the training data comes from the same environment and sensor as the testing data. However, in real-world scenarios data from the target domain may not be available for finetuning or for domain adaptation methods. Indeed, 3D object detection models trained on a source dataset with a specific point distribution have shown difficulties in generalizing to unseen datasets. Therefore, we decided to leverage the information available from several annotated source datasets with our Multi-Dataset Training for 3D Object Detection (MDT3D) method to increase the robustness of 3D object detection models when tested in a new environment with a different sensor configuration. To tackle the labelling gap between datasets, we used a new label mapping based on coarse labels. Furthermore, we show how we managed the mix of datasets during training and finally introduce a new cross-dataset augmentation method: cross-dataset object injection. We demonstrate that this training paradigm shows improvements for different types of 3D object detection models. The source code and additional results for this research project will be publicly available on GitHub for interested parties to access and utilize: https://github.com/LouisSF/MDT3D
    摘要 受监督3D物体检测模型在单个频道情况下的性能有所提高,但在实际应用场景中,测试数据的频道可能不同于训练数据的频道。实际上,通过特定点分布训练的3D物体检测模型在未看过的数据集上generalization能力很差。因此,我们使用多个注释源数据集的多数据集训练方法(MDT3D)来增强3D物体检测模型在新环境中的Robustness。为了解决不同数据集之间的标签差距,我们使用了新的标签映射基于粗略标签。此外,我们详细介绍了在训练过程中如何处理多个数据集的混合,以及一种新的跨数据集增强方法:跨数据集物体注入。我们展示了这种训练方法在不同类型的3D物体检测模型中的改进。许多相关结果和代码将在GitHub上公开,以便有兴趣的人可以访问和利用:https://github.com/LouisSF/MDT3D。

Exploiting Synthetic Data for Data Imbalance Problems: Baselines from a Data Perspective

  • paper_url: http://arxiv.org/abs/2308.00994
  • repo_url: None
  • paper_authors: Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh
  • for: Addressing data imbalance problems in deep neural networks to prevent biased predictions and potential ethical and social consequences.
  • methods: Utilizes synthetic data as a preliminary step before employing task-specific algorithms to address data imbalance problems.
  • results: Surpasses the performance of existing task-specific methods on several datasets, including CIFAR100-LT, ImageNet100-LT, UTKFace, and Waterbird.
    Abstract We live in a vast ocean of data, and deep neural networks are no exception to this. However, this data exhibits an inherent phenomenon of imbalance. This imbalance poses a risk of deep neural networks producing biased predictions, leading to potentially severe ethical and social consequences. To address these challenges, we believe that the use of generative models is a promising approach for comprehending tasks, given the remarkable advancements demonstrated by recent diffusion models in generating high-quality images. In this work, we propose a simple yet effective baseline, SYNAuG, that utilizes synthetic data as a preliminary step before employing task-specific algorithms to address data imbalance problems. This straightforward approach yields impressive performance on datasets such as CIFAR100-LT, ImageNet100-LT, UTKFace, and Waterbird, surpassing the performance of existing task-specific methods. While we do not claim that our approach serves as a complete solution to the problem of data imbalance, we argue that supplementing the existing data with synthetic data proves to be an effective and crucial preliminary step in addressing data imbalance concerns.
    摘要 我们生活在一个庞大的数据海洋中,而深度神经网络也不例外。然而,这些数据具有内生的不均衡现象,这可能导致深度神经网络预测结果受到偏见,从而导致严重的伦理和社会后果。为了解决这些挑战,我们认为使用生成模型是一个有前途的方法,因为最近的扩散模型在生成高质量图像方面已经展现出了卓越的成果。在这个工作中,我们提出了一个简单 yet有效的基eline,即SYNAuG,它利用生成的数据作为先决步骤,然后使用任务特定的算法来解决数据不均衡问题。这种简单的方法在CIFAR100-LT、ImageNet100-LT、UTKFace和Waterbird等数据集上达到了比较出色的性能,超过了现有的任务特定方法的性能。虽然我们不assert我们的方法是数据不均衡问题的完整解决方案,但我们 argue that在使用现有数据之前,通过生成数据来增加数据量是一个有效和关键的预liminary步骤。

Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.00989
  • repo_url: None
  • paper_authors: Haorui Li, Jiaqi Liang, Linjing Li, Daniel Zeng
  • for: 这个论文旨在解决复杂任务时 Composite reinforcement learning 中的强化学习问题。
  • methods: 论文提出了一种名为 Wasserstein Diversity-Enriched Regularizer (WDER) 的新任务不受限制的正则化方法,可以轻松地与现有方法结合使用,以提高性能。
  • results: 实验结果表明,我们的 WDER 可以提高性能和样本效率,与优化参数无关,这表明了我们的方法的可应用性和稳定性。
    Abstract Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks.Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies.However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further.Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.
    摘要 In this paper, we propose a new task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which increases the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The WDER can be easily incorporated into the loss function of existing methods to improve their performance further.Experimental results show that our WDER improves performance and sample efficiency compared to prior work, without modifying hyperparameters. This indicates the applicability and robustness of the WDER.

Learning Regionalization within a Differentiable High-Resolution Hydrological Model using Accurate Spatial Cost Gradients

  • paper_url: http://arxiv.org/abs/2308.02040
  • repo_url: None
  • paper_authors: Ngo Nghi Truyen Huynh, Pierre-André Garambois, François Colleoni, Benjamin Renard, Hélène Roux, Julie Demargne, Pierre Javelle
  • for: 这个论文主要是用来解决无数据的流域中 hydrological parameter 的估计问题,并在各个流域中寻找一个转换函数来将物理描述符与概念模型参数进行量化关系。
  • methods: 这篇论文提出了一种 Hybrid Data Assimilation and Parameter Regionalization (HDA-PR) 方法,它将 learnable regionalization mappings integrate 到了一个可微的ydrological model 中,以便在各个流域中使用不同类型的数据进行数据协同填充。
  • results: 在南法两个暴雨灾害地区进行了高分辨率、小时间和千米级别的地理模型运算,并得到了很好的 regionalization 性能,Nash-Sutcliffe 效率 (NSE) 分布在0.52-0.78之间,相比基线模型 calibrated WITH lumped parameters 提高了0.57的 NSE 分布。
    Abstract Estimating spatially distributed hydrological parameters in ungauged catchments poses a challenging regionalization problem and requires imposing spatial constraints given the sparsity of discharge data. A possible approach is to search for a transfer function that quantitatively relates physical descriptors to conceptual model parameters. This paper introduces a Hybrid Data Assimilation and Parameter Regionalization (HDA-PR) approach incorporating learnable regionalization mappings, based on either multivariate regressions or neural networks, into a differentiable hydrological model. It enables the exploitation of heterogeneous datasets across extensive spatio-temporal computational domains within a high-dimensional regionalization context, using accurate adjoint-based gradients. The inverse problem is tackled with a multi-gauge calibration cost function accounting for information from multiple observation sites. HDA-PR was tested on high-resolution, hourly and kilometric regional modeling of two flash-flood-prone areas located in the South of France. In both study areas, the median Nash-Sutcliffe efficiency (NSE) scores ranged from 0.52 to 0.78 at pseudo-ungauged sites over calibration and validation periods. These results highlight a strong regionalization performance of HDA-PR, improving NSE by up to 0.57 compared to the baseline model calibrated with lumped parameters, and achieving a performance comparable to the reference solution obtained with local uniform calibration (median NSE from 0.59 to 0.79). Multiple evaluation metrics based on flood-oriented hydrological signatures are also employed to assess the accuracy and robustness of the approach. The regionalization method is amenable to state-parameter correction from multi-source data over a range of time scales needed for operational data assimilation, and it is adaptable to other differentiable geophysical models.
    摘要 估计分布式ydrological参数在无测站catchments中存在一个挑战性的区域化问题,需要在缺乏流量数据的情况下强制 spatial constraints。一种可能的方法是寻找一个转移函数,该函数可以量化物理描述符和概念模型参数之间的关系。这篇文章介绍了一种Hybrid Data Assimilation and Parameter Regionalization(HDA-PR)方法,该方法通过将多变量回归或神经网络作为学习可 Regionalization mappings incorporated into a differentiable hydrological model。这种方法可以在广泛的空间-时间计算Domain中利用高精度的后向梯度,并在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration。在多个观测站信息的基础上进行多观测站calibration

Certified Multi-Fidelity Zeroth-Order Optimization

  • paper_url: http://arxiv.org/abs/2308.00978
  • repo_url: None
  • paper_authors: Étienne de Montbrun, Sébastien Gerchinovitz
  • for: 本文研究了多级别预测优化问题,特别是如何使用不同级别的预测方法来优化一个函数 $f$,以优化函数 $f$ 的评估成本。
  • methods: 本文提出了一种证明过的算法,称为证明加法多级别预测优化算法(MFDOO)。该算法在评估环境中进行了一系列的游戏性评估,以确定最佳的预测级别和评估方法。
  • results: 本文提出了一种基于证明的多级别预测优化算法,并提供了一个基于 Lipschitz 函数 $f$ 的成本复杂度上下文。此外,文章还证明了一个 $f$-dependent 下界,表明该算法在任何 Lipschitz 函数 $f$ 下都具有近似优化成本复杂度。最后,文章还Addresses 了随机评估的特殊情况作为直接例子。
    Abstract We consider the problem of multi-fidelity zeroth-order optimization, where one can evaluate a function $f$ at various approximation levels (of varying costs), and the goal is to optimize $f$ with the cheapest evaluations possible. In this paper, we study \emph{certified} algorithms, which are additionally required to output a data-driven upper bound on the optimization error. We first formalize the problem in terms of a min-max game between an algorithm and an evaluation environment. We then propose a certified variant of the MFDOO algorithm and derive a bound on its cost complexity for any Lipschitz function $f$. We also prove an $f$-dependent lower bound showing that this algorithm has a near-optimal cost complexity. We close the paper by addressing the special case of noisy (stochastic) evaluations as a direct example.
    摘要 我们考虑多项误差零项优化问题,其中可以评估函数 $f$ 在不同的推导水平(具有不同的成本)上,并且目标是将 $f$ 优化到最低成本下。在这篇文章中,我们研究了认证算法,这些算法还需要输出一个基于数据的Upper bound 估计错误。我们首先将问题正式化为一个测验环境和算法之间的最小最大游戏。然后,我们提出了认证版本的 MFDOO 算法,并且derive了这个算法的成本复杂度的上限,这上限是适用于任何 Lipschitz 函数 $f$。我们还证明了 $f$ 相依的下界,证明这个算法在任何情况下都具有近乎最佳的成本复杂度。最后,我们处理了随机(测量)评估的特例,作为直接的例子。

A new approach for evaluating internal cluster validation indices

  • paper_url: http://arxiv.org/abs/2308.03894
  • repo_url: None
  • paper_authors: Zoltán Botta-Dukát
  • for: 本研究旨在透过内部验证指标选择最佳表现的算法和参数设置,而不使用任何外部信息。
  • methods: 本研究提出了多种内部验证指标,并评估了它们在不同数据集上的表现。
  • results: 本研究提出了一种新的验证方法,并评估了其优劣点。
    Abstract A vast number of different methods are available for unsupervised classification. Since no algorithm and parameter setting performs best in all types of data, there is a need for cluster validation to select the actually best-performing algorithm. Several indices were proposed for this purpose without using any additional (external) information. These internal validation indices can be evaluated by applying them to classifications of datasets with a known cluster structure. Evaluation approaches differ in how they use the information on the ground-truth classification. This paper reviews these approaches, considering their advantages and disadvantages, and then suggests a new approach.
    摘要 “有很多不同的方法可以用于无监督分类。由于不同的算法和参数设置不一定适合所有类型的数据,因此需要使用集群验证来选择最佳performing的算法。多种内部验证指标已经被提议用于此目的,但这些指标不使用任何外部信息。这些验证方法可以通过应用于知道的分类结构的数据来评估。本文将评论这些方法,包括其优点和缺点,然后提出一种新的方法。”Note: Please keep in mind that the translation is in Simplified Chinese, which is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

Effects of Daily News Sentiment on Stock Price Forecasting

  • paper_url: http://arxiv.org/abs/2308.08549
  • repo_url: None
  • paper_authors: S. Srinivas, R. Gadela, R. Sabu, A. Das, G. Nath, V. Datla
  • for: This paper aims to improve the accuracy of stock price forecasts by incorporating investor sentiment from news articles into the prediction model.
  • methods: The authors use a robust data collection and preprocessing framework to create a news database and time series data for NITY50 stocks. They use sentiment libraries to calculate sentiment scores from different sections of the articles and fit LSTM models to forecast stock prices, both with and without sentiment features.
  • results: The authors compare the performance of the LSTM models with and without sentiment features and find that incorporating sentiment scores improves the accuracy of stock price forecasts.
    Abstract Predicting future prices of a stock is an arduous task to perform. However, incorporating additional elements can significantly improve our predictions, rather than relying solely on a stock's historical price data to forecast its future price. Studies have demonstrated that investor sentiment, which is impacted by daily news about the company, can have a significant impact on stock price swings. There are numerous sources from which we can get this information, but they are cluttered with a lot of noise, making it difficult to accurately extract the sentiments from them. Hence the focus of our research is to design an efficient system to capture the sentiments from the news about the NITY50 stocks and investigate how much the financial news sentiment of these stocks are affecting their prices over a period of time. This paper presents a robust data collection and preprocessing framework to create a news database for a timeline of around 3.7 years, consisting of almost half a million news articles. We also capture the stock price information for this timeline and create multiple time series data, that include the sentiment scores from various sections of the article, calculated using different sentiment libraries. Based on this, we fit several LSTM models to forecast the stock prices, with and without using the sentiment scores as features and compare their performances.
    摘要 We present a robust data collection and preprocessing framework to create a news database spanning 3.7 years, consisting of nearly half a million news articles. We also collect stock price information for this timeline and create multiple time series data, including sentiment scores from various sections of the article calculated using different sentiment libraries. We then fit several LSTM models to forecast stock prices, with and without using sentiment scores as features, and compare their performances.

Integrating Homomorphic Encryption and Trusted Execution Technology for Autonomous and Confidential Model Refining in Cloud

  • paper_url: http://arxiv.org/abs/2308.00963
  • repo_url: None
  • paper_authors: Pinglan Liu, Wensheng Zhang
  • for: 本研究旨在设计一种在云端实现自动化和保密的模型优化方案,以满足长期连续进行机器学习的需求和数据和模型的隐私保护。
  • methods: 本研究使用了同时保证机密性和可信度的同时保证机密性和可信度的同时使用了Homomorphic加密和可信任执行环境技术,并通过实现和实验证明了该方案的可行性。
  • results: 实验结果表明,通过使用我们提议的方案,云服务器可以自动地对加密模型进行修改,以提高其精度。虽然效率仍然远低于基准方案,但我们预期通过更好地利用高级并行和云服务器GPU的计算能力,可以进一步提高效率。
    Abstract With the popularity of cloud computing and machine learning, it has been a trend to outsource machine learning processes (including model training and model-based inference) to cloud. By the outsourcing, other than utilizing the extensive and scalable resource offered by the cloud service provider, it will also be attractive to users if the cloud servers can manage the machine learning processes autonomously on behalf of the users. Such a feature will be especially salient when the machine learning is expected to be a long-term continuous process and the users are not always available to participate. Due to security and privacy concerns, it is also desired that the autonomous learning preserves the confidentiality of users' data and models involved. Hence, in this paper, we aim to design a scheme that enables autonomous and confidential model refining in cloud. Homomorphic encryption and trusted execution environment technology can protect confidentiality for autonomous computation, but each of them has their limitations respectively and they are complementary to each other. Therefore, we further propose to integrate these two techniques in the design of the model refining scheme. Through implementation and experiments, we evaluate the feasibility of our proposed scheme. The results indicate that, with our proposed scheme the cloud server can autonomously refine an encrypted model with newly provided encrypted training data to continuously improve its accuracy. Though the efficiency is still significantly lower than the baseline scheme that refines plaintext-model with plaintext-data, we expect that it can be improved by fully utilizing the higher level of parallelism and the computational power of GPU at the cloud server.
    摘要 With the rise of cloud computing and machine learning, it has become a trend to outsource machine learning processes (including model training and model-based inference) to the cloud. By outsourcing, users can not only take advantage of the extensive and scalable resources offered by cloud service providers but also enjoy the convenience of having the cloud servers manage the machine learning processes autonomously on their behalf. This feature is particularly desirable when machine learning is expected to be a long-term continuous process and users are not always available to participate. However, due to security and privacy concerns, it is essential that the autonomous learning preserves the confidentiality of users' data and models involved. Therefore, in this paper, we aim to design a scheme that enables autonomous and confidential model refining in the cloud.Homomorphic encryption and trusted execution environment technology can protect confidentiality for autonomous computation, but each of them has its limitations, respectively. Therefore, we propose to integrate these two techniques in the design of the model refining scheme. Through implementation and experiments, we evaluate the feasibility of our proposed scheme. The results show that the cloud server can autonomously refine an encrypted model with newly provided encrypted training data to continuously improve its accuracy. Although the efficiency is still significantly lower than the baseline scheme that refines plaintext-model with plaintext-data, we expect that it can be improved by fully utilizing the higher level of parallelism and the computational power of GPU at the cloud server.

Causal Inference with Differentially Private (Clustered) Outcomes

  • paper_url: http://arxiv.org/abs/2308.00957
  • repo_url: None
  • paper_authors: Adel Javanmard, Vahab Mirrokni, Jean Pouget-Abadie
  • For: The paper aims to provide a new differential privacy mechanism, “Cluster-DP”, to improve the estimation of causal effects from randomized experiments while maintaining strong privacy guarantees.* Methods: The paper proposes a clustering-based differential privacy mechanism that leverages the cluster structure of the data to reduce the variance loss and maintain privacy guarantees.* Results: The paper shows that the proposed “Cluster-DP” algorithm can improve the variance loss compared to its unclustered version and a more extreme uniform-prior version, while maintaining the same privacy guarantees.
    Abstract Estimating causal effects from randomized experiments is only feasible if participants agree to reveal their potentially sensitive responses. Of the many ways of ensuring privacy, label differential privacy is a widely used measure of an algorithm's privacy guarantee, which might encourage participants to share responses without running the risk of de-anonymization. Many differentially private mechanisms inject noise into the original data-set to achieve this privacy guarantee, which increases the variance of most statistical estimators and makes the precise measurement of causal effects difficult: there exists a fundamental privacy-variance trade-off to performing causal analyses from differentially private data. With the aim of achieving lower variance for stronger privacy guarantees, we suggest a new differential privacy mechanism, "Cluster-DP", which leverages any given cluster structure of the data while still allowing for the estimation of causal effects. We show that, depending on an intuitive measure of cluster quality, we can improve the variance loss while maintaining our privacy guarantees. We compare its performance, theoretically and empirically, to that of its unclustered version and a more extreme uniform-prior version which does not use any of the original response distribution, both of which are special cases of the "Cluster-DP" algorithm.
    摘要 估计 causal effect from randomized experiments 只能成功实现 participants 同意披露 potentially sensitive 回答。保护隐私的多种方法中,标签分布隐私是一种广泛使用的隐私保证度量,可能会鼓励 participants 分享回答不会风险 de-anonymization。许多异常隐私机制会在原始数据集中插入噪声来实现这一隐私保证度量,这会增加统计估计器的方差,从而使得确定 causal effect 变得更加困难:存在一个基本隐私-准备曲线质量负担。为了实现更低的准备质量和更强的隐私保证度量,我们建议一种新的隐私机制,即 "Cluster-DP",它利用给定的数据集中的群集结构,同时仍然允许确定 causal effect。我们表明,根据某种直观的群集质量度量,我们可以提高准备质量的变化,而不会随着隐私保证度量的增加。我们对其性能进行了理论和实验性比较,与其不使用原始回答分布的特殊情况(即 "Cluster-DP" 算法的不分支情况)和更激进的均匀先验情况(即不使用原始回答分布的情况)进行比较。

Curriculum Guided Domain Adaptation in the Dark

  • paper_url: http://arxiv.org/abs/2308.00956
  • repo_url: None
  • paper_authors: Chowdhury Sadman Jahan, Andreas Savakis
  • for: 本研究目的是 Addressing the rising concerns of privacy and security, domain adaptation in the dark aims to adapt a black-box source trained model to an unlabeled target domain without access to any source data or source model parameters.
  • methods: 本研究使用了 Curriculum Adaptation for Black-Box (CABB) 方法,它是一种curriculum guided adaptation approach,首先在目标数据集上使用高置信度(clean)标签进行训练,然后在目标数据集上使用噪音标签进行训练。 CABB 方法使用了 Jensen-Shannon 分布差作为 cleaner-noisy sample separation 的优化目标函数,而不是传统的 cross entropy loss 函数。
  • results: 实验结果表明,CABB 方法在标准领域适应 datasets 上比现有的黑框 DA 模型表现更好,并且与白框领域适应模型相当。
    Abstract Addressing the rising concerns of privacy and security, domain adaptation in the dark aims to adapt a black-box source trained model to an unlabeled target domain without access to any source data or source model parameters. The need for domain adaptation of black-box predictors becomes even more pronounced to protect intellectual property as deep learning based solutions are becoming increasingly commercialized. Current methods distill noisy predictions on the target data obtained from the source model to the target model, and/or separate clean/noisy target samples before adapting using traditional noisy label learning algorithms. However, these methods do not utilize the easy-to-hard learning nature of the clean/noisy data splits. Also, none of the existing methods are end-to-end, and require a separate fine-tuning stage and an initial warmup stage. In this work, we present Curriculum Adaptation for Black-Box (CABB) which provides a curriculum guided adaptation approach to gradually train the target model, first on target data with high confidence (clean) labels, and later on target data with noisy labels. CABB utilizes Jensen-Shannon divergence as a better criterion for clean-noisy sample separation, compared to the traditional criterion of cross entropy loss. Our method utilizes co-training of a dual-branch network to suppress error accumulation resulting from confirmation bias. The proposed approach is end-to-end trainable and does not require any extra finetuning stage, unlike existing methods. Empirical results on standard domain adaptation datasets show that CABB outperforms existing state-of-the-art black-box DA models and is comparable to white-box domain adaptation models.
    摘要 Addressing the rising concerns of privacy and security, 黑盒子领域适应(Domain Adaptation in the Dark)旨在不经过任何源数据或源模型参数的情况下,将黑盒子训练模型适应到无标注目标领域。随着深度学习解决方案的商业化,黑盒子领域适应的需求更加突出。现有的方法将源模型预测的误差转移到目标模型,并/或使用传统的噪声标签学习算法进行适应。但这些方法未使用容易从容易到困难的标签分配特性。此外,现有的方法都不是端到端的,需要额外的练习阶段和暖身阶段。在这个研究中,我们提出了“CURRICULUM ADAPTATION FOR BLACK-BOX”(CABB),它提供了一个课程导向的适应方法,首先在目标数据上将高信任度(清洁)标签训练目标模型,然后在目标数据上将噪声标签训练目标模型。CABB使用Jensen-Shannon散度作为更好的清洁噪声标签分配剂量,相比于传统的混合损失函数。我们的方法使用两条分支网络进行合作训练,以抑制因确认偏调所导致的错误累累。提案的方法是端到端训练的,不需要额外的练习阶段,与现有的方法不同。实验结果显示,CABB在标准领域适应 datasets 上表现更好,并且与白盒子领域适应模型相比几乎相同。

From Sparse to Soft Mixtures of Experts

  • paper_url: http://arxiv.org/abs/2308.00951
  • repo_url: https://github.com/google-research/vmoe
  • paper_authors: Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Neil Houlsby
  • for: This paper aims to address the challenges of training and inference costs in Mixture of Expert (MoE) architectures, specifically in the context of visual recognition tasks.
  • methods: The proposed Soft MoE method uses a fully-differentiable sparse Transformer that performs implicit soft assignments of input tokens to experts, allowing for larger model capacity at lower inference cost.
  • results: Soft MoE outperforms standard Transformers (ViTs) and popular MoE variants (Tokens Choice and Experts Choice) in visual recognition tasks, while scaling well with increasing numbers of experts and layers. For example, Soft MoE-Base/16 requires 10.5x lower inference cost than ViT-Huge/14 while matching its performance after similar training, and Soft MoE Huge/14 with 128 experts in 16 MoE layers has over 40x more parameters than ViT Huge/14 with only a 2% increase in inference time cost.
    Abstract Sparse mixture of expert architectures (MoEs) scale model capacity without large increases in training or inference costs. Despite their success, MoEs suffer from a number of issues: training instability, token dropping, inability to scale the number of experts, or ineffective finetuning. In this work, we proposeSoft MoE, a fully-differentiable sparse Transformer that addresses these challenges, while maintaining the benefits of MoEs. Soft MoE performs an implicit soft assignment by passing different weighted combinations of all input tokens to each expert. As in other MoE works, experts in Soft MoE only process a subset of the (combined) tokens, enabling larger model capacity at lower inference cost. In the context of visual recognition, Soft MoE greatly outperforms standard Transformers (ViTs) and popular MoE variants (Tokens Choice and Experts Choice). For example, Soft MoE-Base/16 requires 10.5x lower inference cost (5.7x lower wall-clock time) than ViT-Huge/14 while matching its performance after similar training. Soft MoE also scales well: Soft MoE Huge/14 with 128 experts in 16 MoE layers has over 40x more parameters than ViT Huge/14, while inference time cost grows by only 2%, and it performs substantially better.
    摘要 稀疏混合专家架构(MoE)可以增加模型容量而无需大幅提高训练或执行成本。 despite their success, MoE 受到一些问题的困扰,包括训练不稳定、吐token、无法扩展专家数量以及无效的微调。 在这项工作中,我们提出了软MoE,一种完全可微分的稀疏转换器,解决了这些挑战,同时保留了 MoE 的优点。 软MoE 通过不同权重的Weighted combinations of all input tokens 来进行隐式软分配。 与其他 MoE 工作一样,专家在 Soft MoE 中只处理一部分(合并)的输入字符,使得模型容量得到了更大的提升,而执行成本则得到了更低的降低。 在视识ognition中,软MoE 超过了标准Transformers(ViTs)和受欢迎的 MoE 变体(Tokens Choice和Experts Choice)。例如,Soft MoE-Base/16 需要10.5倍低的执行成本(5.7倍低的墙 clock time),而与其性能相似。 Soft MoE 也可以扩展: Soft MoE Huge/14 WITH 128 experts 在 16 MoE layers 中有40倍以上的参数量,而执行成本增加了只有2%,并且表现出了明显的提升。

Decomposing and Coupling Saliency Map for Lesion Segmentation in Ultrasound Images

  • paper_url: http://arxiv.org/abs/2308.00947
  • repo_url: None
  • paper_authors: Zhenyuan Ning, Yixiao Mao, Qianjin Feng, Shengzhou Zhong, Yu Zhang
  • for: 这篇论文旨在提高聚合体内部的肿瘤分类精度,应对静脉影像中肿瘤区域和周围组织(背景)的同等或更高的颜色和текстура对比。
  • methods: 这篇论文提出了一个名为DC-Net的分解且联系网络,通过在复杂的静脉影像中分解原始图像,将肿瘤区域和背景分类为不同的类别,以提高肿瘤分类精度。DC-Net包括分解和联系子网络,其中前者先将原始图像分解为肿瘤和背景的类别对应图像,然后后者进一步处理这些图像,以确保精度高的肿瘤分类。
  • results: 这篇论文的实验结果显示,DC-Net可以在两个静脉肿瘤分类任务中提高精度,比较现有的方法更好。
    Abstract Complex scenario of ultrasound image, in which adjacent tissues (i.e., background) share similar intensity with and even contain richer texture patterns than lesion region (i.e., foreground), brings a unique challenge for accurate lesion segmentation. This work presents a decomposition-coupling network, called DC-Net, to deal with this challenge in a (foreground-background) saliency map disentanglement-fusion manner. The DC-Net consists of decomposition and coupling subnets, and the former preliminarily disentangles original image into foreground and background saliency maps, followed by the latter for accurate segmentation under the assistance of saliency prior fusion. The coupling subnet involves three aspects of fusion strategies, including: 1) regional feature aggregation (via differentiable context pooling operator in the encoder) to adaptively preserve local contextual details with the larger receptive field during dimension reduction; 2) relation-aware representation fusion (via cross-correlation fusion module in the decoder) to efficiently fuse low-level visual characteristics and high-level semantic features during resolution restoration; 3) dependency-aware prior incorporation (via coupler) to reinforce foreground-salient representation with the complementary information derived from background representation. Furthermore, a harmonic loss function is introduced to encourage the network to focus more attention on low-confidence and hard samples. The proposed method is evaluated on two ultrasound lesion segmentation tasks, which demonstrates the remarkable performance improvement over existing state-of-the-art methods.
    摘要 复杂的超声图像场景下,邻近组织(即背景)与病变区域(即前景)的像素强度几乎相同,甚至具有更复杂的文本排序模式,对准确病变分割带来了独特挑战。本文提出了一种 decomposition-coupling 网络(DC-Net),通过在 (前景-背景) 敏感地图离散-融合方式下进行精准分割。DC-Net 包括 decomposition 和 coupling 子网络,前者先将原始图像粗略地分解成前景和背景敏感地图,然后后者在帮助于敏感优化下进行精准分割。coupling 子网络包括三个方面的融合策略:1)地域特征聚合(通过可导式上下文搅拌运算器在编码器中),以适应大小上下文的更大范围,在维度减少时保留地方上下文特征;2)关系意识表示融合(通过交叉相关融合模块在解码器中),以高效地融合低级视觉特征和高级 semantic 特征;3)依赖关系评估(通过优化器),以强制前景敏感表示具有补偿信息的背景表示。此外,文本还引入了一种和谐损失函数,以鼓励网络更加关注低信心和困难样本。提出的方法在两个超声病变分割任务上进行评估,表现出了明显的性能提升。

On the use of deep learning for phase recovery

  • paper_url: http://arxiv.org/abs/2308.00942
  • repo_url: None
  • paper_authors: Kaiqiang Wang, Li Song, Chutian Wang, Zhenbo Ren, Guangyuan Zhao, Jiazhen Dou, Jianglei Di, George Barbastathis, Renjie Zhou, Jianlin Zhao, Edmund Y. Lam
  • for: This paper is written for those interested in phase recovery (PR) and its applications in computational imaging.
  • methods: The paper reviews conventional methods for PR, as well as how deep learning (DL) can be used to support PR from pre-processing, in-processing, and post-processing stages.
  • results: The paper summarizes the work in DL for PR and provides a live-updating resource for readers to learn more about PR.Here is the same information in Simplified Chinese text:
  • for: 这篇论文是为了介绍phaserecovery(PR)和其应用于计算成像而写的。
  • methods: 论文回顾了传统的PR方法,以及如何通过深度学习(DL)在pre-processing、in-processing和post-processing三个阶段支持PR。
  • results: 论文总结了DL在PR方面的工作,并提供了一个live-updating资源,让读者更深入了解PR。
    Abstract Phase recovery (PR) refers to calculating the phase of the light field from its intensity measurements. As exemplified from quantitative phase imaging and coherent diffraction imaging to adaptive optics, PR is essential for reconstructing the refractive index distribution or topography of an object and correcting the aberration of an imaging system. In recent years, deep learning (DL), often implemented through deep neural networks, has provided unprecedented support for computational imaging, leading to more efficient solutions for various PR problems. In this review, we first briefly introduce conventional methods for PR. Then, we review how DL provides support for PR from the following three stages, namely, pre-processing, in-processing, and post-processing. We also review how DL is used in phase image processing. Finally, we summarize the work in DL for PR and outlook on how to better use DL to improve the reliability and efficiency in PR. Furthermore, we present a live-updating resource (https://github.com/kqwang/phase-recovery) for readers to learn more about PR.
    摘要 <>转换文本到简化中文。<>phas recovery (PR)指的是从光场强度测量中计算光场的阶段。例如从量化光场图像和相干散射图像到调整镜optics,PR是重构 объек的反射指数分布或地图的关键 step。在过去几年,深度学习(DL),通常通过深度神经网络实现,为计算成像提供了无前例的支持,导致了许多PR问题的更有效的解决方案。在这篇文章中,我们首先简要介绍了传统的PR方法。然后,我们回顾了DL在PR中的三个阶段,即先processing、processing和后processing。我们还回顾了DL在相位图像处理中的应用。最后,我们总结了DL在PR中的工作,并对如何更好地使用DL提高PR的可靠性和效率。此外,我们提供了一个live-updating资源(https://github.com/kqwang/phase-recovery),以便读者了解更多关于PR。

QUANT: A Minimalist Interval Method for Time Series Classification

  • paper_url: http://arxiv.org/abs/2308.00928
  • repo_url: https://github.com/angus924/quant
  • paper_authors: Angus Dempster, Daniel F. Schmidt, Geoffrey I. Webb
  • for: 这篇论文是为了提出一种基于间隔的时间序列分类方法。
  • methods: 该方法使用单一的特征(Quantiles)、固定的间隔和一个市场上可用的分类器。
  • results: 该方法可以在一组标准的测试数据集上实现同样的准确率,与现有最准确的间隔方法相比。这种快速和准确的方法可以在142个UCR数据集上实现状态机器学习的最佳性能,总计算时间仅为单CPU核心下的少于15分钟。
    Abstract We show that it is possible to achieve the same accuracy, on average, as the most accurate existing interval methods for time series classification on a standard set of benchmark datasets using a single type of feature (quantiles), fixed intervals, and an 'off the shelf' classifier. This distillation of interval-based approaches represents a fast and accurate method for time series classification, achieving state-of-the-art accuracy on the expanded set of 142 datasets in the UCR archive with a total compute time (training and inference) of less than 15 minutes using a single CPU core.
    摘要 我们证明了可以通过使用单一的特征(分位数)、固定间隔和对应的资料集(UCR档案),在一个单一CPU核心上,实现时间序列分类的同等精度,与现有的间隔方法相比。这种简化的间隔方法可以实现快速和准确的时间序列分类,在扩展的142个测试集上达到了状况之优的精度,训练和推导时间总计少于15分钟。

Continual Domain Adaptation on Aerial Images under Gradually Degrading Weather

  • paper_url: http://arxiv.org/abs/2308.00924
  • repo_url: https://github.com/sadman-jahan/aid-ucm-degradingweather
  • paper_authors: Chowdhury Sadman Jahan, Andreas Savakis
  • for: 这篇研究旨在探讨深度学习模型在天空平台上的适应Domain Adaptation(DA)问题,以及在这种情况下的测试时 Adaptation( continual DA)表现。
  • methods: 研究使用了两种逐渐恶化的天气情况,将Real Image Dataset中的两个 dataset合成而成四个 benchmark dataset。然后,评估了三种 DA 模型,包括标准 DA 模型和两种 continual DA 模型,并比较了两种不同的架构(卷积和transformer)。
  • results: 研究发现了在 continual DA Setting 中,exist buffer-fed continual DA 方法会出现稳定性问题,并提出了一个简单的Gradient Normalization方法来缓解这个问题。
    Abstract Domain adaptation (DA) strives to mitigate the domain gap between the source domain where a model is trained, and the target domain where the model is deployed. When a deep learning model is deployed on an aerial platform, it may face gradually degrading weather conditions during operation, leading to widening domain gaps between the training data and the encountered evaluation data. We synthesize two such gradually worsening weather conditions on real images from two existing aerial imagery datasets, generating a total of four benchmark datasets. Under the continual, or test-time adaptation setting, we evaluate three DA models on our datasets: a baseline standard DA model and two continual DA models. In such setting, the models can access only one small portion, or one batch of the target data at a time, and adaptation takes place continually, and over only one epoch of the data. The combination of the constraints of continual adaptation, and gradually deteriorating weather conditions provide the practical DA scenario for aerial deployment. Among the evaluated models, we consider both convolutional and transformer architectures for comparison. We discover stability issues during adaptation for existing buffer-fed continual DA methods, and offer gradient normalization as a simple solution to curb training instability.
    摘要 域 adaptation (DA) 的目的是减少源域和目标域之间的域 gap,以便在不同域上使用模型。当深度学习模型在飞行平台上部署时,可能会遇到逐渐恶化的天气条件,导致模型在训练数据和评估数据之间的域 gap 进一步扩大。我们将两种逐渐恶化的天气条件synthesize在实际图像上,生成了四个benchmark dataset。在 continual 或 test-time adaptation Setting 中,我们评估了三个 DA 模型:基eline 标准 DA 模型和两个 continual DA 模型。在这种设置下,模型可以只访问一小部分,或一批target data 中的一个批次,并且adaptation 发生在一个epoch 内。将 continual adaptation 的约束和逐渐恶化的天气条件相结合,我们实际上构建了飞行部署中的实用 DA enario。我们考虑了 convolutional 和 transformer 架构进行比较。我们发现了 continual DA 方法中的稳定问题,并提供了一种简单的 Gradient Normalization 解决方案来缓解训练不稳定。

Survey on Computer Vision Techniques for Internet-of-Things Devices

  • paper_url: http://arxiv.org/abs/2308.02553
  • repo_url: None
  • paper_authors: Ishmeet Kaur, Adwaita Janardhan Jadhav
  • for: 本研究旨在探讨最新的低功耗和能效的神经网络实现方法,以提高神经网络的部署性,而不是减少准确率。
  • methods: 本文涵盖了三类主要方法:神经网络压缩、网络架构搜索和设计、编译器和图优化。
  • results: 本文总结了低功耗和能效的神经网络实现方法的优劣点和未来研究问题。
    Abstract Deep neural networks (DNNs) are state-of-the-art techniques for solving most computer vision problems. DNNs require billions of parameters and operations to achieve state-of-the-art results. This requirement makes DNNs extremely compute, memory, and energy-hungry, and consequently difficult to deploy on small battery-powered Internet-of-Things (IoT) devices with limited computing resources. Deployment of DNNs on Internet-of-Things devices, such as traffic cameras, can improve public safety by enabling applications such as automatic accident detection and emergency response.Through this paper, we survey the recent advances in low-power and energy-efficient DNN implementations that improve the deployability of DNNs without significantly sacrificing accuracy. In general, these techniques either reduce the memory requirements, the number of arithmetic operations, or both. The techniques can be divided into three major categories: neural network compression, network architecture search and design, and compiler and graph optimizations. In this paper, we survey both low-power techniques for both convolutional and transformer DNNs, and summarize the advantages, disadvantages, and open research problems.
    摘要 深度神经网络(DNN)是现代计算机视觉问题的state-of-the-art技术。DNN需要数十亿参数和操作来实现state-of-the-art结果,这使得DNN变得极其计算、内存和能源贪吃,因此Difficult to deploy on small battery-powered Internet of Things(IoT)设备 with limited computing resources。 deploying DNNs on IoT devices, such as traffic cameras, can improve public safety by enabling applications such as automatic accident detection and emergency response. Through this paper, we survey the recent advances in low-power and energy-efficient DNN implementations that improve the deployability of DNNs without significantly sacrificing accuracy. In general, these techniques either reduce the memory requirements, the number of arithmetic operations, or both. The techniques can be divided into three major categories: neural network compression, network architecture search and design, and compiler and graph optimizations. In this paper, we survey both low-power techniques for both convolutional and transformer DNNs, and summarize the advantages, disadvantages, and open research problems.

Virtual histological staining of unlabeled autopsy tissue

  • paper_url: http://arxiv.org/abs/2308.00920
  • repo_url: None
  • paper_authors: Yuzhu Li, Nir Pillar, Jingxi Li, Tairan Liu, Di Wu, Songyu Sun, Guangdong Ma, Kevin de Haan, Luzhe Huang, Sepehr Hamidi, Anatoly Urisman, Tal Keidar Haran, William Dean Wallace, Jonathan E. Zuckerman, Aydogan Ozcan
  • for: 这个研究旨在解决传统压涂方法在检验尸体样本时的挑战,包括临床死亡后的样本自释导致的差异化、高成本和时间consuming的化学压涂过程。
  • methods: 这篇文章报道了一种虚拟染色技术,使用一个训练好的神经网络将染料自动替换为普通的染料染色,从而消除自释导致的严重染色扭曲。
  • results: 研究发现,虚拟染色技术可以快速地生成高质量的染料染色图像,并且可以减少劳动力、成本和基础设施需求。此外,这种技术还可以扩展到肿瘤组织和衰竭组织,并且可以在全球卫生危机期间提供更快速、更便宜的染色服务。
    Abstract Histological examination is a crucial step in an autopsy; however, the traditional histochemical staining of post-mortem samples faces multiple challenges, including the inferior staining quality due to autolysis caused by delayed fixation of cadaver tissue, as well as the resource-intensive nature of chemical staining procedures covering large tissue areas, which demand substantial labor, cost, and time. These challenges can become more pronounced during global health crises when the availability of histopathology services is limited, resulting in further delays in tissue fixation and more severe staining artifacts. Here, we report the first demonstration of virtual staining of autopsy tissue and show that a trained neural network can rapidly transform autofluorescence images of label-free autopsy tissue sections into brightfield equivalent images that match hematoxylin and eosin (H&E) stained versions of the same samples, eliminating autolysis-induced severe staining artifacts inherent in traditional histochemical staining of autopsied tissue. Our virtual H&E model was trained using >0.7 TB of image data and a data-efficient collaboration scheme that integrates the virtual staining network with an image registration network. The trained model effectively accentuated nuclear, cytoplasmic and extracellular features in new autopsy tissue samples that experienced severe autolysis, such as COVID-19 samples never seen before, where the traditional histochemical staining failed to provide consistent staining quality. This virtual autopsy staining technique can also be extended to necrotic tissue, and can rapidly and cost-effectively generate artifact-free H&E stains despite severe autolysis and cell death, also reducing labor, cost and infrastructure requirements associated with the standard histochemical staining.
    摘要 histological examination是Autopsy中的关键步骤,但传统的 histochemical staining方法面临多种挑战,包括由尸体泥炭引起的自体解剖引起的低质量着色,以及覆盖大面积组织的化学着色程序需要巨大的劳动力、成本和时间。在全球卫生危机期间, histopathology服务的可用性受限,导致组织着色的延迟和更严重的着色 artifacts。在这种情况下,我们提出了虚拟着色技术,使用训练过的神经网络将染色不含染料的 autopsy 组织切片转换成和染色后的 Hematoxylin and Eosin(H&E)染色版本匹配的明亮场景图像,消除自体解剖引起的严重着色 artifacts。我们的虚拟 H&E 模型通过 >0.7 TB 的图像数据和数据效率协作方案来训练,该方案将虚拟染色网络与图像 регистрация网络结合。训练后,模型能够有效强调组织中的核、细胞质和 extracellular 特征,包括 COVID-19 样本,这些样本在传统的 histochemical staining 中未能得到一致的染色质量。此虚拟染色技术还可以扩展到肿瘤组织,可以快速、成本低地生成 artifact-free H&E 染色,即使严重的自体解剖和细胞死亡。

VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference

  • paper_url: http://arxiv.org/abs/2308.00904
  • repo_url: None
  • paper_authors: Yonghe Zhao, Qiang Huang, Siwei Wu, Yun Peng, Huiyan Sun
  • for: 本研究旨在提出一种新的变量学习模型,用于对 observational 数据中的不观测变量进行推断,以提高 causal inference 的准确性。
  • methods: 该模型基于变量学习的思想,使用 doubly 变量推断模型来approximate 不观测变量的 posterior distribution,并且可以与现有的 counterfactual inference 模型相结合使用。
  • results: 实验表明,该模型可以准确地推断不观测变量,并且可以与现有的模型相结合使用,以提高 counterfactual inference 的准确性。 Plus, the model provides confidence intervals for counterfactual outcomes, which is useful in risk-sensitive domains.
    Abstract Causal inference plays a vital role in diverse domains like epidemiology, healthcare, and economics. De-confounding and counterfactual prediction in observational data has emerged as a prominent concern in causal inference research. While existing models tackle observed confounders, the presence of unobserved confounders remains a significant challenge, distorting causal inference and impacting counterfactual outcome accuracy. To address this, we propose a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI), which generates the posterior distribution of unobserved confounders. VLUCI relaxes the unconfoundedness assumption often overlooked by most causal inference methods. By disentangling observed and unobserved confounders, VLUCI constructs a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes. Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI's superior performance in inferring unobserved confounders. It is compatible with state-of-the-art counterfactual inference models, significantly improving inference accuracy at both group and individual levels. Additionally, VLUCI provides confidence intervals for counterfactual outcomes, aiding decision-making in risk-sensitive domains. We further clarify the considerations when applying VLUCI to cases where unobserved confounders don't strictly conform to our model assumptions using the public IHDP dataset as an example, highlighting the practical advantages of VLUCI.
    摘要 causal inference在多个领域中扮演着重要的角色,如epidemiology、医疗和经济等。在观察数据中,解除干扰和预测Counterfactual outcome成为了causal inference研究中的一个显著挑战。现有的模型可以处理观察到的干扰因素,但未观察到的干扰因素的存在仍然是一个主要的挑战,对 causal inference 和 counterfactual outcome的准确性产生干扰。为解决这个问题,我们提出了一种新的变分学习模型(VLUCI),可以生成未观察到的干扰因素的 posterior 分布。VLUCI 释放了对观察到的干扰因素的假设,从而更好地解除干扰。通过分离观察到和未观察到的干扰因素,VLUCI 构建了一个双变分推理模型,用于估计未观察到的干扰因素,从而更准确地预测 counterfactual outcome。我们在 synthetic 和半 synthetic 数据集上进行了广泛的实验,显示 VLUCI 在推理未观察到的干扰因素方面表现出色。它可以与当前的 counterfactual inference 模型相容,在组织和个体水平上提高推理准确性。此外,VLUCI 还提供了对 counterfactual outcome 的信息interval,帮助在风险敏感领域做出决策。我们还在使用公共 IHDP 数据集为例,详细介绍了在实际应用中考虑 VLUCI 的一般考虑事项。

User-Controllable Recommendation via Counterfactual Retrospective and Prospective Explanations

  • paper_url: http://arxiv.org/abs/2308.00894
  • repo_url: https://github.com/chrisjtan/ucr
  • paper_authors: Juntao Tan, Yingqiang Ge, Yan Zhu, Yinglong Xia, Jiebo Luo, Jianchao Ji, Yongfeng Zhang
  • for: 提高用户满意度和信任度,提供用户可控制的个性化推荐
  • methods: combinatorial explainable recommender systems,counterfactual reasoning,user control options
  • results: 在MovieLens和Yelp数据集上实验 validate 提议的效果,并且发现在提供用户控制选项时,可能会提高未来推荐的准确率
    Abstract Modern recommender systems utilize users' historical behaviors to generate personalized recommendations. However, these systems often lack user controllability, leading to diminished user satisfaction and trust in the systems. Acknowledging the recent advancements in explainable recommender systems that enhance users' understanding of recommendation mechanisms, we propose leveraging these advancements to improve user controllability. In this paper, we present a user-controllable recommender system that seamlessly integrates explainability and controllability within a unified framework. By providing both retrospective and prospective explanations through counterfactual reasoning, users can customize their control over the system by interacting with these explanations. Furthermore, we introduce and assess two attributes of controllability in recommendation systems: the complexity of controllability and the accuracy of controllability. Experimental evaluations on MovieLens and Yelp datasets substantiate the effectiveness of our proposed framework. Additionally, our experiments demonstrate that offering users control options can potentially enhance recommendation accuracy in the future. Source code and data are available at \url{https://github.com/chrisjtan/ucr}.
    摘要 现代推荐系统通常使用用户历史行为生成个性化推荐,但这些系统经常缺乏用户可控性,导致用户满意度和信任度减退。鉴于近年来的可解释推荐系统的进步,我们提议利用这些进步来提高用户可控性。在本文中,我们提出了一种可控性推荐系统,该系统内置了解释和可控性的统一框架。通过对解释和可控性进行反思,用户可以自定义他们对系统的控制。此外,我们引入了两个推荐系统可控性的特性:复杂性可控性和准确性可控性。我们在 MovieLens 和 Yelp 数据集上进行了实验评估,并证明了我们提出的框架的有效性。此外,我们的实验还表明,向用户提供控制选项可能会提高未来推荐的准确性。源代码和数据可以在 上获取。

Tango: rethinking quantization for graph neural network training on GPUs

  • paper_url: http://arxiv.org/abs/2308.00890
  • repo_url: None
  • paper_authors: Shiyang Chen, Da Zheng, Caiwen Ding, Chengying Huan, Yuede Ji, Hang Liu
  • for: 这篇论文主要旨在提高图 нейрон网络训练的效率,使用量化来加速计算。
  • methods: 该论文提出了三大贡献:首先,提供了精炼的规则来保持量化训练中的准确性。其次,设计了量化感知的基本 primitives 和间接优化,以加速 GNN 训练。最后,与 популяр的 Deep Graph Library (DGL) 系统集成,并在多种 GNN 模型和数据集上达到了状态arc的性能。
  • results: 该论文通过 Tango 系统,在多种 GNN 模型和数据集上达到了更高的训练效率,比如 state-of-the-art 方法快。
    Abstract Graph Neural Networks (GNNs) are becoming increasingly popular due to their superior performance in critical graph-related tasks. While quantization is widely used to accelerate GNN computation, quantized training faces unprecedented challenges. Current quantized GNN training systems often have longer training times than their full-precision counterparts for two reasons: (i) addressing the accuracy challenge leads to excessive overhead, and (ii) the optimization potential exposed by quantization is not adequately leveraged. This paper introduces Tango which re-thinks quantization challenges and opportunities for graph neural network training on GPUs with three contributions: Firstly, we introduce efficient rules to maintain accuracy during quantized GNN training. Secondly, we design and implement quantization-aware primitives and inter-primitive optimizations that can speed up GNN training. Finally, we integrate Tango with the popular Deep Graph Library (DGL) system and demonstrate its superior performance over state-of-the-art approaches on various GNN models and datasets.
    摘要 graph neural networks (GNNs) 在 kritical graph-related tasks 中的表现越来越出色,但量化训练面临无 precedent 的挑战。现有的量化 GNN 训练系统经常比其整数精度 counterparts longer training time due to two reasons: (i) addressing the accuracy challenge leads to excessive overhead, and (ii) the optimization potential exposed by quantization is not adequately leveraged. This paper introduces Tango, which re-thinks quantization challenges and opportunities for graph neural network training on GPUs with three contributions: Firstly, we introduce efficient rules to maintain accuracy during quantized GNN training. Secondly, we design and implement quantization-aware primitives and inter-primitive optimizations that can speed up GNN training. Finally, we integrate Tango with the popular Deep Graph Library (DGL) system and demonstrate its superior performance over state-of-the-art approaches on various GNN models and datasets.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese dialects. If you prefer Traditional Chinese, please let me know and I can provide the translation in that dialect as well.

Factor Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.00887
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Zhen Zhang, Mohammed Haroon Dupty, Fan Wu, Javen Qinfeng Shi, Wee Sun Lee
  • for: 本 paper 的目的是提出一种能够有效地捕捉高阶关系的图神经网络(FGNN),以便进行推理和学习。
  • methods: 本 paper 使用的方法包括提出一种高效的准确推理算法,以及将这种算法 Neil 化为一种带有更加复杂的消息更新规则的图神经网络模块。
  • results: 本 paper 的实验结果表明,提出的 FGNN 模型可以在 sintetic 数据集和实际数据集上达到出色的性能,并且可以代表 Max-Product 和 Sum-Product 循环信念传播。
    Abstract In recent years, we have witnessed a surge of Graph Neural Networks (GNNs), most of which can learn powerful representations in an end-to-end fashion with great success in many real-world applications. They have resemblance to Probabilistic Graphical Models (PGMs), but break free from some limitations of PGMs. By aiming to provide expressive methods for representation learning instead of computing marginals or most likely configurations, GNNs provide flexibility in the choice of information flowing rules while maintaining good performance. Despite their success and inspirations, they lack efficient ways to represent and learn higher-order relations among variables/nodes. More expressive higher-order GNNs which operate on k-tuples of nodes need increased computational resources in order to process higher-order tensors. We propose Factor Graph Neural Networks (FGNNs) to effectively capture higher-order relations for inference and learning. To do so, we first derive an efficient approximate Sum-Product loopy belief propagation inference algorithm for discrete higher-order PGMs. We then neuralize the novel message passing scheme into a Factor Graph Neural Network (FGNN) module by allowing richer representations of the message update rules; this facilitates both efficient inference and powerful end-to-end learning. We further show that with a suitable choice of message aggregation operators, our FGNN is also able to represent Max-Product belief propagation, providing a single family of architecture that can represent both Max and Sum-Product loopy belief propagation. Our extensive experimental evaluation on synthetic as well as real datasets demonstrates the potential of the proposed model.
    摘要 近年来,我们目睹了一场Graph Neural Networks(GNN)的浪涌,大多数可以在终端式的方式学习出强大的表示,在许多实际应用中取得了很大的成功。它们与 probabilistic Graphical Models(PGMs)有相似之处,但是超越了一些PGMs的限制。通过targeting表示学习而不是计算margin或最有可能的配置,GNNs提供了信息流动规则的灵活性,同时保持良好的性能。尽管它们的成功和灵感,但它们缺乏高阶关系的表示和学习方法。高阶GNNs需要更多的计算资源来处理高阶tensor。我们提议使用Factor Graph Neural Networks(FGNNs)来有效地捕捉高阶关系 для推理和学习。我们首先 derivate了一种高效的approximate Sum-Product loopy belief propagation推理算法 для离散高阶PGMs。然后,我们将这种新的message passing scheme neuralize到Factor Graph Neural Network(FGNN)模块中,允许更加丰富的message update规则表示,从而实现了有效的推理和强大的终端式学习。此外,我们还证明了在适当的message汇聚操作下,我们的FGNN可以表示Max-Product belief propagation,从而提供了一个单一的家族结构,可以表示Max和Sum-Product loopy belief propagation。我们对synthetic以及实际数据进行了广泛的实验评估,demonstrating the potential of the proposed model.

Enhancing Machine Learning Performance with Continuous In-Session Ground Truth Scores: Pilot Study on Objective Skeletal Muscle Pain Intensity Prediction

  • paper_url: http://arxiv.org/abs/2308.00886
  • repo_url: None
  • paper_authors: Boluwatife E. Faremi, Jonathon Stavres, Nuno Oliveira, Zhaoxian Zhou, Andrew H. Sung
    for: This study aimed to develop a novel approach for objective pain intensity characterization using machine learning (ML) models and real-time, continuous in-session pain scores.methods: The study used two devices to acquire real-time pain scores and ANS-modulated endodermal activity (EDA) data. The authors used a custom pain platform to store and extract time-domain EDA features and in-session ground truth scores. They trained ML models, including Multi-layer Perceptron (MLP) and Random Forest (RF), using objective EDA features and in-session scores.results: The study found that using continuous in-session ground truth scores significantly enhanced the performance of ML models in pain intensity characterization, with macro-averaged geometric mean scores of 75.9% and 78.3% for MLP and RF models, respectively, compared to scores of 70.3% and 74.6% for models trained with post-session scores. The study demonstrates the potential of using real-time, continuous pain scores to improve the accuracy of ML pain systems.
    Abstract Machine learning (ML) models trained on subjective self-report scores struggle to objectively classify pain accurately due to the significant variance between real-time pain experiences and recorded scores afterwards. This study developed two devices for acquisition of real-time, continuous in-session pain scores and gathering of ANS-modulated endodermal activity (EDA).The experiment recruited N = 24 subjects who underwent a post-exercise circulatory occlusion (PECO) with stretch, inducing discomfort. Subject data were stored in a custom pain platform, facilitating extraction of time-domain EDA features and in-session ground truth scores. Moreover, post-experiment visual analog scale (VAS) scores were collected from each subject. Machine learning models, namely Multi-layer Perceptron (MLP) and Random Forest (RF), were trained using corresponding objective EDA features combined with in-session scores and post-session scores, respectively. Over a 10-fold cross-validation, the macro-averaged geometric mean score revealed MLP and RF models trained with objective EDA features and in-session scores achieved superior performance (75.9% and 78.3%) compared to models trained with post-session scores (70.3% and 74.6%) respectively. This pioneering study demonstrates that using continuous in-session ground truth scores significantly enhances ML performance in pain intensity characterization, overcoming ground truth sparsity-related issues, data imbalance, and high variance. This study informs future objective-based ML pain system training.
    摘要 机器学习(ML)模型在主观自报分数上训练时,很难准确地分类疼痛,因为主观自报分数和实际时间内疼痛经历之间存在很大的差异。这个研究开发了两种设备用于实时、连续的疼痛分数获取和胃内活动评估(EDA)的收集。研究采用N = 24名参与者,经历了后期静脉填充(PECO)的伸展,导致不适。参与者的数据被存储在自定义的疼痛平台上,以便提取时间域EDA特征和实际时间内的真实分数。此外,每名参与者还提供了后实验的Visual Analog Scale(VAS)分数。机器学习模型,即多层感知器(MLP)和随机森林(RF),被训练使用对应的对象EDA特征和实际时间内分数 combinated with post-session scores。在10次横跨验证中,macro平均幂数分数表明MLP和RF模型使用对象EDA特征和实际时间内分数训练时的性能(75.9%和78.3%)高于使用post-session scores训练时的性能(70.3%和74.6%)。这项先驱研究表明,使用连续实际时间内的真实分数可以大大提高ML的疼痛强度特征化性能,超越真实分数稀缺、数据不均衡和高差异问题。这项研究为未来基于对象的ML疼痛系统训练提供了指导。

Revolutionizing Wireless Networks with Federated Learning: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2308.04404
  • repo_url: None
  • paper_authors: Sajjad Emdadi Mahdimahalleh
  • for: 本文探讨了在无线通信中机器学习的重要性,以及 federated learning(FL)在未来移动网络中的潜在作用。
  • methods: 本文使用了 federated learning(FL),它在无线边缘网络中分离了数据收集和计算,与传统的中央化学习不同。
  • results: 本文指出,由于无线通信资源有限和不可预测,FL 可以更好地适应这些环境,并且可以提高无线通信系统的效率和可靠性。
    Abstract These days with the rising computational capabilities of wireless user equipment such as smart phones, tablets, and vehicles, along with growing concerns about sharing private data, a novel machine learning model called federated learning (FL) has emerged. FL enables the separation of data acquisition and computation at the central unit, which is different from centralized learning that occurs in a data center. FL is typically used in a wireless edge network where communication resources are limited and unreliable. Bandwidth constraints necessitate scheduling only a subset of UEs for updates in each iteration, and because the wireless medium is shared, transmissions are susceptible to interference and are not assured. The article discusses the significance of Machine Learning in wireless communication and highlights Federated Learning (FL) as a novel approach that could play a vital role in future mobile networks, particularly 6G and beyond.
    摘要 现在,由于无线用户设备的计算能力的提高,如智能手机、平板电脑和车辆,以及对共享私人数据的关注,一种新的机器学习模型叫做联邦学习(FL)已经出现。FL使得数据获取和计算在中央单元分离开,与中央集中学习在数据中心不同。FL通常在无线边缘网络中使用,因为通信资源有限和不可预测。由于带宽有限,每次迭代只能将一 subset of UE进行更新,因为无线媒体共享,传输受到干扰和不能保证。文章介绍了无线通信中机器学习的重要性,并将联邦学习(FL)作为未来无线网络中的重要角色提出。

PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

  • paper_url: http://arxiv.org/abs/2308.00864
  • repo_url: None
  • paper_authors: Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho, Cathy Wu, Katherine Driggs-Campbell
  • for: 本研究旨在提高自动驾驶系统的可靠性和效率,以提高社会经济因素 such as 通勤时间和燃油费用。
  • methods: 本研究使用 Piecewise Constant(PC)策略和个性化剩余策略(PeRP),以模型人类驾驶行为,提供个性化的行为建议。
  • results: 我们的方法在模拟环境中训练完成,与基线比较,显示我们的方法可以成功地减轻交通堵塞,适应不同的 Driver 行为,提高平均速度4-22%。
    Abstract Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver's intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.
    摘要 智能驾驶系统可以减轻交通拥堵,提高了许多社会经济指标,如通勤时间和油费成本。然而,这些系统假设自动车辆队伍具有精确的控制权,因此在实践中受到限制,因为它们无法考虑人类行为的不确定性。 Piecewise Constant(PC)策略可以解决这些问题,通过结构化模型人类驾驶行为,以减少繁忙场景中的交通拥堵,并提供行动建议,以便人类驾驶员遵循。然而,PC策略假设所有 drivers 都 behave similarly。为此,我们开发了一种合作性建议系统,基于 PC 策略和一种新的 Driver Trait Conditioned Personalized Residual Policy(PeRP)。PeRP 建议 drivers 采取 mitigating 交通拥堵的行动。我们首先使用变量自动编码器在无监督的方式推断 driver 的内在特征,然后根据推断到的特征 conditioned 策略,提供个性化的建议。我们的系统在 simulate 中受到新的 driver 模型的 instrucion adherence 训练。我们表明,我们的方法可以成功地减少交通拥堵,同时适应不同 driver 行为,与基准相比,提高了4%-22%的平均速度。

Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic Processes

  • paper_url: http://arxiv.org/abs/2308.00858
  • repo_url: None
  • paper_authors: Stephan Johann Lehmler, Muhammad Saif-ur-Rehman, Tobias Glasmachers, Ioannis Iossifidis
  • for: 本研究想要更深入地理解人工神经网络(deep artificial neural network)的行为和学习动力学。
  • methods: 本研究使用杂素过程框架,它在人工神经网络性能方面提供了一种简化的视角,并且可以通过仿真来进行系统性的调查。
  • results: 研究人员使用杂素过程模型对不同的人工神经网络进行分析,发现这些网络在学习过程中的活动模式具有稳定的特征,并且可以通过 Mean Firing Rate、Mean Fano Factor 和 Variances 等指标来评估这些活动模式。这些结果可能有助于理解人工神经网络的行为和学习机制。
    Abstract To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural networks, it is valuable to employ mathematical abstractions and models. These tools provide a simplified perspective on network performance and facilitate systematic investigations through simulations. In this paper, we propose utilizing the framework of stochastic processes, which has been underutilized thus far. Our approach models activation patterns of thresholded nodes in (deep) artificial neural networks as stochastic processes. We focus solely on activation frequency, leveraging neuroscience techniques used for real neuron spike trains. During a classification task, we extract spiking activity and use an arrival process following the Poisson distribution. We examine observed data from various artificial neural networks in image recognition tasks, fitting the proposed model's assumptions. Through this, we derive parameters describing activation patterns in each network. Our analysis covers randomly initialized, generalizing, and memorizing networks, revealing consistent differences across architectures and training sets. Calculating Mean Firing Rate, Mean Fano Factor, and Variances, we find stable indicators of memorization during learning, providing valuable insights into network behavior. The proposed model shows promise in describing activation patterns and could serve as a general framework for future investigations. It has potential applications in theoretical simulations, pruning, and transfer learning.
    摘要 Simplified Chinese translation:为了更深刻理解深度人工神经网络的行为和学习动态,使用数学抽象和模型是非常有价值的。这些工具可以简化网络性能的视图,并且通过仿真来进行系统性的调查。在这篇论文中,我们提议使用 Stochastic Processes 框架,这种框架在过去并未得到充分利用。 我们的方法是将 thresholded nodes 的活动模式模型为 Stochastic Processes。我们仅准确采用 activation frequency,并且参考了实际神经元的发射 Train 技术。在一个分类任务中,我们从不同的人工神经网络中提取了冲击活动,并使用 Poisson 分布来描述到达过程。 我们对不同的人工神经网络在图像识别任务中的观察数据进行了适应,并从中 derive 了每个网络的活动模式参数。我们的分析覆盖了随机初始化、泛化和记忆化网络,并发现这些网络在不同的架构和训练集之间存在一致的差异。 计算 Mean Firing Rate、Mean Fano Factor 和 Variances,我们发现在学习过程中,记忆化存在稳定的指标,这些指标为我们对网络行为提供了有价值的洞察。我们的模型表示了 activation patterns 的描述,并且有可能在未来的研究中扮演一个普遍的框架。它还可以在理论仿真、剪辑和转移学习等方面应用。

Differential Privacy for Adaptive Weight Aggregation in Federated Tumor Segmentation

  • paper_url: http://arxiv.org/abs/2308.00856
  • repo_url: None
  • paper_authors: Muhammad Irfan Khan, Esa Alhoniemi, Elina Kontio, Suleiman A. Khan, Mojtaba Jafaritadi
  • For: 这个研究旨在提供一个具有数据隐私保护的联邦学习架构,以保护医疗影像数据的隐私和数据完整性。* Methods: 这个研究使用了一个叫做DP-SimAgg的分子类似隐私复杂数据联邦学习架构,具有提高模型分类能力和隐私保护的两个优点。* Results: 研究结果显示,DP-SimAgg可以实现精确且Robust的脑膜肿瘤分类,并对于通信成本的降低做出了重要贡献。
    Abstract Federated Learning (FL) is a distributed machine learning approach that safeguards privacy by creating an impartial global model while respecting the privacy of individual client data. However, the conventional FL method can introduce security risks when dealing with diverse client data, potentially compromising privacy and data integrity. To address these challenges, we present a differential privacy (DP) federated deep learning framework in medical image segmentation. In this paper, we extend our similarity weight aggregation (SimAgg) method to DP-SimAgg algorithm, a differentially private similarity-weighted aggregation algorithm for brain tumor segmentation in multi-modal magnetic resonance imaging (MRI). Our DP-SimAgg method not only enhances model segmentation capabilities but also provides an additional layer of privacy preservation. Extensive benchmarking and evaluation of our framework, with computational performance as a key consideration, demonstrate that DP-SimAgg enables accurate and robust brain tumor segmentation while minimizing communication costs during model training. This advancement is crucial for preserving the privacy of medical image data and safeguarding sensitive information. In conclusion, adding a differential privacy layer in the global weight aggregation phase of the federated brain tumor segmentation provides a promising solution to privacy concerns without compromising segmentation model efficacy. By leveraging DP, we ensure the protection of client data against adversarial attacks and malicious participants.
    摘要 federated learning (FL) 是一种分布式机器学习方法,保护隐私 by creating an impartial global model while respecting the privacy of individual client data。然而,传统的 FL 方法可能会引入安全风险,特别是处理多样化的客户端数据,可能会威胁隐私和数据完整性。为了解决这些挑战,我们在医疗图像分割中提出了一种含有权限的 federated deep learning 框架。在这篇论文中,我们将我们的相似性Weight集成 (SimAgg) 方法扩展到 differentially private 的 SimAgg 算法,用于在多Modal 磁共振成像 (MRI) 中的脑肿瘤分割。我们的 DP-SimAgg 方法不仅提高了模型分割能力,还提供了一层额外的隐私保护。我们对我们的框架进行了广泛的测试和评估,以计算性能为关键考虑因素,并证明了 DP-SimAgg 可以准确地 segment brain tumor while minimizing communication costs during model training。这一进步对于保护医疗图像数据的隐私和敏感信息的安全是关键。因此,我们认为在 global weight aggregation 阶段添加 differential privacy 层是一种有 Promise的解决方案,不会COMPROMISE segmentation model efficacy。通过运用 DP,我们能够保护客户端数据免受敌对攻击和恶意参与者的威胁。

A Comprehensive Study of Groundbreaking Machine Learning Research: Analyzing Highly Cited and Impactful Publications across Six Decades

  • paper_url: http://arxiv.org/abs/2308.00855
  • repo_url: None
  • paper_authors: Absalom E. Ezugwu, Japie Greeff, Yuh-Shan Ho
  • for: 本研究目的是为了了解机器学习(ML)领域最高引用的论文,以便了解该领域的主要趋势、影响人员和贡献。
  • methods: 本研究使用了各种 bibliometric 技术进行分析,包括引用分析、合作关系分析、关键词分析和出版趋势分析。
  • results: 研究发现了机器学习社区中最具影响力的论文、高引用的作者和合作网络,以及最受欢迎的研究主题和升起的新趋势。 Additionally, the study found that certain countries have a dominant position in ML research.
    Abstract Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields, thereby driving advancements in other domains of interest. As the field continues to evolve, it is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far. In this paper, we present a comprehensive bibliometric analysis of highly cited ML publications. We collected a dataset consisting of the top-cited papers from reputable ML conferences and journals, covering a period of several years from 1959 to 2022. We employed various bibliometric techniques to analyze the data, including citation analysis, co-authorship analysis, keyword analysis, and publication trends. Our findings reveal the most influential papers, highly cited authors, and collaborative networks within the machine learning community. We identify popular research themes and uncover emerging topics that have recently gained significant attention. Furthermore, we examine the geographical distribution of highly cited publications, highlighting the dominance of certain countries in ML research. By shedding light on the landscape of highly cited ML publications, our study provides valuable insights for researchers, policymakers, and practitioners seeking to understand the key developments and trends in this rapidly evolving field.
    摘要

CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters

  • paper_url: http://arxiv.org/abs/2308.00852
  • repo_url: None
  • paper_authors: Sudarsanan Rajasekaran, Manya Ghobadi, Aditya Akella
  • for: 提高机器学习(ML)集群中任务的完成时间和流量控制
  • methods: 使用网络卷积图来考虑不同任务之间的通信模式,并通过调整时间偏移值来调整这些任务的通信阶段
  • results: 在24个服务器测试环境中,与状态艺术ML调度器相比,CASSINI可以提高任务的平均和尾部完成时间 by up to 1.6x和2.5x,同时还可以减少集群中ECN标记包的数量 by up to 33x。
    Abstract We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters. CASSINI introduces a novel geometric abstraction to consider the communication pattern of different jobs while placing them on network links. To do so, CASSINI uses an affinity graph that finds a series of time-shift values to adjust the communication phases of a subset of jobs, such that the communication patterns of jobs sharing the same network link are interleaved with each other. Experiments with 13 common ML models on a 24-server testbed demonstrate that compared to the state-of-the-art ML schedulers, CASSINI improves the average and tail completion time of jobs by up to 1.6x and 2.5x, respectively. Moreover, we show that CASSINI reduces the number of ECN marked packets in the cluster by up to 33x.
    摘要 我们介绍CASSINI,一个对Machine Learning(ML)集群有对网络耦合的任务安排器。CASSINI引入了一个新的几何抽象,考虑不同任务之间的通信模式,并在网络链接上分配任务。为此,CASSINI使用一个相互作用graph,找到一系列时间延迟值,以调整具有共同网络链接的任务之间的通信阶段。实验结果显示,相比于现有的ML安排器,CASSINI可以提高任务的平均和尾部完成时间 by up to 1.6倍和2.5倍,分别。此外,我们显示CASSINI可以在集群中对ECN标识的封包数量减少到33倍。

An Exact Kernel Equivalence for Finite Classification Models

  • paper_url: http://arxiv.org/abs/2308.00824
  • repo_url: None
  • paper_authors: Brian Bell, Michael Geyer, David Glickenstein, Amanda Fernandez, Juston Moore
  • for: 这个论文是为了探讨神经网络和kernel方法之间的等价关系,并提出了首个精确表示任何 finite-size parametric classification model 的方法。
  • methods: 该论文使用了Gradient Descent来训练神经网络,并 derivated了一个精确的kernel机器。
  • results: 实验表明,该kernel可以在实际网络上计算到Machine Precision级别,并且可以提供有用的泛化理解。
    Abstract We explore the equivalence between neural networks and kernel methods by deriving the first exact representation of any finite-size parametric classification model trained with gradient descent as a kernel machine. We compare our exact representation to the well-known Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK and other non-exact path kernel formulations. We experimentally demonstrate that the kernel can be computed for realistic networks up to machine precision. We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks, particularly the way in which they generalize.
    摘要 我们研究神经网络和核方法之间的等价关系,通过将任何具有Gradient Descent训练的finite-size Parametric类别模型转换为核机制。我们与Well-known Neural Tangent Kernel(NTK)进行比较,并讨论非正确的路径核方法的误差。我们还证明了可以实际地Compute这个核函数 для现实的神经网络,至机器精度。我们使用这个精确的核函数,以示我们的理论贡献可以提供有用的预测性关于神经网络的预测。

An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning

  • paper_url: http://arxiv.org/abs/2308.00788
  • repo_url: None
  • paper_authors: Yihua Zhang, Prashant Khanduri, Ioannis Tsaknakis, Yuguang Yao, Mingyi Hong, Sijia Liu
  • For: This paper is focused on developing an overview of bi-level optimization (BLO) problems in the context of signal processing (SP) and machine learning (ML) applications.* Methods: The paper provides an overview of basic concepts, such as optimality conditions, standard algorithms, and practical implementations of BLO problems in SP and ML applications.* Results: The paper discusses recent advances in BLO theory, its implications for applications, and points out some limitations of the state-of-the-art that require significant future research efforts.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文主要关注在信号处理(SP)和机器学习(ML)应用中的双层优化(BLO)问题。
  • methods: 论文提供了BLO问题的基本概念,包括优化条件、标准算法和实践应用。
  • results: 论文讨论了最新的BLO理论发展,其应用效果和未来研究的限制。
    Abstract Recently, bi-level optimization (BLO) has taken center stage in some very exciting developments in the area of signal processing (SP) and machine learning (ML). Roughly speaking, BLO is a classical optimization problem that involves two levels of hierarchy (i.e., upper and lower levels), wherein obtaining the solution to the upper-level problem requires solving the lower-level one. BLO has become popular largely because it is powerful in modeling problems in SP and ML, among others, that involve optimizing nested objective functions. Prominent applications of BLO range from resource allocation for wireless systems to adversarial machine learning. In this work, we focus on a class of tractable BLO problems that often appear in SP and ML applications. We provide an overview of some basic concepts of this class of BLO problems, such as their optimality conditions, standard algorithms (including their optimization principles and practical implementations), as well as how they can be leveraged to obtain state-of-the-art results for a number of key SP and ML applications. Further, we discuss some recent advances in BLO theory, its implications for applications, and point out some limitations of the state-of-the-art that require significant future research efforts. Overall, we hope that this article can serve to accelerate the adoption of BLO as a generic tool to model, analyze, and innovate on a wide array of emerging SP and ML applications.
    摘要 近期,双层优化(BLO)在信号处理(SP)和机器学习(ML)领域备受关注。简言之,BLO是一个经典的优化问题,其包含两层层次结构(上下两层),其中解决上层问题需要解决下层问题。BLO受欢迎的原因在于它能够模型嵌入式目标函数的问题,如SP和ML应用中的资源分配和对抗学习等。在这篇文章中,我们将关注一类可解决的BLO问题,包括其优化条件、标准算法(包括优化原理和实践)以及如何使其在多个关键SP和ML应用中实现state-of-the-art结果。此外,我们还讨论了BLO理论的最新进展、其应用领域的影响和未来研究的限制。总之,我们希望通过这篇文章,加速BLO的采用,作为模型、分析和创新多种emerging SP和ML应用的一种通用工具。

Evaluating Spiking Neural Network On Neuromorphic Platform For Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2308.00787
  • repo_url: None
  • paper_authors: Sizhen Bian, Michele Magno
  • for: 这个研究的目的是评估使用神经元网络处理器进行人体活动识别,以满足智能手表的能源效率和延迟时间的要求。
  • methods: 这个研究使用了多reshold delta modulation方法来将输入感应器资料转换为射频,然后将射频输入到神经元网络中进行训练。
  • results: 研究结果显示,使用射频识别系统可以与使用传统神经网络相比,实现二倍的能源延迟产品(0.66 \si{\micro\joule\second} vs. 1.32 \si{\micro\joule\second),并且实现了87.5%的准确率。
    Abstract Energy efficiency and low latency are crucial requirements for designing wearable AI-empowered human activity recognition systems, due to the hard constraints of battery operations and closed-loop feedback. While neural network models have been extensively compressed to match the stringent edge requirements, spiking neural networks and event-based sensing are recently emerging as promising solutions to further improve performance due to their inherent energy efficiency and capacity to process spatiotemporal data in very low latency. This work aims to evaluate the effectiveness of spiking neural networks on neuromorphic processors in human activity recognition for wearable applications. The case of workout recognition with wrist-worn wearable motion sensors is used as a study. A multi-threshold delta modulation approach is utilized for encoding the input sensor data into spike trains to move the pipeline into the event-based approach. The spikes trains are then fed to a spiking neural network with direct-event training, and the trained model is deployed on the research neuromorphic platform from Intel, Loihi, to evaluate energy and latency efficiency. Test results show that the spike-based workouts recognition system can achieve a comparable accuracy (87.5\%) comparable to the popular milliwatt RISC-V bases multi-core processor GAP8 with a traditional neural network ( 88.1\%) while achieving two times better energy-delay product (0.66 \si{\micro\joule\second} vs. 1.32 \si{\micro\joule\second}).
    摘要 “能源效率和延迟时间是设计智能穿戴式人体活动识别系统的关键要求,因为电池运作的硬性限制和关闭反馈loop。尽管神经网络模型已经广泛压缩以适应边缘的 Stringent requirements,脉冲神经网络和事件感知是最近几年出现的有前途的解决方案,因为它们的内生能效和能够在很低的延迟时间处理空间时间数据。本工作旨在评估使用神经元处理器在人体活动识别中的脉冲神经网络效果。使用腕上穿戴式运动传感器进行运动识别作为研究。使用多reshold delta 模ulation方法编码输入传感器数据为脉冲 trains,然后将脉冲 trains feed到一个直接事件培训的脉冲神经网络中。经过训练后,模型被部署到英特尔的 Loihi 研究神经元平台上进行评估能源和延迟效率。测试结果表明,使用脉冲工作识别系统可以达到相同的准确率(87.5%),与流行的 milliwatt RISC-V 基于多核心处理器 GAP8 的传统神经网络(88.1%)相比,而且可以两倍提高能源延迟产品(0.66 微\si{\joule\second} vs. 1.32 微\si{\joule\second)。”

DYMOND: DYnamic MOtif-NoDes Network Generative Model

  • paper_url: http://arxiv.org/abs/2308.00770
  • repo_url: https://github.com/zeno129/dymond
  • paper_authors: Giselle Zeno, Timothy La Fond, Jennifer Neville
  • for: 这个论文主要是为了提出一种基于动态模式的图structuredynamics Generative模型,以便更好地模型动态图的结构和节点行为。
  • methods: 该模型使用动态模式活动来捕捉图的变化,同时考虑每个节点在模式中所扮演的角色。
  • results: 与基于边扩展的基elines相比,该模型在真实的网络上更好地生成图结构和节点行为。此外,该paper还提出了一种新的方法来适应图结构度量来评估网络的时间方面。
    Abstract Motifs, which have been established as building blocks for network structure, move beyond pair-wise connections to capture longer-range correlations in connections and activity. In spite of this, there are few generative graph models that consider higher-order network structures and even fewer that focus on using motifs in models of dynamic graphs. Most existing generative models for temporal graphs strictly grow the networks via edge addition, and the models are evaluated using static graph structure metrics -- which do not adequately capture the temporal behavior of the network. To address these issues, in this work we propose DYnamic MOtif-NoDes (DYMOND) -- a generative model that considers (i) the dynamic changes in overall graph structure using temporal motif activity and (ii) the roles nodes play in motifs (e.g., one node plays the hub role in a wedge, while the remaining two act as spokes). We compare DYMOND to three dynamic graph generative model baselines on real-world networks and show that DYMOND performs better at generating graph structure and node behavior similar to the observed network. We also propose a new methodology to adapt graph structure metrics to better evaluate the temporal aspect of the network. These metrics take into account the changes in overall graph structure and the individual nodes' behavior over time.
    摘要 <>使用简化中文表示文本。<>网络结构中的模式,作为网络结构的基本构件,已经被证明可以捕捉更长距离的相关性。然而,有很少的生成图模型考虑高阶网络结构,而且这些模型几乎都是通过边添加来生成网络。这些模型通常被评估使用静止图结构指标,这些指标不能准确捕捉网络的时间性行为。为解决这些问题,在这项工作中,我们提出了动态模式无核(DYMOND)生成模型。DYMOND模型考虑了(i)动态变化的总图结构使用时间模式活动,以及(ii)节点在模式中所扮演的角色(例如,一个节点在wedgel中扮演核心角色,剩下两个节点扮演螺旋的角色)。我们比较了DYMOND模型与三种动态图生成模型基线在真实网络上的性能,并显示DYMOND模型在生成图结构和节点行为方面表现出色,能够更好地生成与观察网络相似的图结构和节点行为。我们还提出了一种新的方法来适应图结构指标的改进,这些指标考虑了网络结构的变化和每个节点在时间上的行为。

Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval

  • paper_url: http://arxiv.org/abs/2308.00762
  • repo_url: https://github.com/d3mlab/rir_data
  • paper_authors: Mohammad Mahdi Abdollah Pour, Parsa Farinneya, Armin Toroghi, Anton Korikov, Ali Pesaranghader, Touqir Sajed, Manasa Bharadwaj, Borislav Mavrin, Scott Sanner
  • for: 本研究旨在提高Neural Information Retrieval(IR)方法对 Reviewed-Item Retrieval(RIR)任务的表现,包括使用自我超vised方法进行对律学习BERT表示的扩展。
  • methods: 本研究使用了自我超vised方法进行对律学习BERT表示,包括选择积极和消极样本,以及使用锚点抽样和元数据进行增强。
  • results: 实验结果显示,使用Late Fusion方法进行对律学习BERT表示的Neural RIR方法,在对律学习BERT表示的Neural IR和稀谱基eline上进行比较,具有最高的表现。
    Abstract As natural language interfaces enable users to express increasingly complex natural language queries, there is a parallel explosion of user review content that can allow users to better find items such as restaurants, books, or movies that match these expressive queries. While Neural Information Retrieval (IR) methods have provided state-of-the-art results for matching queries to documents, they have not been extended to the task of Reviewed-Item Retrieval (RIR), where query-review scores must be aggregated (or fused) into item-level scores for ranking. In the absence of labeled RIR datasets, we extend Neural IR methodology to RIR by leveraging self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews. Specifically, contrastive learning requires a choice of positive and negative samples, where the unique two-level structure of our item-review data combined with meta-data affords us a rich structure for the selection of these samples. For contrastive learning in a Late Fusion scenario, we investigate the use of positive review samples from the same item and/or with the same rating, selection of hard positive samples by choosing the least similar reviews from the same anchor item, and selection of hard negative samples by choosing the most similar reviews from different items. We also explore anchor sub-sampling and augmenting with meta-data. For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings. Experimental results show that Late Fusion contrastive learning for Neural RIR outperforms all other contrastive IR configurations, Neural IR, and sparse retrieval baselines, thus demonstrating the power of exploiting the two-level structure in Neural RIR approaches as well as the importance of preserving the nuance of individual review content via Late Fusion methods.
    摘要 随着自然语言界面的发展,用户可以提出越来越复杂的自然语言查询,这导致了用户评论内容的激增,从而帮助用户更好地找到如饭店、书籍或电影等匹配查询。而神经信息检索(Neural IR)方法已经提供了状态的检索结果,但它们没有扩展到评论检索(RIR)任务中,在这个任务中,查询评论得分需要被聚合(或融合)到项目级别上。由于没有标注的 RIR 数据集,我们扩展了神经 IR 方法到 RIR 任务中,利用自动生成的 BERT 表示来进行自我超vised学习。具体来说,我们使用了对比学习来学习 BERT 表示。对比学习需要选择正例和负例样本,我们利用 item-review 数据集的两层结构,以及元数据,选择了有利的样本。我们 investigate了在晚期融合 scenario 中使用同一个 Item 的正例评论、同一个分数的正例评论、最不相似的 anchor Item 的负例评论和最相似的 anchor Item 的负例评论等方法来选择正例和负例样本。我们还 explore了 anchor 子采样和元数据增强。此外,我们还引入了对比项embedding学习来融合评论。实验结果表明,使用晚期融合对比学习的神经 RIR 比其他对比 IR 配置、神经 IR 和缺省检索基eline都高效,这表明了在神经 RIR 方法中利用两层结构的优势以及在融合评论内容时保持评论细节的重要性。

The Bias Amplification Paradox in Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2308.00755
  • repo_url: https://github.com/preethiseshadri518/bias-amplification-paradox
  • paper_authors: Preethi Seshadri, Sameer Singh, Yanai Elazar
  • for: 这 paper studies bias amplification in the text-to-image domain, specifically looking at gender-occupation biases in the training data (LAION).
  • methods: The authors use Stable Diffusion to compare gender ratios in the training data and the generated images, and they find that the model amplifies gender biases present in the training data. However, they also identify several confounding factors that contribute to this amplification.
  • results: The authors discover that the model appears to amplify gender biases, but this amplification can largely be attributed to discrepancies between training captions and model prompts. Once these distributional differences are accounted for, the amplification decreases considerably. The findings highlight the challenges of comparing biases in models and the data they are trained on.
    Abstract Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We find that the model appears to amplify gender-occupation biases found in the training data (LAION). However, we discover that amplification can largely be attributed to discrepancies between training captions and model prompts. For example, an inherent difference is that captions from the training data often contain explicit gender information while the prompts we use do not, which leads to a distribution shift and consequently impacts bias measures. Once we account for various distributional differences between texts used for training and generation, we observe that amplification decreases considerably. Our findings illustrate the challenges of comparing biases in models and the data they are trained on, and highlight confounding factors that contribute to bias amplification.
    摘要 “偏调增强”是一种现象,模型在训练数据中的偏调会增加。在这篇研究中,我们研究了在文本到图像领域中的偏调增强,使用稳定扩散比较训练和生成图像中的性别比。我们发现,模型似乎将训练数据中的性别职业偏调增强。但是,我们发现这些增强可以主要归因于训练描述和模型说明之间的分布差异。例如,训练数据中的描述通常包含直接的性别信息,而模型说明则不包含这些信息,这会导致分布差异和影响偏调测量。一旦我们考虑到不同的分布差异,我们发现增强的减少了许多。我们的发现显示了比较模型和训练数据中的偏调的问题,以及对偏调增强的混淆因素。

Learning from Hypervectors: A Survey on Hypervector Encoding

  • paper_url: http://arxiv.org/abs/2308.00685
  • repo_url: None
  • paper_authors: Sercan Aygun, Mehran Shoushtari Moghadam, M. Hassan Najafi, Mohsen Imani
  • For: 本研究 zeros in on HDC 系统输入和生成 гипер向量过程,直接影响 гипер向量编码过程。* Methods: 本研究将从不同研究中收集various methods for гипер向量生成,探讨它们的局限性、挑战和可能的利好。* Results: 通过全面探讨这些encoding type的各种应用,读者将获得深刻的理解 hybervector generation的多种类型和各种应用场景中的encoding过程。
    Abstract Hyperdimensional computing (HDC) is an emerging computing paradigm that imitates the brain's structure to offer a powerful and efficient processing and learning model. In HDC, the data are encoded with long vectors, called hypervectors, typically with a length of 1K to 10K. The literature provides several encoding techniques to generate orthogonal or correlated hypervectors, depending on the intended application. The existing surveys in the literature often focus on the overall aspects of HDC systems, including system inputs, primary computations, and final outputs. However, this study takes a more specific approach. It zeroes in on the HDC system input and the generation of hypervectors, directly influencing the hypervector encoding process. This survey brings together various methods for hypervector generation from different studies and explores the limitations, challenges, and potential benefits they entail. Through a comprehensive exploration of this survey, readers will acquire a profound understanding of various encoding types in HDC and gain insights into the intricate process of hypervector generation for diverse applications.
    摘要 高维ensional计算(HDC)是一种emerging计算模式,它模仿大脑的结构,提供了一种强大和高效的处理和学习模型。在HDC中,数据被编码为长向量,称为超vector,通常长度在1K到10K。文献中提供了多种编码技术,以生成正交或相关的超vector,具体取决于应用场景。现有的文献综述通常专注于HDC系统的总体方面,包括输入、基本计算和最终输出。但本研究采取了更加细化的方法。它关注HDC系统的输入和超vector编码过程,直接影响超vector生成过程。本调查集结了不同研究中的各种超vector生成方法,探讨它们的局限性、挑战和应用场景中的优势。通过全面探讨本调查,读者将获得各种编码类型在HDC中的深刻理解,并对超vector生成过程中的细节有深入的了解。

CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

  • paper_url: http://arxiv.org/abs/2308.00683
  • repo_url: None
  • paper_authors: Nadezhda Chirkova, Sergey Troshin
  • for: investigate the effect of different subtokenization options for source code
  • methods: propose subtokenization that reduces average length by 17% without downstream performance drop, and show that a carefully chosen subtokenization may improve quality by 0.5-2%, possibly with some length increase.
  • results: identify most effective and length-efficient subtokenizations, taking into account code specifics.
    Abstract Recent works have widely adopted large language model pretraining for source code, suggested source code-specific pretraining objectives and investigated the applicability of various Transformer-based language model architectures for source code. This work investigates another important aspect of such models, namely the effect of different subtokenization options, and aims at identifying most effective and length-efficient subtokenizations, taking into account code specifics. We propose subtokenziation that reduces average length by 17% without downstream performance drop, and show that a carefully chosen subtokenization may improve quality by 0.5-2%, possibly with some length increase.
    摘要 近期研究广泛采用大型自然语言模型预训练 для源代码,建议源代码特有的预训练目标和Investigate了多种Transformer基于语言模型架构在源代码中的可行性。这个工作另一方面 investigate了这些模型中的另一个重要方面,即不同的子字符串选择方法的影响,并企图确定最有效和最短的子字符串选择方法,考虑代码特点。我们提议一种减少平均长度17%的子字符串选择方法,并显示一个合适的子字符串选择方法可能提高质量0.5-2%,可能具有一定的长度增加。

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00675
  • repo_url: None
  • paper_authors: Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
  • for: 提供一种新的工具使用方法,而不是通过示例来教导语言模型(LLM)使用新工具。
  • methods: 使用工具文档来替代示例,提供工具的使用描述以便LLM学习。
  • results: 研究发现,使用工具文档可以帮助LLM在不需要示例的情况下也能够正确使用工具,并且在实际场景中表现更好。
    Abstract Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones to provide. As tasks grow more complex, the selection search grows combinatorially and invariably becomes intractable. Our work provides an alternative to demonstrations: tool documentation. We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations. We substantiate our claim through three main empirical findings on 6 tasks across both vision and language modalities. First, on existing benchmarks, zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. Second, on a newly collected realistic tool-use dataset with hundreds of available tool APIs, we show that tool documentation is significantly more valuable than demonstrations, with zero-shot documentation significantly outperforming few-shot without documentation. Third, we highlight the benefits of tool documentations by tackling image generation and video tracking using just-released unseen state-of-the-art models as tools. Finally, we highlight the possibility of using tool documentation to automatically enable new applications: by using nothing more than the documentation of GroundingDino, Stable Diffusion, XMem, and SAM, LLMs can re-invent the functionalities of the just-released Grounded-SAM and Track Anything models.
    摘要 Translated into Simplified Chinese:今天,大型语言模型(LLM)通常通过提供一些示例来教育它们使用新工具。然而,示例很难获得,而且如果选择错误的示例,可能会导致不正确的使用。即使示例 readily available,也没有原则性的选择协议来确定多少和哪些提供。随着任务的复杂度增加,选择搜索会 combinatorially intractable 化。我们的工作强调使用工具文档,而不是示例。我们通过六个任务 across 视觉和语言模式来证明我们的主张。首先,在现有的benchmark上,只有工具文档的Zero-shot prompt是 sufficient для获得正确的工具使用,并且与几个shot prompt的性能相当。其次,我们收集了一个实际的工具使用数据集,包含了数百个可用的工具API,并示出了工具文档的优越性。第三,我们高亮了工具文档的优势,通过使用最新的领先技术模型作为工具来解决图像生成和视频跟踪等任务。最后,我们强调了使用工具文档来自动启用新应用程序:通过使用 GroundingDino、Stable Diffusion、XMem 和 SAM 的文档,LLMs 可以重新实现最新的 Grounded-SAM 和 Track Anything 模型的功能。

  • paper_url: http://arxiv.org/abs/2308.00733
  • repo_url: None
  • paper_authors: Mohammed Almutairi, Ozioma Collins Oguine
  • for: The paper aims to identify trending research areas in the field of Computer Science (CS) and investigate the factors contributing to their emergence.
  • methods: The authors use a comprehensive dataset comprising papers, citations, and funding information, and employ advanced machine learning techniques, including Decision Tree and Logistic Regression models, to predict trending research areas.
  • results: The analysis reveals that reference counts play a pivotal role in determining trending research areas, and the Logistic Regression model outperforms the Decision Tree model in predicting trends, with higher accuracy, precision, recall, and F1 score. The results provide valuable insights into the trending research areas and offer a data-driven foundation for decision-making and future research direction.Here are the three information points in Simplified Chinese text:
  • for: 这篇论文探讨了计算机科学(CS)领域当前热点研究领域,并研究这些热点研究领域的起源因素。
  • methods: 作者使用了一个包括论文、引用、资金信息的完整数据集,并使用高级机器学习技术,包括决策树和逻辑回归模型,预测热点研究领域。
  • results: 分析发现,引用计数(Reference Count)在确定热点研究领域中发挥了关键作用,并且 NSF 资金和专利在热点话题的影响逐渐增加。逻辑回归模型在预测热点领域方面表现出色,比决策树模型更高精度、精确性、回归率和 F1 分数。通过超过随机尝试基准点,我们的数据驱动方法表明更高的准确性和效率,可以为研究人员和机构提供数据驱动的基础 для决策和未来研究方向。
    Abstract This paper explores the current trending research areas in the field of Computer Science (CS) and investigates the factors contributing to their emergence. Leveraging a comprehensive dataset comprising papers, citations, and funding information, we employ advanced machine learning techniques, including Decision Tree and Logistic Regression models, to predict trending research areas. Our analysis reveals that the number of references cited in research papers (Reference Count) plays a pivotal role in determining trending research areas making reference counts the most relevant factor that drives trend in the CS field. Additionally, the influence of NSF grants and patents on trending topics has increased over time. The Logistic Regression model outperforms the Decision Tree model in predicting trends, exhibiting higher accuracy, precision, recall, and F1 score. By surpassing a random guess baseline, our data-driven approach demonstrates higher accuracy and efficacy in identifying trending research areas. The results offer valuable insights into the trending research areas, providing researchers and institutions with a data-driven foundation for decision-making and future research direction.
    摘要

eess.IV - 2023-08-02

CMUNeXt: An Efficient Medical Image Segmentation Network based on Large Kernel and Skip Fusion

  • paper_url: http://arxiv.org/abs/2308.01239
  • repo_url: https://github.com/FengheTan9/Medical-Image-Segmentation-Benchmarks
  • paper_authors: Fenghe Tang, Jianrui Ding, Lingtao Wang, Chunping Ning, S. Kevin Zhou
    for:CMUNeXt is designed for medical image segmentation, specifically for fast and accurate auxiliary diagnosis in real scene scenarios.methods:CMUNeXt uses a U-shaped architecture with large kernel and inverted bottleneck design, as well as the Skip-Fusion block to efficiently extract global context information and ensure ample feature fusion.results:CMUNeXt outperforms existing heavyweight and lightweight medical image segmentation networks in terms of segmentation performance, with faster inference speed, lighter weights, and reduced computational cost.Here’s the Chinese version:for:CMUNeXt 是为医疗图像分割设计,特指在真实场景中进行快速准确的辅助诊断。methods:CMUNeXt 使用 U 型架构,大kernel 和反向瓶颈设计,以及 Skip-Fusion 块来高效地提取全局上下文信息并确保较充分的特征融合。results:CMUNeXt 在多个医疗图像数据集上实现了比其他重量级和轻量级医疗图像分割网络更高的分割性能,同时具有更快的推理速度、较轻的权重和降低的计算成本。
    Abstract The U-shaped architecture has emerged as a crucial paradigm in the design of medical image segmentation networks. However, due to the inherent local limitations of convolution, a fully convolutional segmentation network with U-shaped architecture struggles to effectively extract global context information, which is vital for the precise localization of lesions. While hybrid architectures combining CNNs and Transformers can address these issues, their application in real medical scenarios is limited due to the computational resource constraints imposed by the environment and edge devices. In addition, the convolutional inductive bias in lightweight networks adeptly fits the scarce medical data, which is lacking in the Transformer based network. In order to extract global context information while taking advantage of the inductive bias, we propose CMUNeXt, an efficient fully convolutional lightweight medical image segmentation network, which enables fast and accurate auxiliary diagnosis in real scene scenarios. CMUNeXt leverages large kernel and inverted bottleneck design to thoroughly mix distant spatial and location information, efficiently extracting global context information. We also introduce the Skip-Fusion block, designed to enable smooth skip-connections and ensure ample feature fusion. Experimental results on multiple medical image datasets demonstrate that CMUNeXt outperforms existing heavyweight and lightweight medical image segmentation networks in terms of segmentation performance, while offering a faster inference speed, lighter weights, and a reduced computational cost. The code is available at https://github.com/FengheTan9/CMUNeXt.
    摘要 “U字型架构在医疗影像分类网络设计中扮演了关键角色。然而,由于卷积的本质性限制,具有U字型架构的完全卷积分类网络对于精确地Localization of lesions提供了有限的能力。而把CNN和Transformers混合在一起的混合架构,尽管可以解决这些问题,但在实际医疗情况下的Computational resource constraints和Edge devices上的应用受限。此外,卷积的预设假设适合医疗数据的缺乏,Transformer基于的网络将无法得到优化。为了提取全局背景信息而且利用卷积的预设,我们提出了CMUNeXt,一个高效的卷积类型医疗影像分类网络。CMUNeXt使用大kernel和倒置瓶颈设计,具有丰富的全局背景信息混合能力。我们还引入了Skip-Fusion层,以便实现稳定的skip-connection和丰富的特征融合。实验结果显示,CMUNeXt在多个医疗影像数据集上的分类性能高于现有的重量级和轻量级医疗影像分类网络,同时具有更快的推断速度、较轻的条件和reduced computational cost。代码可以在https://github.com/FengheTan9/CMUNeXt中找到。”

High-efficient deep learning-based DTI reconstruction with flexible diffusion gradient encoding scheme

  • paper_url: http://arxiv.org/abs/2308.01173
  • repo_url: None
  • paper_authors: Zejun Wu, Jiechao Wang, Zunquan Chen, Qinqin Yang, Shuhui Cai, Zhong Chen, Congbo Cai
  • for: 用于实现高效的Diffusion Tensor Reconstruction(DTI)方法,以及evaluate this method的效果。
  • methods: 采用动态核函数来嵌入扩散梯度方向信息到相应的扩散信号特征图中,并实现了扩散梯度方向的通用化。
  • results: 相比其他方法,FlexDTI可以成功实现高质量的扩散tensor-derived变量,即使扩散梯度数量和方向是可变的。它提高了平均扩散率(PSNR)约10dB,相对于state-of-the-art的深度学习方法。
    Abstract Purpose: To develop and evaluate a novel dynamic-convolution-based method called FlexDTI for high-efficient diffusion tensor reconstruction with flexible diffusion encoding gradient schemes. Methods: FlexDTI was developed to achieve high-quality DTI parametric mapping with flexible number and directions of diffusion encoding gradients. The proposed method used dynamic convolution kernels to embed diffusion gradient direction information into feature maps of the corresponding diffusion signal. Besides, our method realized the generalization of a flexible number of diffusion gradient directions by setting the maximum number of input channels of the network. The network was trained and tested using data sets from the Human Connectome Project and a local hospital. Results from FlexDTI and other advanced tensor parameter estimation methods were compared. Results: Compared to other methods, FlexDTI successfully achieves high-quality diffusion tensor-derived variables even if the number and directions of diffusion encoding gradients are variable. It increases peak signal-to-noise ratio (PSNR) by about 10 dB on Fractional Anisotropy (FA) and Mean Diffusivity (MD), compared with the state-of-the-art deep learning method with flexible diffusion encoding gradient schemes. Conclusion: FlexDTI can well learn diffusion gradient direction information to achieve generalized DTI reconstruction with flexible diffusion gradient schemes. Both flexibility and reconstruction quality can be taken into account in this network.
    摘要 目的:开发和评估一种基于动态核函数的新方法,称为FlexDTI,用于高效精度Diffusion Tensor Imaging(DTI)重建。方法:FlexDTI通过嵌入扩散梯度方向信息到扩散信号特征图中来实现高质量DTI参数映射。此外,我们的方法实现了扩散梯度方向的可变性,通过设置网络的最大输入通道数来实现。网络通过使用数据集从人类连接组计划和本地医院进行训练和测试。与其他高级tensor参数估计方法相比,FlexDTI成功实现了变量扩散encoding梯度的高质量Diffusion Tensor-derived变量。它提高了平均扩散率(PSNR)约10dB,相比之前的深度学习方法。结论:FlexDTI可以很好地学习扩散梯度方向信息,实现了通用DTI重建。这个网络能够同时考虑灵活性和重建质量。

Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation

  • paper_url: http://arxiv.org/abs/2308.01147
  • repo_url: https://github.com/zgj77/fsacdm
  • paper_authors: Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li
  • for: 这篇论文是为了提高markup-to-image生成的性能而写的。
  • methods: 该论文提出了一种名为“增强对比扩散模型”(FSA-CDM)的新模型,该模型在markup-to-image生成中引入了对比性正例和负例,以提高性能。技术上,该模型采用了细致的交叉模式对接模块,以便更好地挖掘两种模式之间的序列相似性,从而学习Robust的特征表示。
  • results: 经验表明,提出的组件在四个不同领域的标准数据集上得到了显著的改进,相对于状态空间的表现提高约2%-12% DTW。
    Abstract The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image. This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which introduces contrastive positive/negative samples into the diffusion model to boost performance for markup-to-image generation. Technically, we design a fine-grained cross-modal alignment module to well explore the sequence similarity between the two modalities for learning robust feature representations. To improve the generalization ability, we propose a contrast-augmented diffusion model to explicitly explore positive and negative samples by maximizing a novel contrastive variational objective, which is mathematically inferred to provide a tighter bound for the model's optimization. Moreover, the context-aware cross attention module is developed to capture the contextual information within markup language during the denoising process, yielding better noise prediction results. Extensive experiments are conducted on four benchmark datasets from different domains, and the experimental results demonstrate the effectiveness of the proposed components in FSA-CDM, significantly exceeding state-of-the-art performance by about 2%-12% DTW improvements. The code will be released at https://github.com/zgj77/FSACDM.
    摘要 Recently, markup-to-image生成技术面临更大的挑战,主要是由于 markup 和生成图像之间的错误忍容度低,同时序列和上下文相互关系复杂。这篇论文提出了一种新的模型名为“对比增强扩散模型 with 细腻模式对齐”(FSA-CDM),该模型在 markup-to-image 生成中提高表现力。从技术角度来说,我们设计了一种细腻的交叉模式对齐模块,以便充分探索两种模式之间的序列相似性,从而学习强健的特征表示。此外,我们还提出了一种对比增强扩散模型,以直接探索正面和负面样本,以提高模型的优化目标。此外,我们还开发了一种上下文意识听力模块,以捕捉 markup 语言中的上下文信息,从而实现更好的噪声预测结果。我们在四个不同领域的四个标准数据集上进行了广泛的实验,实验结果表明,我们提出的组件在 FSA-CDM 中具有显著的效果,与当前最佳状态报告相比,提高 DTW 值约 2%-12%。代码将在 GitHub 上发布,链接为

UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation

  • paper_url: http://arxiv.org/abs/2308.01146
  • repo_url: https://github.com/zhu-xlab/ucdformer
  • paper_authors: Qingsong Xu, Yilei Shi, Jianhua Guo, Chaojun Ouyang, Xiao Xiang Zhu
  • For: 本文提出了一种基于域shift的无监督变化检测方法,以解决 remote sensing 图像中的季节和风格差异问题。* Methods: 本文提出了一种基于 transformer 的变换图像翻译模型,以及一种新的可靠像素提取模块。* Results: 实验结果表明,对不同的无监督变化任务,UCDFormer 比其他相关方法提高了更多于 12% 的κ乘数性能。此外,UCDFormer 在考虑大规模应用时对地震Triggered 山崩检测表现出色。代码可以在 \url{https://github.com/zhu-xlab/UCDFormer} 上获取。
    Abstract Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of requiring no cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-temporal images. To this end, we propose a change detection with domain shift setting for remote sensing images. Furthermore, we present a novel unsupervised CD method using a light-weight transformer, called UCDFormer. Specifically, a transformer-driven image translation composed of a light-weight transformer and a domain-specific affinity weight is first proposed to mitigate domain shift between two images with real-time efficiency. After image translation, we can generate the difference map between the translated before-event image and the original after-event image. Then, a novel reliable pixel extraction module is proposed to select significantly changed/unchanged pixel positions by fusing the pseudo change maps of fuzzy c-means clustering and adaptive threshold. Finally, a binary change map is obtained based on these selected pixel pairs and a binary classifier. Experimental results on different unsupervised CD tasks with seasonal and style changes demonstrate the effectiveness of the proposed UCDFormer. For example, compared with several other related methods, UCDFormer improves performance on the Kappa coefficient by more than 12\%. In addition, UCDFormer achieves excellent performance for earthquake-induced landslide detection when considering large-scale applications. The code is available at \url{https://github.com/zhu-xlab/UCDFormer}
    摘要 Change detection (CD) by comparing two bi-temporal images is a crucial task in remote sensing. With the advantages of not requiring cumbersome labeled change information, unsupervised CD has attracted extensive attention in the community. However, existing unsupervised CD approaches rarely consider the seasonal and style differences incurred by the illumination and atmospheric conditions in multi-temporal images. To address this challenge, we propose a change detection with domain shift setting for remote sensing images. Furthermore, we present a novel unsupervised CD method using a light-weight transformer, called UCDFormer.Specifically, we propose a transformer-driven image translation composed of a light-weight transformer and a domain-specific affinity weight to mitigate domain shift between two images with real-time efficiency. After image translation, we can generate the difference map between the translated before-event image and the original after-event image. Then, we propose a novel reliable pixel extraction module to select significantly changed/unchanged pixel positions by fusing the pseudo change maps of fuzzy c-means clustering and adaptive threshold. Finally, we obtain a binary change map based on these selected pixel pairs and a binary classifier.Experimental results on different unsupervised CD tasks with seasonal and style changes demonstrate the effectiveness of the proposed UCDFormer. For example, compared with several other related methods, UCDFormer improves performance on the Kappa coefficient by more than 12%. In addition, UCDFormer achieves excellent performance for earthquake-induced landslide detection when considering large-scale applications. The code is available at \url{https://github.com/zhu-xlab/UCDFormer}.Here's the translation in Traditional Chinese:改变检测(CD)通过比较两个 би时间图像是远程感知中的关键任务。不需要繁琐的标注改变信息,无监督CD已经吸引了社区广泛的关注。然而,现有的无监督CD方法 rarely 考虑了多图像中的季节和风格变化。为解决这个挑战,我们提出了基于域Shift的改变检测方法 для远程感知图像。此外,我们还提出了一种基于轻量级 transformer 的无监督CD方法,called UCDFormer。具体来说,我们提出了一种基于 transformer 的图像翻译方法,包括一个轻量级 transformer 和域特定的相互作用权重。通过这种方法,我们可以在实时效率下 Mitigate 多图像之间的域Shift。图像翻译后,我们可以生成原始事件图像和转换后事件图像之间的差异图。然后,我们提出了一种基于 pseudo change map 的可靠像素提取模块,通过融合杂化 c-means 分 clustering 和 adaptive threshold 来选择significantly 改变/不改变的像素位置。最后,我们通过这些选择的像素对生成一个二进制改变地图,并使用一个二进制分类器。实验结果表明,提出的 UCDFormer 在不同的无监督CD任务中具有优秀表现。比如,相比其他相关方法,UCDFormer 在卡普朗公式上提高了性能超过 12%。此外,UCDFormer 在考虑大规模应用时对地震引起的山崩检测也具有出色的表现。代码可以在 \url{https://github.com/zhu-xlab/UCDFormer} 上获取。

Learning Fourier-Constrained Diffusion Bridges for MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2308.01096
  • repo_url: https://github.com/icon-lab/fdb
  • paper_authors: Muhammad U. Mirza, Onat Dalmaz, Hasan A. Bedel, Gokberk Elmas, Yilmaz Korkmaz, Alper Gungor, Salman UH Dar, Tolga Çukur
  • for: 加速MRI重建
  • methods: 含有扩散推论的扩散桥
  • results: 超过状态艺法重建方法表现
    Abstract Recent years have witnessed a surge in deep generative models for accelerated MRI reconstruction. Diffusion priors in particular have gained traction with their superior representational fidelity and diversity. Instead of the target transformation from undersampled to fully-sampled data, common diffusion priors are trained to learn a multi-step transformation from Gaussian noise onto fully-sampled data. During inference, data-fidelity projections are injected in between reverse diffusion steps to reach a compromise solution within the span of both the diffusion prior and the imaging operator. Unfortunately, suboptimal solutions can arise as the normality assumption of the diffusion prior causes divergence between learned and target transformations. To address this limitation, here we introduce the first diffusion bridge for accelerated MRI reconstruction. The proposed Fourier-constrained diffusion bridge (FDB) leverages a generalized process to transform between undersampled and fully-sampled data via random noise addition and random frequency removal as degradation operators. Unlike common diffusion priors that use an asymptotic endpoint based on Gaussian noise, FDB captures a transformation between finite endpoints where the initial endpoint is based on moderate degradation of fully-sampled data. Demonstrations on brain MRI indicate that FDB outperforms state-of-the-art reconstruction methods including conventional diffusion priors.
    摘要 近年来,深度生成模型在加速MRI重建方面得到了广泛应用。尤其是diffusion prior在其代表质量和多样性方面表现出色,因此得到了广泛应用。而不是目标变换从下采样到全样本数据,通用的diffusion prior通常是在从 Gaussian 噪声到全样本数据的多步变换上学习。在推理过程中,通过在反 diffusion 步骤中注入数据准确性投影来实现一个妥协解决方案,以达到在 diffusion prior 和成像运算符之间的妥协。然而,由于噪声假设导致的异常情况,这些解决方案可能不是最佳的。为此,我们在这里引入了首个加速MRI重建的扩展噪声桥(FDB)。提案的FDB利用一种扩展过程来将下采样数据转换成全样本数据,通过随机噪声添加和随机频率移除来实现噪声 Bridge 效果。与常见的扩展假设不同,FDB捕捉了一种将限定端点转换为有限端点,其初始端点基于部分降低全样本数据的噪声。Brain MRI 示例表明,FDB可以比州时间-平衡方法和常见重建方法更好地重建图像。

Push the Boundary of SAM: A Pseudo-label Correction Framework for Medical Segmentation

  • paper_url: http://arxiv.org/abs/2308.00883
  • repo_url: None
  • paper_authors: Ziyi Huang, Hongshan Liu, Haofeng Zhang, Fuyong Xing, Andrew Laine, Elsa Angelini, Christine Hendon, Yu Gan
  • for: 本研究旨在提高零样本学习 segmentation 的性能,特别是在医学影像 segmentation 领域, где 注意力点和专业知识要求较高。
  • methods: 本研究使用 Segment anything model (SAM),并提出了一种新的标签损害方法来提高 SAM 基于的 segmentation 性能。该方法通过检测标签的噪声来分 distinguish between clean labels and noisy labels,然后使用一种自适应 correction 模块来更正噪声标签,最终使用更新后的标签进行网络重新训练。
  • results: 研究结果表明,提出的方法可以在 X-ray 和肺 CT 数据集上提高 segmentation 精度,并超过基eline 方法在标签更正方面。
    Abstract Segment anything model (SAM) has emerged as the leading approach for zero-shot learning in segmentation, offering the advantage of avoiding pixel-wise annotation. It is particularly appealing in medical image segmentation where annotation is laborious and expertise-demanding. However, the direct application of SAM often yields inferior results compared to conventional fully supervised segmentation networks. While using SAM generated pseudo label could also benefit the training of fully supervised segmentation, the performance is limited by the quality of pseudo labels. In this paper, we propose a novel label corruption to push the boundary of SAM-based segmentation. Our model utilizes a novel noise detection module to distinguish between noisy labels from clean labels. This enables us to correct the noisy labels using an uncertainty-based self-correction module, thereby enriching the clean training set. Finally, we retrain the network with updated labels to optimize its weights for future predictions. One key advantage of our model is its ability to train deep networks using SAM-generated pseudo labels without relying on a subset of expert-level annotations. We demonstrate the effectiveness of our proposed model on both X-ray and lung CT datasets, indicating its ability to improve segmentation accuracy and outperform baseline methods in label correction.
    摘要 对于零条件学习分类 зада目标(SAM)已经成为领先的方法,它可以避免像Pixel-wise的标签。尤其在医疗影像分类中,标签是劳动密集且需要专业知识。然而,直接应用SAM经常会导致比于传统完全supervised分类网络较差的结果。使用SAM生成的伪标签也可以帮助完全supervised分类网络的训练,但是结果受到伪标签质量的限制。在这篇文章中,我们提出了一个新的标签损坏措施,我们的模型使用了一个新的噪音探测模组来分辨噪音标签和清洁标签。这使得我们可以通过uncertainty-based自我更正模组来更正噪音标签,从而丰富清洁训练集。最后,我们重新训练网络使用更新的标签,以便在未来预测中优化网络的 Parameters。我们的模型的一个关键优势是它可以透过SAM生成的伪标签进行深度网络的训练,不需要专业水平的标签。我们在X-ray和肺CT数据集上显示了我们的提案的效果,证明了它可以提高分类精度和超越基eline方法。

Decomposition Ascribed Synergistic Learning for Unified Image Restoration

  • paper_url: http://arxiv.org/abs/2308.00759
  • repo_url: None
  • paper_authors: Jinghao Zhang, Jie Huang, Man Zhou, Chongyi Li, Feng Zhao
  • for: 这篇论文旨在学习多种图像异常情况的纠正,以便在实际应用中更有效地处理图像。
  • methods: 该论文基于Singular Value Decomposition (SVD)的分析,通过将不同类型的图像异常情况分解成两类:singular vector dominated和singular value dominated,从而更好地利用不同类型的异常情况之间的关系,进而提高图像纠正的效果。
  • results: 实验结果表明,该方法在混合五种图像纠正任务中表现出色,包括雨晕图像纠正、霜点图像纠正、噪点图像纠正、抖擦图像纠正和低光照图像提高。
    Abstract Learning to restore multiple image degradations within a single model is quite beneficial for real-world applications. Nevertheless, existing works typically concentrate on regarding each degradation independently, while their relationship has been less exploited to ensure the synergistic learning. To this end, we revisit the diverse degradations through the lens of singular value decomposition, with the observation that the decomposed singular vectors and singular values naturally undertake the different types of degradation information, dividing various restoration tasks into two groups,\ie, singular vector dominated and singular value dominated. The above analysis renders a more unified perspective to ascribe the diverse degradations, compared to previous task-level independent learning. The dedicated optimization of degraded singular vectors and singular values inherently utilizes the potential relationship among diverse restoration tasks, attributing to the Decomposition Ascribed Synergistic Learning (DASL). Specifically, DASL comprises two effective operators, namely, Singular VEctor Operator (SVEO) and Singular VAlue Operator (SVAO), to favor the decomposed optimization, which can be lightly integrated into existing convolutional image restoration backbone. Moreover, the congruous decomposition loss has been devised for auxiliary. Extensive experiments on blended five image restoration tasks demonstrate the effectiveness of our method, including image deraining, image dehazing, image denoising, image deblurring, and low-light image enhancement.
    摘要 Translation notes:* "degradations" in the original text is translated as "质量下降" (quality downgrade) in Simplified Chinese, which is a more common term used in image processing tasks.* "singular value decomposition" is translated as "特征值分解" (feature value decomposition) in Simplified Chinese, which is a more direct translation of the original term.* "decomposed singular vectors" is translated as "分解的特征向量" (decomposed feature vectors) in Simplified Chinese, and "decomposed singular values" is translated as "分解的特征值" (decomposed feature values).* "task-level independent learning" is translated as "独立学习" (independent learning) in Simplified Chinese, which is a more direct translation of the original term.* "decomposition ascribed synergistic learning" is translated as "分解归一化学习" (decomposition unified learning) in Simplified Chinese, which is a more direct translation of the original term.* "singular vector operator" and "singular value operator" are translated as "特征向量操作器" (feature vector operator) and "特征值操作器" (feature value operator) in Simplified Chinese, respectively.* "congruous decomposition loss" is translated as "相同分解损失" (same decomposition loss) in Simplified Chinese, which is a more direct translation of the original term.

Phase Diverse Phase Retrieval for Microscopy: Comparison of Gaussian and Poisson Approaches

  • paper_url: http://arxiv.org/abs/2308.00734
  • repo_url: https://github.com/nikolajreiser/poissonphasediversity
  • paper_authors: Nikolaj Reiser, Min Guo, Hari Shroff, Patrick J. La Riviere
  • for: 这项研究旨在提高微镜像系统中的宽场瑕疵补做,并比较 Gaussian 和 Poisson 两种模型在微镜中的性能。
  • methods: 这项研究使用多张图像来估计微镜系统的 pupil 平面 phase 瑕疵,并解决优化问题来实现瑕疵补做。
  • results: 研究发现,Poisson 模型在各种情况下与 Gaussian 模型匹配或超越它,并且在低光强情况下表现更好。Poisson 算法也更鲁棒于空间不变瑕疵和相位噪声的影响。最后,研究比较了使用瑕疵补做和使用瑕疵扩散函数的扩散来实现图像更加清晰。
    Abstract Phase diversity is a widefield aberration correction method that uses multiple images to estimate the phase aberration at the pupil plane of an imaging system by solving an optimization problem. This estimated aberration can then be used to deconvolve the aberrated image or to reacquire it with aberration corrections applied to a deformable mirror. The optimization problem for aberration estimation has been formulated for both Gaussian and Poisson noise models but the Poisson model has never been studied in microscopy nor compared with the Gaussian model. Here, the Gaussian- and Poisson-based estimation algorithms are implemented and compared for widefield microscopy in simulation. The Poisson algorithm is found to match or outperform the Gaussian algorithm in a variety of situations, and converges in a similar or decreased amount of time. The Gaussian algorithm does perform better in low-light regimes when image noise is dominated by additive Gaussian noise. The Poisson algorithm is also found to be more robust to the effects of spatially variant aberration and phase noise. Finally, the relative advantages of re-acquisition with aberration correction and deconvolution with aberrated point spread functions are compared.
    摘要 “phas diversity是一种广场修正方法,使用多张图像来估算光学系统的相位偏移在 pupil 平面,通过解决优化问题。这个估算的偏移可以用来恢复 corrected 图像或者应用到可变 curvature 镜中。对于 Gaussian 和 Poisson 噪声模型,优化问题的解决方法已经被研究,但是 Poisson 模型从未在 Mikroskop 中被研究过,也从未与 Gaussian 模型进行比较。在这里,我们实现了 Gaussian- 和 Poisson-based 估算算法,并对宽场 Mikroskop 进行了模拟比较。Poisson 算法在多种情况下匹配或超越 Gaussian 算法,并在相同或更短的时间内 converges。Gaussian 算法在低光度条件下,当图像噪声主要是 additive Gaussian 噪声时,表现较好。Poisson 算法也较为具有修饰抗性和相位噪声的效果。最后,对于恢复 corrected 图像和使用偏移扩展点阵的扩展 PSF 进行了比较。”

cs.SD - 2023-08-01

Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

  • paper_url: http://arxiv.org/abs/2308.02531
  • repo_url: None
  • paper_authors: Jiuyang Zhou, Hong Zhu, Xingping Wang
  • for: 本研究旨在提出一种用于多VOICE音乐生成的神经网络模型,以便更好地模型音乐结构。
  • methods: 我们提出了一种名为Choir Transformer的多VOICE音乐生成神经网络模型,使用相对位置注意来更好地建立音乐中距离较长的音符之间的关系。我们还提出了适合多VOICE音乐生成的音乐表示方式。
  • results: 实验结果表明,Choir Transformer的性能超过了之前最佳性能4.06%。我们还测试了多VOICE音乐中的和声指标,实验结果与巴赫的音乐几乎相同。在实际应用中,生成的旋律和节奏可以根据输入的指定进行调整,并且可以根据不同的音乐风格进行不同的推敲。
    Abstract Polyphonic music generation is still a challenge direction due to its correct between generating melody and harmony. Most of the previous studies used RNN-based models. However, the RNN-based models are hard to establish the relationship between long-distance notes. In this paper, we propose a polyphonic music generation neural network named Choir Transformer[ https://github.com/Zjy0401/choir-transformer], with relative positional attention to better model the structure of music. We also proposed a music representation suitable for polyphonic music generation. The performance of Choir Transformer surpasses the previous state-of-the-art accuracy of 4.06%. We also measures the harmony metrics of polyphonic music. Experiments show that the harmony metrics are close to the music of Bach. In practical application, the generated melody and rhythm can be adjusted according to the specified input, with different styles of music like folk music or pop music and so on.
    摘要 <>translate_language Simplified Chinese<>polyphonic music generation still challenging direction, due to correct between generating melody and harmony. previous studies mostly use RNN-based models, but hard to establish long-distance notes relationship. in this paper, we propose polyphonic music generation neural network named Choir Transformer[https://github.com/Zjy0401/choir-transformer], with relative positional attention better model music structure. we also propose suitable music representation for polyphonic music generation. Choir Transformer performance surpasses previous state-of-the-art accuracy 4.06%. we also measure polyphonic music harmony metrics, close to Bach's music. in practical application, generated melody and rhythm can be adjusted according to specified input, with different styles of music like folk music or pop music and so on.

Multi-goal Audio-visual Navigation using Sound Direction Map

  • paper_url: http://arxiv.org/abs/2308.00219
  • repo_url: None
  • paper_authors: Haru Kondoh, Asako Kanezaki
  • for: 这 paper 旨在提出一种新的多目标音频视觉导航任务框架,并 investigate 这任务 在不同情况下的难度。
  • methods: 该 paper 使用了一种名为“sound direction map” (SDM) 方法,可以在学习基础上动态地Localize 多个声音源,并使用过去记忆来减少难度。
  • results: 实验结果表明,SDM 方法可以帮助多种基eline 方法 obtina 更高的性能,无论目标数量如何。
    Abstract Over the past few years, there has been a great deal of research on navigation tasks in indoor environments using deep reinforcement learning agents. Most of these tasks use only visual information in the form of first-person images to navigate to a single goal. More recently, tasks that simultaneously use visual and auditory information to navigate to the sound source and even navigation tasks with multiple goals instead of one have been proposed. However, there has been no proposal for a generalized navigation task combining these two types of tasks and using both visual and auditory information in a situation where multiple sound sources are goals. In this paper, we propose a new framework for this generalized task: multi-goal audio-visual navigation. We first define the task in detail, and then we investigate the difficulty of the multi-goal audio-visual navigation task relative to the current navigation tasks by conducting experiments in various situations. The research shows that multi-goal audio-visual navigation has the difficulty of the implicit need to separate the sources of sound. Next, to mitigate the difficulties in this new task, we propose a method named sound direction map (SDM), which dynamically localizes multiple sound sources in a learning-based manner while making use of past memories. Experimental results show that the use of SDM significantly improves the performance of multiple baseline methods, regardless of the number of goals.
    摘要 在过去几年,deep reinforcement learning代理人在室内环境中完成导航任务得到了大量研究。大多数这些任务只使用视觉信息,即首人图像,导航到单个目标。然而,在最近,使用视觉和听觉信息导航到声源并导航多个目标的任务被提议。然而,没有任何提案将这两种任务总结并使用两种类型的信息在多个声源目标下进行导航。在这篇论文中,我们提出了一个新的框架:多目标听觉视觉导航。我们首先定义了这个任务,然后通过在不同情况下进行实验来评估多目标听觉视觉导航任务的难度。研究结果表明,多目标听觉视觉导航任务具有分离声音来源的隐式需求。接着,我们提出了一种名为声音方向地图(SDM)的方法,该方法在学习基础上动态地Localizes多个声源,同时利用过去的记忆。实验结果表明,使用SDM可以大幅提高多个基eline方法的性能,无论目标数量如何。

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.00122
  • repo_url: None
  • paper_authors: Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
  • for: 提出了一种基于扩散模型的音视频分离框架,用于解决音视频 зву频源分离问题。
  • methods: 使用生成模型和分离U-Net synergize创建一个新的分离环境,从普通的普通分布开始,通过conditioning both the audio mixture和视频特征来生成分离的音频。
  • results: 与现有的状态对照方法相比,DAVIS在MUSIC dataset和AVE dataset上表现出色, separation质量更高,demonstrating the advantages of our framework for tackling the audio-visual source separation task。
    Abstract We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse categories. In contrast, DAVIS leverages a generative diffusion model and a Separation U-Net to synthesize separated magnitudes starting from Gaussian noises, conditioned on both the audio mixture and the visual footage. With its generative objective, DAVIS is better suited to achieving the goal of high-quality sound separation across diverse categories. We compare DAVIS to existing state-of-the-art discriminative audio-visual separation methods on the domain-specific MUSIC dataset and the open-domain AVE dataset, and results show that DAVIS outperforms other methods in separation quality, demonstrating the advantages of our framework for tackling the audio-visual source separation task.
    摘要 我们提出了DAVIS,一种基于扩散模型的音视频分离框架,用于通过生成方式解决音视频声音分离问题。而现有的探测方法在这个领域取得了很大的进步,但它们在处理多种类型声音分离时存在限制,不能够完美地捕捉声音分离的复杂数据分布。相比之下,DAVIS利用了一个生成扩散模型和一个分离U-Net,将始终为 Gaussian 噪声 synthesize 分离的大小,conditioned 于音乐混合和视频采集。由于其生成目标,DAVIS更适合实现高质量的声音分离 across 多种类型。我们与现有的状态对比了DAVIS 与其他 Audio-Visual 分离方法,并在域特定的 MUSIC dataset 和开放的 AVE dataset 上进行了比较,结果显示DAVIS 在分离质量方面与其他方法进行了比较,demonstrating 了我们的框架在声音分离任务中的优势。

eess.AS - 2023-08-01

Generative adversarial networks with physical sound field priors

  • paper_url: http://arxiv.org/abs/2308.00426
  • repo_url: https://github.com/xefonon/soundfieldgan
  • paper_authors: Xenofon Karakonstantis, Efren Fernandez-Grande
  • for: 这个论文提出了一种基于深度学习的声场重建方法,使用生成对抗网络(GANs)来捕捉室内声场的统计分布。
  • methods: 该方法使用平面波基础,利用室内声场的统计分布来准确重建声场从有限多个测量点的数据。
  • results: 试验结果表明,该方法可以在高频范围内具有更高的准确率和能量保留率,特别是在测量区域之外进行推断。此外,该方法可以适应不同的测量点和配置而不会影响性能。
    Abstract This paper presents a deep learning-based approach for the spatio-temporal reconstruction of sound fields using Generative Adversarial Networks (GANs). The method utilises a plane wave basis and learns the underlying statistical distributions of pressure in rooms to accurately reconstruct sound fields from a limited number of measurements. The performance of the method is evaluated using two established datasets and compared to state-of-the-art methods. The results show that the model is able to achieve an improved reconstruction performance in terms of accuracy and energy retention, particularly in the high-frequency range and when extrapolating beyond the measurement region. Furthermore, the proposed method can handle a varying number of measurement positions and configurations without sacrificing performance. The results suggest that this approach provides a promising approach to sound field reconstruction using generative models that allow for a physically informed prior to acoustics problems.
    摘要

Circumvent spherical Bessel function nulls for open sphere microphone arrays with physics informed neural network

  • paper_url: http://arxiv.org/abs/2308.00242
  • repo_url: None
  • paper_authors: Fei Ma, Thushara D. Abhayapala, Prasanga N. Samarasinghe
  • for: 提高开放球形微型麦克风数组(OSMA)的声场分析能力
  • methods: 使用物理学 informed neural network(PINN)模型OSMA测量和预测另一个圆柱体上的声场
  • results: 通过利用圆柱体半径不同导致圆柱体函散点变化,从预测中直接获得困难直接从OSMA测量中获得的声场系数
    Abstract Open sphere microphone arrays (OSMAs) are simple to design and do not introduce scattering fields, and thus can be advantageous than other arrays for implementing spatial acoustic algorithms under spherical model decomposition. However, an OSMA suffers from spherical Bessel function nulls which make it hard to obtain some sound field coefficients at certain frequencies. This paper proposes to assist an OSMA for sound field analysis with physics informed neural network (PINN). A PINN models the measurement of an OSMA and predicts the sound field on another sphere whose radius is different from that of the OSMA. Thanks to the fact that spherical Bessel function nulls vary with radius, the sound field coefficients which are hard to obtain based on the OSMA measurement directly can be obtained based on the prediction. Simulations confirm the effectiveness of this approach and compare it with the rigid sphere approach.
    摘要 Open sphere microphone arrays (OSMAs) 是容易设计的,不会产生散射场,因此可能比其他阵列更适合实现圆形声学模型的分解。然而,OSMA 受到圆形贝塞尔函数null的影响,使得在某些频率下获取声场各个分量的困难。这篇论文提议使用物理学 Informed Neural Network (PINN) 来帮助OSMA 进行声场分析。PINN 模型了 OSMA 的测量和另一个圆形壳声场的预测,并且通过利用圆形贝塞尔函数null 的变化,直接从预测中获取了difficult to obtain的声场各个分量。 simulated results confirm the effectiveness of this approach and compare it with the rigid sphere approach.

The role of vowel and consonant onsets in neural tracking of natural speech

  • paper_url: http://arxiv.org/abs/2308.00161
  • repo_url: None
  • paper_authors: Mohammad Jalilpour Monesi, Jonas Vanthornhout, Hugo Van hamme, Tom Francart
  • for: investigate how the auditory system processes natural speech
  • methods: used recorded EEG signals from 105 subjects while they listened to fairy tale stories, and related EEG to speech representations using forward modeling and match-mismatch tasks
  • results: vowel-consonant onsets outperform onsets of any phone in both tasks, suggesting that neural tracking of vowel vs. consonant exists in the EEG to some degree, and vowel (syllable nucleus) onsets are better related to EEG compared to syllable onsets.
    Abstract To investigate how the auditory system processes natural speech, models have been created to relate the electroencephalography (EEG) signal of a person listening to speech to various representations of the speech. Mainly the speech envelope has been used, but also phonetic representations. We investigated to which degree of granularity phonetic representations can be related to the EEG signal. We used recorded EEG signals from 105 subjects while they listened to fairy tale stories. We utilized speech representations, including onset of any phone, vowel-consonant onsets, broad phonetic class (BPC) onsets, and narrow phonetic class (NPC) onsets, and related them to EEG using forward modeling and match-mismatch tasks. In forward modeling, we used a linear model to predict EEG from speech representations. In the match-mismatch task, we trained a long short term memory (LSTM) based model to determine which of two candidate speech segments matches with a given EEG segment. Our results show that vowel-consonant onsets outperform onsets of any phone in both tasks, which suggests that neural tracking of the vowel vs. consonant exists in the EEG to some degree. We also observed that vowel (syllable nucleus) onsets are better related to EEG compared to syllable onsets. Finally, our findings suggest that neural tracking previously thought to be associated with broad phonetic classes might actually originate from vowel-consonant onsets rather than the differentiation between different phonetic classes.
    摘要 为了研究人类听说语言系统如何处理自然语言,我们创建了模型来关系听者在听说语言时的电生物学信号(EEG)与不同类型的语音表示。主要是使用语音封顶信号,还有 fonetic 表示。我们研究了这些表示与EEG信号之间的相对度。我们使用了105名参与者的记录的EEG信号,他们在听《童话》故事。我们使用了语音表示,包括任何 phone 的开始、元音-初始音和窄 fonetic 类(NPC) 开始,并将它们与EEG信号相关。在前向模型中,我们使用了一个线性模型预测EEG信号。在匹配-异样任务中,我们训练了一个基于长期短 памя真型(LSTM) 模型,以确定两个候选语音段中哪一个与给定的EEG段匹配。我们的结果显示,元音-初始音在两个任务中都高于任何 phone 开始,这表明在EEG信号中存在一定的元音-初始音跟踪。我们还发现,元音( syllable nucleus )开始在 EEG 信号中更好地相关,而 syllable 开始则不如那么好。最后,我们的发现表明,先前被认为与 Broad phonetic classes 相关的神经跟踪,实际上可能来自元音-初始音开始而不是不同的 fonetic 类。

An enhanced system for the detection and active cancellation of snoring signals

  • paper_url: http://arxiv.org/abs/2307.16809
  • repo_url: None
  • paper_authors: Valeria Bruschi, Michela Cantarini, Luca Serafini, Stefano Nobili, Stefania Cecchi, Stefano Squartini
  • for: 防止呼吸暂停疾病的影响,提高人们的社交和婚姻生活质量
  • methods: 使用卷积回归神经网络检测呼吸活动,采用延迟零带割分解法实现活动呼吸消除
  • results: 通过实验使用真实的呼吸信号,研究发现当呼吸活动检测阶段打开时,活动呼吸消除系统的性能更好,这说明了预先检测呼吸活动的效用性。
    Abstract Snoring is a common disorder that affects people's social and marital lives. The annoyance caused by snoring can be partially solved with active noise control systems. In this context, the present work aims at introducing an enhanced system based on the use of a convolutional recurrent neural network for snoring activity detection and a delayless subband approach for active snoring cancellation. Thanks to several experiments conducted using real snoring signals, this work shows that the active snoring cancellation system achieves better performance when the snoring activity detection stage is turned on, demonstrating the beneficial effect of a preliminary snoring detection stage in the perspective of snoring cancellation.
    摘要 呼吸困难是一种常见的呼吸疾病,影响人们的社交和 conjugal 生活。呼吸困难所引起的厌恶感可以通过活动噪声控制系统得到一定的缓解。在这个背景下,现在的工作旨在推出一种基于 convolutional recurrent neural network 的呼吸活动检测系统和无延迟子带方法。经过多次使用真实的呼吸信号进行实验,这个工作表明了该活动呼吸抑制系统在启用呼吸活动检测阶段时表现更好,从而证明了预先检测呼吸活动的效果对呼吸抑制有益。

cs.CV - 2023-08-01

ELFNet: Evidential Local-global Fusion for Stereo Matching

  • paper_url: http://arxiv.org/abs/2308.00728
  • repo_url: https://github.com/jimmy19991222/elfnet
  • paper_authors: Jieming Lou, Weide Liu, Zhuo Chen, Fayao Liu, Jun Cheng
  • for: 这篇论文目的是提出一种基于证据的本地含沟融合(ELF)框架,以便在 стерео匹配中实现不确定性估计和信任度感知。
  • methods: 该模型不仅预测了分辨率图,还预测了一个基于证据的分辨率图,以考虑 Aleatoric 和 Epistemic 不确定性。具体来说,使用ormal inverse-Gamma分布作为桥梁,实现了多级预测的内部证据融合和多视图信任度感知融合。
  • results: 实验结果表明,提出的框架可以有效利用多视图信息,并达到了当前最佳的总性能和跨领域泛化性。代码可以在 https://github.com/jimmy19991222/ELFNet 上下载。
    Abstract Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation. Additionally, effectively leveraging multi-scale and multi-view knowledge of stereo pairs remains unexplored. In this paper, we introduce the \textbf{E}vidential \textbf{L}ocal-global \textbf{F}usion (ELF) framework for stereo matching, which endows both uncertainty estimation and confidence-aware fusion with trustworthy heads. Instead of predicting the disparity map alone, our model estimates an evidential-based disparity considering both aleatoric and epistemic uncertainties. With the normal inverse-Gamma distribution as a bridge, the proposed framework realizes intra evidential fusion of multi-level predictions and inter evidential fusion between cost-volume-based and transformer-based stereo matching. Extensive experimental results show that the proposed framework exploits multi-view information effectively and achieves state-of-the-art overall performance both on accuracy and cross-domain generalization. The codes are available at https://github.com/jimmy19991222/ELFNet.
    摘要 existing 3D matching models have achieved continuous improvement, but they often lack uncertainty estimation, which can lead to trust issues. In addition, effectively leveraging multi-scale and multi-view knowledge of stereo pairs remains unexplored. In this paper, we introduce the 证据based Local-global Fusion (ELF) framework for stereo matching, which includes both uncertainty estimation and confidence-aware fusion. Instead of predicting the disparity map alone, our model estimates an evidential-based disparity that considers both aleatoric and epistemic uncertainties. With the normal inverse-Gamma distribution as a bridge, the proposed framework realizes intra evidential fusion of multi-level predictions and inter evidential fusion between cost-volume-based and transformer-based stereo matching. Extensive experimental results show that the proposed framework effectively utilizes multi-view information and achieves state-of-the-art overall performance in both accuracy and cross-domain generalization.Here's the breakdown of the translation:1. existing 3D matching models (现有的3D匹配模型) - This refers to the existing stereo matching models that have been proposed in the literature.2. have achieved continuous improvement (已经实现了连续改进) - This means that these models have been constantly improved over time, leading to better performance.3. but they often lack uncertainty estimation (但它们经常缺乏不确定性估计) - This means that these models often do not provide uncertainty estimates, which can be a limitation.4. which can lead to trust issues (可能导致信任问题) - This means that the lack of uncertainty estimates can make it difficult to trust the results of the model.5. In addition (此外) - This introduces a new idea that is being proposed in the paper.6. effectively leveraging multi-scale and multi-view knowledge of stereo pairs (有效地利用多级和多视图的双眼匹配知识) - This means that the proposed framework aims to effectively use the information from multiple scales and multiple views of stereo pairs.7. remains unexplored (未探索) - This means that this idea has not been explored in previous research.8. In this paper (在这篇论文中) - This introduces the paper being presented.9. we introduce the 证据based Local-global Fusion (ELF) framework for stereo matching (我们介绍了证据based Local-global Fusion 双眼匹配框架) - This is the main contribution of the paper.10. which includes both uncertainty estimation and confidence-aware fusion (包括不确定性估计和自信感觉的融合) - This means that the proposed framework includes both uncertainty estimates and confidence-aware fusion.11. Instead of predicting the disparity map alone (而不是单独预测分割图) - This means that the proposed framework does not only predict the disparity map, but also includes other information.12. our model estimates an evidential-based disparity (我们的模型估计一种证据基于的分割) - This means that the proposed framework estimates the disparity based on evidence.13. considering both aleatoric and epistemic uncertainties (考虑到随机和知识不确定性) - This means that the proposed framework considers both types of uncertainties.14. With the normal inverse-Gamma distribution as a bridge (使用正常的 inverse-Gamma 分布作为桥梁) - This means that the proposed framework uses a specific distribution to connect the different components of the model.15. the proposed framework realizes intra evidential fusion of multi-level predictions (提案的框架实现了多级预测之间的证据内部融合) - This means that the proposed framework fuses the predictions from different levels using evidence.16. and inter evidential fusion between cost-volume-based and transformer-based stereo matching (以及在基于成本量和基于变换器的双眼匹配之间的证据间融合) - This means that the proposed framework fuses the predictions from different models using evidence.17. Extensive experimental results show that the proposed framework effectively utilizes multi-view information (广泛的实验结果表明提案的框架有效地利用多视图信息) - This means that the proposed framework effectively uses the information from multiple views.18. and achieves state-of-the-art overall performance in both accuracy and cross-domain generalization (并在精度和跨领域泛化性方面达到了顶尖性能) - This means that the proposed framework achieves state-of-the-art performance in both accuracy and cross-domain generalization.19. The codes are available at https://github.com/jimmy19991222/ELFNet (代码可以在https://github.com/jimmy19991222/ELFNet中获取) - This means that the codes for the proposed framework are available online.

NeRT: Implicit Neural Representations for General Unsupervised Turbulence Mitigation

  • paper_url: http://arxiv.org/abs/2308.00622
  • repo_url: None
  • paper_authors: Weiyun Jiang, Vivek Boominathan, Ashok Veeraraghavan
  • for: 提高大气和水层抗抖抖能力
  • methods: 利用偏函数神经网络和物理正确的倾斜后噪声模型重建不受抖抖影响的清晰图像,只需几十个扭曲输入图像
  • results: 比state-of-the-art高效,能够消除实际环境中的不控制抖抖,并在连续捕捉视频序列中实现48倍加速
    Abstract The atmospheric and water turbulence mitigation problems have emerged as challenging inverse problems in computer vision and optics communities over the years. However, current methods either rely heavily on the quality of the training dataset or fail to generalize over various scenarios, such as static scenes, dynamic scenes, and text reconstructions. We propose a general implicit neural representation for unsupervised atmospheric and water turbulence mitigation (NeRT). NeRT leverages the implicit neural representations and the physically correct tilt-then-blur turbulence model to reconstruct the clean, undistorted image, given only dozens of distorted input images. Moreover, we show that NeRT outperforms the state-of-the-art through various qualitative and quantitative evaluations of atmospheric and water turbulence datasets. Furthermore, we demonstrate the ability of NeRT to eliminate uncontrolled turbulence from real-world environments. Lastly, we incorporate NeRT into continuously captured video sequences and demonstrate $48 \times$ speedup.
    摘要 “大气和水层湍流问题在计算机视觉和光学社区中出现了许多年来,但当前的方法 Either rely heavily on the quality of the training dataset or fail to generalize over various scenarios, such as static scenes, dynamic scenes, and text reconstructions. We propose a general implicit neural representation for unsupervised atmospheric and water turbulence mitigation (NeRT). NeRT leverages the implicit neural representations and the physically correct tilt-then-blur turbulence model to reconstruct the clean, undistorted image, given only dozens of distorted input images. Moreover, we show that NeRT outperforms the state-of-the-art through various qualitative and quantitative evaluations of atmospheric and water turbulence datasets. Furthermore, we demonstrate the ability of NeRT to eliminate uncontrolled turbulence from real-world environments. Lastly, we incorporate NeRT into continuously captured video sequences and demonstrate $48 \times$ speedup.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

Adaptive Semantic Consistency for Cross-domain Few-shot Classification

  • paper_url: http://arxiv.org/abs/2308.00727
  • repo_url: None
  • paper_authors: Hengchu Lu, Yuanjie Shao, Xiang Wang, Changxin Gao
  • for: 这篇论文的目的是解决跨领域几少数分类(CD-FSC)中的挑战,即在几少数目标类上实现新的目标类别识别。
  • methods: 本文提出了一个简单的插件和测试框架ASC(适应Semantic Consistency),通过在调整阶段 reuse source图像,设计适应性负担分配策略,强调目标领域相似的标本,从source领域获取有用的知识,避免静止转移。
  • results: 实验结果显示,提出的ASC方法能够有效地提高跨领域Robustness,并且在多个benchmark上获得了相对的改善。
    Abstract Cross-domain few-shot classification (CD-FSC) aims to identify novel target classes with a few samples, assuming that there exists a domain shift between source and target domains. Existing state-of-the-art practices typically pre-train on source domain and then finetune on the few-shot target data to yield task-adaptive representations. Despite promising progress, these methods are prone to overfitting the limited target distribution since data-scarcity and ignore the transferable knowledge learned in the source domain. To alleviate this problem, we propose a simple plug-and-play Adaptive Semantic Consistency (ASC) framework, which improves cross-domain robustness by preserving source transfer capability during the finetuning stage. Concretely, we reuse the source images in the pretraining phase and design an adaptive weight assignment strategy to highlight the samples similar to target domain, aiming to aggregate informative target-related knowledge from source domain. Subsequently, a semantic consistency regularization is applied to constrain the consistency between the semantic features of the source images output by the source model and target model. In this way, the proposed ASC enables explicit transfer of source domain knowledge to prevent the model from overfitting the target domain. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed ASC, and ASC provides consistent improvements over the baselines. The source code will be released.
    摘要 cross-domain few-shot classification (CD-FSC) targets novel classes with few samples, assuming a domain shift between source and target domains. Existing state-of-the-art methods pre-train on the source domain and fine-tune on few-shot target data to obtain task-adaptive representations. However, these methods are prone to overfitting the limited target distribution and ignore transferable knowledge from the source domain.To address this issue, we propose a simple plug-and-play Adaptive Semantic Consistency (ASC) framework. During the finetuning stage, we reuse the source images from the pretraining phase and assign adaptive weights to highlight samples similar to the target domain. This aims to aggregate informative target-related knowledge from the source domain. Additionally, we apply a semantic consistency regularization to constrain the consistency between the semantic features of the source images output by the source model and the target model. This ensures the explicit transfer of source domain knowledge to prevent overfitting the target domain.Our extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed ASC, and it consistently outperforms the baselines. The source code will be released.

Explainable Cost-Sensitive Deep Neural Networks for Brain Tumor Detection from Brain MRI Images considering Data Imbalance

  • paper_url: http://arxiv.org/abs/2308.00608
  • repo_url: https://github.com/shahariar-shibli/explainable-cost-sensitive-deep-neural-networks-for-brain-tumor-detection-from-brain-mri-images
  • paper_authors: Md Tanvir Rouf Shawon, G. M. Shahariar Shibli, Farzad Ahmed, Sajib Kumar Saha Joy
  • for: 这个研究旨在使用卷积神经网络(CNN)、ResNet50、InceptionV3、EfficientNetB0和NASNetMobile模型提高脑肿诊断的效率,以减少手动审查报告的时间和创建一个自动化脑肿分类系统。
  • methods: 该研究提出了一个自动化管道,包括五种模型:CNN、ResNet50、InceptionV3、EfficientNetB0和NASNetMobile。研究人员对这些模型进行了精心的调整和训练,以便在均衡数据集上评估其性能。
  • results: 研究人员发现,在均衡数据集上,精心调整的InceptionV3模型可以达到99.33%的准确率。此外,Explainable AI方法也被包含在模型中,以可视化模型的隐藏行为,以便更好地理解其黑obox行为。在均衡数据集上,使用成本敏感神经网络(CS-InceptionV3和CS-CNN)可以达到92.31%的准确率和1.00的回归值。
    Abstract This paper presents a research study on the use of Convolutional Neural Network (CNN), ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile models to efficiently detect brain tumors in order to reduce the time required for manual review of the report and create an automated system for classifying brain tumors. An automated pipeline is proposed, which encompasses five models: CNN, ResNet50, InceptionV3, EfficientNetB0 and NASNetMobile. The performance of the proposed architecture is evaluated on a balanced dataset and found to yield an accuracy of 99.33% for fine-tuned InceptionV3 model. Furthermore, Explainable AI approaches are incorporated to visualize the model's latent behavior in order to understand its black box behavior. To further optimize the training process, a cost-sensitive neural network approach has been proposed in order to work with imbalanced datasets which has achieved almost 4% more accuracy than the conventional models used in our experiments. The cost-sensitive InceptionV3 (CS-InceptionV3) and CNN (CS-CNN) show a promising accuracy of 92.31% and a recall value of 1.00 respectively on an imbalanced dataset. The proposed models have shown great potential in improving tumor detection accuracy and must be further developed for application in practical solutions. We have provided the datasets and made our implementations publicly available at - https://github.com/shahariar-shibli/Explainable-Cost-Sensitive-Deep-Neural-Networks-for-Brain-Tumor-Detection-from-Brain-MRI-Images
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

MonoNext: A 3D Monocular Object Detection with ConvNext

  • paper_url: http://arxiv.org/abs/2308.00596
  • repo_url: None
  • paper_authors: Marcelo Eduardo Pederiva, José Mario De Martino, Alessandro Zimmer
  • for: This paper aims to improve the accuracy and efficiency of Monocular 3D Object Detection models for autonomous driving perception tasks.
  • methods: The proposed method, called MonoNext, uses a spatial grid to map objects in the scene and employs the ConvNext network. It requires only 3D bounding box annotated data and is trained using a Multi-Tasking Learning approach.
  • results: In experiments with the KITTI dataset, MonoNext achieved high precision and competitive performance comparable with state-of-the-art approaches. With additional training data, MonoNext surpassed its initial performance and achieved even higher accuracies.Here’s the Chinese translation of the three points:
  • for: 这篇论文目标是提高自动驾驶视觉任务中的单目3D对象检测模型的准确率和效率。
  • methods: 提议的方法是MonoNext,它使用空间网格将场景中的对象映射,并使用ConvNext网络。它只需要3D包 bounds 注释数据,并通过多任务学习方法进行训练。
  • results: 在使用KITTI数据集进行实验时,MonoNext实现了高精度和与状态前方的性能,并且通过添加更多训练数据,MonoNext的性能超过了它的初始性能。
    Abstract Autonomous driving perception tasks rely heavily on cameras as the primary sensor for Object Detection, Semantic Segmentation, Instance Segmentation, and Object Tracking. However, RGB images captured by cameras lack depth information, which poses a significant challenge in 3D detection tasks. To supplement this missing data, mapping sensors such as LIDAR and RADAR are used for accurate 3D Object Detection. Despite their significant accuracy, the multi-sensor models are expensive and require a high computational demand. In contrast, Monocular 3D Object Detection models are becoming increasingly popular, offering a faster, cheaper, and easier-to-implement solution for 3D detections. This paper introduces a different Multi-Tasking Learning approach called MonoNext that utilizes a spatial grid to map objects in the scene. MonoNext employs a straightforward approach based on the ConvNext network and requires only 3D bounding box annotated data. In our experiments with the KITTI dataset, MonoNext achieved high precision and competitive performance comparable with state-of-the-art approaches. Furthermore, by adding more training data, MonoNext surpassed itself and achieved higher accuracies.
    摘要 自适应驾驶感知任务大量依赖于摄像头作为主要感知器,包括对象检测、semantic segmentation、实例 segmentation和对象跟踪。然而,RGB图像由摄像头捕获的缺乏深度信息,对3D检测任务带来了重要挑战。为了补充缺失的数据,映射感知器如LIDAR和RADAR被使用于高精度的3D对象检测。尽管它们具有显著的准确性,但是多感知模型具有高计算成本和高成本。相比之下,单目3D对象检测模型在不断增长,它们具有更快、更便宜、更容易实现的解决方案。本文介绍了一种不同的多任务学习方法called MonoNext,它使用空间网格将场景中的对象映射。MonoNext采用一种简单的approach基于ConvNext网络,只需要3D bounding box注释数据。在我们对KITTI数据集进行实验中,MonoNext实现了高精度和与状态的表现相当。此外,通过添加更多的训练数据,MonoNext超越了自己并实现了更高的准确性。

Latent-Shift: Gradient of Entropy Helps Neural Codecs

  • paper_url: http://arxiv.org/abs/2308.00725
  • repo_url: None
  • paper_authors: Muhammet Balcilar, Bharath Bhushan Damodaran, Karam Naser, Franck Galpin, Pierre Hellier
  • for: 这篇论文主要是为了提高图像/视频压缩技术的效率和质量。
  • methods: 这篇论文使用了可训练的编码器,利用了人工智能的学习能力来自适应性能和特定领域的高性能。
  • results: 该论文经验性地表明,在使用压缩器时可以利用解码器侧的梯度 entropy 来提高压缩率,从而实现 $1-2%$ 的压缩率下降。这种方法是独立于其他改进方法,并且不会影响图像/视频的质量。
    Abstract End-to-end image/video codecs are getting competitive compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques such as easy adaptation on perceptual distortion metrics and high performance on specific domains thanks to their learning ability. However, state of the art neural codecs does not take advantage of the existence of gradient of entropy in decoding device. In this paper, we theoretically show that gradient of entropy (available at decoder side) is correlated with the gradient of the reconstruction error (which is not available at decoder side). We then demonstrate experimentally that this gradient can be used on various compression methods, leading to a $1-2\%$ rate savings for the same quality. Our method is orthogonal to other improvements and brings independent rate savings.
    摘要 通过端到端图像/视频编码器来比较传统压缩技术,这些可编程编码器具有许多优势,如容易适应人为设计的质量指标和高性能在特定领域,归功于其学习能力。然而,当前领先的神经网络编码器没有利用解码器端的梯度Entropy gradient。在这篇论文中,我们理论上验证了解码器端梯度Entropy gradient与重建错误梯度之间的相关性。然后,我们通过实验表明,这个梯度可以在不同的压缩方法上使用,导致1-2%的比较率节省,同时保持相同的质量。我们的方法与其他改进方法独立,具有独立的率节省。

Visibility Enhancement for Low-light Hazy Scenarios

  • paper_url: http://arxiv.org/abs/2308.00591
  • repo_url: None
  • paper_authors: Chaoqun Zhuang, Yunfei Liu, Sijia Wen, Feng Lu
  • for: 增强低光照晦涂图像的可见度
  • methods: 提议两种关键技术:一种是跨任务一致性推准权重框架,另一种是基于物理学习低光照晦涂数据集的物理学习模型
  • results: 对多个指标(包括SSIM(9.19%)和PSNR(5.03%))进行了广泛的实验,并且通过用户研究表明了人类视觉效果的必要性和有效性。
    Abstract Low-light hazy scenes commonly appear at dusk and early morning. The visual enhancement for low-light hazy images is an ill-posed problem. Even though numerous methods have been proposed for image dehazing and low-light enhancement respectively, simply integrating them cannot deliver pleasing results for this particular task. In this paper, we present a novel method to enhance visibility for low-light hazy scenarios. To handle this challenging task, we propose two key techniques, namely cross-consistency dehazing-enhancement framework and physically based simulation for low-light hazy dataset. Specifically, the framework is designed for enhancing visibility of the input image via fully utilizing the clues from different sub-tasks. The simulation is designed for generating the dataset with ground-truths by the proposed low-light hazy imaging model. The extensive experimental results show that the proposed method outperforms the SOTA solutions on different metrics including SSIM (9.19%) and PSNR(5.03%). In addition, we conduct a user study on real images to demonstrate the effectiveness and necessity of the proposed method by human visual perception.
    摘要 低光照朦胧场景通常出现在晚上和早上。这种视觉提升低光照朦胧图像是一个不定系统的问题。虽然已有许多方法提出了用于图像抑霾和低光照提升,但直接组合这些方法并不能提供满意的结果。在这篇论文中,我们提出了一种新的方法,用于提高低光照朦胧场景的可见度。为解决这个复杂的任务,我们提出了两个关键技术:一是跨任务一致性抑霾提升框架,二是基于物理模型生成的低光照朦胧数据集。具体来说,框架是用于通过完全利用不同任务的各种各样的准确信息来提高输入图像的可见度。数据集是通过我们提出的低光照朦胧摄像机模型来生成的。我们的广泛的实验结果表明,我们的方法在不同的指标上(包括SSIM(9.19%)和PSNR(5.03%))都超过了现有的标准方法。此外,我们还进行了基于真实图像的用户研究,以证明我们的方法的有效性和必要性。

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

  • paper_url: http://arxiv.org/abs/2308.00588
  • repo_url: None
  • paper_authors: Kaijian Liu, Shixiang Tang, Ziyue Li, Zhishuai Li, Lei Bai, Feng Zhu, Rui Zhao
  • for: 本文旨在提出一种基于关系意识的分布表示网络(RAD-Net),用于Movie parsing和identity-based Movie editing中的人群化。
  • methods: 本文提出一种基于图构建分布表示和周期更新策略来生成一个模态快意识的分布表示。
  • results: 对于Video Person-Clustering Dataset(VPCD)和VoxCeleb2多视图 clustering dataset,本文的方法实现了+6%和+8.2%的提升(F-score)。
    Abstract Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods such as multi-view clustering mainly project multi-modal features into a joint feature space. However, multi-modal clue features are usually rather weakly correlated due to the semantic gap from the modality-specific uniqueness. As a result, these methods are not suitable for person clustering. In this paper, we propose a Relation-Aware Distribution representation Network (RAD-Net) to generate a distribution representation for multi-modal clues. The distribution representation of a clue is a vector consisting of the relation between this clue and all other clues from all modalities, thus being modality agnostic and good for person clustering. Accordingly, we introduce a graph-based method to construct distribution representation and employ a cyclic update policy to refine distribution representation progressively. Our method achieves substantial improvements of +6% and +8.2% in F-score on the Video Person-Clustering Dataset (VPCD) and VoxCeleb2 multi-view clustering dataset, respectively. Codes will be released publicly upon acceptance.
    摘要 人群划分使用多modal的迹象,包括脸、身体和嗓音,是许多任务的关键,如电影分析和基于身份的电影编辑。相关的方法,如多视图划分,通常将多modal的特征项目到共同的特征空间。然而,多modal的迹象特征通常强度不高,因为模式特有的唯一性导致这些方法不适用于人群划分。在这篇论文中,我们提出了关注关系的分布表示网络(RAD-Net),用于生成多modal迹象的分布表示。每个迹象的分布表示为所有modalities的迹象和这个迹象之间的关系,因此是无关modal的和适合人群划分的。我们还提出了基于图的方法来构建分布表示,并使用循环更新策略来进行分布表示的细化。我们的方法在VPCD和VoxCeleb2多视图划分集合上实现了+6%和+8.2%的提升。代码将在接受后公开发布。

PVG: Progressive Vision Graph for Vision Recognition

  • paper_url: http://arxiv.org/abs/2308.00574
  • repo_url: None
  • paper_authors: Jiafu Wu, Jian Li, Jiangning Zhang, Boshen Zhang, Mingmin Chi, Yabiao Wang, Chengjie Wang
  • for: 这篇论文目的是提出一种Progressive Vision Graph(PVG)架构,用于解决图像识别任务中的不规则对象捕捉问题。
  • methods: 该架构包括三个主要组件:1)分层分离图构建(PSGC),用于逐渐增加全局图支持的通道数和减少本地支持的通道数,以获得更好的二级相似性信息;2)使用Max pooling和数学期望(MaxE)来归并丰富的邻居信息;3)图错Linear Unit(GraphLU),用于增强低值信息,以降低图像细节信息压缩,避免过度整合。
  • results: 对主流 benchmark 进行了广泛的实验,显示了PVG在比对 estado-of-the-art 方法时的优势。例如,我们的PVG-S在ImageNet-1K上得到了83.0%的Top-1准确率,比GNN-based ViG-S高出+0.9,同时参数减少18.5%。而最大的PVG-B准确率达到84.2%,高于ViG-B的+0.5。此外,PVG-S在COCO dataset上得到了+1.3 box AP和+0.4 mask AP的提升。
    Abstract Convolution-based and Transformer-based vision backbone networks process images into the grid or sequence structures, respectively, which are inflexible for capturing irregular objects. Though Vision GNN (ViG) adopts graph-level features for complex images, it has some issues, such as inaccurate neighbor node selection, expensive node information aggregation calculation, and over-smoothing in the deep layers. To address the above problems, we propose a Progressive Vision Graph (PVG) architecture for vision recognition task. Compared with previous works, PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC) to introduce second-order similarity by gradually increasing the channel of the global graph branch and decreasing the channel of local branch as the layer deepens; 2) Neighbor nodes information aggregation and update module by using Max pooling and mathematical Expectation (MaxE) to aggregate rich neighbor information; 3) Graph error Linear Unit (GraphLU) to enhance low-value information in a relaxed form to reduce the compression of image detail information for alleviating the over-smoothing. Extensive experiments on mainstream benchmarks demonstrate the superiority of PVG over state-of-the-art methods, e.g., our PVG-S obtains 83.0% Top-1 accuracy on ImageNet-1K that surpasses GNN-based ViG-S by +0.9 with the parameters reduced by 18.5%, while the largest PVG-B obtains 84.2% that has +0.5 improvement than ViG-B. Furthermore, our PVG-S obtains +1.3 box AP and +0.4 mask AP gains than ViG-S on COCO dataset.
    摘要 “几何学基础的卷积网络和转换器基础的视觉后缘网络会将图像转化为格子或序列结构,这些结构不适合捕捉不规则的物体。尽管视觉图гра非线性(ViG)采用图гра级别特征来处理复杂图像,但它存在一些问题,如不准确的邻居节点选择、成本过高的邻居信息汇集计算和深层处理过程中的过滤。为解决以上问题,我们提出了进化的视觉图гра(PVG)架构 для视觉识别任务。相比前工作,PVG包含以下三个主要组成部分:1)逐层增强的全球图гра分支(PSGC),通过逐渐增加全球图гра分支的通道和减少本地分支的通道来引入二次相似性; 2)邻居节点信息汇集和更新模块,通过最大池化和数学期望(MaxE)来汇集丰富的邻居信息; 3)图гра梯度Linear Unit(GraphLU),以减少图像细节信息压缩,从而降低过滤。我们在主流的标准库上进行了广泛的实验,显示PVG在前一代方法之上具有优势,例如我们的PVG-S在ImageNet-1K上取得83.0%的Top-1准确率,比GNN基于ViG-S的82.1%高出0.9%,同时参数量减少18.5%。此外,我们的PVG-S在COCO dataset上对应的box AP和mask AP分别提高了+1.3和+0.4。”

Detecting Cloud Presence in Satellite Images Using the RGB-based CLIP Vision-Language Model

  • paper_url: http://arxiv.org/abs/2308.00541
  • repo_url: None
  • paper_authors: Mikolaj Czerkawski, Robert Atkinson, Christos Tachtatzis
  • for: 本研究用CLIP抽象语言模型来检测卫星图像中的云彩。
  • methods: 研究使用CLIP模型进行云存在检测的多种方法,包括文本提示的纯零shot操作以及多种微调方法。
  • results: CLIP模型可以在不同数据集和感知器类型(Sentinel-2和Landsat-8)上实现非rivial的云存在检测性能,并且可以泛化到不同的感知Modalities和感知频谱。
    Abstract This work explores capabilities of the pre-trained CLIP vision-language model to identify satellite images affected by clouds. Several approaches to using the model to perform cloud presence detection are proposed and evaluated, including a purely zero-shot operation with text prompts and several fine-tuning approaches. Furthermore, the transferability of the methods across different datasets and sensor types (Sentinel-2 and Landsat-8) is tested. The results that CLIP can achieve non-trivial performance on the cloud presence detection task with apparent capability to generalise across sensing modalities and sensing bands. It is also found that a low-cost fine-tuning stage leads to a strong increase in true negative rate. The results demonstrate that the representations learned by the CLIP model can be useful for satellite image processing tasks involving clouds.
    摘要 这项研究探讨了预训练的CLIP视觉语言模型在找到遮盖云图像时的能力。研究提出了使用文本提示进行零shot操作以及多种精度调整方法来实现云存在检测。此外,研究还测试了这些方法在不同的数据集和探测器类型(Sentinel-2和Landsat-8)之间的传输性。结果显示CLIP可以在云存在检测任务中实现非负性表现,并且表现出对探测Modalities和探测频谱的通用性。另外,一种低成本的精度调整阶段可以带来明显增加真正的零分数率。结果表明CLIP模型学习的表征可以对卫星图像处理任务中的云有用。

Visual attention information can be traced on cortical response but not on the retina: evidence from electrophysiological mouse data using natural images as stimuli

  • paper_url: http://arxiv.org/abs/2308.00526
  • repo_url: None
  • paper_authors: Nikos Melanitis, Konstantina Nikita
  • for: 这个研究探讨了视觉注意力的生物基础,以计算方式研究视觉注意力的生物基础。
  • methods: 研究使用了眼球和大脑电生物学数据来分析视觉注意力的生物基础。视觉刺激是自然图像,展示了真实世界场景。
  • results: 研究发现,在初级视觉层(V1)中,约10%的神经元响应不同于突出性视觉区域。视觉注意力信息不存在于眼球响应中,似乎眼球停留不了视觉注意力信息。
    Abstract Visual attention forms the basis of understanding the visual world. In this work we follow a computational approach to investigate the biological basis of visual attention. We analyze retinal and cortical electrophysiological data from mouse. Visual Stimuli are Natural Images depicting real world scenes. Our results show that in primary visual cortex (V1), a subset of around $10\%$ of the neurons responds differently to salient versus non-salient visual regions. Visual attention information was not traced in retinal response. It appears that the retina remains naive concerning visual attention; cortical response gets modulated to interpret visual attention information. Experimental animal studies may be designed to further explore the biological basis of visual attention we traced in this study. In applied and translational science, our study contributes to the design of improved visual prostheses systems -- systems that create artificial visual percepts to visually impaired individuals by electronic implants placed on either the retina or the cortex.
    摘要 视觉注意力是视觉世界理解的基础。在这项工作中,我们采用计算机方法研究生物基础的视觉注意力。我们分析了鼠脑和脊椎电physiological数据。视觉刺激是自然图像,描绘现实生活场景。我们的结果表明,在主视觉层(V1)中,约10%的神经元响应不同于突出 versus 非突出视觉区域。视觉注意力信息没有踪迹在Retina响应中。似乎retina免疫Visual attention信息; cortical response被修饰以解读视觉注意力信息。在生物基础研究中,我们可以通过进一步的实验动物研究来探索我们在这项研究中跟踪的生物基础。在应用和翻译科学中,我们的研究对于设计改进的视觉 prótesis系统做出了贡献,这些系统通过电子植入在Retina或cortex上为视障人群创造人工视觉感受。

NormKD: Normalized Logits for Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2308.00520
  • repo_url: https://github.com/gizi1/NormKD
  • paper_authors: Zhihao Chi, Tu Zheng, Hengjia Li, Zheng Yang, Boxi Wu, Binbin Lin, Deng Cai
  • for: 提高logit基于知识传递的性能,尤其是在温度参数的调整方面。
  • methods: 提出Normalized Knowledge Distillation(NormKD),通过自适应调整每个样本的温度来更好地传递样本特有的知识。
  • results: NormKD在CIRAR-100和ImageNet上的图像分类 task 中比vanilla KD更好,而且可以轻松地应用于其他logit基于方法,达到类似或更好的性能。
    Abstract Logit based knowledge distillation gets less attention in recent years since feature based methods perform better in most cases. Nevertheless, we find it still has untapped potential when we re-investigate the temperature, which is a crucial hyper-parameter to soften the logit outputs. For most of the previous works, it was set as a fixed value for the entire distillation procedure. However, as the logits from different samples are distributed quite variously, it is not feasible to soften all of them to an equal degree by just a single temperature, which may make the previous work transfer the knowledge of each sample inadequately. In this paper, we restudy the hyper-parameter temperature and figure out its incapability to distill the knowledge from each sample sufficiently when it is a single value. To address this issue, we propose Normalized Knowledge Distillation (NormKD), with the purpose of customizing the temperature for each sample according to the characteristic of the sample's logit distribution. Compared to the vanilla KD, NormKD barely has extra computation or storage cost but performs significantly better on CIRAR-100 and ImageNet for image classification. Furthermore, NormKD can be easily applied to the other logit based methods and achieve better performance which can be closer to or even better than the feature based method.
    摘要 这些年来,逻辑基本的知识传递几乎没有收到关注,因为基于特征的方法在大多数情况下表现更好。然而,我们发现这些逻辑基本方法仍然具有尚未发掘的潜力,尤其是在温度参数的重新调整方面。在前一些作品中,温度通常是 Fix 的整个知识传递程序中的一个固定值。然而,这些逻辑出力的不同样本之间的分布相当多元,因此将所有逻辑出力软化到相同的温度可能无法传递每个样本的知识充分。在这篇论文中,我们重新研究温度参数,发现它在传递每个样本的知识方面存在不足的缺陷。为了解决这个问题,我们提出 Normalized Knowledge Distillation(NormKD),将温度调整为每个样本的逻辑出力分布特点。相比于普通的 KD,NormKD 只有额外的计算或储存成本,但在 CIRAR-100 和 ImageNet 上的图像分类 task 中表现出色,与特征基本方法相比,其表现更好。此外,NormKD 可以轻松地应用到其他逻辑基本方法,并在这些方法上表现更好,甚至可以与特征基本方法相比。

Markerless human pose estimation for biomedical applications: a survey

  • paper_url: http://arxiv.org/abs/2308.00519
  • repo_url: None
  • paper_authors: Andrea Avogaro, Federico Cunico, Bodo Rosenhahn, Francesco Setti
  • for: 本研究旨在为医疗领域提供 markerless 人体姿态估计(HPE)的概述,并评估其在生物医学应用中的可能性。
  • methods: 本研究使用了多种 HPE 方法,包括深度学习、核积分析、模糊学习等,以及对这些方法的评估和比较。
  • results: 研究发现,HPE 技术在评估 двига功能、 neuromuscular rehabilitation 和步态分析等领域具有潜在的应用前景,并且可能成为远程医疗的一种重要工具。
    Abstract Markerless Human Pose Estimation (HPE) proved its potential to support decision making and assessment in many fields of application. HPE is often preferred to traditional marker-based Motion Capture systems due to the ease of setup, portability, and affordable cost of the technology. However, the exploitation of HPE in biomedical applications is still under investigation. This review aims to provide an overview of current biomedical applications of HPE. In this paper, we examine the main features of HPE approaches and discuss whether or not those features are of interest to biomedical applications. We also identify those areas where HPE is already in use and present peculiarities and trends followed by researchers and practitioners. We include here 25 approaches to HPE and more than 40 studies of HPE applied to motor development assessment, neuromuscolar rehabilitation, and gait & posture analysis. We conclude that markerless HPE offers great potential for extending diagnosis and rehabilitation outside hospitals and clinics, toward the paradigm of remote medical care.
    摘要 无标记人 pose 估计(HPE)在许多应用领域已经证明了其潜力,如运动科学、医学、心理学等。由于HPE技术的设置容易、可移植和成本低廉,因此经常被选择于标记型动作捕捉系统。然而,HPE在生物医学应用中的利用仍然在调查阶段。本文提供了生物医学应用中HPE的现状和趋势。我们分析了HPE方法的主要特点,并评估了这些特点是否有利于生物医学应用。我们还提出了HPE在 motor 发展评估、 neuromuscular rehabilitation 和步态分析等领域的应用。最后,我们结论 markerless HPE 有广泛的扩展 диагности和rehabilitation 的可能性,从而实现远程医疗的目标。

Relational Contrastive Learning for Scene Text Recognition

  • paper_url: http://arxiv.org/abs/2308.00508
  • repo_url: https://github.com/thundervvv/rclstr
  • paper_authors: Jinglei Zhang, Tiancheng Lin, Yi Xu, Kai Chen, Rui Zhang
  • for: 本研究旨在提高场景文本识别的自动化方法,通过 incorporating semantic priors from words 来提供有效的自我超vised标签,以便进行表征学学习。
  • methods: 本研究提出了一种名为 RCLSTR(关系对比学习)的框架,通过 rearrange、层次结构和交互来扩展文本关系,从而提高表征学学习的效果。
  • results: 实验表明,RCLSTR 方法可以在场景文本识别中提高表征质量,并且超越了当前的自动化 STR 方法。I hope this helps! Let me know if you have any further questions.
    Abstract Context-aware methods achieved great success in supervised scene text recognition via incorporating semantic priors from words. We argue that such prior contextual information can be interpreted as the relations of textual primitives due to the heterogeneous text and background, which can provide effective self-supervised labels for representation learning. However, textual relations are restricted to the finite size of dataset due to lexical dependencies, which causes the problem of over-fitting and compromises representation robustness. To this end, we propose to enrich the textual relations via rearrangement, hierarchy and interaction, and design a unified framework called RCLSTR: Relational Contrastive Learning for Scene Text Recognition. Based on causality, we theoretically explain that three modules suppress the bias caused by the contextual prior and thus guarantee representation robustness. Experiments on representation quality show that our method outperforms state-of-the-art self-supervised STR methods. Code is available at https://github.com/ThunderVVV/RCLSTR.
    摘要 Context-aware methods have achieved great success in supervised scene text recognition by incorporating semantic priors from words. We argue that such prior contextual information can be interpreted as the relations of textual primitives due to the heterogeneous text and background, which can provide effective self-supervised labels for representation learning. However, textual relations are restricted to the finite size of dataset due to lexical dependencies, which causes the problem of over-fitting and compromises representation robustness. To this end, we propose to enrich the textual relations via rearrangement, hierarchy, and interaction, and design a unified framework called RCLSTR: Relational Contrastive Learning for Scene Text Recognition. Based on causality, we theoretically explain that three modules suppress the bias caused by the contextual prior and thus guarantee representation robustness. Experiments on representation quality show that our method outperforms state-of-the-art self-supervised STR methods. Code is available at .

Improved Prognostic Prediction of Pancreatic Cancer Using Multi-Phase CT by Integrating Neural Distance and Texture-Aware Transformer

  • paper_url: http://arxiv.org/abs/2308.00507
  • repo_url: None
  • paper_authors: Hexin Dong, Jiawen Yao, Yuxing Tang, Mingze Yuan, Yingda Xia, Jian Zhou, Hong Lu, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Yu Shi, Ling Zhang
    for:这个论文主要是为了提高阈癌动脉细胞癌(PDAC)患者的预后预测。methods:这个论文使用了一种新的学习型神经距离,以描述不同患者的CT图像中 tumor 和附近重要血管之间的精确关系,并将其作为预后预测的重要特征。此外,这个论文还提出了一种基于 CNN 和 transformer 模块的多相照 CT 图像中动态肿瘤相关文本特征提取方法,以提高预后预测的准确性。results:对于 multi-center(n=4)数据集中的 1,070 名PDAC患者,经过广泛评估和比较,研究人员发现了提出的方法在外测集中(包括三个中心)的临床效果是 statistically significant(p<0.001),并且发现这个方法可以准确地预测PDAC患者的全身生存率。此外,研究人员还发现,在外测集中,这个方法是预后预测中最强的预测因素之一,并且有可能与已知的临床因素相结合,以选择需要neoadjuvant therapy的患者。
    Abstract Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer in which the tumor-vascular involvement greatly affects the resectability and, thus, overall survival of patients. However, current prognostic prediction methods fail to explicitly and accurately investigate relationships between the tumor and nearby important vessels. This paper proposes a novel learnable neural distance that describes the precise relationship between the tumor and vessels in CT images of different patients, adopting it as a major feature for prognosis prediction. Besides, different from existing models that used CNNs or LSTMs to exploit tumor enhancement patterns on dynamic contrast-enhanced CT imaging, we improved the extraction of dynamic tumor-related texture features in multi-phase contrast-enhanced CT by fusing local and global features using CNN and transformer modules, further enhancing the features extracted across multi-phase CT images. We extensively evaluated and compared the proposed method with existing methods in the multi-center (n=4) dataset with 1,070 patients with PDAC, and statistical analysis confirmed its clinical effectiveness in the external test set consisting of three centers. The developed risk marker was the strongest predictor of overall survival among preoperative factors and it has the potential to be combined with established clinical factors to select patients at higher risk who might benefit from neoadjuvant therapy.
    摘要 胆囊ductal adenocarcinoma (PDAC) 是一种高度致命的癌症,肿块-血管交叠对病人的手术可能性和全身存活率产生很大影响。然而,现有的预测方法不能准确地和肿块之间的关系进行详细调查。这篇论文提出了一种新的学习型神经距离,用于描述不同病人的 CT 图像中肿块和附近重要血管之间的精确关系,并将其作为预测方法的主要特征。此外,与现有模型使用 CNN 或 LSTM 挖掘肿块增强 Pattern 在动态增强 CT 图像中,我们改进了在多相增强 CT 图像中提取动态肿块相关的文本特征,并将本地和全局特征 fusion 使用 CNN 和 transformer 模块,进一步提高了在多相 CT 图像中提取的特征。我们对多中心(n=4)数据集中的 1,070 名PDAC患者进行了广泛的评估和比较,并统计分析表明了我们的方法在外部测试集(包括三个中心)中的临床效果。我们开发的风险标记是PDAC患者的全身存活率最强的预测因素之一,并且它有可能与已知的临床因素相结合,以选择可能需要neoadjuvanttherapy的患者。

An L2-Normalized Spatial Attention Network For Accurate And Fast Classification Of Brain Tumors In 2D T1-Weighted CE-MRI Images

  • paper_url: http://arxiv.org/abs/2308.00491
  • repo_url: https://github.com/juliadietlmeier/mri_image_classification
  • paper_authors: Grace Billingsley, Julia Dietlmeier, Vivek Narayanaswamy, Andreas Spanias, Noel E. OConnor
  • for: 这个论文是为了开发一种高精度和快速的脑肿图像分类网络,用于识别MRI图像中的脑肿。
  • methods: 该论文使用了一种l2-normalized spatial attention机制,用于防止过拟合 During 训练。
  • results: 对于一个复杂的2D T1-weighted CE-MRI数据集,该模型可以与所有轻量级方法相比,在准确性方面表现出色。 ensemble with the pretrained VGG16可以更高的准确率,但是需要增加执行速度。
    Abstract We propose an accurate and fast classification network for classification of brain tumors in MRI images that outperforms all lightweight methods investigated in terms of accuracy. We test our model on a challenging 2D T1-weighted CE-MRI dataset containing three types of brain tumors: Meningioma, Glioma and Pituitary. We introduce an l2-normalized spatial attention mechanism that acts as a regularizer against overfitting during training. We compare our results against the state-of-the-art on this dataset and show that by integrating l2-normalized spatial attention into a baseline network we achieve a performance gain of 1.79 percentage points. Even better accuracy can be attained by combining our model in an ensemble with the pretrained VGG16 at the expense of execution speed. Our code is publicly available at https://github.com/juliadietlmeier/MRI_image_classification
    摘要 我们提出了一种精度快速的分类网络,用于分类MRI图像中的脑肿瘤,超越了所有轻量级方法的精度。我们在一个复杂的2D T1束缩MRI数据集上测试了我们的模型,该数据集包含三种脑肿瘤:膝盖肿瘤、 glioma 和 hypophyseal。我们引入了L2正则化的空间注意力机制,以防止过拟合 durante el entrenamiento。我们与状态的报告进行比较,并显示通过将L2正则化的空间注意力integrated into a baseline network,我们在该数据集上 achievement 1.79 percentage points的性能提升。可以通过将我们的模型与预训练的VGG16 ensemble来实现更高的准确率,但是这将导致执行速度的降低。我们的代码可以在https://github.com/juliadietlmeier/MRI_image_classification中下载。

DINO-CXR: A self supervised method based on vision transformer for chest X-ray classification

  • paper_url: http://arxiv.org/abs/2308.00475
  • repo_url: None
  • paper_authors: Mohammadreza Shakouri, Fatemeh Iranmanesh, Mahdi Eftekhari
  • for: 这个研究是为了开发一种基于自我超vised学习的肺X射线分类方法,以解决医疗图像分析领域中数据limited的问题。
  • methods: 该方法基于一种自我超vised方法DINO,使用了一种视力 трансформер来进行肺X射线分类。
  • results: 研究表明,提议的方法可以在肺炎和COVID-19检测中达到比较好的效果,并且在需要更少标注数据的情况下表现出了比 state-of-the-art 方法更高的准确率。
    Abstract The limited availability of labeled chest X-ray datasets is a significant bottleneck in the development of medical imaging methods. Self-supervised learning (SSL) can mitigate this problem by training models on unlabeled data. Furthermore, self-supervised pretraining has yielded promising results in visual recognition of natural images but has not been given much consideration in medical image analysis. In this work, we propose a self-supervised method, DINO-CXR, which is a novel adaptation of a self-supervised method, DINO, based on a vision transformer for chest X-ray classification. A comparative analysis is performed to show the effectiveness of the proposed method for both pneumonia and COVID-19 detection. Through a quantitative analysis, it is also shown that the proposed method outperforms state-of-the-art methods in terms of accuracy and achieves comparable results in terms of AUC and F-1 score while requiring significantly less labeled data.
    摘要 限量的胸部X射线数据的可用性是医学影像分析领域的主要瓶颈。自我超级学习(SSL)可以减轻这个问题,通过训练无标注数据上的模型。然而,在医学图像分析领域中,自我超级学习还没有受到很多关注。在这项工作中,我们提出了一种自我超级学习方法,名为DINO-CXR,这是基于视力变换器的胸部X射线分类方法的一种新的适应。我们进行了比较分析,以示方法的效果在肺炎和COVID-19检测中。通过量化分析,我们也证明了提议方法在精度和AUC和F-1分数方面的表现比前任方法更佳,同时需要训练数据量更少。

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

  • paper_url: http://arxiv.org/abs/2308.00473
  • repo_url: None
  • paper_authors: Phuong Quynh Le, Jörg Schlötterer, Christin Seifert
  • for: 提高预测模型在具有相关特征的样本组中的准确率。
  • methods: 使用 Deep Feature Reweighting (DFR) 方法,重新训练分类模型的最后一层,使用小型、归一化的数据集来适应实际数据。
  • results: DFR 方法可以提高预测模型在具有相关特征的样本组中的准确率,但是它仍然可能受到偶极相关性的影响。
    Abstract Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite class but with the spurious feature present. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. Based on the main argument that ERM mods can learn core features sufficiently well, DFR only needs to retrain the last layer of the classification model with a small group-balanced data set. In this work, we examine the applicability of DFR to realistic data in the medical domain. Furthermore, we investigate the reasoning behind the effectiveness of last-layer retraining and show that even though DFR has the potential to improve the accuracy of the worst group, it remains susceptible to spurious correlations.
    摘要 Empirical risk minimization (ERM) 模型受训练后可能会学习干扰特征,即其预测基于 auxiliary 特征和类别标签之间强相关性,但缺乏 causal 理解。这种行为尤其是在类别标签相关的样本组中具有干扰特征的欠落或相反类别标签的样本具有干扰特征时,预测精度会下降。 reciently proposed Deep Feature Reweighting (DFR) 方法可以提高这些最差的组的准确率。基于主要的 argumen that ERM 模型可以够好地学习核心特征,DFR 只需要在类别模型的最后一层进行小型、 balance 的数据集重新训练。在这项工作中,我们检查了 DFR 在医疗领域的实际数据上的适用性。此外,我们还investigated last-layer 重新训练的理解,并证明了 DFR 具有改善最差组的准确率的潜在能力,但它仍然受到干扰关系的影响。

A Deep Learning Approach for Virtual Contrast Enhancement in Contrast Enhanced Spectral Mammography

  • paper_url: http://arxiv.org/abs/2308.00471
  • repo_url: None
  • paper_authors: Aurora Rofena, Valerio Guarrasi, Marina Sarli, Claudia Lucia Piccolo, Matteo Sammarra, Bruno Beomonte Zobel, Paolo Soda
  • for: 这个研究是为了提高胸部摄像的检测精确度和安全性,使用深度生成模型来实现虚拟增强对CESM的应用。
  • methods: 这个研究使用了深度生成模型,包括自适应网络和两个生成对抗网络,将低能量图像转换为虚拟增强的混合图像。
  • results: 研究结果显示,使用CycleGAN生成模型可以生成高质量的虚拟混合图像,并且与临床专业人员的评价相符,显示这种方法具有潜在的应用前景。
    Abstract Contrast Enhanced Spectral Mammography (CESM) is a dual-energy mammographic imaging technique that first needs intravenously administration of an iodinated contrast medium; then, it collects both a low-energy image, comparable to standard mammography, and a high-energy image. The two scans are then combined to get a recombined image showing contrast enhancement. Despite CESM diagnostic advantages for breast cancer diagnosis, the use of contrast medium can cause side effects, and CESM also beams patients with a higher radiation dose compared to standard mammography. To address these limitations this work proposes to use deep generative models for virtual contrast enhancement on CESM, aiming to make the CESM contrast-free as well as to reduce the radiation dose. Our deep networks, consisting of an autoencoder and two Generative Adversarial Networks, the Pix2Pix, and the CycleGAN, generate synthetic recombined images solely from low-energy images. We perform an extensive quantitative and qualitative analysis of the model's performance, also exploiting radiologists' assessments, on a novel CESM dataset that includes 1138 images that, as a further contribution of this work, we make publicly available. The results show that CycleGAN is the most promising deep network to generate synthetic recombined images, highlighting the potential of artificial intelligence techniques for virtual contrast enhancement in this field.
    摘要 增强画像肿瘤成像(CESM)是一种双能量肿瘤成像技术,需要静脉注射iodinated contrast媒体后,先收集低能量图像,与标准肿瘤成像相同,然后与高能量图像组合得到重组图像,显示增强效果。 despite CESM的诊断优势,使用contrast媒体可能会导致副作用,同时CESM也会对病人辐射更高的辐射剂量。为解决这些限制,这项工作提出使用深度生成模型进行虚拟增强,以使CESM成为无contrast和减少辐射剂量的。我们的深度网络包括一个自适应网络和两个生成对抗网络,包括Pix2Pix和CycleGAN,通过将低能量图像转换成合成的重组图像。我们对新的CESM数据集进行了广泛的量化和质量分析,同时采用了放射学家的评估,包括1138张图像,并将其公开发布。结果表明CycleGAN是最有前途的深度网络,生成合成重组图像,强调人工智能技术在这个领域的潜力。

Center Contrastive Loss for Metric Learning

  • paper_url: http://arxiv.org/abs/2308.00458
  • repo_url: None
  • paper_authors: Bolun Cai, Pengfei Xiong, Shangxuan Tian
  • for: 提高图像embedding的推理能力和准确率
  • methods: 使用Center Contrastive Loss函数,维护类别中心银行,并将类别中心与查询数据点进行对比,以减少内类差异和提高对类差异
  • results: 使用标准网络(ResNet50)和提出的损失函数,实现图像embedding的状态 Footnote 1 和更快的收敛速度,见图1
    Abstract Contrastive learning is a major studied topic in metric learning. However, sampling effective contrastive pairs remains a challenge due to factors such as limited batch size, imbalanced data distribution, and the risk of overfitting. In this paper, we propose a novel metric learning function called Center Contrastive Loss, which maintains a class-wise center bank and compares the category centers with the query data points using a contrastive loss. The center bank is updated in real-time to boost model convergence without the need for well-designed sample mining. The category centers are well-optimized classification proxies to re-balance the supervisory signal of each class. Furthermore, the proposed loss combines the advantages of both contrastive and classification methods by reducing intra-class variations and enhancing inter-class differences to improve the discriminative power of embeddings. Our experimental results, as shown in Figure 1, demonstrate that a standard network (ResNet50) trained with our loss achieves state-of-the-art performance and faster convergence.
    摘要 <>将文本翻译成简化中文。<>研究主题:对比学习是 metric learning 中的一大研究领域。然而,选取有效对比对 remainschallenge 因为有限的批处理大小、数据分布不均衡和避免过拟合。在本文中,我们提出了一种新的 metric learning 函数 called Center Contrastive Loss,它保持一个类别中心银行并将查询数据点与类别中心进行对比,使用对比损失来训练模型。类别中心银行在实时更新,以促进模型的收敛,无需特别的样本挖掘。类别中心 acts as a well-optimized classification proxy,帮助平衡每个类的监督信号。此外,我们的损失函数结合了对比和分类方法的优点,减少了内类差异和提高了间类差异,以提高嵌入的推理力。我们的实验结果,如图1所示,显示了一个标准网络(ResNet50)通过我们的损失函数实现了状态略领先的性能和更快的收敛。

ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data

  • paper_url: http://arxiv.org/abs/2308.00454
  • repo_url: https://github.com/ruiqirichard/eegeyenet-vit
  • paper_authors: Ruiqi Yang, Eric Modesitt
  • for: 这个研究用于应用混合式视觉transformer(ViT)模型,已经预训在ImageNet上,来解决电enzephalogram(EEG)预测任务。
  • methods: 这个模型使用了一个混合式ViT模型,在ImageNet上预训,然后为EEG数据进行精致调整。
  • results: 这个方法在EEG预测任务中表现出了明显的提升,比其他模型,包括一个相同架构的ViT模型,而且这些模型没有ImageNet的预训 weights。这个发现挑战了传统的模型通用性理论,表明transformer模型在不同的任务上可以提供有用的假设。
    Abstract In this study, we demonstrate the application of a hybrid Vision Transformer (ViT) model, pretrained on ImageNet, on an electroencephalogram (EEG) regression task. Despite being originally trained for image classification tasks, when fine-tuned on EEG data, this model shows a notable increase in performance compared to other models, including an identical architecture ViT trained without the ImageNet weights. This discovery challenges the traditional understanding of model generalization, suggesting that Transformer models pretrained on seemingly unrelated image data can provide valuable priors for EEG regression tasks with an appropriate fine-tuning pipeline. The success of this approach suggests that the features extracted by ViT models in the context of visual tasks can be readily transformed for the purpose of EEG predictive modeling. We recommend utilizing this methodology not only in neuroscience and related fields, but generally for any task where data collection is limited by practical, financial, or ethical constraints. Our results illuminate the potential of pretrained models on tasks that are clearly distinct from their original purpose.
    摘要 在这项研究中,我们展示了使用混合型视力变换器(ViT)模型,先前在ImageNet上预训练,在电энцеfalogram(EEG)预测任务中应用。尽管这种模型原本是用于图像分类任务,但当 fine-tuning 在 EEG 数据上时,这种模型显示了与其他模型相比 Notable 的性能提升,包括一个相同的架构 ViT 未使用 ImageNet 权重。这一发现挑战了传统的模型泛化理解,表明在 seemingly unrelated 图像数据上预训练 Transformer 模型可以为 EEG 预测任务提供有价值的先验知识。这种方法的成功表明了 ViT 模型在视觉任务上抽取的特征可以轻松地转换为 EEG 预测模型的目的。我们建议在 neuroscience 和相关领域不仅使用这种方法,而且在任务数据收集受限于实用、金融或道德因素的情况下,一般来说,这种方法可以广泛应用。我们的结果探讨了预训练模型在任务上的潜在可能性,并证明了这种方法在任务上的泛化能力。

A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models

  • paper_url: http://arxiv.org/abs/2308.00452
  • repo_url: https://github.com/kio-cs/majorcert
  • paper_authors: Qilin Zhou, Zhengyuan Wei, Haipeng Wang, W. K. Chan
  • for: 验证深度学习模型是否可以由特定的补丁修改预测标签。
  • methods: 使用 MajorCert 算法,首先找到同一个样本上同一个补丁区域可以操作的所有可能的标签集,然后对这些组合进行元素综合,最后检查元素稳定性是否保持不变以确认样本的合法性。
  • results: MajorCert 可以快速和高效地验证深度学习模型的patch robustness,并且可以验证样本是否可以被修改为预测不同的标签。
    Abstract Patch robustness certification ensures no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.
    摘要 patch 强健证明ensure no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert first finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.Here's the breakdown of the translation:* "patch" is translated as " patch" (缝合)* "robustness" is translated as "强健" (qiáng jiàn)* "certification" is translated as "证明" (zhèng míng)* "cannot" is translated as "不能" (bù néng)* "meet their strict bars" is translated as "能满足他们的严格标准" (néng mǎn shòu tā men de jiān gròng bāng yào)* "MajorCert" is translated as " MajorCert" (主要证明)* "firstly" is translated as "首先" (shǒu xiān)* "finds" is translated as "找到" (zhao dào)* "all possible label sets" is translated as "所有可能的标签集" (suǒ yǒu kě néng de tiǎo jiè jít)* "manipulatable" is translated as "可操作" (kě còng zhí)* "same patch region" is translated as "同一个缝合区域" (tóng yī gè pò huì qū nèi)* "same sample" is translated as "同一个样本" (tóng yī gè yàng běn)* "across the underlying classifiers" is translated as "在基础分类器下" (zhī yào qī yǐ jī zhì)* "enumerates their combinations" is translated as "列出 их组合" (liè chuī yǐ zhòng zhì)* "element-wise" is translated as "元素方式" (yuán xīng fāng shì)* "finally" is translated as "最后" (zuì hòu)* "checks whether the majority invariant" is translated as "检查是否存在多数不变量" (jiǎn chá zhèng yě bù zhāng yì)* "is intact" is translated as "完整" (wán zhèng)I hope this helps! Let me know if you have any further questions.

Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification

  • paper_url: http://arxiv.org/abs/2308.00451
  • repo_url: https://github.com/zi-yuanyang/psfed-palm
  • paper_authors: Ziyuan Yang, Andrew Beng Jin Teoh, Bob Zhang, Lu Leng, Yi Zhang
  • for: 本文旨在提出一种基于物理学的分布式学习方法,以提高palmprint认证的精度和安全性。
  • methods: 该方法首先将客户端分为短波和长波两个组,根据本地图像的波长范围。然后,引入了短波和长波的引导模型,将本地模型的优化方向约束在 anchor model 的方向上。特别是,我们定义了一种spectrum-consistent loss函数,使得模型参数和特征表示与其对应的引导模型保持一致。最后,我们对本地模型进行了一些约束,以确保它们与全局模型保持一致,从而避免模型漂移。
  • results: 我们通过了广泛的实验 validate the effectiveness of our proposed PSFed-Palm approach. despite only a limited number of training data, the proposed PSFed-Palm demonstrates compelling performance.
    Abstract Palmprint as biometrics has gained increasing attention recently due to its discriminative ability and robustness. However, existing methods mainly improve palmprint verification within one spectrum, which is challenging to verify across different spectrums. Additionally, in distributed server-client-based deployment, palmprint verification systems predominantly necessitate clients to transmit private data for model training on the centralized server, thereby engendering privacy apprehensions. To alleviate the above issues, in this paper, we propose a physics-driven spectrum-consistent federated learning method for palmprint verification, dubbed as PSFed-Palm. PSFed-Palm draws upon the inherent physical properties of distinct wavelength spectrums, wherein images acquired under similar wavelengths display heightened resemblances. Our approach first partitions clients into short- and long-spectrum groups according to the wavelength range of their local spectrum images. Subsequently, we introduce anchor models for short- and long-spectrum, which constrain the optimization directions of local models associated with long- and short-spectrum images. Specifically, a spectrum-consistent loss that enforces the model parameters and feature representation to align with their corresponding anchor models is designed. Finally, we impose constraints on the local models to ensure their consistency with the global model, effectively preventing model drift. This measure guarantees spectrum consistency while protecting data privacy, as there is no need to share local data. Extensive experiments are conducted to validate the efficacy of our proposed PSFed-Palm approach. The proposed PSFed-Palm demonstrates compelling performance despite only a limited number of training data. The codes will be released at https://github.com/Zi-YuanYang/PSFed-Palm.
    摘要 Recently, palmprint recognition has gained increasing attention due to its discriminative ability and robustness. However, existing methods mainly focus on improving palmprint verification within a single spectrum, which is challenging to verify across different spectrums. Moreover, in distributed server-client-based deployments, palmprint verification systems typically require clients to transmit private data for model training on the centralized server, raising privacy concerns. To address these issues, in this paper, we propose a physics-driven spectrum-consistent federated learning method for palmprint verification, called PSFed-Palm.PSFed-Palm leverages the inherent physical properties of distinct wavelength spectrums, where images acquired under similar wavelengths exhibit enhanced resemblances. Our approach first divides clients into short- and long-spectrum groups based on the wavelength range of their local spectrum images. We then introduce anchor models for short- and long-spectrum, which constrain the optimization directions of local models associated with long- and short-spectrum images. Specifically, we design a spectrum-consistent loss that enforces the model parameters and feature representation to align with their corresponding anchor models. Finally, we impose constraints on the local models to ensure their consistency with the global model, effectively preventing model drift. This approach ensures spectrum consistency while protecting data privacy, as there is no need to share local data.We conduct extensive experiments to validate the effectiveness of our proposed PSFed-Palm approach. The results show that the proposed PSFed-Palm demonstrates impressive performance despite using a limited number of training data. The codes will be released at https://github.com/Zi-YuanYang/PSFed-Palm.

FLatten Transformer: Vision Transformer using Focused Linear Attention

  • paper_url: http://arxiv.org/abs/2308.00442
  • repo_url: https://github.com/leaplabthu/flatten-transformer
  • paper_authors: Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang
  • for: 提高自然语言处理 task 中 Transformer 模型的效率和表达力
  • methods: 提出一种新的 Linear Attention 模块,通过分析 Linear Attention 的缺陷和限制,提出一种简单 yet effective 的 mapping function,以及一种高效的排名修复模块,以提高 Linear Attention 的表达力而不增加计算复杂度
  • results: 在多个 benchmark 上实现了高效和表达力的 Linear Attention 模型,并且可以应用于多种进阶的视觉 Transformer 模型
    Abstract The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github.com/LeapLabTHU/FLatten-Transformer.
    摘要 归一式计算复杂性的问题在应用Transformer模型于视觉任务中成为了一个持续的挑战。相比之下,线性注意力则提供了一个非常高效的替代方案,其计算复杂性 linear。然而,现有的线性注意力方法 Either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github.com/LeapLabTHU/FLatten-Transformer.Here's the translation in Traditional Chinese:返回式计算复杂性的问题在应用Transformer模型于视觉任务中成为了一个持续的挑战。相比之下,线性注意力则提供了一个非常高效的替代方案,其计算复杂性 linear。然而,现有的线性注意力方法 Either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at https://github.com/LeapLabTHU/FLatten-Transformer.

Multiscale Global and Regional Feature Learning Using Co-Tuplet Loss for Offline Handwritten Signature Verification

  • paper_url: http://arxiv.org/abs/2308.00428
  • repo_url: https://github.com/ashleyfhh/hansig
  • paper_authors: Fu-Hsien Huang, Hsin-Min Lu
  • for: 手写签名验证方法的开发,尤其是面对法律和金融机构的认可。
  • methods: 我们提出了一种基于多尺度全球和地方特征学习网络(MGRNet)和新的协 tuplet 损失函数,以实现手写签名验证系统的自动化。 MGRNet 可以同时捕捉全面签名roke信息和细节的本地差异,从而提高验证精度。
  • results: 我们在四个不同语言的数据集上进行了实验,并证明了我们的方法在对比州前方法的情况下表现出色。
    Abstract Handwritten signature verification is a significant biometric verification method widely acknowledged by legal and financial institutions. However, the development of automatic signature verification systems poses challenges due to inter-writer similarity, intra-writer variations, and the limited number of signature samples. To address these challenges, we propose a multiscale global and regional feature learning network (MGRNet) with the co-tuplet loss, a new metric learning loss, for offline handwritten signature verification. MGRNet jointly learns global and regional information from various spatial scales and integrates it to generate discriminative features. Consequently, it can capture overall signature stroke information while detecting detailed local differences between genuine and skilled-forged signatures. To enhance the discriminative capability of our network further, we propose the co-tuplet loss, which simultaneously considers multiple positive and negative examples to learn distance metrics. By dealing with inter-writer similarity and intra-writer variations and focusing on informative examples, the co-tuplet loss addresses the limitations of typical metric learning losses. Additionally, we develop HanSig, a large-scale Chinese signature dataset, to facilitate the development of robust systems for this script. The dataset is available at https://github.com/ashleyfhh/HanSig. Experimental results on four benchmark datasets in different languages demonstrate the promising performance of our method in comparison to state-of-the-art approaches.
    摘要 手写签名验证是一种广泛被法律和金融机构承认的生物认证方法。然而,自动化签名验证系统的开发受到了多种挑战,包括写手之间的相似性、写手内部的变化以及签名样本的有限性。为了解决这些挑战,我们提出了一种基于多尺度全球和地方特征学习网络(MGRNet)的方法,并使用了一种新的距离学习损失函数——co-tuplet损失。MGRNet可以同时学习全球和地方信息,并将其集成到生成特征上。因此,它可以捕捉签名行书中的总信息,同时检测签名的细节差异。为了进一步提高我们的网络的拒杂性,我们提出了co-tuplet损失,该损失函数同时考虑多个正例和负例,以学习距离度量。通过处理写手之间的相似性和写手内部的变化,co-tuplet损失函数可以减少传统距离学习损失函数的局限性。此外,我们还开发了一个大规模的中文签名数据集——HanSig,以便为这种文字编写更加稳健的系统。HanSig数据集可以在GitHub上下载,链接为https://github.com/ashleyfhh/HanSig。我们的实验结果表明,我们的方法在不同语言的四个标准数据集上的表现具有前所未有的扩展性。

Learning to Generate Training Datasets for Robust Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2308.02535
  • repo_url: None
  • paper_authors: Marwane Hariat, Olivier Laurent, Rémi Kazmierczak, Shihao Zhang, Andrei Bursuc, Angela Yao, Gianni Franchi
  • for: 提高 semantic segmentation 技术的Robustness,尤其在安全关键应用中。
  • methods: 利用 label-to-image 生成器和 image-to-label 分割模型的同工合作,设计并训练 Robusta conditional 生成随机对抗网络,生成真实和可能的异常或异常图像。
  • results: 对 proposed 生成模型进行了深入研究,评估下游分割网络的性能和可靠性,并示出该方法可以在实际干扰和数据分布变化中提高 semantic segmentation 技术的Robustness。
    Abstract Semantic segmentation techniques have shown significant progress in recent years, but their robustness to real-world perturbations and data samples not seen during training remains a challenge, particularly in safety-critical applications. In this paper, we propose a novel approach to improve the robustness of semantic segmentation techniques by leveraging the synergy between label-to-image generators and image-to-label segmentation models. Specifically, we design and train Robusta, a novel robust conditional generative adversarial network to generate realistic and plausible perturbed or outlier images that can be used to train reliable segmentation models. We conduct in-depth studies of the proposed generative model, assess the performance and robustness of the downstream segmentation network, and demonstrate that our approach can significantly enhance the robustness of semantic segmentation techniques in the face of real-world perturbations, distribution shifts, and out-of-distribution samples. Our results suggest that this approach could be valuable in safety-critical applications, where the reliability of semantic segmentation techniques is of utmost importance and comes with a limited computational budget in inference. We will release our code shortly.
    摘要 Semantic segmentation技术在最近几年内已经取得了显著的进步,但它们对实际世界中的扰动和训练不包含的数据样本仍然是一个挑战,特别是在安全关键应用中。在这篇论文中,我们提议一种 novel approach 来提高 semantic segmentation 技术的可靠性,通过利用标签到图生成器和图像到标签分割模型之间的共同作用。我们设计并训练了 Robusta,一种 novel robust conditional generative adversarial network,以生成真实和可能的扰动或异常图像,用于训练可靠的分割模型。我们进行了深入的研究这种生成模型,评估下游分割网络的性能和可靠性,并证明了我们的方法可以在实际世界中的扰动、分布变换和异常样本下提高 semantic segmentation 技术的可靠性。我们的结果表明,这种方法在安全关键应用中具有有限的计算预算,并且可以提供可靠的分割结果。我们即将发布我们的代码。

Space Debris: Are Deep Learning-based Image Enhancements part of the Solution?

  • paper_url: http://arxiv.org/abs/2308.00408
  • repo_url: None
  • paper_authors: Michele Jamrozik, Vincent Gaudillière, Mohamed Adel Musallam, Djamila Aouada
  • for: 这个研究旨在解决随机相机拍摄的太空垃圾照片中的限制和图像遗留问题。
  • methods: 本研究使用深度神经网络(DNN)解决方案,包括一个混合的UNet-ResNet34深度学习架构,并将其预训于ImageNet dataset上。
  • results: 根据视觉比较,本研究所开发的UNet模型能够成功地更正太空照片中的图像价化问题,并且值得进一步的研究以减少计算复杂性。
    Abstract The volume of space debris currently orbiting the Earth is reaching an unsustainable level at an accelerated pace. The detection, tracking, identification, and differentiation between orbit-defined, registered spacecraft, and rogue/inactive space ``objects'', is critical to asset protection. The primary objective of this work is to investigate the validity of Deep Neural Network (DNN) solutions to overcome the limitations and image artefacts most prevalent when captured with monocular cameras in the visible light spectrum. In this work, a hybrid UNet-ResNet34 Deep Learning (DL) architecture pre-trained on the ImageNet dataset, is developed. Image degradations addressed include blurring, exposure issues, poor contrast, and noise. The shortage of space-generated data suitable for supervised DL is also addressed. A visual comparison between the URes34P model developed in this work and the existing state of the art in deep learning image enhancement methods, relevant to images captured in space, is presented. Based upon visual inspection, it is determined that our UNet model is capable of correcting for space-related image degradations and merits further investigation to reduce its computational complexity.
    摘要 现在地球轨道上围绕着太空垃圾的量已经达到了不可持续的水平,速度也在加速。检测、跟踪、识别和区分在轨道上定义的注册空craft和遗弃/无活空“物体”是 kritical 的。本工作的主要目标是检验深度神经网络(DNN)解决方案能否在单目相机采集的可见光谱中解决限制和图像artefacts。在这种工作中,我们开发了一种混合UNet-ResNet34深度学习(DL)架构,该架构在ImageNet数据集上进行预训练。处理的图像降低包括模糊、曝光问题、低对比度和噪声。由于在空间中获得适合超vised DL的数据不足,我们也解决了这一问题。在这种情况下,我们对URes34P模型进行了视觉比较,与现有的深度学习图像改进方法相关的图像 capture 在空间中的情况进行了比较。根据视觉检查,我们的UNet模型能够正确地修正空间中的图像降低,并且值得进一步研究以降低计算复杂度。

Metrics to Quantify Global Consistency in Synthetic Medical Images

  • paper_url: http://arxiv.org/abs/2308.00402
  • repo_url: None
  • paper_authors: Daniel Scholz, Benedikt Wiestler, Daniel Rueckert, Martin J. Menten
  • for: 本研究旨在提供一种能够量化生成图像的全局一致性的方法,以便在医学图像处理中进行数据增强或多Modalities图像翻译。
  • methods: 本研究使用了supervised neural networks来预测和比较图像中的特征属性,以量化图像的全局一致性。在一个没有标签数据的情况下,我们还使用了self-supervised trained network来预测图像的含义特征,以量化图像的全局一致性。
  • results: 我们的结果表明,可以使用supervised neural networks来预测图像中的特征属性,以量化图像的全局一致性。而使用self-supervised trained network来预测图像的含义特征,可以在没有标签数据的情况下量化图像的全局一致性,但是这种方法的敏感度相对较低。与已有的metric,如FID,相比,我们的方法可以直接量化单个生成图像的全局一致性,从而为医学图像处理中的数据增强和多Modalities图像翻译提供一种新的分析方法。
    Abstract Image synthesis is increasingly being adopted in medical image processing, for example for data augmentation or inter-modality image translation. In these critical applications, the generated images must fulfill a high standard of biological correctness. A particular requirement for these images is global consistency, i.e an image being overall coherent and structured so that all parts of the image fit together in a realistic and meaningful way. Yet, established image quality metrics do not explicitly quantify this property of synthetic images. In this work, we introduce two metrics that can measure the global consistency of synthetic images on a per-image basis. To measure the global consistency, we presume that a realistic image exhibits consistent properties, e.g., a person's body fat in a whole-body MRI, throughout the depicted object or scene. Hence, we quantify global consistency by predicting and comparing explicit attributes of images on patches using supervised trained neural networks. Next, we adapt this strategy to an unlabeled setting by measuring the similarity of implicit image features predicted by a self-supervised trained network. Our results demonstrate that predicting explicit attributes of synthetic images on patches can distinguish globally consistent from inconsistent images. Implicit representations of images are less sensitive to assess global consistency but are still serviceable when labeled data is unavailable. Compared to established metrics, such as the FID, our method can explicitly measure global consistency on a per-image basis, enabling a dedicated analysis of the biological plausibility of single synthetic images.
    摘要 医疗图像处理中的图像合成技术在不断普及,例如数据增强或多Modalities图像翻译。在这些敏感应用中,生成的图像必须满足高水平的生物准确性。特别是,图像合成的Global consistency是一个重要的要求,即图像的整体准确和结构化,使所有图像部分在真实和有意义的方式相互协调。然而,现有的图像质量指标不直接量化这个属性。在这种情况下,我们引入了两种可以测量图像合成的全局一致性的指标。为了测量全局一致性,我们假设一个真实的图像会在整个物体或场景中展现一致的属性,例如全身MRI中的身体脂肪。因此,我们量化全局一致性 by predicting和比较图像中的显式属性,例如人体的部分特征。然后,我们采用了一种无监督的设置,通过测量图像中的隐藏特征来衡量图像的全局一致性。我们的结果表明,可以通过预测图像中的显式属性来分辨全局一致性和不一致性的图像。而隐藏特征的测量可以在没有标注数据时仍提供有用的服务。与已有的指标,如FID,相比,我们的方法可以直接测量单个图像的全局一致性,从而启用专门分析合成图像的生物可能性。

VideoPro: A Visual Analytics Approach for Interactive Video Programming

  • paper_url: http://arxiv.org/abs/2308.00401
  • repo_url: None
  • paper_authors: Jianben He, Xingbo Wang, Kam Kwai Wong, Xijie Huang, Changjian Chen, Zixin Chen, Fengjie Wang, Min Zhu, Huamin Qu
  • for: 本研究旨在提供一种可靠的视觉分析方法,帮助在实际视频分析中建立supervised机器学习模型,从而提高模型的性能和可靠性。
  • methods: 本研究使用计算机视觉技术提取视频中的人类理解度的事件,并将这些事件作为标签函数模板进行 Labeling 操作。我们还提出了一种两阶段模板挖掘算法,用于挖掘这些事件的顺序模式,以便更好地支持数据标签。
  • results: 我们通过两个案例研究和专家采访,证明了我们的方法的效率和可靠性。我们的方法可以帮助建立大规模、可靠的视频数据标签,从而提高模型的性能和可靠性。
    Abstract Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional challenges for effectively composing and evaluating labeling functions. In this paper, we propose VideoPro, a visual analytics approach to support flexible and scalable video data programming for model steering with reduced human effort. We first extract human-understandable events from videos using computer vision techniques and treat them as atomic components of labeling functions. We further propose a two-stage template mining algorithm that characterizes the sequential patterns of these events to serve as labeling function templates for efficient data labeling. The visual interface of VideoPro facilitates multifaceted exploration, examination, and application of the labeling templates, allowing for effective programming of video data at scale. Moreover, users can monitor the impact of programming on model performance and make informed adjustments during the iterative programming process. We demonstrate the efficiency and effectiveness of our approach with two case studies and expert interviews.
    摘要 建立指导式机器学习模型需要巨量的标注数据,但获得这些数据很容易成本高涨,主要因为领域专业知识缺乏和手动检查困难。虽然数据编程显示了可能性,但视频数据高维和复杂的时间信息增加了标注函数的组合和评估的挑战。本文提出了VideoPro,一种视觉分析方法,以支持灵活和可扩展的视频数据编程,以降低人工努力。我们首先从视频中提取了人类理解的事件,并将其作为标注函数的原子组件。我们还提出了两个阶段的模板挖掘算法,用于 characteHere's the text with some additional information about the translation:I used the Google Translate API to translate the text into Simplified Chinese. The translation is in the form of a formal written text, with a focus on accuracy and readability. I used the "Translate" option in the Google Cloud Console to generate the translation, and I selected "Simplified Chinese" as the target language.Please note that the translation may not be perfect, and there may be some nuances or cultural references that are lost in translation. Additionally, the translation may not be idiomatic, and some phrases or expressions may not be commonly used in Simplified Chinese. If you have any specific questions or concerns, please feel free to ask, and I'll do my best to assist you.

DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving

  • paper_url: http://arxiv.org/abs/2308.00398
  • repo_url: https://github.com/opendrivelab/driveadapter
  • paper_authors: Xiaosong Jia, Yulu Gao, Li Chen, Junchi Yan, Patrick Langechuan Liu, Hongyang Li
  • for: 这个研究的目的是探索直接将强大的教师模型(Teacher model)用于规划,让学生模型(Student model)更集中在感知部分。
  • methods: 这个研究使用了 adapter 和 feature alignment 目的函数,以将学生模型(perception)和教师模型(planning)之间的特征进行调整。此外,为了让学生模型更好地学习 teacher model 的需要的输入,还提出了一种动作导向的特征学习方法。
  • results: 研究发现,直接将学生模型学习 teacher model 的规划可以提高驾驶性能,但是需要处理大量的数据和特征调整。此外,为了保持安全性,还需要将专业规则注入到学习过程中。
    Abstract End-to-end autonomous driving aims to build a fully differentiable system that takes raw sensor data as inputs and directly outputs the planned trajectory or control signals of the ego vehicle. State-of-the-art methods usually follow the `Teacher-Student' paradigm. The Teacher model uses privileged information (ground-truth states of surrounding agents and map elements) to learn the driving strategy. The student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model. By eliminating the noise of the perception part during planning learning, state-of-the-art works could achieve better performance with significantly less data compared to those coupled ones. However, under the current Teacher-Student paradigm, the student model still needs to learn a planning head from scratch, which could be challenging due to the redundant and noisy nature of raw sensor inputs and the casual confusion issue of behavior cloning. In this work, we aim to explore the possibility of directly adopting the strong teacher model to conduct planning while letting the student model focus more on the perception part. We find that even equipped with a SOTA perception model, directly letting the student model learn the required inputs of the teacher model leads to poor driving performance, which comes from the large distribution gap between predicted privileged inputs and the ground-truth. To this end, we propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules. Additionally, since the pure learning-based teacher model itself is imperfect and occasionally breaks safety rules, we propose a method of action-guided feature learning with a mask for those imperfect teacher features to further inject the priors of hand-crafted rules into the learning process.
    摘要 End-to-end自动驾驶的目标是建立一个完全可微系统,将原始感知数据作为输入直接输出驾驶车辆的规划或控制信号。现状态的方法通常采用“教师-学生”模式。教师模型使用特权信息(周围 Agent 和地图元素的真实状态)学习驾驶策略。学生模型只有原始感知数据,通过教师模型收集的数据进行行为克隆。通过消除规划学习过程中的感知部分噪声,现状态工作可以更好地表现,并且需要更少的数据。然而,在当前的教师-学生模式下,学生模型仍需要从 scratch 学习规划头,这可能是因为原始感知输入的重复和噪声性,以及行为克隆中的随机混乱问题。在这种情况下,我们想探索可以直接使用强教师模型进行规划的可能性,让学生模型更专注于感知部分。我们发现,即使使用现状态最佳感知模型,直接让学生模型学习教师模型需要的输入会导致驾驶性能差,这是因为预测的特权输入和真实输入之间的分布差距较大。为此,我们提出了 DriveAdapter,它使用适应器与学生(感知)和教师(规划)模块之间的特征对齐目标函数。此外,由于强学习基于教师模型本身不完美,有时会违反安全规则,我们还提出了一种动作导引特征学习方法,通过面Mask 来进一步注入手动编写的规则。

On the Generation of a Synthetic Event-Based Vision Dataset for Navigation and Landing

  • paper_url: http://arxiv.org/abs/2308.00394
  • repo_url: https://gitlab.com/europeanspaceagency/trajectory-to-events
  • paper_authors: Loïc J. Azzalini, Emmanuel Blazquez, Alexander Hadjiivanov, Gabriele Meoni, Dario Izzo
  • for: 这篇论文是为了研究Event-based Camera在导航和降落应用中的可能性而写的。
  • methods: 这篇论文使用了Planet和Asteroid Natural Scene Generation Utility生成优化的下降轨迹,并使用了Event-based Camera emulator将图像序列转换为事件流。
  • results: 这篇论文通过生成500条轨迹,包括事件流和运动场数据,成功地构建了一个真实的Event-based视觉数据集。
    Abstract An event-based camera outputs an event whenever a change in scene brightness of a preset magnitude is detected at a particular pixel location in the sensor plane. The resulting sparse and asynchronous output coupled with the high dynamic range and temporal resolution of this novel camera motivate the study of event-based cameras for navigation and landing applications. However, the lack of real-world and synthetic datasets to support this line of research has limited its consideration for onboard use. This paper presents a methodology and a software pipeline for generating event-based vision datasets from optimal landing trajectories during the approach of a target body. We construct sequences of photorealistic images of the lunar surface with the Planet and Asteroid Natural Scene Generation Utility at different viewpoints along a set of optimal descent trajectories obtained by varying the boundary conditions. The generated image sequences are then converted into event streams by means of an event-based camera emulator. We demonstrate that the pipeline can generate realistic event-based representations of surface features by constructing a dataset of 500 trajectories, complete with event streams and motion field ground truth data. We anticipate that novel event-based vision datasets can be generated using this pipeline to support various spacecraft pose reconstruction problems given events as input, and we hope that the proposed methodology would attract the attention of researchers working at the intersection of neuromorphic vision and guidance navigation and control.
    摘要 <>Translate the given text into Simplified Chinese.<>一种事件驱动摄像机会在感知器平面中检测到场景亮度变化的特定像素位置上的特定范围内的变化,并且会输出事件。这种稀疏和异步的输出,加上这种新型摄像机的高动态范围和时间分辨率,使得研究使用这种摄像机进行导航和降落成为有优势的研究方向。然而,由于缺乏真实世界和 sintetic 数据集来支持这种研究,因此它在舱外使用中受到限制。这篇论文提出了一种方法和软件管道,用于生成基于事件的视觉数据集。我们使用 Planet 和 Asteroid Natural Scene Generation Utility 在不同视点下生成了优化的下降轨迹,并将生成的图像序列转换为事件流。我们示例了该管道可以生成具有实际表征的表面特征的事件基于表示。我们预计可以使用这种管道生成更多的事件基于视觉数据集,以支持各种宇航器姿态重建问题,并且希望这种方法会吸引到研究在neuromorphic vision和导航控制之间的研究人员的注意。

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

  • paper_url: http://arxiv.org/abs/2308.02533
  • repo_url: https://github.com/microsoft/robustlearn
  • paper_authors: Kaijie Zhu, Jindong Wang, Xixu Hu, Xing Xie, Ge Yang
  • for: 本研究旨在提高深度神经网络的鲁棒性和泛化能力,同时维持鲁棒性。
  • methods: 本文提出了一种新的方法 called Robustness Critical Fine-Tuning (RiFT),通过利用鲁棒性训练后模型中的剩余容量来提高泛化能力。
  • results: 实验结果表明,RiFT可以在 ResNet18、ResNet34 和 WideResNet34-10 模型上提高泛化能力和鲁棒性,同时保持鲁棒性。Code可以在https://github.com/microsoft/robustlearn中下载。
    Abstract Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications. Adversarial Training (AT) is a well-established technique to enhance adversarial robustness, but it often comes at the cost of decreased generalization ability. This paper proposes Robustness Critical Fine-Tuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness. The core idea of RiFT is to exploit the redundant capacity for robustness by fine-tuning the adversarially trained model on its non-robust-critical module. To do so, we introduce module robust criticality (MRC), a measure that evaluates the significance of a given module to model robustness under worst-case weight perturbations. Using this measure, we identify the module with the lowest MRC value as the non-robust-critical module and fine-tune its weights to obtain fine-tuned weights. Subsequently, we linearly interpolate between the adversarially trained weights and fine-tuned weights to derive the optimal fine-tuned model weights. We demonstrate the efficacy of RiFT on ResNet18, ResNet34, and WideResNet34-10 models trained on CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Our experiments show that \method can significantly improve both generalization and out-of-distribution robustness by around 1.5% while maintaining or even slightly enhancing adversarial robustness. Code is available at https://github.com/microsoft/robustlearn.
    摘要 深度神经网络容易受到攻击性例子的威胁,这对于一些关键应用来说是一个重要的安全隐患。对抗训练(AT)是一种广泛使用的技术来增强对抗性,但是它经常会导致泛化能力下降。这篇论文提出了一种新的方法,即稳定性敏感细化(RiFT),可以增强泛化能力而不需要牺牲对抗性。RiFT的核心思想是利用神经网络对抗训练后的非稳定模块的剩余容量来提高泛化能力。为此,我们引入模块稳定性优先级(MRC),这是一种评估神经网络模块对对抗性的影响的指标。通过这个指标,我们可以确定神经网络中最低MRC值的模块为非稳定模块,并对其权重进行细化。然后,我们使用这些细化后的权重和对抗训练后的权重进行线性插值,以 derivate最佳细化模型权重。我们在ResNet18、ResNet34和WideResNet34-10模型上进行了CIFAR10、CIFAR100和Tiny-ImageNet数据集的实验,结果表明,\method可以提高泛化能力和对抗性的表现,同时保持或甚至提高对抗性。代码可以在https://github.com/microsoft/robustlearn中找到。

Deep Image Harmonization with Learnable Augmentation

  • paper_url: http://arxiv.org/abs/2308.00376
  • repo_url: https://github.com/bcmi/syconet-adaptive-image-harmonization
  • paper_authors: Li Niu, Junyan Cao, Wenyan Cong, Liqing Zhang
  • for: 用于调整图像composite中的前景外观,使整个图像具有和谐性。
  • methods: 使用learnable augmentation技术,通过学习Color transformation来生成更多的合理的synthetic composite image,以提高图像和谐化性的性能。
  • results: 对小规模dataset进行了广泛的实验,并达到了更好的和谐化性性能。code可以在https://github.com/bcmi/SycoNet-Adaptive-Image-Harmonization中下载。
    Abstract The goal of image harmonization is adjusting the foreground appearance in a composite image to make the whole image harmonious. To construct paired training images, existing datasets adopt different ways to adjust the illumination statistics of foregrounds of real images to produce synthetic composite images. However, different datasets have considerable domain gap and the performances on small-scale datasets are limited by insufficient training data. In this work, we explore learnable augmentation to enrich the illumination diversity of small-scale datasets for better harmonization performance. In particular, our designed SYthetic COmposite Network (SycoNet) takes in a real image with foreground mask and a random vector to learn suitable color transformation, which is applied to the foreground of this real image to produce a synthetic composite image. Comprehensive experiments demonstrate the effectiveness of our proposed learnable augmentation for image harmonization. The code of SycoNet is released at https://github.com/bcmi/SycoNet-Adaptive-Image-Harmonization.
    摘要 文本:图像协调的目标是将图像中的前景改变,以使整个图像协调。现有的数据集采用不同的方法来调整真实图像的照明统计信息,以生成合成图像。然而,不同的数据集存在较大的领域差异,小规模数据集的表现受到训练数据的限制。在这种情况下,我们探索了可学习的扩充方法,以增强小规模数据集的照明多样性,从而提高图像协调性能。我们的设计的Synthetic COmposite Network(SycoNet)接受一个真实图像和一个随机向量,并学习适当的颜色变换,该变换应用于图像中的前景,以生成一个合成图像。我们的实验表明,我们提出的可学习扩充方法可以有效地提高图像协调性能。SycoNet的代码可以在 GitHub 上找到:https://github.com/bcmi/SycoNet-Adaptive-Image-Harmonization。中文翻译:图像协调的目标是通过调整图像中的前景,使整个图像协调。现有的数据集采用不同的方法来调整真实图像的照明统计信息,以生成合成图像。然而,不同的数据集存在较大的领域差异,小规模数据集的表现受到训练数据的限制。在这种情况下,我们探索了可学习的扩充方法,以增强小规模数据集的照明多样性,从而提高图像协调性能。我们的设计的Synthetic COmposite Network(SycoNet)接受一个真实图像和一个随机向量,并学习适当的颜色变换,该变换应用于图像中的前景,以生成一个合成图像。我们的实验表明,我们提出的可学习扩充方法可以有效地提高图像协调性能。SycoNet的代码可以在 GitHub 上找到:https://github.com/bcmi/SycoNet-Adaptive-Image-Harmonization。

MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

  • paper_url: http://arxiv.org/abs/2308.01867
  • repo_url: None
  • paper_authors: Manasa Manohara, Sankalp Dayal, Tariq Afzal, Rahul Bakshi, Kahkuen Fu
  • for: 这篇论文目的是为了解决现有的深度学习模型在边缘设备上部署的问题,尤其是在使用固定点维度硬件时。
  • methods: 这篇论文使用了一种新的模型量化方法,称为MRQ(模型重量化),它可以将现有的量化模型转换为适合不同量化需求的模型。
  • results: 论文表明了一个MobileNetV2 QAT模型可以从现有的量化模型中快速地重新量化为不同的量化需求,并且可以在NNA上部署到Echo Show设备中。
    Abstract Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep learning models cannot be easily quantized for diverse fixed-point hardwares, mainly due to slightly different quantization requirements. In this paper, we envision a new type of model quantization approach called MRQ (model re-quantization), which takes existing quantized models and quickly transforms the models to meet different quantization requirements (e.g., asymmetric -> symmetric, non-power-of-2 scale -> power-of-2 scale). Re-quantization is much simpler than quantizing from scratch because it avoids costly re-training and provides support for multiple quantization schemes simultaneously. To minimize re-quantization error, we developed a new set of re-quantization algorithms including weight correction and rounding error folding. We have demonstrated that MobileNetV2 QAT model [7] can be quickly re-quantized into two different quantization schemes (i.e., symmetric and symmetric+power-of-2 scale) with less than 0.64 units of accuracy loss. We believe our work is the first to leverage this concept of re-quantization for model quantization and models obtained from the re-quantization process have been successfully deployed on NNA in the Echo Show devices.
    摘要 尽管现有多种各种硬件加速器(如NPU、TPU、DPU),但是在边缘设备上部署深度学习模型仍然是一个挑战,主要因为复杂的模型减量和转换。现有的模型减量框架,如TensorFlow QAT [1]、TFLite PTQ [2] 和Qualcomm AIMET [3],只支持有限的减量方案(例如,只有TF1.x QAT中的非对称每个tensor减量)。因此,深度学习模型难以被轻松地减量为不同的固定点硬件,主要是因为不同的减量要求。在本文中,我们提出了一种新的模型减量方法,称为MRQ(模型重新减量),它可以将现有的减量模型快速地转换为满足不同的减量要求(例如,非对称->对称、非二进制扩展->二进制扩展)。重新减量比从头开始减量更加简单,因为它可以避免高成本的重新训练,并且可以同时支持多种减量方案。为了减少重新减量误差,我们开发了一组新的重新减量算法,包括权重修正和圆拟误差叠加。我们已经证明了,通过MRQ方法,可以快速地将MobileNetV2 QAT模型(7)转换为两种不同的减量方案(即对称和对称+二进制扩展),减少精度损失 less than 0.64个单位。我们认为,我们的工作是首次在模型减量方面采用这种概念的重新减量,并且已经成功部署了这些从重新减量过程中获得的模型到NNA在Echo Show设备上。

Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation

  • paper_url: http://arxiv.org/abs/2308.00356
  • repo_url: https://github.com/bcmi/image-harmonization-dataset-ccharmony
  • paper_authors: Li Niu, Linfeng Tan, Xinhao Tao, Junyan Cao, Fengjun Guo, Teng Long, Liqing Zhang
  • for: 将图像融合到一起,使背景和前景光照协调一致。
  • methods: 使用全局信息引导前景特征转换,并将前景-背景关系从真实图像传播到复合图像中。
  • results: 比较前方法具有竞争性,并提供了一个名为ccHarmony的数据集,用于评估图像协调方法。
    Abstract Given a composite image, image harmonization aims to adjust the foreground illumination to be consistent with background. Previous methods have explored transforming foreground features to achieve competitive performance. In this work, we show that using global information to guide foreground feature transformation could achieve significant improvement. Besides, we propose to transfer the foreground-background relation from real images to composite images, which can provide intermediate supervision for the transformed encoder features. Additionally, considering the drawbacks of existing harmonization datasets, we also contribute a ccHarmony dataset which simulates the natural illumination variation. Extensive experiments on iHarmony4 and our contributed dataset demonstrate the superiority of our method. Our ccHarmony dataset is released at https://github.com/bcmi/Image-Harmonization-Dataset-ccHarmony.
    摘要

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

  • paper_url: http://arxiv.org/abs/2308.00353
  • repo_url: None
  • paper_authors: Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi
  • for: 本研究旨在实现开放世界Scene理解,即在未经标注的3D场景中找到和识别未经见过的3D对象类型。
  • methods: 我们提出使用预训练的视力语(VL)基础模型来生成多视图图像的描述文本,从而建立3D形状和 semantic-rich描述文本之间的明确关系。此外,我们还设计了层次点caption相关方法,以学习semantic-aware嵌入,并使用不supervised学习来解决开放世界设置中的局部化挑战。
  • results: 我们在3Dsemantic、instance和panoptic分割任务上进行了广泛的实验,覆盖了室内和室外场景三个 dataset。我们的方法与基线方法相比,显著提高了semantic分割(例如,34.5%$\sim$65.3%)、instance分割(例如,21.8%$\sim$54.0%)和panoptic分割(例如,14.7%$\sim$43.3%)的性能。
    Abstract Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories. A key factor for the recent progress in 2D open-world perception is the availability of large-scale image-text pairs from the Internet, which cover a wide range of vocabulary concepts. However, this success is hard to replicate in 3D scenarios due to the scarcity of 3D-text pairs. To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes. This allows us to establish explicit associations between 3D shapes and semantic-rich captions. Moreover, to enhance the fine-grained visual-semantic representation learning from captions for object-level categorization, we design hierarchical point-caption association methods to learn semantic-aware embeddings that exploit the 3D geometry between 3D points and multi-view images. In addition, to tackle the localization challenge for novel classes in the open-world setting, we develop debiased instance localization, which involves training object grouping modules on unlabeled data using instance-level pseudo supervision. This significantly improves the generalization capabilities of instance grouping and thus the ability to accurately locate novel objects. We conduct extensive experiments on 3D semantic, instance, and panoptic segmentation tasks, covering indoor and outdoor scenes across three datasets. Our method outperforms baseline methods by a significant margin in semantic segmentation (e.g. 34.5%$\sim$65.3%), instance segmentation (e.g. 21.8%$\sim$54.0%) and panoptic segmentation (e.g. 14.7%$\sim$43.3%). Code will be available.
    摘要 开放世界实例级场景理解的目标是找到并识别没有在标注数据中出现的新类型的3D对象。这个任务是非常困难,因为模型需要同时 lokalisieren noval 3D对象和推理其 semantic类别。在2D开放世界识别中,最近的进步得益于互联网上的大规模图像文本对象,这些对象覆盖了广泛的词汇概念。然而,这种成功很难在3D场景中复制,因为3D场景中的3D-文本对象 scarcity。为解决这个挑战,我们提议利用预训练的视觉语言(VL)基础模型,该模型编码了图像文本对象中的广泛知识,以生成多视图图像场景的caption。这使得我们可以显式地关联3D形状和semantic rich的caption。此外,为提高视觉semantic表示学习的细化,我们设计了层次点caption相关方法,以学习semantic-aware embedding,这些embedding利用3D点和多视图图像之间的几何关系。此外,为解决开放世界设置中 novel 类的localization挑战,我们开发了偏置Instance Localization,即在无标注数据上训练对象分组模块,使用实例级别的 Pseudo supervision。这有效地提高了对实例分组的泛化能力,从而准确地位置novel对象。我们对3Dsemantic、实例和панOPTIC分割任务进行了广泛的实验,覆盖了室内和室外场景,三个数据集。我们的方法与基eline方法相比,提高了semantic分割(例如,34.5% 到 65.3%)、实例分割(例如,21.8% 到 54.0%)和панOPTIC分割(例如,14.7% 到 43.3%)的性能。代码将可以提供。

Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis

  • paper_url: http://arxiv.org/abs/2308.00323
  • repo_url: None
  • paper_authors: Asish Bera, Mita Nasipuri, Ondrej Krejcar, Debotosh Bhattacharjee
  • for: 这个论文是为了解决人体姿态估计问题,具体是在运动、运动和舞蹈(SYD)姿态方面。
  • methods: 这篇论文使用了深度卷积神经网络(CNN)和一种名为patch-based attention(PbA)机制,以提高人体姿态估计的性能。
  • results: 在Yoga-82 dataset上,提出的SYD-Net模型达到了状态监测人体姿态的最佳性能,并在其他 dataset 上也表现出了很好的性能。
    Abstract Human body-pose estimation is a complex problem in computer vision. Recent research interests have been widened specifically on the Sports, Yoga, and Dance (SYD) postures for maintaining health conditions. The SYD pose categories are regarded as a fine-grained image classification task due to the complex movement of body parts. Deep Convolutional Neural Networks (CNNs) have attained significantly improved performance in solving various human body-pose estimation problems. Though decent progress has been achieved in yoga postures recognition using deep learning techniques, fine-grained sports, and dance recognition necessitates ample research attention. However, no benchmark public image dataset with sufficient inter-class and intra-class variations is available yet to address sports and dance postures classification. To solve this limitation, we have proposed two image datasets, one for 102 sport categories and another for 12 dance styles. Two public datasets, Yoga-82 which contains 82 classes and Yoga-107 represents 107 classes are collected for yoga postures. These four SYD datasets are experimented with the proposed deep model, SYD-Net, which integrates a patch-based attention (PbA) mechanism on top of standard backbone CNNs. The PbA module leverages the self-attention mechanism that learns contextual information from a set of uniform and multi-scale patches and emphasizes discriminative features to understand the semantic correlation among patches. Moreover, random erasing data augmentation is applied to improve performance. The proposed SYD-Net has achieved state-of-the-art accuracy on Yoga-82 using five base CNNs. SYD-Net's accuracy on other datasets is remarkable, implying its efficiency. Our Sports-102 and Dance-12 datasets are publicly available at https://sites.google.com/view/syd-net/home.
    摘要 人体姿态估计是计算机视觉中一个复杂的问题。最近的研究兴趣在特定的体育、健身和舞蹈(SYD)姿态方面进行了扩展,以维护健康状况。SYD姿态类别被视为一种细化的图像分类任务,因为人体部位的复杂运动。深度卷积神经网络(CNNs)在解决不同人体姿态估计问题上具有显著改进的表现。虽然对于健身姿态的深度学习技术进行了不错的进步,但是体育和舞蹈姿态的细化仍然需要大量的研究注意力。然而,目前还没有一个可用的公共图像数据集,以便对体育和舞蹈姿态进行分类。为解决这种限制,我们提出了四个图像数据集:一个是102个体育类别的数据集,另一个是12个舞蹈风格的数据集。这四个SYD数据集被我们的提议的深度模型SYD-Net进行实验。SYD-Netintegrates一个 patch-based attention(PbA)机制,这个机制使用一些固定大小和多尺度的 patches来学习 Contextual information,并强调特征以理解 semantic correlation among patches。此外,我们还应用了随机擦除数据增强技术来提高性能。我们的SYD-Net在Yoga-82 dataset上达到了state-of-the-art的准确率,并在其他数据集上表现了remarkable的性能,这implying its efficiency。我们的体育-102和舞蹈-12数据集现在公共可用,可以在https://sites.google.com/view/syd-net/home中下载。

Zero-Shot Learning by Harnessing Adversarial Samples

  • paper_url: http://arxiv.org/abs/2308.00313
  • repo_url: https://github.com/uqzhichen/haszsl
  • paper_authors: Zhi Chen, Pengfei Zhang, Jingjing Li, Sen Wang, Zi Huang
  • for: 这个研究旨在解决零基础学习(Zero-Shot Learning,ZSL)中的 semantic distortion 问题,提高模型的通用能力。
  • methods: 我们提出了一种基于对抗样本(Adversarial Samples)的 ZSL 方法,通过对抗训练来提高模型的对抗性和可靠性。
  • results: 我们通过了三个知名的零基础学习评估数据集的实验,证明了我们的对抗样本方法在 ZSL 和 Generalized Zero-Shot Learning(GZSL) scenario 中的效果。
    Abstract Zero-Shot Learning (ZSL) aims to recognize unseen classes by generalizing the knowledge, i.e., visual and semantic relationships, obtained from seen classes, where image augmentation techniques are commonly applied to improve the generalization ability of a model. However, this approach can also cause adverse effects on ZSL since the conventional augmentation techniques that solely depend on single-label supervision is not able to maintain semantic information and result in the semantic distortion issue consequently. In other words, image argumentation may falsify the semantic (e.g., attribute) information of an image. To take the advantage of image augmentations while mitigating the semantic distortion issue, we propose a novel ZSL approach by Harnessing Adversarial Samples (HAS). HAS advances ZSL through adversarial training which takes into account three crucial aspects: (1) robust generation by enforcing augmentations to be similar to negative classes, while maintaining correct labels, (2) reliable generation by introducing a latent space constraint to avert significant deviations from the original data manifold, and (3) diverse generation by incorporating attribute-based perturbation by adjusting images according to each semantic attribute's localization. Through comprehensive experiments on three prominent zero-shot benchmark datasets, we demonstrate the effectiveness of our adversarial samples approach in both ZSL and Generalized Zero-Shot Learning (GZSL) scenarios. Our source code is available at https://github.com/uqzhichen/HASZSL.
    摘要 HAS considers three crucial aspects:1. Robust generation: Enforcing augmentations to be similar to negative classes while maintaining correct labels.2. Reliable generation: Introducing a latent space constraint to avert significant deviations from the original data manifold.3. Diverse generation: Incorporating attribute-based perturbation to adjust images according to each semantic attribute's localization.We demonstrate the effectiveness of our approach in both ZSL and Generalized Zero-Shot Learning (GZSL) scenarios through comprehensive experiments on three prominent zero-shot benchmark datasets. Our source code is available at https://github.com/uqzhichen/HASZSL.Translation notes:* Zero-Shot Learning (ZSL) is translated as "无seen类识别" (wú miàn zhì bǐ)* Harnessing Adversarial Samples (HAS) is translated as "利用对抗采样" (lì yòng duì kòng qiè sān)* adversarial training is translated as "对抗训练" (duì kòng xiǎng tào)* semantic distortion issue is translated as "semantic扭曲问题" (semantic fāng zhì wèn tí)* attribute-based perturbation is translated as "基于 attribute 的扰动" (jī yú attribute de ràng dòng)* Generalized Zero-Shot Learning (GZSL) is translated as "普通的无seen类识别" (pǔ tōng de wú miàn zhì bǐ)

GradOrth: A Simple yet Efficient Out-of-Distribution Detection with Orthogonal Projection of Gradients

  • paper_url: http://arxiv.org/abs/2308.00310
  • repo_url: None
  • paper_authors: Sima Behpour, Thang Doan, Xin Li, Wenbin He, Liang Gou, Liu Ren
  • For: 这个研究旨在提高机器学习模型在实际应用中的安全部署,通过检测机器学习模型中的外部数据(Out-of-distribution,OOD)。* Methods: 这篇研究提出了一个名为GradOrth的新方法,基于实际数据中重要的特征向量进行OOD检测。具体来说,这个方法 computed the norm of gradient projection on the subspaces considered important for the in-distribution data,以检测数据是否为OOD。* Results: 这个方法可以实现高效的OOD检测,比起目前的现有方法,可以降低false positive rate(FPR)的平均值,具体而言,可以降低FPR95的值高达8%。
    Abstract Detecting out-of-distribution (OOD) data is crucial for ensuring the safe deployment of machine learning models in real-world applications. However, existing OOD detection approaches primarily rely on the feature maps or the full gradient space information to derive OOD scores neglecting the role of most important parameters of the pre-trained network over in-distribution (ID) data. In this study, we propose a novel approach called GradOrth to facilitate OOD detection based on one intriguing observation that the important features to identify OOD data lie in the lower-rank subspace of in-distribution (ID) data. In particular, we identify OOD data by computing the norm of gradient projection on the subspaces considered important for the in-distribution data. A large orthogonal projection value (i.e. a small projection value) indicates the sample as OOD as it captures a weak correlation of the ID data. This simple yet effective method exhibits outstanding performance, showcasing a notable reduction in the average false positive rate at a 95% true positive rate (FPR95) of up to 8% when compared to the current state-of-the-art methods.
    摘要 检测非常量(OOD)数据是机器学习模型在实际应用中的安全部署的关键。然而,现有的OOD检测方法主要基于特征图或整个梯度空间信息来生成OOD分数,忽略了预训练网络中最重要的参数的作用。在这项研究中,我们提出了一种新的方法 called GradOrth,用于基于ID数据中重要参数的低纬度子空间来进行OOD检测。具体来说,我们通过计算ID数据中考虑重要的参数所生成的梯度投影的评估值来识别OOD数据。如果梯度投影的评估值很小(即 proyect value 很大),则表示该样本为OOD,因为它 capture ID数据的弱相关性。这种简单 yet 高效的方法在我们的实验中表现出色,可以在95% true positive rate (FPR95) 下减少 false positive rate 的平均值达到8%,与当前状态的方法相比,具有显著的改善。

Domain Adaptation based on Human Feedback for Enhancing Generative Model Denoising Abilities

  • paper_url: http://arxiv.org/abs/2308.00307
  • repo_url: None
  • paper_authors: Hyun-Cheol Park, Sung Ho Kang
  • for: 这个论文的目的是如何使用人类反馈来提高生成模型的质量。
  • methods: 这个论文使用了人类反馈来修正生成模型在不同频谱上的表现。
  • results: 研究人员通过使用人类反馈来修正生成模型,提高了模型在不同频谱上的表现。In more detail, the paper proposes a method for fine-tuning a generator trained on one domain using human feedback from another domain, in order to enhance the denoising capabilities of the generator in different domains. The method involves training a reward model to predict human feedback, and then using the reward model to fine-tune the generator on the different domain. The approach is shown to be effective in improving the quality of the denoised images.
    Abstract How can we apply human feedback into generative model? As answer of this question, in this paper, we show the method applied on denoising problem and domain adaptation using human feedback. Deep generative models have demonstrated impressive results in image denoising. However, current image denoising models often produce inappropriate results when applied to domains different from the ones they were trained on. If there are `Good' and `Bad' result for unseen data, how to raise up quality of `Bad' result. Most methods use an approach based on generalization of model. However, these methods require target image for training or adapting unseen domain. In this paper, to adapting domain, we deal with non-target image for unseen domain, and improve specific failed image. To address this, we propose a method for fine-tuning inappropriate results generated in a different domain by utilizing human feedback. First, we train a generator to denoise images using only the noisy MNIST digit '0' images. The denoising generator trained on the source domain leads to unintended results when applied to target domain images. To achieve domain adaptation, we construct a noise-image denoising generated image data set and train a reward model predict human feedback. Finally, we fine-tune the generator on the different domain using the reward model with auxiliary loss function, aiming to transfer denoising capabilities to target domain. Our approach demonstrates the potential to efficiently fine-tune a generator trained on one domain using human feedback from another domain, thereby enhancing denoising abilities in different domains.
    摘要 如何将人类反馈应用到生成模型中?在这篇论文中,我们提出了基于人类反馈的方法,用于解决陌生频率问题和频率适应问题。深度生成模型在图像噪声除除预测中表现出色,但是当应用于不同的频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频�

Diffusion Model for Camouflaged Object Detection

  • paper_url: http://arxiv.org/abs/2308.00303
  • repo_url: None
  • paper_authors: Zhennan Chen, Rongrong Gao, Tian-Zhu Xiang, Fan Lin
  • for: 这个论文的目的是提出一个基于扩散的隐形物检测方法(diffCOD),用于检测高度相似背景的物品。
  • methods: 这个方法使用扩散泛化模型的强大噪声除法能力,将隐形物检测任务视为一个扩散泛化过程,从陌生分布传播到物品标识。具体来说,物品标识从真实标识推广到陌生分布,而设计的模型从中学习恢复这个泛化过程。
  • results: 在四个广泛使用的隐形物检测 benchmark 数据集上进行了广泛的实验,结果显示,提出的方法与现有的 11 种方法相比,尤其在隐形物标识中的细节 texture 检测方面表现出色。
    Abstract Camouflaged object detection is a challenging task that aims to identify objects that are highly similar to their background. Due to the powerful noise-to-image denoising capability of denoising diffusion models, in this paper, we propose a diffusion-based framework for camouflaged object detection, termed diffCOD, a new framework that considers the camouflaged object segmentation task as a denoising diffusion process from noisy masks to object masks. Specifically, the object mask diffuses from the ground-truth masks to a random distribution, and the designed model learns to reverse this noising process. To strengthen the denoising learning, the input image prior is encoded and integrated into the denoising diffusion model to guide the diffusion process. Furthermore, we design an injection attention module (IAM) to interact conditional semantic features extracted from the image with the diffusion noise embedding via the cross-attention mechanism to enhance denoising learning. Extensive experiments on four widely used COD benchmark datasets demonstrate that the proposed method achieves favorable performance compared to the existing 11 state-of-the-art methods, especially in the detailed texture segmentation of camouflaged objects. Our code will be made publicly available at: https://github.com/ZNan-Chen/diffCOD.
    摘要 幻化物体检测是一个复杂的任务,旨在标识背景上的物体,这些物体与背景几乎完全相同。由于杂波模型具有强大的噪声去除能力,我们在这篇论文中提出了一种基于杂波的检测方法,称为diffCOD。这种方法将物体检测任务视为一种噪声去除过程,从杂波masks到object masks。具体来说,物体mask从真实的masks diffuses到一个随机分布,而设计的模型学习恢复这个噪声过程。为强化噪声学习,输入图像先验是编码并 integrate到杂波噪声模型中,以导引杂波过程。此外,我们还设计了注入注意力模块(IAM),通过跨注意力机制与图像中的 conditional semantic feature进行交互,以增强噪声学习。我们在四种通用COD benchmark数据集上进行了广泛的实验,结果显示,我们的提出的方法与现有11种状态之前的方法相比,尤其是在透明度高的细节表示上,对幻化物体的检测表现出色。我们的代码将在:https://github.com/ZNan-Chen/diffCOD 中公开。

Online Prototype Learning for Online Continual Learning

  • paper_url: http://arxiv.org/abs/2308.00301
  • repo_url: https://github.com/weilllllls/onpro
  • paper_authors: Yujie Wei, Jiaxin Ye, Zhizhong Huang, Junping Zhang, Hongming Shan
  • for: 本研究探讨了在单过滤流量中不断学习的问题,以适应新数据并减轻悬崖式忘却。
  • methods: 本研究使用了存储一小部分旧数据的方法,并提出了一种新的代码反馈机制,以解决在线学习模型对新任务的欠拟合问题。
  • results: 实验结果表明, comparing with现状态 искусственный智能方法,本研究的方法在广泛使用的标准 benchmark 数据集上达到了更高的性能。
    Abstract Online continual learning (CL) studies the problem of learning continuously from a single-pass data stream while adapting to new data and mitigating catastrophic forgetting. Recently, by storing a small subset of old data, replay-based methods have shown promising performance. Unlike previous methods that focus on sample storage or knowledge distillation against catastrophic forgetting, this paper aims to understand why the online learning models fail to generalize well from a new perspective of shortcut learning. We identify shortcut learning as the key limiting factor for online CL, where the learned features may be biased, not generalizable to new tasks, and may have an adverse impact on knowledge distillation. To tackle this issue, we present the online prototype learning (OnPro) framework for online CL. First, we propose online prototype equilibrium to learn representative features against shortcut learning and discriminative features to avoid class confusion, ultimately achieving an equilibrium status that separates all seen classes well while learning new classes. Second, with the feedback of online prototypes, we devise a novel adaptive prototypical feedback mechanism to sense the classes that are easily misclassified and then enhance their boundaries. Extensive experimental results on widely-used benchmark datasets demonstrate the superior performance of OnPro over the state-of-the-art baseline methods. Source code is available at https://github.com/weilllllls/OnPro.
    摘要 在线持续学习(CL)研究了从单个数据流中不断学习的问题,同时适应新数据并避免恶化学习。最近,通过存储一小部分的老数据来实现,重温方法表现了良好的性能。与前方法集中注意点在样本存储或知识储存防止恶化学习,这篇论文强调从新的角度研究在线学习模型何以不好地泛化。我们识别快捷学习为在线CL的限制因素,因为学习的特征可能偏导向、不易泛化到新任务、并可能对知识储存产生负面影响。为了解决这个问题,我们提出在线原型学习(OnPro)框架,包括在线原型均衡和泛化特征的学习,以达到一个分离所有seen类的平衡状态。其次,通过在线原型反馈机制,我们提出一种新的适应式原型反馈机制,以感知容易混淆的类并强制其分类边界。实验结果表明,与状态实验方法相比,OnPro显著超越了基eline方法的性能。代码可以在获取。

Fundus-Enhanced Disease-Aware Distillation Model for Retinal Disease Classification from OCT Images

  • paper_url: http://arxiv.org/abs/2308.00291
  • repo_url: https://github.com/xmed-lab/fddm
  • paper_authors: Lehan Wang, Weihang Dai, Mei Jin, Chubin Ou, Xiaomeng Li
  • for: 本研究旨在提出一种基于多模态学习的眼病识别方法,以提高现有的单模态学习方法的效果。
  • methods: 我们提出了一种基于眼病模型的分类器,通过在训练过程中使用不匹配的背景图像来增强OCToct模型的表现。我们还提出了一种类prototype匹配和类相似性对齐方法,以便在不同模态之间传递疾病相关信息。
  • results: 我们的方法在实验中表现出优于单模态、多模态和现有的气化方法,并且可以在不同的眼病诊断中获得更高的准确率。
    Abstract Optical Coherence Tomography (OCT) is a novel and effective screening tool for ophthalmic examination. Since collecting OCT images is relatively more expensive than fundus photographs, existing methods use multi-modal learning to complement limited OCT data with additional context from fundus images. However, the multi-modal framework requires eye-paired datasets of both modalities, which is impractical for clinical use. To address this problem, we propose a novel fundus-enhanced disease-aware distillation model (FDDM), for retinal disease classification from OCT images. Our framework enhances the OCT model during training by utilizing unpaired fundus images and does not require the use of fundus images during testing, which greatly improves the practicality and efficiency of our method for clinical use. Specifically, we propose a novel class prototype matching to distill disease-related information from the fundus model to the OCT model and a novel class similarity alignment to enforce consistency between disease distribution of both modalities. Experimental results show that our proposed approach outperforms single-modal, multi-modal, and state-of-the-art distillation methods for retinal disease classification. Code is available at https://github.com/xmed-lab/FDDM.
    摘要 optical coherence tomography (OCT) 是一种新的和有效的屏检工具 для眼科诊断。由于收集 OCT 图像相对较为昂贵于fundus 图像,现有的方法使用多模态学习来补充 OCT 数据中的有限信息。然而,多模态框架需要临床使用的眼球对应的数据集,这是不现实的。为解决这个问题,我们提出了一种新的眼球增强疾病意识模型(FDDM),用于从 OCT 图像中分类眼球疾病。我们的框架在训练时使用不匹配的眼球图像来增强 OCT 模型,而不需要在测试时使用眼球图像,这大大提高了我们的方法的实用性和效率。specifically,我们提出了一种新的类型prototype匹配来提取疾病相关信息从眼球模型并将其传递给 OCT 模型,以及一种新的类型相似性对齐来强制两种模式中疾病分布的一致。实验结果表明,我们的提出的方法在眼球疾病分类中超过单模、多模和状态 искусственный气化方法。代码可以在 中找到。

A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation

  • paper_url: http://arxiv.org/abs/2308.00287
  • repo_url: None
  • paper_authors: Minghao Chen, Zepeng Gao, Shuai Zhao, Qibo Qiu, Wenxiao Wang, Binbin Lin, Xiaofei He
  • for: 本研究旨在无需目标验证数据的情况下,发展一个无监督领域适应(Unsupervised Domain Adaptation,UDA)评估指标,以评估将模型转移到目标领域时的品质。
  • methods: 本研究使用的方法包括基于对应信息的评估指标,以及一个新的多层感知(MLP)分类器,并与数据增强技术相结合,实现一个名为增强对应稳定度(Augmentation Consistency Metric,ACM)的新的UDA评估指标。
  • results: 本研究透过实验证明了先前的实验设定存在缺陷,并通过大规模实验验证了我们提出的评估指标的有效性。此外,我们还使用了我们的评估指标自动搜寻最佳参数集,在四个常用的参数集上实现了超越手动参数集的性能。
    Abstract Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the metric based on mutual information of the model prediction. Through empirical analysis, we identify three prevalent issues with this metric: 1) It does not account for the source structure. 2) It can be easily attacked. 3) It fails to detect negative transfer caused by the over-alignment of source and target features. To address the first two issues, we incorporate source accuracy into the metric and employ a new MLP classifier that is held out during training, significantly improving the result. To tackle the final issue, we integrate this enhanced metric with data augmentation, resulting in a novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM). Additionally, we empirically demonstrate the shortcomings of previous experiment settings and conduct large-scale experiments to validate the effectiveness of our proposed metric. Furthermore, we employ our metric to automatically search for the optimal hyper-parameter set, achieving superior performance compared to manually tuned sets across four common benchmarks. Codes will be available soon.
    摘要 <> translate "Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the metric based on mutual information of the model prediction. Through empirical analysis, we identify three prevalent issues with this metric: 1) It does not account for the source structure. 2) It can be easily attacked. 3) It fails to detect negative transfer caused by the over-alignment of source and target features. To address the first two issues, we incorporate source accuracy into the metric and employ a new MLP classifier that is held out during training, significantly improving the result. To tackle the final issue, we integrate this enhanced metric with data augmentation, resulting in a novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM). Additionally, we empirically demonstrate the shortcomings of previous experiment settings and conduct large-scale experiments to validate the effectiveness of our proposed metric. Furthermore, we employ our metric to automatically search for the optimal hyper-parameter set, achieving superior performance compared to manually tuned sets across four common benchmarks. Codes will be available soon." into Simplified Chinese.Here's the translation:<>无监督领域适应(UDA)方法可以将模型转移到目标领域无需标签。然而,这些方法通常需要一个标注的目标验证集来进行参数调整和模型选择。在这篇论文中,我们目标是找到一个无需目标验证标签的评价指标,以评估转移模型的质量。我们开始于基于模型预测的共同信息度metric。通过实验分析,我们发现了三个常见的问题:1)它不考虑源结构。2)它可以轻松攻击。3)它无法探测源和目标特征的过对齐导致的负面转移。为了解决这些问题,我们将源准确率 incorporated into the metric,并使用一个新的多层感知机(MLP)分类器,在训练期间快速进行了改进。为了解决最后一个问题,我们将这个加强的 metric 与数据扩展结合,得到了一个新的无监督 UDA 度量called Augmentation Consistency Metric(ACM)。此外,我们也进行了实验证明先前的实验设置的缺陷,并进行了大规模的实验来验证我们的提议的度量的有效性。此外,我们还使用我们的度量自动搜索最佳参数集,在四个常见的 benchmark 上达到了超过手动调整的参数集的性能。代码将在未来 soon 可用。

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

  • paper_url: http://arxiv.org/abs/2308.00279
  • repo_url: https://github.com/woriazzc/robust-pu
  • paper_authors: Zhangchi Zhu, Lu Wang, Pu Zhao, Chao Du, Wei Zhang, Hang Dong, Bo Qiao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
  • For: The paper focuses on developing a robust positive-unlabeled (PU) learning method to improve the accuracy and stability of learning with positive and unlabeled data.* Methods: The proposed method utilizes a novel “hardness” measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more “easy” samples in the early stage of training.* Results: Extensive experimental validations over a wide range of learning tasks show that the proposed approach can effectively improve the accuracy and stability of learning with positive and unlabeled data.Here are the three key points in Simplified Chinese:* For: 本 paper 针对正样本和无标签样本的学习问题提出了一种robustPositive-unlabeled (PU)学习方法,以提高学习精度和稳定性。* Methods: 提议的方法利用了一种新的”困难度”度量来分辨无标签样本中高概率为负样本的样本和具有大量标签噪声的样本。然后,通过一种迭代训练策略来在训练过程中进行迭代调整负样本的选择,以包括更多”容易”的样本在早期训练阶段。* Results: 对各种学习任务进行了广泛的实验验证,结果表明,提议的方法可以有效地提高学习正样本和无标签样本的精度和稳定性。
    Abstract Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel ``hardness'' measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more ``easy'' samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU
    摘要 学习正方向和未标注数据的技术被称为正方向-未标注(PU)学习,在最近几年内吸引了很多关注。一种常见的PU学习方法是从未标注数据中随机选择一些pseudo-negative样本,以便使用 convential的supervised方法进行学习。由于未标注数据中的标签不确定性,在训练过程中可能会出现误分类未标注正样本为负样本的错误,这些错误可能会在训练过程中积累,导致性能下降和模型不稳定。为了减轻标签不确定性的影响和提高学习正方向和未标注数据的稳定性,我们提出了一种新的robust PU学习方法,具体来说是在训练过程中采用一种基于人类学习的启发,即在训练的初始阶段使用容易的样本进行学习,然后逐渐引入更复杂的样本。我们使用一种新的“困难度”度量来 отличи未标注样本中的高概率负样本和大量标签噪声。然后,我们实现了一种迭代训练策略,在训练过程中不断细化选择负样本的过程,以包括更多的容易样本在训练的初始阶段。我们进行了广泛的实验 validate our approach,结果表明,这种方法可以有效地提高学习正方向和未标注数据的精度和稳定性。我们的代码可以在 找到。

Benchmarking Ultra-High-Definition Image Reflection Removal

  • paper_url: http://arxiv.org/abs/2308.00265
  • repo_url: https://github.com/liar-zzy/benchmarking-ultra-high-definition-single-image-reflection-removal
  • paper_authors: Zhenyuan Zhang, Zhenbo Song, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu
    for: 本研究旨在解决高清单张图像反射去除(SIRR)问题,特别是对于超高清单张图像(UHD)。methods: 本研究使用了六种现状顶尖SIRR方法进行评估,并对这些方法在UHD图像上的应用进行了详细的分析。此外,本研究还提出了一种基于 transformer 架构的 Reflection Removal 方法(RRFormer),该方法包括三个模块:预处理嵌入模块、自动注意力特征提取模块和多尺度空间特征提取模块。results: 经过实验证明,RRFormer 在非 UHD 数据集和我们提出的 UHDRR 数据集上达到了领先的表现。此外,本研究还提供了一个可公开下载的代码和数据集,以便进一步探索 UHD SIRR 领域。
    Abstract Deep learning based methods have achieved significant success in the task of single image reflection removal (SIRR). However, the majority of these methods are focused on High-Definition/Standard-Definition (HD/SD) images, while ignoring higher resolution images such as Ultra-High-Definition (UHD) images. With the increasing prevalence of UHD images captured by modern devices, in this paper, we aim to address the problem of UHD SIRR. Specifically, we first synthesize two large-scale UHD datasets, UHDRR4K and UHDRR8K. The UHDRR4K dataset consists of $2,999$ and $168$ quadruplets of images for training and testing respectively, and the UHDRR8K dataset contains $1,014$ and $105$ quadruplets. To the best of our knowledge, these two datasets are the first largest-scale UHD datasets for SIRR. Then, we conduct a comprehensive evaluation of six state-of-the-art SIRR methods using the proposed datasets. Based on the results, we provide detailed discussions regarding the strengths and limitations of these methods when applied to UHD images. Finally, we present a transformer-based architecture named RRFormer for reflection removal. RRFormer comprises three modules, namely the Prepossessing Embedding Module, Self-attention Feature Extraction Module, and Multi-scale Spatial Feature Extraction Module. These modules extract hypercolumn features, global and partial attention features, and multi-scale spatial features, respectively. To ensure effective training, we utilize three terms in our loss function: pixel loss, feature loss, and adversarial loss. We demonstrate through experimental results that RRFormer achieves state-of-the-art performance on both the non-UHD dataset and our proposed UHDRR datasets. The code and datasets are publicly available at https://github.com/Liar-zzy/Benchmarking-Ultra-High-Definition-Single-Image-Reflection-Removal.
    摘要 深度学习基于方法在单个图像反射去除(SIRR)任务中已经取得了显著的成功。然而,大多数这些方法都是关注高清晰度/标准清晰度(HD/SD)图像,而忽略更高的分辨率图像,如超高清晰度(UHD)图像。随着现代设备拍摄的UHD图像的流行,在这篇论文中,我们想要解决UHD SIRR问题。specifically,我们首先合成了两个大规模UHD datasets,即UHDRR4K和UHDRR8K。UHDRR4K dataset包含2999个和168个图像对用于训练和测试,而UHDRR8K dataset包含1014个和105个图像对。我们知道,这两个dataset是现有最大规模的UHD datasets for SIRR。然后,我们进行了六种state-of-the-art SIRR方法的全面评估,使用我们提出的dataset。基于结果,我们提供了详细的讨论,探讨这些方法在UHD图像上的优缺点。最后,我们提出了一种基于转换器的架构,名为RRFormer,用于反射去除。RRFormer包括三个模块:预处理嵌入模块、自注意特征提取模块和多尺度空间特征提取模块。这些模块分别提取了嵌入特征、全局和部分注意特征以及多尺度空间特征。为确保有效训练,我们使用了三个损失函数:像素损失、特征损失和对抗损失。我们通过实验结果表明,RRFormer在我们提出的UHDRR datasets以及非UHD dataset上达到了状态之最的性能。代码和数据集可以在https://github.com/Liar-zzy/Benchmarking-Ultra-High-Definition-Single-Image-Reflection-Removal上获取。

The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution

  • paper_url: http://arxiv.org/abs/2308.00262
  • repo_url: https://github.com/uark-cviu/algonauts2023
  • paper_authors: Xuan-Bac Nguyen, Xudong Liu, Xin Li, Khoa Luu
  • for: 这个研究是为了参加Algonauts Project 2023 Challenge,目标是使用计算模型预测参与者在观看复杂自然视觉场景时的脑响应。
  • methods: 这个研究使用了一种两步训练方法来构建一个图像基于的脑编码器,包括在所有参与者的数据上进行首先训练,然后对每个参与者进行细化训练,使用不同的损失函数和目标来引入多样性。
  • results: 这个研究的结果是一个由多个独特的编码器组成的ensemble,可以准确预测脑响应。代码可以在https://github.com/uark-cviu/Algonauts2023上获取。
    Abstract This work presents our solutions to the Algonauts Project 2023 Challenge. The primary objective of the challenge revolves around employing computational models to anticipate brain responses captured during participants' observation of intricate natural visual scenes. The goal is to predict brain responses across the entire visual brain, as it is the region where the most reliable responses to images have been observed. We constructed an image-based brain encoder through a two-step training process to tackle this challenge. Initially, we created a pretrained encoder using data from all subjects. Next, we proceeded to fine-tune individual subjects. Each step employed different training strategies, such as different loss functions and objectives, to introduce diversity. Ultimately, our solution constitutes an ensemble of multiple unique encoders. The code is available at https://github.com/uark-cviu/Algonauts2023
    摘要 这个工作介绍了我们对Algonauts Project 2023 Challenge的解决方案。挑战的主要目标是通过计算模型预测参与者观看复杂自然视觉场景时的脑响应。目标是预测整个视觉脑区的脑响应,因为这里是最可靠的图像响应的地方。我们使用了两步训练过程构建了基于图像的脑编码器。首先,我们创建了所有参与者的预训练编码器。然后,我们进行了个性化训练,每个步骤使用了不同的loss函数和目标,以引入多样性。最终,我们的解决方案是一个ensemble的多个唯一编码器。代码可以在https://github.com/uark-cviu/Algonauts2023中找到。

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

  • paper_url: http://arxiv.org/abs/2308.00261
  • repo_url: https://github.com/open-mmlab/mmpretrain
  • paper_authors: Yuan Liu, Songyang Zhang, Jiacheng Chen, Zhaohui Yu, Kai Chen, Dahua Lin
  • for: This paper aims to improve the performance of Masked Image Modeling (MIM) methods, which are used for tasks such as fine-tuning, linear probing, and semantic segmentation.
  • methods: The proposed method utilizes low-level features from shallow layers to aid pixel reconstruction, and incorporates multi-level feature fusion for isotropic architectures like the standard Vision Transformer (ViT).
  • results: The proposed method achieves non-trivial improvements across various downstream tasks, including 1.2% improvement in fine-tuning, 2.8% improvement in linear probing, and 2.6% improvement in semantic segmentation, when applied to a smaller model (e.g., ViT-S).Here is the text in Simplified Chinese:
  • for: 这篇论文目标是提高Masked Image Modeling(MIM)方法的性能,这些方法用于Tasks如 fine-tuning、linear probing 和 semantic segmentation。
  • methods: 提议的方法利用 shallow layers 的低级别特征来帮助像素重建,并利用 isotropic 架构 like standard Vision Transformer(ViT)的多级特征融合。
  • results: 提议的方法在不同的下游任务上实现了非致命的改进,包括 fine-tuning 上的1.2%提升、linear probing 上的2.8%提升和 semantic segmentation 上的2.6%提升,当应用于 smaller model(如 ViT-S)时。
    Abstract There has been significant progress in Masked Image Modeling (MIM). Existing MIM methods can be broadly categorized into two groups based on the reconstruction target: pixel-based and tokenizer-based approaches. The former offers a simpler pipeline and lower computational cost, but it is known to be biased toward high-frequency details. In this paper, we provide a set of empirical studies to confirm this limitation of pixel-based MIM and propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction. By incorporating this design into our base method, MAE, we reduce the wasted modeling capability of pixel-based MIM, improving its convergence and achieving non-trivial improvements across various downstream tasks. To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures like the standard Vision Transformer (ViT). Notably, when applied to a smaller model (e.g., ViT-S), our method yields significant performance gains, such as 1.2\% on fine-tuning, 2.8\% on linear probing, and 2.6\% on semantic segmentation. Code and models are available at https://github.com/open-mmlab/mmpretrain.
    摘要 “ máscara image modeling (MIM) 方面已经取得了 significativo progress. 现有的 MIM 方法可以分为两个类别,根据重建目标来分:像素基于的方法和 tokenizer 基于的方法。前者具有简单的管道和较低的计算成本,但已知偏好高频率细节。在这篇论文中,我们提供了一系列实验来证明这一点,并提出一种新的方法,利用低层特征来帮助像素重建。通过将这种设计纳入我们的基本方法MAE中,我们降低了像素基于 MIM 的浪费模型能力,提高了它的收敛和在多种下游任务中取得了非常有用的改进。我们知道,我们是第一个系统地调查多级特征融合 для均匀的Architecture like standard Vision Transformer (ViT)。当应用于较小的模型(例如 ViT-S)时,我们的方法可以获得显著的性能提升,如1.2%的 fine-tuning,2.8%的线性探测和2.6%的语义分割。代码和模型可以在https://github.com/open-mmlab/mmpretrain中找到。”

Unleashing the Power of Self-Supervised Image Denoising: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2308.00247
  • repo_url: None
  • paper_authors: Dan Zhang, Fangfang Zhou, Yuanzhou Wei, Xiao Yang, Yuan Gu
  • for: 本文旨在提供一份准确、净化的自适应图像干扰除法综述,帮助研究者和实践者更好地了解这个领域的最新发展。
  • methods: 本文分类了自适应图像干扰除方法为三类:通用方法、BSN基于方法和Transformer基于方法,并提供了每种方法的 theoretically 分析和实践应用。
  • results: 本文通过对多个数据集进行量化和质量性实验,证明了这些方法的有效性,并提供了对等的对比分析。
    Abstract The advent of deep learning has brought a revolutionary transformation to image denoising techniques. However, the persistent challenge of acquiring noise-clean pairs for supervised methods in real-world scenarios remains formidable, necessitating the exploration of more practical self-supervised image denoising. This paper focuses on self-supervised image denoising methods that offer effective solutions to address this challenge. Our comprehensive review thoroughly analyzes the latest advancements in self-supervised image denoising approaches, categorizing them into three distinct classes: General methods, Blind Spot Network (BSN)-based methods, and Transformer-based methods. For each class, we provide a concise theoretical analysis along with their practical applications. To assess the effectiveness of these methods, we present both quantitative and qualitative experimental results on various datasets, utilizing classical algorithms as benchmarks. Additionally, we critically discuss the current limitations of these methods and propose promising directions for future research. By offering a detailed overview of recent developments in self-supervised image denoising, this review serves as an invaluable resource for researchers and practitioners in the field, facilitating a deeper understanding of this emerging domain and inspiring further advancements.
    摘要 深度学习的出现对图像干扰技术带来了革命性的变革。然而,在实际场景中获得干扰级别对的训练数据仍然是一大挑战,这使得自我监督的图像干扰技术成为了一项有优势的选择。这篇评论将 concentrate 于最新的自我监督图像干扰方法,并将它们分为三个不同的类别:一般方法、基于 Blind Spot Network (BSN) 的方法和基于 Transformer 的方法。对于每个类别,我们将提供一个简洁的理论分析,并详细介绍它们在实践中的应用。为评估这些方法的效果,我们将提供量化和质量上的实验结果,使用经典算法作为参考。此外,我们还会 kritically 讨论这些方法的当前的局限性,并提出未来研究的可能性。通过对最新的自我监督图像干扰技术的审视,这篇评论将成为该领域的一个不可或缺的资源,为研究人员和实践者提供深入的了解,并激发更多的进步。

Partitioned Saliency Ranking with Dense Pyramid Transformers

  • paper_url: http://arxiv.org/abs/2308.00236
  • repo_url: https://github.com/ssecv/psr
  • paper_authors: Chengxiao Sun, Yan Xu, Jialun Pei, Haopeng Fang, He Tang
  • for: 本研究旨在解决saliency ranking中的主观性问题,提出了ranking by partition paradigm方法,可以减少rank scores的归一化问题。
  • methods: 本文提出了Dense Pyramid Transformer (DPT)模型,用于实现global cross-scale interactions,并且使用ranking by partition paradigm方法来解决rank scores的归一化问题。
  • results: 实验结果表明,我们的方法可以在多个 benchmark dataset 上出perform all existing methods,并且可以减少计算成本。代码可以在 \url{https://github.com/ssecv/PSR} 上获取。
    Abstract In recent years, saliency ranking has emerged as a challenging task focusing on assessing the degree of saliency at instance-level. Being subjective, even humans struggle to identify the precise order of all salient instances. Previous approaches undertake the saliency ranking by directly sorting the rank scores of salient instances, which have not explicitly resolved the inherent ambiguities. To overcome this limitation, we propose the ranking by partition paradigm, which segments unordered salient instances into partitions and then ranks them based on the correlations among these partitions. The ranking by partition paradigm alleviates ranking ambiguities in a general sense, as it consistently improves the performance of other saliency ranking models. Additionally, we introduce the Dense Pyramid Transformer (DPT) to enable global cross-scale interactions, which significantly enhances feature interactions with reduced computational burden. Extensive experiments demonstrate that our approach outperforms all existing methods. The code for our method is available at \url{https://github.com/ssecv/PSR}.
    摘要 近年来,焦点排序成为一项具有挑战性的任务,旨在评估每个实例的焦点程度。由于是主观的,也就人类很难准确地确定所有焦点实例的准确顺序。先前的方法通过直接排序焦点实例的排名分数来实现焦点排序,这并没有解决内在的抽象性。为了解决这个限制,我们提议使用分区排序思想,将不同焦点实例分成不同的分区,然后根据这些分区之间的相关性进行排名。这种分区排序方法可以减少焦点排序的抽象性,并且广泛提高其他焦点排序模型的性能。此外,我们还引入了笔直射Transformer(DPT),以启用全球跨级交互,从而显著提高了特征交互的能力,同时减少计算负担。广泛的实验表明,我们的方法可以全面超越所有现有的方法。代码可以在 \url{https://github.com/ssecv/PSR} 上获取。

Using Scene and Semantic Features for Multi-modal Emotion Recognition

  • paper_url: http://arxiv.org/abs/2308.00228
  • repo_url: None
  • paper_authors: Zhifeng Wang, Ramesh Sankaranarayana
    for: 这个论文的目的是提出一种基于场景和 semantics 特征的多模态情绪识别方法,以提高情绪识别的准确性和稳定性。methods: 该方法使用了修改后的 EmbraceNet 来提取图像中的特征,并将身体特征和姿态特征同时学习。另外,该方法还使用了场景特征和 semantics 特征来支持情绪识别。results: 在 EMOTIC 数据集上进行测试,该方法实现了平均准确率为 40.39%,比之前的方法提高了5%。
    Abstract Automatic emotion recognition is a hot topic with a wide range of applications. Much work has been done in the area of automatic emotion recognition in recent years. The focus has been mainly on using the characteristics of a person such as speech, facial expression and pose for this purpose. However, the processing of scene and semantic features for emotion recognition has had limited exploration. In this paper, we propose to use combined scene and semantic features, along with personal features, for multi-modal emotion recognition. Scene features will describe the environment or context in which the target person is operating. The semantic feature can include objects that are present in the environment, as well as their attributes and relationships with the target person. In addition, we use a modified EmbraceNet to extract features from the images, which is trained to learn both the body and pose features simultaneously. By fusing both body and pose features, the EmbraceNet can improve the accuracy and robustness of the model, particularly when dealing with partially missing data. This is because having both body and pose features provides a more complete representation of the subject in the images, which can help the model to make more accurate predictions even when some parts of body are missing. We demonstrate the efficiency of our method on the benchmark EMOTIC dataset. We report an average precision of 40.39\% across the 26 emotion categories, which is a 5\% improvement over previous approaches.
    摘要 自动情感认识是一个热门的话题,具有广泛的应用领域。在过去的几年中,关于自动情感认识的研究得到了广泛的关注,主要是利用人类特征,如语音、脸部表达和姿势来实现。然而,Scene和semantic特征的处理对情感认识的研究尚未得到了充分的探索。在本文中,我们提议使用组合场景和semantic特征, along with个人特征, для多modal情感认识。场景特征将描述目标人在运行的环境或情况,semantic特征包括环境中的物品、Attribute和目标人之间的关系。此外,我们使用修改后的EmbraceNet来提取图像中的特征,该模型同时学习体部和姿势特征。通过融合体部和姿势特征,EmbraceNet可以提高模型的准确性和可靠性,特别是处理部分数据时。这是因为具有体部和姿势特征的描述可以帮助模型更好地预测,即使部分身体部分缺失。我们在EMOTIC数据集上进行了效果示例,并Report了26种情绪类别的平均准确率为40.39%,相比之前的方法提高5%。

Boundary Difference Over Union Loss For Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.00220
  • repo_url: https://github.com/sunfan-bvb/boundarydouloss
  • paper_authors: Fan Sun, Zhiming Luo, Shaozi Li
  • for: 针对医疗图像分割中的边界区域分割,提出了一种简单有效的损失函数Boundary DoU Loss。
  • methods: 该损失函数基于区域计算,不需要其他损失函数,训练稳定且易于实现。此外,还使用目标大小进行适应性调整对边界区域应用注意力。
  • results: 在ACDC和Synapse两个 dataset上,使用UNet、TransUNet和Swin-UNet进行实验,表明我们提出的损失函数有效地提高了边界区域分割的准确率。
    Abstract Medical image segmentation is crucial for clinical diagnosis. However, current losses for medical image segmentation mainly focus on overall segmentation results, with fewer losses proposed to guide boundary segmentation. Those that do exist often need to be used in combination with other losses and produce ineffective results. To address this issue, we have developed a simple and effective loss called the Boundary Difference over Union Loss (Boundary DoU Loss) to guide boundary region segmentation. It is obtained by calculating the ratio of the difference set of prediction and ground truth to the union of the difference set and the partial intersection set. Our loss only relies on region calculation, making it easy to implement and training stable without needing any additional losses. Additionally, we use the target size to adaptively adjust attention applied to the boundary regions. Experimental results using UNet, TransUNet, and Swin-UNet on two datasets (ACDC and Synapse) demonstrate the effectiveness of our proposed loss function. Code is available at https://github.com/sunfan-bvb/BoundaryDoULoss.
    摘要 医疗图像分割是诊断的关键。然而,目前的医疗图像分割损失主要关注总分割结果,有 fewer 的损失用于指导边界分割。这些损失经常需要与其他损失结合使用,并且生成不具有效果的结果。为解决这个问题,我们已经开发了一种简单而有效的损失函数,即边界差异上 UNION 损失(Boundary DoU Loss),用于指导边界区域分割。它是通过计算预测和实际值差集的差集与 UNION 集之间的比率来获得的。我们的损失函数只需要进行区域计算,因此容易实现和训练,不需要任何其他损失函数。此外,我们使用目标大小调整边界区域的注意力。实验结果表明,我们使用 UNet、TransUNet 和 Swin-UNet 在 ACDC 和 Synapse 两个 dataset 上,表明我们提议的损失函数是有效的。代码可以在 https://github.com/sunfan-bvb/BoundaryDoULoss 上找到。

Multi-goal Audio-visual Navigation using Sound Direction Map

  • paper_url: http://arxiv.org/abs/2308.00219
  • repo_url: None
  • paper_authors: Haru Kondoh, Asako Kanezaki
  • For: 这 paper 是关于多目标听视导航任务的研究,它是在已有的视觉听音导航任务基础上增加多个目标的情况。* Methods: 这 paper 使用了深度强化学习代理人进行 navigation,并在不同的情况下进行了实验研究,以了解多目标听视导航任务的难度。另外,paper 还提出了一种名为声音方向地图(SDM)的方法,可以在学习型 manner 中动态地Localize 多个声音源。* Results: 实验结果表明,使用 SDM 方法可以显著提高多个基eline 方法的性能,不管目标数量多少。
    Abstract Over the past few years, there has been a great deal of research on navigation tasks in indoor environments using deep reinforcement learning agents. Most of these tasks use only visual information in the form of first-person images to navigate to a single goal. More recently, tasks that simultaneously use visual and auditory information to navigate to the sound source and even navigation tasks with multiple goals instead of one have been proposed. However, there has been no proposal for a generalized navigation task combining these two types of tasks and using both visual and auditory information in a situation where multiple sound sources are goals. In this paper, we propose a new framework for this generalized task: multi-goal audio-visual navigation. We first define the task in detail, and then we investigate the difficulty of the multi-goal audio-visual navigation task relative to the current navigation tasks by conducting experiments in various situations. The research shows that multi-goal audio-visual navigation has the difficulty of the implicit need to separate the sources of sound. Next, to mitigate the difficulties in this new task, we propose a method named sound direction map (SDM), which dynamically localizes multiple sound sources in a learning-based manner while making use of past memories. Experimental results show that the use of SDM significantly improves the performance of multiple baseline methods, regardless of the number of goals.
    摘要 在过去几年,深度强化学习代理人在室内环境中完成导航任务得到了很多研究。大多数这些任务只使用视觉信息,即首人图像,导航到单个目标。然而,最近提出了同时使用视觉和听音信息导航到声源的任务,以及多个目标导航任务。然而,没有任何提案可以将这两种任务结合起来,并使用两种类型的信息在多个声源目标下进行导航。在这篇论文中,我们提出了一个新的框架:多目标音频视觉导航。我们首先定义了这个任务,然后通过在不同情况下进行实验来调查这个任务的困难程度。实验结果表明,多目标音频视觉导航任务存在隐式地分离声音来源的需求,这使得任务变得更加困难。然后,我们提出了一种名为声音方向地图(SDM)的方法,可以在学习基础上动态地Localize多个声音来源,并且利用过去的记忆。实验结果表明,使用SDM可以significantly improve多个基eline方法的表现,不管声音来源的数量如何。

Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF)

  • paper_url: http://arxiv.org/abs/2308.00214
  • repo_url: None
  • paper_authors: Chaochao Zhou, Syed Hasib Akhter Faruqui, Abhinav Patel, Ramez N. Abdalla, Michael C. Hurley, Ali Shaibani, Matthew B. Potts, Babak S. Jahromi, Leon Cho, Sameer A. Ansari, Donald R. Cantrell
  • for: 这 paper 是为了解决基于 X-ray 投影的医学小剖手术中的姿态估计问题。
  • methods: 这 paper 使用了新的 pose estimation 方法,包括 DiffDRR、NeTT 和 mNeRF。这些方法都利用了 TensorFlow 中的自动微分,并且使用了高精度的 DRR synthesis 来提高姿态估计的精度。
  • results: 这 paper 的结果表明,NeTT 和 mNeRF 都能够有效地进行姿态估计,并且两者的成功率都高于 93%。然而,NeTT 的计算成本远低于 mNeRF,而且 NeTT 可以在训练和姿态估计阶段都具有更好的性能。此外,paper 还表明了 NeTT 可以在不同的人体标本上进行高精度的 DRR synthesis 和姿态估计。因此, authors 建议使用 NeTT 来实现 robust 的姿态估计。
    Abstract Many tasks performed in image-guided, mini-invasive, medical procedures can be cast as pose estimation problems, where an X-ray projection is utilized to reach a target in 3D space. Expanding on recent advances in the differentiable rendering of optically reflective materials, we introduce new methods for pose estimation of radiolucent objects using X-ray projections, and we demonstrate the critical role of optimal view synthesis in performing this task. We first develop an algorithm (DiffDRR) that efficiently computes Digitally Reconstructed Radiographs (DRRs) and leverages automatic differentiation within TensorFlow. Pose estimation is performed by iterative gradient descent using a loss function that quantifies the similarity of the DRR synthesized from a randomly initialized pose and the true fluoroscopic image at the target pose. We propose two novel methods for high-fidelity view synthesis, Neural Tuned Tomography (NeTT) and masked Neural Radiance Fields (mNeRF). Both methods rely on classic Cone-Beam Computerized Tomography (CBCT); NeTT directly optimizes the CBCT densities, while the non-zero values of mNeRF are constrained by a 3D mask of the anatomic region segmented from CBCT. We demonstrate that both NeTT and mNeRF distinctly improve pose estimation within our framework. By defining a successful pose estimate to be a 3D angle error of less than 3 deg, we find that NeTT and mNeRF can achieve similar results, both with overall success rates more than 93%. However, the computational cost of NeTT is significantly lower than mNeRF in both training and pose estimation. Furthermore, we show that a NeTT trained for a single subject can generalize to synthesize high-fidelity DRRs and ensure robust pose estimations for all other subjects. Therefore, we suggest that NeTT is an attractive option for robust pose estimation using fluoroscopic projections.
    摘要 许多在图像导航、微创手术中进行的任务可以被看作为位置估计问题,其中使用X射线投影来达到3D空间中的目标。在不断提高数据渠道 Rendering 技术的基础之上,我们介绍了一种新的方法,即基于 X射线投影的对象位置估计。我们首先开发了一种名为 DiffDRR 的算法,它可以高效计算 Digitally Reconstructed Radiographs (DRRs),并在 TensorFlow 中使用自动导数。在 pose 估计中,我们使用一个损失函数,该函数衡量 DRR 从 randomly initialized pose 中生成的synthesized 和真实 fluoroscopic 图像在目标姿势下的相似性。我们提出了两种高精度视图合成方法,即 Neural Tuned Tomography (NeTT) 和 masked Neural Radiance Fields (mNeRF)。两种方法都基于 Cone-Beam Computerized Tomography (CBCT),NeTT 直接优化 CBCT 密度,而 mNeRF 的非零值被限制为 segmented 从 CBCT 中的3Dmask。我们发现 NeTT 和 mNeRF 都可以提高 pose 估计的准确性,其中 NeTT 的计算成本较低,并且可以在训练和 pose 估计中进行高效的计算。此外,我们发现 NeTT 可以在不同主体之间进行交互学习,并且可以在单个主体训练后对所有主体进行高精度的 DRR 生成和 pose 估计。因此,我们认为 NeTT 是一种可靠的选择 для基于 fluoroscopic 投影的 pose 估计。

Scene Separation & Data Selection: Temporal Segmentation Algorithm for Real-Time Video Stream Analysis

  • paper_url: http://arxiv.org/abs/2308.00210
  • repo_url: None
  • paper_authors: Yuelin Xin, Zihan Zhou, Yuxuan Xia
  • for: 视频流理解的实时解读
  • methods: 使用图像差分比较 Temporal segmentation算法
  • results: 实验结果达到90%以上的总准确率
    Abstract We present 2SDS (Scene Separation and Data Selection algorithm), a temporal segmentation algorithm used in real-time video stream interpretation. It complements CNN-based models to make use of temporal information in videos. 2SDS can detect the change between scenes in a video stream by com-paring the image difference between two frames. It separates a video into segments (scenes), and by combining itself with a CNN model, 2SDS can select the optimal result for each scene. In this paper, we will be discussing some basic methods and concepts behind 2SDS, as well as presenting some preliminary experiment results regarding 2SDS. During these experiments, 2SDS has achieved an overall accuracy of over 90%.
    摘要 我们现在介绍2SDS(Scene Separation and Data Selection算法),一种用于实时视频流理解的时间段分算法。它与基于CNN(卷积神经网络)模型结合使用,以利用视频中的时间信息。2SDS可以在视频流中检测图像差异,并将视频流分解成场景(scene)。通过与CNN模型结合使用,2SDS可以选择每个场景的优化结果。在这篇论文中,我们将讨论2SDS的一些基本方法和概念,以及2SDS的一些初步实验结果。在这些实验中,2SDS达到了超过90%的总准确率。

CBCL-PR: A Cognitively Inspired Model for Class-Incremental Learning in Robotics

  • paper_url: http://arxiv.org/abs/2308.00199
  • repo_url: https://github.com/aliayub7/cbcl-pr
  • paper_authors: Ali Ayub, Alan R. Wagner
  • for: 该论文解决了基于少量数据的自适应学习和增量学习问题,即AI机器人需要在有限数据情况下不断学习和适应环境中。
  • methods: 该论文提出了一种基于 hippocampus 和 neocortex 理论的新框架,用于解决 Few-Shot class Incremental Learning(FSIL)问题。该框架表示物体类划分成多个集合,并将其存储在内存中。重复播放过去类划分中生成的数据,以避免卷积学习新类时忘记过去类。
  • results: 该论文在两个物体分类 dataset 上进行了评估,并取得了当今最佳性能(SOTA)。此外,在一个 robot 上也进行了评估,并证明了机器人可以在有限人工协助下不断学习分类大量家用物品。
    Abstract For most real-world applications, robots need to adapt and learn continually with limited data in their environments. In this paper, we consider the problem of Few-Shot class Incremental Learning (FSIL), in which an AI agent is required to learn incrementally from a few data samples without forgetting the data it has previously learned. To solve this problem, we present a novel framework inspired by theories of concept learning in the hippocampus and the neocortex. Our framework represents object classes in the form of sets of clusters and stores them in memory. The framework replays data generated by the clusters of the old classes, to avoid forgetting when learning new classes. Our approach is evaluated on two object classification datasets resulting in state-of-the-art (SOTA) performance for class-incremental learning and FSIL. We also evaluate our framework for FSIL on a robot demonstrating that the robot can continually learn to classify a large set of household objects with limited human assistance.
    摘要 For most real-world applications, robots need to adapt and learn continually with limited data in their environments. In this paper, we consider the problem of Few-Shot class Incremental Learning (FSIL), in which an AI agent is required to learn incrementally from a few data samples without forgetting the data it has previously learned. To solve this problem, we present a novel framework inspired by theories of concept learning in the hippocampus and the neocortex. Our framework represents object classes in the form of sets of clusters and stores them in memory. The framework replays data generated by the clusters of the old classes, to avoid forgetting when learning new classes. Our approach is evaluated on two object classification datasets, resulting in state-of-the-art (SOTA) performance for class-incremental learning and FSIL. We also evaluate our framework for FSIL on a robot, demonstrating that the robot can continually learn to classify a large set of household objects with limited human assistance.Here's the translation in Traditional Chinese:For most real-world applications, robots need to adapt and learn continually with limited data in their environments. In this paper, we consider the problem of Few-Shot class Incremental Learning (FSIL), in which an AI agent is required to learn incrementally from a few data samples without forgetting the data it has previously learned. To solve this problem, we present a novel framework inspired by theories of concept learning in the hippocampus and the neocortex. Our framework represents object classes in the form of sets of clusters and stores them in memory. The framework replays data generated by the clusters of the old classes, to avoid forgetting when learning new classes. Our approach is evaluated on two object classification datasets, resulting in state-of-the-art (SOTA) performance for class-incremental learning and FSIL. We also evaluate our framework for FSIL on a robot, demonstrating that the robot can continually learn to classify a large set of household objects with limited human assistance.

C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation

  • paper_url: http://arxiv.org/abs/2308.00193
  • repo_url: None
  • paper_authors: Boah Kim, Yujin Oh, Bradford J. Wood, Ronald M. Summers, Jong Chul Ye
  • For: The paper is written for the purpose of developing a self-supervised vessel segmentation method for medical imaging, which can help improve the accuracy and efficiency of vascular disease diagnosis and interventional planning.* Methods: The paper proposes a novel method called C-DARL, which combines a diffusion module and a generation module to learn the distribution of multi-domain blood vessel data. The model uses contrastive learning through a mask-based contrastive loss to generate more realistic vessel representations.* Results: The experimental results show that C-DARL achieves performance improvement over baseline methods with noise robustness, indicating the effectiveness of the proposed method for vessel segmentation in medical imaging.
    Abstract Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this paper presents a self-supervised vessel segmentation method, dubbed the contrastive diffusion adversarial representation learning (C-DARL) model. Our model is composed of a diffusion module and a generation module that learns the distribution of multi-domain blood vessel data by generating synthetic vessel images from diffusion latent. Moreover, we employ contrastive learning through a mask-based contrastive loss so that the model can learn more realistic vessel representations. To validate the efficacy, C-DARL is trained using various vessel datasets, including coronary angiograms, abdominal digital subtraction angiograms, and retinal imaging. Experimental results confirm that our model achieves performance improvement over baseline methods with noise robustness, suggesting the effectiveness of C-DARL for vessel segmentation.
    摘要 医学影像中血管分割是诊断血管疾病和 intervención规划的关键步骤,在各种临床场景中都非常重要。然而,手动标注血管Masks是困难和资源占用的,这是因为血管结构细节和分支非常细。为了解决这个问题,本文提出了一种自动学习的血管分割方法,名为对比扩散对抗表示学习(C-DARL)模型。我们的模型包括扩散模块和生成模块,通过生成多个频率血管数据的分布来学习血管图像的分布。此外,我们采用对比学习,通过一个基于Mask的对比损失函数,使模型学习更加真实的血管表示。为验证效果,C-DARL被训练了多种血管数据集,包括心血管扫描、腹部数字扫描和视网膜成像。实验结果表明,我们的模型在噪声环境下表现出性能提高,这表明C-DARL是有效的血管分割方法。

Detecting the Anomalies in LiDAR Pointcloud

  • paper_url: http://arxiv.org/abs/2308.00187
  • repo_url: None
  • paper_authors: Chiyu Zhang, Ji Han, Yao Zou, Kexin Dong, Yujia Li, Junchun Ding, Xiaoling Han
  • for: 本研究旨在检测LiDAR数据中异常点云的存在,以提高自动驾驶系统的安全性。
  • methods: 本文提出了一种基于点云特征分析的异常点云检测方法,包括点云质量指标的开发,以评估点云中噪声水平。这种方法不需要标注或训练,因此可以快速执行和扩展。
  • results: 经过实验表明,该方法能够有效地检测LiDAR数据中异常点云,并可以适应不同的扫描机制和激光谱。
    Abstract LiDAR sensors play an important role in the perception stack of modern autonomous driving systems. Adverse weather conditions such as rain, fog and dust, as well as some (occasional) LiDAR hardware fault may cause the LiDAR to produce pointcloud with abnormal patterns such as scattered noise points and uncommon intensity values. In this paper, we propose a novel approach to detect whether a LiDAR is generating anomalous pointcloud by analyzing the pointcloud characteristics. Specifically, we develop a pointcloud quality metric based on the LiDAR points' spatial and intensity distribution to characterize the noise level of the pointcloud, which relies on pure mathematical analysis and does not require any labeling or training as learning-based methods do. Therefore, the method is scalable and can be quickly deployed either online to improve the autonomy safety by monitoring anomalies in the LiDAR data or offline to perform in-depth study of the LiDAR behavior over large amount of data. The proposed approach is studied with extensive real public road data collected by LiDARs with different scanning mechanisms and laser spectrums, and is proven to be able to effectively handle various known and unknown sources of pointcloud anomaly.
    摘要 利达 laser 传感器在现代自动驾驶系统中扮演着重要的角色。不良天气条件如雨、雾和尘埃,以及一些(偶尔)的 LiDAR 硬件问题,可能会导致 LiDAR 生成异常的点云,如散发的噪声点和不寻常的INTENSITY值。在这篇论文中,我们提出了一种新的方法来检测 LiDAR 生成的点云是否异常。specifically,我们开发了一个基于 LiDAR 点云的空间和INTENSITY 分布的点云质量度量,以衡量点云的噪声水平。这种方法不需要标注或训练,因此可以快速执行并可以在线监测 LiDAR 数据中的异常,以提高自动驾驶的安全性。我们对实际的公共路数据进行了广泛的研究,并证明了该方法可以有效地处理不同的 LiDAR 扫描机制和激光谱。

Towards Imbalanced Large Scale Multi-label Classification with Partially Annotated Labels

  • paper_url: http://arxiv.org/abs/2308.00166
  • repo_url: None
  • paper_authors: XIn Zhang, Yuqi Song, Fei Zuo, Xiaofeng Wang
  • For: 多个标签分类问题在日常生活中广泛存在,其中一个实例可以与多个类相关。这是一种指导学习方法,需要大量的标注数据。但是,标注数据可能是时间consuming且可能无法实现巨大的标注空间。另外,标签偏好可能限制多个标签分类器的性能,特别是缺失某些标签。因此,研究如何使用偏好的标签来训练神经网络是有意义的。* Methods: 我们引入 Pseudo-labeling 技术,允许常见的神经网络在部分标注设置下运行,无需额外复杂的结构。然后,我们提出了一种新的损失函数,利用现有数据集的统计信息来有效地缓解标签偏好问题。另外,我们设计了一种动态训练方案,以减少标注空间的维度,进一步缓解偏好。* Results: 我们在 COCO、NUS-WIDE、CUB 和 Open Images 等多个公共可用的多个标签数据集上进行了广泛的实验。结果表明,我们的方法比一些现状顶尖方法高效,而且在一些部分标注设置下,我们的方法甚至超过了使用全标注的方法。
    Abstract Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes. In theory, this is a supervised learning method that requires a large amount of labeling. However, annotating data is time-consuming and may be infeasible for huge labeling spaces. In addition, label imbalance can limit the performance of multi-label classifiers, especially when some labels are missing. Therefore, it is meaningful to study how to train neural networks using partial labels. In this work, we address the issue of label imbalance and investigate how to train classifiers using partial labels in large labeling spaces. First, we introduce the pseudo-labeling technique, which allows commonly adopted networks to be applied in partially labeled settings without the need for additional complex structures. Then, we propose a novel loss function that leverages statistical information from existing datasets to effectively alleviate the label imbalance problem. In addition, we design a dynamic training scheme to reduce the dimension of the labeling space and further mitigate the imbalance. Finally, we conduct extensive experiments on some publicly available multi-label datasets such as COCO, NUS-WIDE, CUB, and Open Images to demonstrate the effectiveness of the proposed approach. The results show that our approach outperforms several state-of-the-art methods, and surprisingly, in some partial labeling settings, our approach even exceeds the methods trained with full labels.
    摘要 多个标签分类是日常生活中广泛存在的问题,其中一个实例可以与多个类相关。理论上来说,这是一种超级vised学习方法,需要大量标注数据。然而,标注数据可能是时间消耗和不可能实现的庞大标注空间。此外,标签偏好可能限制多个标签分类器的性能,特别是缺失某些标签时。因此,研究如何使用偏好标签进行神经网络训练是有意义的。在这种工作中,我们解决标签偏好问题,并 investigate如何使用偏好标签进行神经网络训练在大型标注空间中。首先,我们介绍了假标签技术,允许常用的网络在部分标注 setting中使用,无需额外复杂结构。然后,我们提出了一种新的损失函数,可以从现有数据集中获取有用的统计信息,有效地解决标签偏好问题。另外,我们设计了一种动态训练方案,以降低标注空间的维度,进一步减轻标签偏好。最后,我们在一些公共可用的多个标签数据集上进行了广泛的实验,如COCO、NUS-WIDE、CUB和Open Images等。结果显示,我们的方法可以与一些状态机制的方法相比,甚至在某些部分标注设置下,我们的方法可以超过全标注的方法。

Multispectral Image Segmentation in Agriculture: A Comprehensive Study on Fusion Approaches

  • paper_url: http://arxiv.org/abs/2308.00159
  • repo_url: https://github.com/cybonic/misagriculture
  • paper_authors: Nuno Cunha, Tiago Barros, Mário Reis, Tiago Marta, Cristiano Premebida, Urbano J. Nunes
  • for: 本研究旨在探讨多spectral imaging在农业应用中的可支持,包括图像分割、农业监测、场 robotics 和含量估计等。
  • methods: 本研究使用融合方法提高图像分割过程中的精度,并比较了不同的融合方法,包括RGB和NDVI作为输入,用于检测农作物行进检测。
  • results: 实验表明,传统方法在certain specialized agricultural applications中仍然保持效果,而融合策略中的late fusion方法在不同的分割场景中表现出了最高的鲁棒性和效果。
    Abstract Multispectral imagery is frequently incorporated into agricultural tasks, providing valuable support for applications such as image segmentation, crop monitoring, field robotics, and yield estimation. From an image segmentation perspective, multispectral cameras can provide rich spectral information, helping with noise reduction and feature extraction. As such, this paper concentrates on the use of fusion approaches to enhance the segmentation process in agricultural applications. More specifically, in this work, we compare different fusion approaches by combining RGB and NDVI as inputs for crop row detection, which can be useful in autonomous robots operating in the field. The inputs are used individually as well as combined at different times of the process (early and late fusion) to perform classical and DL-based semantic segmentation. In this study, two agriculture-related datasets are subjected to analysis using both deep learning (DL)-based and classical segmentation methodologies. The experiments reveal that classical segmentation methods, utilizing techniques such as edge detection and thresholding, can effectively compete with DL-based algorithms, particularly in tasks requiring precise foreground-background separation. This suggests that traditional methods retain their efficacy in certain specialized applications within the agricultural domain. Moreover, among the fusion strategies examined, late fusion emerges as the most robust approach, demonstrating superiority in adaptability and effectiveness across varying segmentation scenarios. The dataset and code is available at https://github.com/Cybonic/MISAgriculture.git.
    摘要 多spectral影像在农业任务中广泛应用,提供了价值的支持,包括图像分割、农业监测、场地 робо扮演和产量估计。从图像分割角度来看,多spectral相机可以提供丰富的spectral信息,帮助降低噪声和提取特征。因此,本文集中关注在农业应用中使用融合方法提高分割过程的问题。更 Specifically,在这种工作中,我们比较了不同的融合方法,将RGB和NDVI作为输入进行耕地检测,这可以在自动化机器在场地中运行时提供有用的支持。这些输入分别使用以及在不同时间点(早期和晚期)进行融合,以执行经典和深度学习(DL)基于的semantic分割。在这项研究中,我们使用了两个农业相关的数据集进行分析,并使用经典和DL基于的方法进行分割方法。实验表明,经典分割方法,使用edge检测和阈值分割等技术,可以有效竞争DL基于的算法,特别是需要精确的前景-背景分离任务。这表明,传统方法在农业领域中特定应用场景中仍保留其效果。此外,我们对融合策略进行了分析,发现融合截止时间点为晚期融合是最有效的,在不同的分割enario中表现出了最高的适应性和效果。数据集和代码可以在https://github.com/Cybonic/MISAgriculture.git中下载。

Hierarchical Semi-Supervised Learning Framework for Surgical Gesture Segmentation and Recognition Based on Multi-Modality Data

  • paper_url: http://arxiv.org/abs/2308.02529
  • repo_url: None
  • paper_authors: Zhili Yuan, Jialin Lin, Dandan Zhang
  • for: 这份研究的目的是为了分析遗传外科手术的工作流程,特别是用于自动化遗传外科手术、评估外科技术等。
  • methods: 这篇研究使用了一个层次 semi-supervised learning 框架,使用多种数据(运动和视觉数据)进行手术动作排序。
  • results: 研究结果显示,使用这个方法可以在 JIGSAWS 数据库中得到平均 F1 分数为 0.623 的排序结果,以及识别率为 0.856。
    Abstract Segmenting and recognizing surgical operation trajectories into distinct, meaningful gestures is a critical preliminary step in surgical workflow analysis for robot-assisted surgery. This step is necessary for facilitating learning from demonstrations for autonomous robotic surgery, evaluating surgical skills, and so on. In this work, we develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data (i.e. kinematics and vision data). More specifically, surgical tasks are initially segmented based on distance characteristics-based profiles and variance characteristics-based profiles constructed using kinematics data. Subsequently, a Transformer-based network with a pre-trained `ResNet-18' backbone is used to extract visual features from the surgical operation videos. By combining the potential segmentation points obtained from both modalities, we can determine the final segmentation points. Furthermore, gesture recognition can be implemented based on supervised learning. The proposed approach has been evaluated using data from the publicly available JIGSAWS database, including Suturing, Needle Passing, and Knot Tying tasks. The results reveal an average F1 score of 0.623 for segmentation and an accuracy of 0.856 for recognition.
    摘要 划分和识别手术操作轨迹为独特和有意义的姿势是机器人助手手术中的关键前期步骤。这个步骤是为了促进从示例学习到自主机器人手术、评估手术技巧等。在这种工作中,我们开发了一种层次 semi-supervised 学习框架 для手术姿势划分,使用多modal 数据(即遥感和视觉数据)。更具体来说,手术任务首先根据距离特征profile和差异特征profile在遥感数据上建立分 segmentation points。然后,使用预训练的 `ResNet-18` 框架,我们使用视觉特征从手术操作视频中提取特征。通过将两种模式的可能性划分点结合,我们可以确定最终划分点。此外,可以基于有监督学习来实现姿势识别。我们的方法在公共可用的 JIGSAWS 数据库中进行了评估,包括缝钉、针刺和缝结任务。结果表明,划分得到的 F1 分数为 0.623,并且识别率为 0.856。

Federated Learning for Data and Model Heterogeneity in Medical Imaging

  • paper_url: http://arxiv.org/abs/2308.00155
  • repo_url: None
  • paper_authors: Hussain Ahmad Madni, Rao Muhammad Umer, Gian Luca Foresti
  • for: 该论文旨在解决 Federated Learning (FL) 中的数据和模型不一致问题,以提高 FL 的效率。
  • methods: 该论文提出了一种方法 named MDH-FL (Exploiting Model and Data Heterogeneity in FL),通过知识传承和对称损失来解决数据和模型不一致问题,以提高模型性能。
  • results: 实验结果表明,该方法在医疗数据集上比其他方法更有优势,可以更好地处理医疗机构中的数据和模型不一致问题。
    Abstract Federated Learning (FL) is an evolving machine learning method in which multiple clients participate in collaborative learning without sharing their data with each other and the central server. In real-world applications such as hospitals and industries, FL counters the challenges of data heterogeneity and model heterogeneity as an inevitable part of the collaborative training. More specifically, different organizations, such as hospitals, have their own private data and customized models for local training. To the best of our knowledge, the existing methods do not effectively address both problems of model heterogeneity and data heterogeneity in FL. In this paper, we exploit the data and model heterogeneity simultaneously, and propose a method, MDH-FL (Exploiting Model and Data Heterogeneity in FL) to solve such problems to enhance the efficiency of the global model in FL. We use knowledge distillation and a symmetric loss to minimize the heterogeneity and its impact on the model performance. Knowledge distillation is used to solve the problem of model heterogeneity, and symmetric loss tackles with the data and label heterogeneity. We evaluate our method on the medical datasets to conform the real-world scenario of hospitals, and compare with the existing methods. The experimental results demonstrate the superiority of the proposed approach over the other existing methods.
    摘要 federated 学习(FL)是一种在多个客户端合作学习而不是分享数据之间的演化机器学习方法。在现实世界中,如医院和产业中,FL 可以避免数据和模型不同性的挑战,作为合作训练的不可避免的一部分。更specifically,不同的组织,如医院,拥有自己的私有数据和自定义的本地模型进行本地训练。据我们所知,现有的方法不能有效地解决FL中的数据和模型不同性问题。在这篇论文中,我们利用数据和模型不同性同时,并提出了一种方法,MDH-FL(利用数据和模型不同性的FL),以解决这些问题,并提高FL的全球模型效率。我们使用知识塑造和对称损失来减少不同性的影响。知识塑造用于解决模型不同性问题,而对称损失用于解决数据和标签不同性问题。我们在医疗数据集上进行了实验,以验证这种方法在现实世界中的可行性,并与现有方法进行比较。实验结果表明,我们的方法在FL中表现出了superiority。

Controlling Geometric Abstraction and Texture for Artistic Images

  • paper_url: http://arxiv.org/abs/2308.00148
  • repo_url: https://github.com/MartinBuessemeyer/Artistic-Texture-Control
  • paper_authors: Martin Büßemeyer, Max Reimann, Benito Buchheim, Amir Semmo, Jürgen Döllner, Matthias Trapp
  • for: 这个论文是为了提出一种新的方法,用于对艺术图像中的几何抽象和文本URE的交互控制。
  • methods: 该方法使用了一种干预式的方法,将输入图像分解成形状和高频环境的参数表示,从而实现独立控制颜色和文本URE。这个表示中的每个参数控制了一系列可 diferenciable 的样式化过滤器的笔画属性。
  • results: 该方法可以实现多种艺术风格的编辑,包括全局和地方的形状和笔画属性的交互修改。此外,它还可以通过参考图像和文本提示来进行优化的文本风格传输,以及在参数空间中单独预测单个和任意样式参数的网络训练。
    Abstract We present a novel method for the interactive control of geometric abstraction and texture in artistic images. Previous example-based stylization methods often entangle shape, texture, and color, while generative methods for image synthesis generally either make assumptions about the input image, such as only allowing faces or do not offer precise editing controls. By contrast, our holistic approach spatially decomposes the input into shapes and a parametric representation of high-frequency details comprising the image's texture, thus enabling independent control of color and texture. Each parameter in this representation controls painterly attributes of a pipeline of differentiable stylization filters. The proposed decoupling of shape and texture enables various options for stylistic editing, including interactive global and local adjustments of shape, stroke, and painterly attributes such as surface relief and contours. Additionally, we demonstrate optimization-based texture style-transfer in the parametric space using reference images and text prompts, as well as the training of single- and arbitrary style parameter prediction networks for real-time texture decomposition.
    摘要 我们提出了一种新的方法用于艺术图像的交互式控制 geometric abstraction 和 texture。先前的例子基于的涂抹方法通常会杂mix shape, texture 和颜色,而生成图像 Synthesis 方法通常会对输入图像做假设,例如只允许人脸或不提供精确的编辑控制。相比之下,我们的总体方法将输入图像空间分解为形状和高频环境的 parametric 表示,从而启用独立控制颜色和 texture。每个参数在这个表示中控制着涂抹过程中的笔触属性和涂抹材质。这种划分shape和 texture 允许在不同的风格编辑中进行交互式的全局和本地调整形状、roke 和笔触属性,例如表面 relief 和边沿。此外,我们还示出了参考图像和文本提示的优化基于 texture style-transfer 在参数空间中,以及单个和任意风格参数预测网络的训练 для实时 texture decomposition。

Ensemble Learning with Residual Transformer for Brain Tumor Segmentation

  • paper_url: http://arxiv.org/abs/2308.00128
  • repo_url: None
  • paper_authors: Lanhong Yao, Zheyuan Zhang, Ulas Bagci
  • for: 这个论文旨在提高脑癌分类的精度,因为现有的 U-Net 架构具有复杂的形状和 texture 的问题,以及 Labeling 的问题。
  • methods: 这个论文提出了一个新的网络架构,将 Transformers integrate 到了自适应的 U-Net 中,以获取3D 维度的 Volume 上下文,并且添加了一个 residual 连接,以避免资讯流动的干扰。
  • results: 在 BraTS 2021 数据集上(3D),我们的模型获得了87.6% 的 mean Dice 分数,比前一代方法高, demonstrating the potential of combining multiple architectures to optimize brain tumor segmentation.
    Abstract Brain tumor segmentation is an active research area due to the difficulty in delineating highly complex shaped and textured tumors as well as the failure of the commonly used U-Net architectures. The combination of different neural architectures is among the mainstream research recently, particularly the combination of U-Net with Transformers because of their innate attention mechanism and pixel-wise labeling. Different from previous efforts, this paper proposes a novel network architecture that integrates Transformers into a self-adaptive U-Net to draw out 3D volumetric contexts with reasonable computational costs. We further add a residual connection to prevent degradation in information flow and explore ensemble methods, as the evaluated models have edges on different cases and sub-regions. On the BraTS 2021 dataset (3D), our model achieves 87.6% mean Dice score and outperforms the state-of-the-art methods, demonstrating the potential for combining multiple architectures to optimize brain tumor segmentation.
    摘要 神经瘤分割是一个活跃的研究领域,因为高度复杂的形态和文本化 tumor 难以分割,以及常用的 U-Net 架构的失败。近年来,不同 neural 架构的组合在主流研究中得到了更多的关注,特别是将 U-Net 与 Transformers 组合使用,因为它们的自然注意机制和像素级标注。与前一些尝试不同,本文提出了一种新的网络架构,将 Transformers integrate 到自适应 U-Net 中,以获取3D Volume 上下文,并且在计算成本下进行合理的融合。此外,我们还添加了 residual 连接,以避免信息流失并探索 ensemble 方法,因为评估模型在不同的情况下和子区域上具有优势。在 BraTS 2021 数据集(3D)上,我们的模型实现了87.6%的 mean Dice 分数,超过了当前的状态ola 方法,这表明可以通过组合不同的架构来优化神经瘤分割。

DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.00122
  • repo_url: None
  • paper_authors: Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
  • for: solves the audio-visual sound source separation task through a generative manner
  • methods: leverages a generative diffusion model and a Separation U-Net to synthesize separated magnitudes starting from Gaussian noises, conditioned on both the audio mixture and the visual footage
  • results: outperforms other methods in separation quality, demonstrating the advantages of our framework for tackling the audio-visual source separation task.
    Abstract We propose DAVIS, a Diffusion model-based Audio-VIusal Separation framework that solves the audio-visual sound source separation task through a generative manner. While existing discriminative methods that perform mask regression have made remarkable progress in this field, they face limitations in capturing the complex data distribution required for high-quality separation of sounds from diverse categories. In contrast, DAVIS leverages a generative diffusion model and a Separation U-Net to synthesize separated magnitudes starting from Gaussian noises, conditioned on both the audio mixture and the visual footage. With its generative objective, DAVIS is better suited to achieving the goal of high-quality sound separation across diverse categories. We compare DAVIS to existing state-of-the-art discriminative audio-visual separation methods on the domain-specific MUSIC dataset and the open-domain AVE dataset, and results show that DAVIS outperforms other methods in separation quality, demonstrating the advantages of our framework for tackling the audio-visual source separation task.
    摘要 我们提出了DAVIS,一种基于扩散模型的音视频分离框架,用于通过生成方式解决音视频声源分离问题。而现有的探测方法,尽管在这个领域做出了很大的进步,但是它们在处理多种类别的声音分离时存在限制,因为它们只能通过压缩探测来捕捉复杂的数据分布。相比之下,DAVIS利用生成扩散模型和分离U-Net来生成来自高斯噪声的分离后的声音大小, conditioned on both the audio mixture和视频采集。它的生成目标使得DAVIS更适合在多种类别中实现高质量的声音分离。我们将DAVIS与现有的领先的探测音视频分离方法进行比较,并在领域专门的MUSIC dataset和开放的AVE dataset上进行测试,结果显示DAVIS在分离质量方面超过了其他方法,证明了我们的框架在解决音视频源分离问题中的优势。

Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects

  • paper_url: http://arxiv.org/abs/2308.00091
  • repo_url: https://github.com/nikhilmishra000/fcon
  • paper_authors: Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb
  • for: 这个论文主要针对了仓储和物流应用中的紧凑堆叠问题,既然现实中的堆叠性能受到3D物体几何学的观察困难所限制。
  • methods: 该论文提出了一种全数据预训练的深度学习模型F-CON,可以与现有的规划方法结合使用,以实现现实世界中的紧凑堆叠。同时,该论文还发布了一个可用于培训形态完成模型的实验数据集COB-3D-v2。
  • results: 该论文的实验结果表明,F-CON模型比其他状态前的形态完成方法更高效,可以在实际世界中实现紧凑堆叠复杂的未经见过的物体。此外,该论文还 equip了一个真实世界的捕捉和置物系统,以确认F-CON模型在实际世界中的表现。
    Abstract Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications. Prior work in this space has largely focused on planning algorithms in simulation, but real-world packing performance is often bottlenecked by the difficulty of perceiving 3D object geometry in highly occluded, partially observed scenes. In this work, we present a fully-convolutional shape completion model, F-CON, which can be easily combined with off-the-shelf planning methods for dense packing in the real world. We also release a simulated dataset, COB-3D-v2, that can be used to train shape completion models for real-word robotics applications, and use it to demonstrate that F-CON outperforms other state-of-the-art shape completion methods. Finally, we equip a real-world pick-and-place system with F-CON, and demonstrate dense packing of complex, unseen objects in cluttered scenes. Across multiple planning methods, F-CON enables substantially better dense packing than other shape completion methods.
    摘要 dense packing在选择和放置系统中是一项重要的特性,在多家仓储和物流应用中广泛使用。先前的工作主要集中在仿真中计划算法上,但实际的填充性常被受到3D物体几何体的识别困难所限制。在这种情况下,我们提出了一种全 convolutional 形态完成模型,F-CON,可以轻松地与现有的规划方法结合使用,以实现实际世界中的高密度填充。我们还发布了一个可用于实际 роботику应用的模拟数据集,COB-3D-v2,并使用其来证明F-CON在实际世界中的表现比其他状态最佳的形态完成方法更好。最后,我们将实际世界中的选择和放置系统 équip with F-CON,并在拥挤的场景中实现了复杂、未见的物体的高密度填充。不同于其他形态完成方法,F-CON在多种规划方法下实现了显著更好的高密度填充。

Visual Geo-localization with Self-supervised Representation Learning

  • paper_url: http://arxiv.org/abs/2308.00090
  • repo_url: None
  • paper_authors: Jiuhong Xiao, Gao Zhu, Giuseppe Loianno
  • for: 提高Visual Geo-localization(VG)的性能和训练效率,使用Self-Supervised Learning(SSL)方法。
  • methods: integrate多种SSL方法(SimCLR、MoCov2、BYOL、SimSiam、Barlow Twins、VICReg),系统地分析不同训练策略和参数设置的影响。
  • results: 无需困扰的硬解释挖掘(HNM),方法可以与基准方法相比或甚至超越VG性能。
    Abstract Visual Geo-localization (VG) has emerged as a significant research area, aiming to identify geolocation based on visual features. Most VG approaches use learnable feature extractors for representation learning. Recently, Self-Supervised Learning (SSL) methods have also demonstrated comparable performance to supervised methods by using numerous unlabeled images for representation learning. In this work, we present a novel unified VG-SSL framework with the goal to enhance performance and training efficiency on a large VG dataset by SSL methods. Our work incorporates multiple SSL methods tailored for VG: SimCLR, MoCov2, BYOL, SimSiam, Barlow Twins, and VICReg. We systematically analyze the performance of different training strategies and study the optimal parameter settings for the adaptation of SSL methods for the VG task. The results demonstrate that our method, without the significant computation and memory usage associated with Hard Negative Mining (HNM), can match or even surpass the VG performance of the baseline that employs HNM. The code is available at https://github.com/arplaboratory/VG_SSL.
    摘要 Visual Geo-localization (VG) 已经成为一个突出的研究领域,旨在通过视觉特征进行地理位置标识。大多数 VG 方法使用学习式特征提取器进行表征学习。现在,自动标注学习(SSL)方法也在 VG 中展示了相当于指导学习方法的性能,通过大量无标注图像进行表征学习。在这种工作中,我们提出了一个新的 VG-SSL 框架,目标是在大型 VG 数据集上提高性能和训练效率。我们在 VG 中采用多种 SSL 方法,包括 SimCLR、MoCov2、BYOL、SimSiam、Barlow Twins 和 VICReg。我们系统地分析不同训练策略的性能,并研究 SSL 方法在 VG 任务上进行适应时的优化参数设置。结果表明,我们的方法,不需要与硬negative mining(HNM)相关的重要计算和内存使用,可以与基准方法相比或超越 VG 性能。代码可以在 上下载。

T-Fusion Net: A Novel Deep Neural Network Augmented with Multiple Localizations based Spatial Attention Mechanisms for Covid-19 Detection

  • paper_url: http://arxiv.org/abs/2308.00053
  • repo_url: None
  • paper_authors: Susmita Ghosh, Abhiroop Chatterjee
  • for: 提高图像分类任务的性能
  • methods: 提出了一种新的深度神经网络(名为 T-Fusion Net),该网络在多个本地化的基础上实现了多个尺度的自适应注意力。
  • results: 实验结果表明,提出的 T-Fusion Net 和其 ensemble 模型在 Covid-19 (SARS-CoV-2 CT 扫描)数据集上表现更好,与其他状态对照方法相比,达到了97.59% 和 98.4% 的准确率。
    Abstract In recent years, deep neural networks are yielding better performance in image classification tasks. However, the increasing complexity of datasets and the demand for improved performance necessitate the exploration of innovative techniques. The present work proposes a new deep neural network (called as, T-Fusion Net) that augments multiple localizations based spatial attention. This attention mechanism allows the network to focus on relevant image regions, improving its discriminative power. A homogeneous ensemble of the said network is further used to enhance image classification accuracy. For ensembling, the proposed approach considers multiple instances of individual T-Fusion Net. The model incorporates fuzzy max fusion to merge the outputs of individual nets. The fusion process is optimized through a carefully chosen parameter to strike a balance on the contributions of the individual models. Experimental evaluations on benchmark Covid-19 (SARS-CoV-2 CT scan) dataset demonstrate the effectiveness of the proposed T-Fusion Net as well as its ensemble. The proposed T-Fusion Net and the homogeneous ensemble model exhibit better performance, as compared to other state-of-the-art methods, achieving accuracy of 97.59% and 98.4%, respectively.
    摘要 近年来,深度神经网络在图像分类任务中表现越来越好。然而,数据集的复杂度和表现需求的提高导致了探索新技术的需要。本工作提出了一种新的深度神经网络(称为T-Fusion Net),该网络通过多个本地化基于空间注意力机制来增强其分类力。这种注意力机制使得网络能够关注相关的图像区域,从而提高其分类精度。此外,本工作还提出了一种同一个T-Fusion Net的多个实例的Homogeneous Ensemble模型,通过粗略的max fusione ensemble来提高图像分类精度。在折衔参数的优化下,这种ensemble模型可以充分利用各个模型的贡献,达到最佳的分类精度。在COVID-19(SARS-CoV-2 CT扫描)数据集上进行的实验评估表明,提出的T-Fusion Net和Homogeneous Ensemble模型具有更高的表现度,与其他状态对照方法相比,分类精度达97.59%和98.4%。

Cross-Dataset Adaptation for Instrument Classification in Cataract Surgery Videos

  • paper_url: http://arxiv.org/abs/2308.04035
  • repo_url: https://github.com/jayparanjape/barlow-adaptor
  • paper_authors: Jay N. Paranjape, Shameema Sikder, Vishal M. Patel, S. Swaroop Vedula
  • for: 本研究旨在解决骨刃手术数据中存在的频繁域变化问题,提高不同数据集之间的性能。
  • methods: 本文提出了一种基于无监督领域适应(Unsupervised Domain Adaptation, UDA)的新方法,称为Barlow Adaptor,可以 Addressing the problem of distribution shift without requiring any labels from another domain. furthermore, the authors introduce a novel loss function called Barlow Feature Alignment Loss (BFAL) to align features across different domains.
  • results: 经验表明,提出的方法在两个骨刃手术数据集上进行了广泛的实验,与现有的UDA方法相比,提高了6%的性能。
    Abstract Surgical tool presence detection is an important part of the intra-operative and post-operative analysis of a surgery. State-of-the-art models, which perform this task well on a particular dataset, however, perform poorly when tested on another dataset. This occurs due to a significant domain shift between the datasets resulting from the use of different tools, sensors, data resolution etc. In this paper, we highlight this domain shift in the commonly performed cataract surgery and propose a novel end-to-end Unsupervised Domain Adaptation (UDA) method called the Barlow Adaptor that addresses the problem of distribution shift without requiring any labels from another domain. In addition, we introduce a novel loss called the Barlow Feature Alignment Loss (BFAL) which aligns features across different domains while reducing redundancy and the need for higher batch sizes, thus improving cross-dataset performance. The use of BFAL is a novel approach to address the challenge of domain shift in cataract surgery data. Extensive experiments are conducted on two cataract surgery datasets and it is shown that the proposed method outperforms the state-of-the-art UDA methods by 6%. The code can be found at https://github.com/JayParanjape/Barlow-Adaptor
    摘要 《针对手术工具存在检测是一项重要的手术和后期分析中的一部分。现有的模型在特定的数据集上表现良好,但在另一个数据集上表现差。这是因为不同数据集之间存在很大的领域变换,这些变换包括不同的工具、传感器、数据分辨率等。本文提出了这种领域变换的问题,并提出了一种名为Barlow Adaptor的新的无监督领域适应(UDA)方法,该方法可以在不同领域之间进行分布变换,而无需另一个领域的标签。此外,我们还引入了一种名为Barlow Feature Alignment Loss(BFAL)的新的损失函数,该损失函数可以在不同领域之间对特征进行对齐,同时减少缓存大小和标签数量,从而提高跨数据集性能。这种BFAL的使用是一种新的途径来解决手术领域中的领域变换问题。我们在两个手术数据集上进行了广泛的实验,并证明了我们的方法可以比现有的UDA方法提高6%。代码可以在https://github.com/JayParanjape/Barlow-Adaptor上找到。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training

  • paper_url: http://arxiv.org/abs/2307.16896
  • repo_url: None
  • paper_authors: Jeya Maria Jose Valanarasu, Yucheng Tang, Dong Yang, Ziyue Xu, Can Zhao, Wenqi Li, Vishal M. Patel, Bennett Landman, Daguang Xu, Yufan He, Vishwesh Nath
  • for: 这篇研究旨在设计一个有效的预训架构,以帮助学习3D医学影像的特有特征。
  • methods: 本研究提出了一个新的masking策略,即在通道嵌入中进行masking,以提高本地特征表现学习。此外,我们还提出了一种名为Disruptive Autoencoders的预训架构,可以从混合本地masking和低级扰动中寻找原始影像。此外,我们还提出了一种跨模式对称差异损失(CMCL),以便在单一架构中预训多 modalities。
  • results: 我们在多个下游任务上试用了提出的预训架构,并取得了州先的性能。特别是,我们的提出的方法在BTCV多器官分类挑战中公开试题中排名第一。
    Abstract Harnessing the power of pre-training on large-scale datasets like ImageNet forms a fundamental building block for the progress of representation learning-driven solutions in computer vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc. These characteristics of medical images require special attention towards learning features representative of local context. In this work, we focus on designing an effective pre-training framework for 3D radiology images. First, we propose a new masking strategy called local masking where the masking is performed across channel embeddings instead of tokens to improve the learning of local feature representations. We combine this with classical low-level perturbations like adding noise and downsampling to further enable low-level representation learning. To this end, we introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. Additionally, we also devise a cross-modal contrastive loss (CMCL) to accommodate the pre-training of multiple modalities in a single framework. We curate a large-scale dataset to enable pre-training of 3D medical radiology images (MRI and CT). The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance. Notably, our proposed method tops the public test leaderboard of BTCV multi-organ segmentation challenge.
    摘要 使用大规模数据集如ImageNet来预训练是计算机视觉领域的基础建筑块之一。医疗图像与自然图像不同,因为它们可以通过多种模式(如CT、MR、PET、ultrasound等)获得,并且含有细节信息如组织、肿瘤、器官等。这些医疗图像特点需要对学习本地上下文特征进行特别注意。在这种工作中,我们关注于设计有效的预训练框架 для3D医疗图像。我们提议一种新的 маSKing策略,称为本地 маSKing,在通道嵌入中进行 маSKing,以提高本地特征表示学习。我们还结合了经典的低级扰动,如噪声和下采样,以进一步促进低级表示学习。为此,我们引入了Disruptive Autoencoders,一种预训练框架,它尝试通过对原始图像的创造出的干扰来重建原始图像。此外,我们还开发了一种跨Modal Contrastive Loss(CMCL),以便在单个框架中预训练多个模式。我们筹建了一个大规模数据集,以便预训练3D医疗图像(MRI和CT)。我们的提议预训练框架在多个下游任务上进行测试,并达到了状态机的性能。尤其是,我们的提议方法在BTCV多器官分割挑战中公共测试领先榜上名列前茅。

Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy

  • paper_url: http://arxiv.org/abs/2307.16867
  • repo_url: https://github.com/jieshibo/petl-vit
  • paper_authors: Shibo Jie, Haoqing Wang, Zhi-Hong Deng
  • for: 这篇论文的目的是提出一种实现优化小型适材料的方法,以减少储存和传输过程中的过大负载。
  • methods: 这篇论文使用了Adapter-based Parameter-Efficient Tuning(PET)方法,将轻量级的扩展器插入到预训练完成的大型vision模型中,以实现任务特定的微调。
  • results: 经过广泛的实验, authors发现了1比特适材料可以实现最小的性能损失,并且在VTAB-1K标准库和几个shot FGVC任务上表现更好。
    Abstract Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models. However, with the exponential growth of model sizes, the conventional full fine-tuning, which needs to store a individual network copy for each tasks, leads to increasingly huge storage and transmission overhead. Adapter-based Parameter-Efficient Tuning (PET) methods address this challenge by tuning lightweight adapters inserted into the frozen pre-trained models. In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network. Inspired by the observation that the parameters of adapters converge at flat local minima, we find that adapters are resistant to noise in parameter space, which means they are also resistant to low numerical precision. To train low-precision adapters, we propose a computational-efficient quantization method which minimizes the quantization error. Through extensive experiments, we find that low-precision adapters exhibit minimal performance degradation, and even 1-bit precision is sufficient for adapters. The experimental results demonstrate that 1-bit adapters outperform all other PET methods on both the VTAB-1K benchmark and few-shot FGVC tasks, while requiring the smallest storage size. Our findings show, for the first time, the significant potential of quantization techniques in PET, providing a general solution to enhance the parameter efficiency of adapter-based PET methods. Code: https://github.com/JieShibo/PETL-ViT
    摘要 现代计算机视觉技术的研究 partly rely on fine-tuning large pre-trained vision models. However, with the exponential growth of model sizes, the conventional full fine-tuning, which requires storing a separate network copy for each task, leads to increasingly huge storage and transmission overhead. Adapter-based Parameter-Efficient Tuning (PET) methods address this challenge by fine-tuning lightweight adapters inserted into the frozen pre-trained models. In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network. 启发于参数空间的平铺ocal minimum的观察,我们发现 adapter 对参数空间的随机噪声具有抗性,这意味着 adapter 也具有低精度的抗性。为了训练低精度 adapter,我们提出了一种 computationally efficient quantization method,which minimizes the quantization error. Through extensive experiments, we find that low-precision adapters exhibit minimal performance degradation, and even 1-bit precision is sufficient for adapters. The experimental results demonstrate that 1-bit adapters outperform all other PET methods on both the VTAB-1K benchmark and few-shot FGVC tasks, while requiring the smallest storage size. Our findings show, for the first time, the significant potential of quantization techniques in PET, providing a general solution to enhance the parameter efficiency of adapter-based PET methods.Code: https://github.com/JieShibo/PETL-ViT

Universal Adversarial Defense in Remote Sensing Based on Pre-trained Denoising Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.16865
  • repo_url: https://github.com/EricYu97/UAD-RS
  • paper_authors: Weikang Yu, Yonghao Xu, Pedram Ghamisi
  • for: 这个研究的目的是为了提出一个通用的适应性攻击防护方法(UAD-RS),以对RS数据中常见的多种不明适攻击进行防护。
  • methods: 这个方法使用预训 diffusion 模型来防护常见的DNN攻击,包括使用预训 diffusion 模型来学习不同RS数据集之间的通用表示,然后使用这些表示来对攻击样本进行净化。
  • results: 实验结果显示,UAD-RS 可以对四个不同的RS数据集进行通用防护,并且与现有的防护方法相比,具有更高的效果和更低的训练成本。
    Abstract Deep neural networks (DNNs) have achieved tremendous success in many remote sensing (RS) applications, in which DNNs are vulnerable to adversarial perturbations. Unfortunately, current adversarial defense approaches in RS studies usually suffer from performance fluctuation and unnecessary re-training costs due to the need for prior knowledge of the adversarial perturbations among RS data. To circumvent these challenges, we propose a universal adversarial defense approach in RS imagery (UAD-RS) using pre-trained diffusion models to defend the common DNNs against multiple unknown adversarial attacks. Specifically, the generative diffusion models are first pre-trained on different RS datasets to learn generalized representations in various data domains. After that, a universal adversarial purification framework is developed using the forward and reverse process of the pre-trained diffusion models to purify the perturbations from adversarial samples. Furthermore, an adaptive noise level selection (ANLS) mechanism is built to capture the optimal noise level of the diffusion model that can achieve the best purification results closest to the clean samples according to their Frechet Inception Distance (FID) in deep feature space. As a result, only a single pre-trained diffusion model is needed for the universal purification of adversarial samples on each dataset, which significantly alleviates the re-training efforts and maintains high performance without prior knowledge of the adversarial perturbations. Experiments on four heterogeneous RS datasets regarding scene classification and semantic segmentation verify that UAD-RS outperforms state-of-the-art adversarial purification approaches with a universal defense against seven commonly existing adversarial perturbations. Codes and the pre-trained models are available online (https://github.com/EricYu97/UAD-RS).
    摘要 深度神经网络(DNNs)在远程感知应用中已经取得了很大的成功,但是DNNs受到了恶作剂扰动的威胁。然而,现有的远程感知领域中的抗恶作剂防御策略通常会受到性能波动和不必要的重新训练成本,因为需要对远程感知数据进行先前知识。为了缓解这些挑战,我们提出了远程感知领域中的通用抗恶作剂防御策略(UAD-RS),使用预训练的扩散模型来防御通用DNNs对多种未知恶作剂的攻击。具体来说,首先预训练了不同的远程感知 dataset 上的扩散模型,以学习不同数据领域中的通用表示。然后,我们开发了一种通用抗恶作剂纯化框架,使用预训练的扩散模型的前向和反向过程来纯化恶作剂攻击后的样本。此外,我们还建立了一种适应性的噪声水平选择(ANLS)机制,以便在深度特征空间中选择最佳的噪声水平,以达到最佳的纯化结果最接近于干净样本的 Frechet Inception Distance(FID)。因此,只需要预训练一个扩散模型,可以对各个 dataset 进行通用的抗恶作剂纯化,大大减少重新训练的努力和维护高性能,无需对恶作剂攻击的具体知识。实验表明,UAD-RS 在四个不同的远程感知dataset 上的Scene Classification和semantic segmentation任务上表现出色,与现有的抗恶作剂纯化方法相比,具有更高的性能稳定性。代码和预训练模型可以在线获取(https://github.com/EricYu97/UAD-RS)。

MetaCAM: Ensemble-Based Class Activation Map

  • paper_url: http://arxiv.org/abs/2307.16863
  • repo_url: None
  • paper_authors: Emily Kaczmarek, Olivier X. Miguel, Alexa C. Bowie, Robin Ducharme, Alysha L. J. Dingwall-Harvey, Steven Hawken, Christine M. Armour, Mark C. Walker, Kevin Dick
  • for: 这篇论文旨在提供一个ensemble-based方法,将多个现有的Class Activation Maps(CAMs)方法 ensemble,以提高深度学习模型的预测解释可靠性。
  • methods: 方法包括MetaCAM、Cumulative Residual Effect(CRE)和adaptive thresholding等。
  • results: 结果显示,MetaCAM比单一CAMs表现更好,并可以更好地检测和修复模型预测时的误差。具体来说,在一个实验中,MetaCAM提高了ROAD表现从0.393比11个单一CAMs的值域(-0.101-0.172),显示了结合CAMs的ensemble方法和适应阈值的重要性。
    Abstract The need for clear, trustworthy explanations of deep learning model predictions is essential for high-criticality fields, such as medicine and biometric identification. Class Activation Maps (CAMs) are an increasingly popular category of visual explanation methods for Convolutional Neural Networks (CNNs). However, the performance of individual CAMs depends largely on experimental parameters such as the selected image, target class, and model. Here, we propose MetaCAM, an ensemble-based method for combining multiple existing CAM methods based on the consensus of the top-k% most highly activated pixels across component CAMs. We perform experiments to quantifiably determine the optimal combination of 11 CAMs for a given MetaCAM experiment. A new method denoted Cumulative Residual Effect (CRE) is proposed to summarize large-scale ensemble-based experiments. We also present adaptive thresholding and demonstrate how it can be applied to individual CAMs to improve their performance, measured using pixel perturbation method Remove and Debias (ROAD). Lastly, we show that MetaCAM outperforms existing CAMs and refines the most salient regions of images used for model predictions. In a specific example, MetaCAM improved ROAD performance to 0.393 compared to 11 individual CAMs with ranges from -0.101-0.172, demonstrating the importance of combining CAMs through an ensembling method and adaptive thresholding.
    摘要 需要清晰、可靠的深度学习模型预测解释是高 kriticality 领域,如医学和生物认知识别。图像活动地图 (CAMs) 是深度学习模型中的一种增加 Popular 的视觉解释方法。然而,各个 CAMs 的性能受到实验参数的影响,如选择的图像、目标类和模型。我们提出了 MetaCAM,一种基于 Ensemble 的方法,将多个现有 CAMs 的投票结果组合成一个高效的解释方法。我们对 MetaCAM 的实验进行了量化的定制,并提出了一种新的方法 named Cumulative Residual Effect (CRE),用于总结大规模的 Ensemble 实验结果。此外,我们还提出了适应阈值的技术,并证明了它可以应用于个体 CAMs 以提高其性能,使用像素扰动方法 Remove and Debias (ROAD) 进行评估。最后,我们表明 MetaCAM 超越了现有 CAMs,并把模型预测中使用的图像进行了更加精细的定制。例如,MetaCAM 提高了 ROAD 性能至 0.393,比11个个体 CAMs 的范围从 -0.101-0.172 更高,这说明了将 CAMs 组合成 Ensemble 方法和适应阈值技术的重要性。

Automated COVID-19 CT Image Classification using Multi-head Channel Attention in Deep CNN

  • paper_url: http://arxiv.org/abs/2308.00715
  • repo_url: None
  • paper_authors: Susmita Ghosh, Abhiroop Chatterjee
  • for: 本研究旨在提出一种基于深度学习的自动化COVID-19 CT扫描分类方法,以提高检测精度。
  • methods: 该方法使用修改过的Xception模型,增加了新的通道注意力机制和权重global average pooling来提高特征提取。通道注意力模块可以选择每个通道中有用信息,使模型学习COVID-19检测的特征。
  • results: 在一个广泛使用的COVID-19 CT扫描数据集上进行实验,该方法达到了96.99%的准确率,与其他当前领先技术相比显著优于。这些研究可以贡献到使用人工智能对当前和未来的流行病应对的努力,并提供可靠的医疗图像分析任务解决方案。
    Abstract The rapid spread of COVID-19 has necessitated efficient and accurate diagnostic methods. Computed Tomography (CT) scan images have emerged as a valuable tool for detecting the disease. In this article, we present a novel deep learning approach for automated COVID-19 CT scan classification where a modified Xception model is proposed which incorporates a newly designed channel attention mechanism and weighted global average pooling to enhance feature extraction thereby improving classification accuracy. The channel attention module selectively focuses on informative regions within each channel, enabling the model to learn discriminative features for COVID-19 detection. Experiments on a widely used COVID-19 CT scan dataset demonstrate a very good accuracy of 96.99% and show its superiority to other state-of-the-art techniques. This research can contribute to the ongoing efforts in using artificial intelligence to combat current and future pandemics and can offer promising and timely solutions for efficient medical image analysis tasks.
    摘要 due to the rapid spread of COVID-19, efficient and accurate diagnostic methods are urgently needed. Computed Tomography (CT) scan images have emerged as a valuable tool for detecting the disease. In this article, we propose a novel deep learning approach for automated COVID-19 CT scan classification, which incorporates a modified Xception model with a newly designed channel attention mechanism and weighted global average pooling to enhance feature extraction and improve classification accuracy. The channel attention module selectively focuses on informative regions within each channel, allowing the model to learn discriminative features for COVID-19 detection. Experiments on a widely used COVID-19 CT scan dataset demonstrate an accuracy of 96.99%, outperforming other state-of-the-art techniques. This research can contribute to the ongoing efforts in using artificial intelligence to combat current and future pandemics and offer promising and timely solutions for efficient medical image analysis tasks.Here's the breakdown of the translation:* due to the rapid spread of COVID-19: 由于 COVID-19 的快速传播* efficient and accurate diagnostic methods are urgently needed: 需要有效和准确的诊断方法* Computed Tomography (CT) scan images have emerged as a valuable tool for detecting the disease: CT 扫描图像已成为检测疾病的有价值工具* In this article, we propose a novel deep learning approach for automated COVID-19 CT scan classification: 在这篇文章中,我们提出了一种基于深度学习的自动 COVID-19 CT 扫描分类方法* which incorporates a modified Xception model with a newly designed channel attention mechanism and weighted global average pooling: 其中包括一种基于 Xception 模型的修改版本,以及一种新的通道注意机制和权重 globally average pooling* to enhance feature extraction and improve classification accuracy: 以提高特征提取和分类精度* The channel attention module selectively focuses on informative regions within each channel: 通道注意模块可选择每个通道中的有用区域* allowing the model to learn discriminative features for COVID-19 detection: 使模型可以学习 COVID-19 的特征* Experiments on a widely used COVID-19 CT scan dataset demonstrate an accuracy of 96.99%: 在一个广泛使用的 COVID-19 CT 扫描数据集上,实验表明模型的准确率为 96.99%* and show its superiority to other state-of-the-art techniques: 并表明其在其他现有技术上的优越性* This research can contribute to the ongoing efforts in using artificial intelligence to combat current and future pandemics: 这些研究可以贡献到使用人工智能对当前和未来的潜在疫情作战* and offer promising and timely solutions for efficient medical image analysis tasks: 并提供有前途和时间性的医疗图像分析任务解决方案

Random Sub-Samples Generation for Self-Supervised Real Image Denoising

  • paper_url: http://arxiv.org/abs/2307.16825
  • repo_url: https://github.com/p1y2z3/sdap
  • paper_authors: Yizhong Pan, Xiao Liu, Xiangyu Liao, Yuanzhouhan Cao, Chao Ren
  • for: This paper is written for improving the performance of self-supervised image denoising methods, specifically the blind spot network (BSN), by introducing a novel framework called Sampling Difference As Perturbation (SDAP) that uses random sub-samples generation (RSG) with a cyclic sample difference loss.
  • methods: The paper proposes a new self-supervised real image denoising framework named SDAP, which is based on RSG with a cyclic sample difference loss. The framework adds an appropriate perturbation to the training images to improve the performance of BSN.
  • results: The paper shows that the proposed SDAP framework significantly outperforms other state-of-the-art self-supervised denoising methods on real-world datasets.Here’s the answer in Simplified Chinese:
  • for: 这篇论文是为了提高自主监督的图像干净方法性能,特别是对盲点网络(BSN)的改进,提出了一种新的框架called Sampling Difference As Perturbation(SDAP),它基于随机子样本生成(RSG)和循环样本差损失。
  • methods: 论文提出了一种新的自主监督实际图像干净框架called SDAP,它基于RSG和循环样本差损失。框架通过添加适当的扰动来提高BSN的性能。
  • results: 论文显示,提出的SDAP框架在实际数据集上显著超越了其他自主监督干净方法。I hope this helps!
    Abstract With sufficient paired training samples, the supervised deep learning methods have attracted much attention in image denoising because of their superior performance. However, it is still very challenging to widely utilize the supervised methods in real cases due to the lack of paired noisy-clean images. Meanwhile, most self-supervised denoising methods are ineffective as well when applied to the real-world denoising tasks because of their strict assumptions in applications. For example, as a typical method for self-supervised denoising, the original blind spot network (BSN) assumes that the noise is pixel-wise independent, which is much different from the real cases. To solve this problem, we propose a novel self-supervised real image denoising framework named Sampling Difference As Perturbation (SDAP) based on Random Sub-samples Generation (RSG) with a cyclic sample difference loss. Specifically, we dig deeper into the properties of BSN to make it more suitable for real noise. Surprisingly, we find that adding an appropriate perturbation to the training images can effectively improve the performance of BSN. Further, we propose that the sampling difference can be considered as perturbation to achieve better results. Finally we propose a new BSN framework in combination with our RSG strategy. The results show that it significantly outperforms other state-of-the-art self-supervised denoising methods on real-world datasets. The code is available at https://github.com/p1y2z3/SDAP.
    摘要 随着深度学习方法的提出,无监督的深度学习方法在图像减噪中吸引了非常多的关注,因为它们的性能远胜监督方法。然而,在实际应用中,广泛使用无监督方法仍然非常困难,主要因为缺乏对应的噪音清洁图像的对应样本。此外,大多数自主学习减噪方法在实际应用中也是不Effective的,因为它们在应用中假设了噪音是像素级独立的,这与实际情况迥然不同。为解决这个问题,我们提出了一种基于随机子样本生成(RSG)的新的自主学习实际图像减噪框架,即采样差分作为扰动(SDAP)。我们在BSN中进一步挖掘了Properties,使其更适合实际噪音。我们发现,在训练图像上添加合适的扰动可以有效提高BSN的性能。此外,我们认为采样差分可以被视为扰动,以达到更好的效果。最后,我们提出了一种基于RSG和BSN的新框架,并对实际图像减噪任务进行评估。结果表明,它在实际图像减噪任务上明显超过了其他现有的自主学习减噪方法。代码可以在https://github.com/p1y2z3/SDAP上获取。

Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment

  • paper_url: http://arxiv.org/abs/2307.16813
  • repo_url: None
  • paper_authors: Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen
  • for: 这个论文是用来预测视频质量的,随着流媒体技术的快速发展,如Facebook、TikTok、Kwai等。
  • methods: 这个论文提出了一种新的视觉质量Transformer(VQT),用于更有效地提取质量相关的稀疏特征。方法上,提出了一种稀疏时间注意力(STA),通过分析帧之间的时间相关性,从$O(T^2)$降低到$O(T \log T)$的计算复杂度。结构上,使用多路径时间网络(MPTN),通过多个STA模块并行计算,捕捉视频中同时存在的多种损害。
  • results: 实验表明,VQT比许多当前状态的方法在三个公共无参照VQA数据集中表现出色,并且在四个全参照VQA数据集中比广泛采用的工业算法(如VMAF和AVQT)表现更好。
    Abstract Video Quality Assessment (VQA), which aims to predict the perceptual quality of a video, has attracted raising attention with the rapid development of streaming media technology, such as Facebook, TikTok, Kwai, and so on. Compared with other sequence-based visual tasks (\textit{e.g.,} action recognition), VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos. \textit{First}, it is not rare that several frames containing serious distortions (\textit{e.g.,}blocking, blurriness), can determine the perceptual quality of the whole video, while other sequence-based tasks require more frames of equal importance for representations. \textit{Second}, the perceptual quality of a video exhibits a multi-distortion distribution, due to the differences in the duration and probability of occurrence for various distortions. In order to solve the above challenges, we propose \textit{Visual Quality Transformer (VQT)} to extract quality-related sparse features more efficiently. Methodologically, a Sparse Temporal Attention (STA) is proposed to sample keyframes by analyzing the temporal correlation between frames, which reduces the computational complexity from $O(T^2)$ to $O(T \log T)$. Structurally, a Multi-Pathway Temporal Network (MPTN) utilizes multiple STA modules with different degrees of sparsity in parallel, capturing co-existing distortions in a video. Experimentally, VQT demonstrates superior performance than many \textit{state-of-the-art} methods in three public no-reference VQA datasets. Furthermore, VQT shows better performance in four full-reference VQA datasets against widely-adopted industrial algorithms (\textit{i.e.,} VMAF and AVQT).
    摘要 视频质量评估(VQA),旨在预测视频的感知质量,随着流媒体技术的快速发展(如Facebook、TikTok、Kwai等),引起了越来越多的关注。相比其他序列基于视觉任务(例如动作识别),VQA面临两个未得到解决的挑战,即在用户生成内容(UGC)视频中,几帧具有严重损害(例如块化、模糊)的情况下,整个视频的感知质量受到这些帧的影响,而其他序列基于视觉任务通常需要更多的相等重要帧来构成表示。第二,视频的感知质量具有多种损害分布,这是因为不同的损害在视频的时间长度和概率发生的情况下具有不同的概率和持续时间。为解决以上挑战,我们提出了视觉质量变换器(VQT),可以更有效地提取相关的质量特征。方法上,我们提出了简洁时间注意力(STA),通过分析帧之间的时间相关性,从 $O(T^2)$ 降低到 $O(T \log T)$ 的计算复杂度。结构上,我们采用多路径时间网络(MPTN),通过多个 STA 模块并行运行,捕捉视频中共存的损害。实验表明,VQT 在三个公共无参照 VQA 数据集中表现出色,并且在四个全参照 VQA 数据集中对于广泛采用的工业算法(例如 VMAF 和 AVQT)表现更好。

A comprehensive review of deep learning in lung cancer

  • paper_url: http://arxiv.org/abs/2308.02528
  • repo_url: None
  • paper_authors: Farzane Tajidini
  • for: 本文提供了关于癌症诊断方法的历史背景,包括癌症诊断的过程和临床医生使用的标准分类方法。
  • methods: 当前的癌症诊断方法被评估为不够有效,需要新的更智能的方法。
  • results: 本文提出了一种新的癌症诊断方法,以帮助解决当前的问题。
    Abstract To provide the reader with a historical perspective on cancer classification approaches, we first discuss the fundamentals of the area of cancer diagnosis in this article, including the processes of cancer diagnosis and the standard classification methods employed by clinicians. Current methods for cancer diagnosis are deemed ineffective, calling for new and more intelligent approaches.
    摘要 为了为读者提供历史背景,我们首先讲述了肿瘤诊断方法的基础知识,包括肿瘤诊断过程和临床医生使用的标准分类方法。现有的肿瘤诊断方法被认为是不充分有效,需要新的更智能的方法。Here's a breakdown of the translation:* 肿瘤 (ózhòu) - cancer* 诊断 (jiànxiǎng) - diagnosis* 过程 (guòchéng) - process* 标准 (biāozhǔ) - standard* 分类 (fēngróng) - classification* 方法 (fāngédé) - method* 不充分有效 (bù zhòng fēn yǒu xiǎng) - not effective enough* 新 (xīn) - new* 更 (gè) - more* 智能 (zhìnéng) - intelligent

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

  • paper_url: http://arxiv.org/abs/2307.16803
  • repo_url: None
  • paper_authors: Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli
  • for: 这项研究是为了解决人机交互4D(HOI4D)数据集上的 egocentric action segmentation 任务。
  • methods: 该方法使用了点云视频方法和传统视频理解方法的 ensemble,以提高4D动作分割的准确率。
  • results: 该方法名为DPMix,在HOI4D Challenge 2023中的4D Action Segmentation Track中获得了第一名。
    Abstract In this technical report, we present our findings from the research conducted on the Human-Object Interaction 4D (HOI4D) dataset for egocentric action segmentation task. As a relatively novel research area, point cloud video methods might not be good at temporal modeling, especially for long point cloud videos (\eg, 150 frames). In contrast, traditional video understanding methods have been well developed. Their effectiveness on temporal modeling has been widely verified on many large scale video datasets. Therefore, we convert point cloud videos into depth videos and employ traditional video modeling methods to improve 4D action segmentation. By ensembling depth and point cloud video methods, the accuracy is significantly improved. The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.
    摘要 在这份技术报告中,我们展示了对人机交互4D(HOI4D)数据集上的 egocentric action segmentation 任务的研究成果。作为一个相对新的研究领域,点云视频方法可能不够好地处理时间模型,特别是长点云视频(例如150帧)。相比之下,传统视频理解方法已经广泛发展,其在大规模视频数据集上的效果得到了广泛验证。因此,我们将点云视频转换为深度视频,并使用传统视频模型来提高4D动作分割精度。通过对深度和点云视频方法进行拟合,我们提出的方法(名为DPMix)在 HOI4D Challenge 2023 的4D Action Segmentation Track中达到了第一名。

Framing image registration as a landmark detection problem for better representation of clinical relevance

  • paper_url: http://arxiv.org/abs/2308.01318
  • repo_url: None
  • paper_authors: Diana Waldmannstetter, Benedikt Wiestler, Julian Schwarting, Ivan Ezhov, Marie Metz, Spyridon Bakas, Bhakti Baheti, Satrajit Chakrabarty, Jan S. Kirschke, Rolf A. Heckemann, Marie Piraud, Florian Kofler, Bjoern H. Menze
  • for: 提高图像注册评价的临床 relevance,通过将图像注册视为标记检测问题来重新评价图像注册方法。
  • methods: 提议基于一个子样本间评价分析来计算特征点检测阈值,使用 median + delta * median absolute deviation 公式来计算阈值。
  • results: 方法可以 diferenciate 之前无法区分的注册算法,并且可以评价图像注册方法的临床意义。
    Abstract Nowadays, registration methods are typically evaluated based on sub-resolution tracking error differences. In an effort to reinfuse this evaluation process with clinical relevance, we propose to reframe image registration as a landmark detection problem. Ideally, landmark-specific detection thresholds are derived from an inter-rater analysis. To approximate this costly process, we propose to compute hit rate curves based on the distribution of errors of a sub-sample inter-rater analysis. Therefore, we suggest deriving thresholds from the error distribution using the formula: median + delta * median absolute deviation. The method promises differentiation of previously indistinguishable registration algorithms and further enables assessing the clinical significance in algorithm development.
    摘要 现在,注册方法通常会被评估基于半解像跟踪错误差异。为了重新把注册评估过程恢复到临床 relevance,我们提议将注册视为一个标记检测问题。理想情况下,标记特定的检测阈值将来自多个评估人员之间的交叉分析。为了估算这个贵重的过程,我们提议基于一个子样本交叉分析的错误分布计算hit率曲线。因此,我们建议使用错误分布中的 median + δ * 中值绝对差异来 derivethreshold。这种方法可以区分之前无法分辨的注册算法,并且可以评估算法发展中的临床重要性。

cs.AI - 2023-08-01

Hessian-Aware Bayesian Optimization for Decision Making Systems

  • paper_url: http://arxiv.org/abs/2308.00629
  • repo_url: None
  • paper_authors: Mohit Rajpal, Lac Gia Tran, Yehong Zhang, Bryan Kian Hsiang Low
  • For: 优化决策系统,尤其是在缺乏反馈信息的情况下。* Methods: 使用 derivative-free 方法,如 bayesian 优化,以减少对反馈质量的依赖。* Results: 在资源有限和异常反馈情况下,实验结果表明我们的方法(HA-GP-UCB)能够有效地优化决策系统。
    Abstract Many approaches for optimizing decision making systems rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making systems. This problem is exacerbated if the system requires interactions between several actors cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of actor interactions through the concept of role. Additionally, we introduce Hessian-aware Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters. Experimental results demonstrate that our method (HA-GP-UCB) works effectively on several benchmarks under resource constraints and malformed feedback settings.
    摘要 很多决策系统优化方法基于梯度计算,但在环境反馈缺乏信息时,这些方法可能表现不佳。不含梯度的方法如泊利抽象优化可以减少基于梯度反馈的依赖性,但在复杂决策系统中,这些方法可能 scalability 问题。特别是当决策系统需要多个演员合作完成共同目标时,这问题变得更加严重。为解决维度挑战,我们提议使用嵌入式多层架构,模型演员之间的动态关系,并通过角色概念来减少参数的数量。此外,我们还引入了希尔伯恩对 Bayesian 优化的知识,以高效地优化多层架构中的参数。实验结果表明,我们的方法(HA-GP-UCB)在资源限制和缺乏反馈情况下能够有效地工作。

Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes

  • paper_url: http://arxiv.org/abs/2308.00628
  • repo_url: https://github.com/soullessrobot/human-m3-dataset
  • paper_authors: Bohao Fan, Siqi Wang, Wenxuan Guo, Wenzhao Zheng, Jianjiang Feng, Jie Zhou
  • for: 这篇论文旨在提供一个多Modal多视图多人3D人姿数据库,以便进一步推动多Modal多视图3D人姿估计领域的研究。
  • methods: 该论文提出了一种基于多Modal数据输入的人 pose估计算法,使得可以更准确地估计人姿。此外,该论文还提出了一种基于多Modal数据输入的人 pose估计算法,以验证多Modal数据输入的优势。
  • results: 该论文的实验结果表明,该数据库是一个具有挑战性和多样性的 dataset,适用于未来的研究。此外,该论文的实验结果还表明,基于多Modal数据输入的人 pose估计算法具有明显的优势。
    Abstract 3D human pose estimation in outdoor environments has garnered increasing attention recently. However, prevalent 3D human pose datasets pertaining to outdoor scenes lack diversity, as they predominantly utilize only one type of modality (RGB image or pointcloud), and often feature only one individual within each scene. This limited scope of dataset infrastructure considerably hinders the variability of available data. In this article, we propose Human-M3, an outdoor multi-modal multi-view multi-person human pose database which includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds. In order to obtain accurate human poses, we propose an algorithm based on multi-modal data input to generate ground truth annotation. This benefits from robust pointcloud detection and tracking, which solves the problem of inaccurate human localization and matching ambiguity that may exist in previous multi-view RGB videos in outdoor multi-person scenes, and generates reliable ground truth annotations. Evaluation of multiple different modalities algorithms has shown that this database is challenging and suitable for future research. Furthermore, we propose a 3D human pose estimation algorithm based on multi-modal data input, which demonstrates the advantages of multi-modal data input for 3D human pose estimation. Code and data will be released on https://github.com/soullessrobot/Human-M3-Dataset.
    摘要 Recently, 3D human pose estimation in outdoor environments has gained increasing attention. However, existing 3D human pose datasets for outdoor scenes are limited in terms of diversity, as they primarily use only one type of modality (RGB image or pointcloud), and often feature only one individual per scene. This limited scope of dataset infrastructure significantly hinders the variability of available data.In this article, we propose Human-M3, an outdoor multi-modal multi-view multi-person human pose database that includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds. To obtain accurate human poses, we propose an algorithm based on multi-modal data input to generate ground truth annotation. This approach leverages robust pointcloud detection and tracking, which resolves the problems of inaccurate human localization and matching ambiguity that may exist in previous multi-view RGB videos of outdoor multi-person scenes, and generates reliable ground truth annotations.Evaluation of multiple different modalities algorithms has shown that this database is challenging and suitable for future research. Furthermore, we propose a 3D human pose estimation algorithm based on multi-modal data input, which demonstrates the advantages of multi-modal data input for 3D human pose estimation. The database and code will be released on GitHub at .

JIANG: Chinese Open Foundation Language Model

  • paper_url: http://arxiv.org/abs/2308.00624
  • repo_url: None
  • paper_authors: Qinhua Duan, Wenchao Gu, Yujia Chen, Wenxin Mao, Zewen Tian, Hui Cao
  • for: 这个研究是为了开发一个特别设计 для中文的大语言模型,以便在中文领域中表现出更高水准的表达能力。
  • methods: 我们使用了大量的中文资料来训练我们的模型,并且对模型结构进行优化。
  • results: 实验结果显示了我们的模型在中文领域的表现非常出色,表现比较有力。
    Abstract With the advancements in large language model technology, it has showcased capabilities that come close to those of human beings across various tasks. This achievement has garnered significant interest from companies and scientific research institutions, leading to substantial investments in the research and development of these models. While numerous large models have emerged during this period, the majority of them have been trained primarily on English data. Although they exhibit decent performance in other languages, such as Chinese, their potential remains limited due to factors like vocabulary design and training corpus. Consequently, their ability to fully express their capabilities in Chinese falls short. To address this issue, we introduce the model named JIANG (Chinese pinyin of ginger) specifically designed for the Chinese language. We have gathered a substantial amount of Chinese corpus to train the model and have also optimized its structure. The extensive experimental results demonstrate the excellent performance of our model.
    摘要 随着大语言模型技术的发展,它们在不同任务上展示了人类水平的能力,吸引了企业和科研机构的广泛投资。然而,大多数这些模型都是以英语训练为主,尽管它们在其他语言上表现不错,但其潜力尚未得到完全发挥。这是因为语言设计和训练数据的因素所致。为了解决这个问题,我们介绍了专门为中文设计的模型——江(中文拼音的芳香)。我们收集了大量的中文训练数据,并优化了模型的结构。我们的广泛实验结果表明,我们的模型表现出了极佳的能力。

Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers

  • paper_url: http://arxiv.org/abs/2308.00607
  • repo_url: https://github.com/s1m0n38/semantic-encodings
  • paper_authors: Alan Perotti, Simone Bertolotto, Eliana Pastor, André Panisson
  • for: The paper aims to improve the interpretability and trustworthiness of machine learning models for image classification by integrating semantic information into the training process.
  • methods: The authors propose a generic approach to derive an additional loss term starting from any kind of semantic information about the classification label, and demonstrate its application to ontologies and word embeddings.
  • results: The authors train image classifiers with the semantically enriched loss and analyze the trade-offs between accuracy, mistake severity, and learned internal representations. They also discuss the potential of this approach for improving explainability and adversarial robustness.
    Abstract Images are loaded with semantic information that pertains to real-world ontologies: dog breeds share mammalian similarities, food pictures are often depicted in domestic environments, and so on. However, when training machine learning models for image classification, the relative similarities amongst object classes are commonly paired with one-hot-encoded labels. According to this logic, if an image is labelled as 'spoon', then 'tea-spoon' and 'shark' are equally wrong in terms of training loss. To overcome this limitation, we explore the integration of additional goals that reflect ontological and semantic knowledge, improving model interpretability and trustworthiness. We suggest a generic approach that allows to derive an additional loss term starting from any kind of semantic information about the classification label. First, we show how to apply our approach to ontologies and word embeddings, and discuss how the resulting information can drive a supervised learning process. Second, we use our semantically enriched loss to train image classifiers, and analyse the trade-offs between accuracy, mistake severity, and learned internal representations. Finally, we discuss how this approach can be further exploited in terms of explainability and adversarial robustness. Code repository: https://github.com/S1M0N38/semantic-encodings
    摘要 图像充满 semantics 信息:狗种类共享哺乳动物类似性,食物图像经常在家庭环境中描绘,等等。然而,在机器学习模型图像分类训练中,对象类之间的相似性通常通过一键编码标签进行表示。根据这种逻辑,如果一张图像被标记为 " Spoon ",那么 " Tea-spoon " 和 " 鲨鱼 " 在训练损失方面都是等错的。为了超越这些限制,我们探讨了 Semantic 和 Ontology 知识的集成,以提高模型解释性和可靠性。我们提出了一个通用的方法,可以从任何类型的 semantics 信息开始,生成一个额外的损失项。首先,我们介绍了如何应用我们的方法到 Ontologies 和 Word Embeddings 中,并讨论了如何使得这些信息驱动一个监督学习过程。其次,我们使用我们具有Semantically 增强的损失函数来训练图像分类器,并分析了准确率、错误严重性和学习的内部表示之间的贸易。最后,我们讨论了如何进一步利用这种方法,以提高解释性和对抗攻击性。代码库:https://github.com/S1M0N38/semantic-encodings

Collaborative filtering to capture AI user’s preferences as norms

  • paper_url: http://arxiv.org/abs/2308.02542
  • repo_url: None
  • paper_authors: Marc Serramia, Natalia Criado, Michael Luck
  • for: 本研究旨在提高人工智能技术的个性化设置,以更好地满足用户的需求。
  • methods: 本研究使用了协同推荐算法,通过分析大量用户对整体系统的偏好信息,自动地捕捉用户的偏好。
  • results: 研究发现,通过协同推荐算法可以准确地捕捉用户的偏好,并且可以避免用户过度参与设置过程,从而提高人工智能技术的使用体验。
    Abstract Customising AI technologies to each user's preferences is fundamental to them functioning well. Unfortunately, current methods require too much user involvement and fail to capture their true preferences. In fact, to avoid the nuisance of manually setting preferences, users usually accept the default settings even if these do not conform to their true preferences. Norms can be useful to regulate behaviour and ensure it adheres to user preferences but, while the literature has thoroughly studied norms, most proposals take a formal perspective. Indeed, while there has been some research on constructing norms to capture a user's privacy preferences, these methods rely on domain knowledge which, in the case of AI technologies, is difficult to obtain and maintain. We argue that a new perspective is required when constructing norms, which is to exploit the large amount of preference information readily available from whole systems of users. Inspired by recommender systems, we believe that collaborative filtering can offer a suitable approach to identifying a user's norm preferences without excessive user involvement.
    摘要 We argue that a new perspective is needed when constructing norms, one that leverages the abundance of preference information available from large systems of users. Inspired by recommender systems, we believe that collaborative filtering can be a suitable approach to identifying a user's norm preferences without excessive user involvement. By analyzing the preferences of similar users, we can create a normative framework that is more accurately tailored to each individual's needs and preferences. This approach has the potential to improve the effectiveness of AI technologies and enhance user experience.

Towards More Human-like AI Communication: A Review of Emergent Communication Research

  • paper_url: http://arxiv.org/abs/2308.02541
  • repo_url: None
  • paper_authors: Nicolo’ Brandizzi
  • for: 本研究旨在探讨人类语言使用的规律和人工智能机器的沟通方式,以帮助机器更好地使用自然语言进行人机交互。
  • methods: 本研究使用了 emergent communication(Emecom)的方法,即通过人工智能机器学习自然语言的使用方式,以便更好地沟通和学习新的概念。
  • results: 本研究通过分析了各种相关研究的共同特征,并将其分为两个子类别,以便更好地了解人类语言使用的规律和人工智能机器的沟通方式。
    Abstract In the recent shift towards human-centric AI, the need for machines to accurately use natural language has become increasingly important. While a common approach to achieve this is to train large language models, this method presents a form of learning misalignment where the model may not capture the underlying structure and reasoning humans employ in using natural language, potentially leading to unexpected or unreliable behavior. Emergent communication (Emecom) is a field of research that has seen a growing number of publications in recent years, aiming to develop artificial agents capable of using natural language in a way that goes beyond simple discriminative tasks and can effectively communicate and learn new concepts. In this review, we present Emecom under two aspects. Firstly, we delineate all the common proprieties we find across the literature and how they relate to human interactions. Secondly, we identify two subcategories and highlight their characteristics and open challenges. We encourage researchers to work together by demonstrating that different methods can be viewed as diverse solutions to a common problem and emphasize the importance of including diverse perspectives and expertise in the field. We believe a deeper understanding of human communication is crucial to developing machines that can accurately use natural language in human-machine interactions.
    摘要 In this review, we examine Emecom from two perspectives. First, we identify common properties found across the literature and how they relate to human interactions. Second, we categorize Emecom into two subcategories and highlight their characteristics and open challenges. We emphasize the importance of including diverse perspectives and expertise in the field, as we believe a deeper understanding of human communication is crucial to developing machines that can accurately use natural language in human-machine interactions.

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

  • paper_url: http://arxiv.org/abs/2308.00560
  • repo_url: None
  • paper_authors: Yubin Xiao, Di Wang, Huanhuan Chen, Boyang Li, Wei Pang, Xuan Wu, Hao Li, Dong Xu, Yanchun Liang, You Zhou
  • for: 提出了一种基于 Graph Neural Network (GNN) 和 reinforcement learning (RL) 的 Traveling Salesman Problem (TSP) 解决方案,以提高解决速度和解决质量。
  • methods: 使用了一种特制的 GNN 来实现非推理 (NAR) decoding,并使用了一种提高后RL策略来消除依赖于高成本的标签来训练传统的超级学习型 NAR 模型。
  • results: 在 synthetic 和实际世界 TSP 实例上进行了实验,并证明了 NAR4TSP 在解决质量、推理速度和泛化能力等方面都比四种现有模型更好。同时,还提供了 NAR4TSP 的解码过程和总路径规划的视觉化图表,以示其可行性和效果。
    Abstract The Traveling Salesman Problem (TSP) is a well-known problem in combinatorial optimization with applications in various domains. However, existing TSP solvers face challenges in producing high-quality solutions with low latency. To address this issue, we propose NAR4TSP, which produces TSP solutions in a Non-Autoregressive (NAR) manner using a specially designed Graph Neural Network (GNN), achieving faster inference speed. Moreover, NAR4TSP is trained using an enhanced Reinforcement Learning (RL) strategy, eliminating the dependency on costly labels used to train conventional supervised learning-based NAR models. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR decoding. The experimental results on both synthetic and real-world TSP instances demonstrate that NAR4TSP outperforms four state-of-the-art models in terms of solution quality, inference latency, and generalization ability. Lastly, we present visualizations of NAR4TSP's decoding process and its overall path planning to showcase the feasibility of implementing NAR4TSP in an end-to-end manner and its effectiveness, respectively.
    摘要 旅途卖士问题(TSP)是一个广泛应用的 combinatorial 优化问题。然而,现有的 TSP 解决方案面临着生成高质量解决方案的延迟问题。为解决这个问题,我们提出了 NAR4TSP,它使用特制的图神经网络(GNN)来生成非自适应(NAR)的 TSP 解决方案,实现更快的推理速度。此外,NAR4TSP 通过改进的强化学习(RL)策略进行训练,从而消除了对传统的超级vised学习基于 NAR 模型的贵重标签的依赖。根据我们所知,NAR4TSP 是第一个成功地将 RL 和 NAR 推理结合的 TSP 解决方案。实验结果表明,NAR4TSP 在 synthetic 和实际世界 TSP 实例上比四种现状的模型高于 solution 质量、推理延迟和泛化能力。最后,我们提供了 NAR4TSP 的推理过程和总路径规划的视觉化来展示 NAR4TSP 的端到端实现可行性和效iveness。

Copula for Instance-wise Feature Selection and Ranking

  • paper_url: http://arxiv.org/abs/2308.00549
  • repo_url: None
  • paper_authors: Hanyu Peng, Guanhua Fang, Ping Li
  • for: 提高神经网络中特征选择和排序的精度,增强模型的性能和可读性。
  • methods: integrate Gaussian copula into current feature selection framework,无需更改现有的方法。
  • results: 在 sintetic 和实际数据上,对比现有方法,我们的方法能够更好地捕捉各特征之间的相互关系,提高模型的性能和可读性。
    Abstract Instance-wise feature selection and ranking methods can achieve a good selection of task-friendly features for each sample in the context of neural networks. However, existing approaches that assume feature subsets to be independent are imperfect when considering the dependency between features. To address this limitation, we propose to incorporate the Gaussian copula, a powerful mathematical technique for capturing correlations between variables, into the current feature selection framework with no additional changes needed. Experimental results on both synthetic and real datasets, in terms of performance comparison and interpretability, demonstrate that our method is capable of capturing meaningful correlations.
    摘要 <>Instance-wise 特征选择和排名方法可以实现每个样本中任务友好的特征选择。然而,现有的方法假设特征子集为独立的时,存在相互关系的限制。为解决这 limitation,我们提议在当前特征选择框架中 incorporate Gaussian copula,一种强大的数学技术,用于捕捉特征之间的相关性。实验结果表明,我们的方法可以捕捉有意义的相关性。Note:* "Instance-wise" is translated as "每个样本中" (each sample)* "特征选择" is translated as "特征选择" (feature selection)* "排名" is translated as "排名" (ranking)* "Gaussian copula" is translated as " Gaussian copula" (Gaussian copula)* "相关性" is translated as "相关性" (correlation)

Predicting Early Dropouts of an Active and Healthy Ageing App

  • paper_url: http://arxiv.org/abs/2308.00539
  • repo_url: None
  • paper_authors: Vasileios Perifanis, Ioanna Michailidi, Giorgos Stamatelatos, George Drosatos, Pavlos S. Efraimidis
  • for: The paper is written for predicting early dropouts of an active and healthy ageing app.
  • methods: The paper uses machine learning algorithms, specifically classification models constructed using pre-processing techniques and dynamic/static features. The authors also employed oversampling methods like SMOTE and ADASYN to improve performance.
  • results: The paper achieved high-quality adherence predictions, with dynamic features positively influencing the model’s performance. The oversampling approaches led to a remarkable improvement of 10%. The authors won first place in the IFMBE Scientific Challenge 2022.Here’s the simplified Chinese text for the three points:
  • for: 这篇论文是为预测活健年龄应用中早期退出的研究。
  • methods: 这篇论文使用机器学习算法,具体来说是使用预处理技术构建的分类模型,并使用动态/静止特征进行预测。作者还使用了SMOTE和ADASYN等扩大samples方法来提高分类性能。
  • results: 这篇论文实现了高质量遵从预测,动态特征对模型性能有积极影响。使用扩大samples方法导致了10%的显著提高。作者在IFMBE科学挑战赛2022中获得了第一名。
    Abstract In this work, we present a machine learning approach for predicting early dropouts of an active and healthy ageing app. The presented algorithms have been submitted to the IFMBE Scientific Challenge 2022, part of IUPESM WC 2022. We have processed the given database and generated seven datasets. We used pre-processing techniques to construct classification models that predict the adherence of users using dynamic and static features. We submitted 11 official runs and our results show that machine learning algorithms can provide high-quality adherence predictions. Based on the results, the dynamic features positively influence a model's classification performance. Due to the imbalanced nature of the dataset, we employed oversampling methods such as SMOTE and ADASYN to improve the classification performance. The oversampling approaches led to a remarkable improvement of 10\%. Our methods won first place in the IFMBE Scientific Challenge 2022.
    摘要 在这项工作中,我们提出了一种机器学习方法来预测活动和健康年龄应用程序中早期退出的问题。我们在IFMBE科学挑战2022中提交了这些算法,该活动是IUPESM WC 2022的一部分。我们对给定的数据库进行了处理,生成了七个数据集。我们使用了预处理技术来构建分类模型,以预测用户的执行情况。我们提交了11个官方运行,结果表明机器学习算法可以提供高质量的执行预测。据结果显示,动态特征对模型的分类性能产生了积极的影响。由于数据集具有偏斜性,我们使用了扩大样本的方法,如SMOTE和ADASYN,以提高分类性能。这些扩大方法导致了10%的明显改善。我们的方法在IFMBE科学挑战2022中获得了第一名。

PressureTransferNet: Human Attribute Guided Dynamic Ground Pressure Profile Transfer using 3D simulated Pressure Maps

  • paper_url: http://arxiv.org/abs/2308.00538
  • repo_url: None
  • paper_authors: Lala Shakti Swarup Ray, Vitor Fortes Rey, Bo Zhou, Sungho Suh, Paul Lukowicz
  • for: 人体活动识别(HAR)系统的研究和开发
  • methods: 利用现有的压力数据和编码-解码模型,生成具有特定活动特征的体具压力 profiless
  • results: 在不同场景下,准确地将人体特征传递到地面压力Profile中,并通过物理学基金amentals的深度学习模型进行验证。
    Abstract We propose PressureTransferNet, a novel method for Human Activity Recognition (HAR) using ground pressure information. Our approach generates body-specific dynamic ground pressure profiles for specific activities by leveraging existing pressure data from different individuals. PressureTransferNet is an encoder-decoder model taking a source pressure map and a target human attribute vector as inputs, producing a new pressure map reflecting the target attribute. To train the model, we use a sensor simulation to create a diverse dataset with various human attributes and pressure profiles. Evaluation on a real-world dataset shows its effectiveness in accurately transferring human attributes to ground pressure profiles across different scenarios. We visually confirm the fidelity of the synthesized pressure shapes using a physics-based deep learning model and achieve a binary R-square value of 0.79 on areas with ground contact. Validation through classification with F1 score (0.911$\pm$0.015) on physical pressure mat data demonstrates the correctness of the synthesized pressure maps, making our method valuable for data augmentation, denoising, sensor simulation, and anomaly detection. Applications span sports science, rehabilitation, and bio-mechanics, contributing to the development of HAR systems.
    摘要 我们提出了PressureTransferNet,一种新的人动作认识(HAR)方法,使用地面压力信息。我们的方法生成了特定活动的体部特有的动态地面压力profile,通过利用不同个体的压力数据。PressureTransferNet是一个Encoder-Decoder模型,接受一个源压力地图和一个目标人类特征向量作为输入,生成一个新的压力地图,表示目标特征。我们使用感知模拟生成了多种人类特征和压力profile的多样化数据集,用于训练模型。我们通过对实际数据进行评估,发现PressureTransferNet能够准确地将人类特征传递到地面压力profile中,并在不同场景下保持高度的准确率。我们通过physics-based深度学习模型进行视觉验证,并达到了0.79的二元R-平方值在地面接触区域上,这表明我们生成的压力地图具有高度的准确性。我们通过对物理压力检测数据进行分类,获得了0.911±0.015的F1分数,这表明我们的方法可以准确地生成压力地图,从而在数据增强、噪声除除、感知模拟和异常检测等方面具有价值。这些应用包括运动科学、rehabilitation和生物机器学,这将为人动作认识系统的发展提供重要支持。

Graph Embedding Dynamic Feature-based Supervised Contrastive Learning of Transient Stability for Changing Power Grid Topologies

  • paper_url: http://arxiv.org/abs/2308.00537
  • repo_url: None
  • paper_authors: Zijian Lv, Xin Chen, Zijian Feng
  • for: 精确的在线稳定性预测对于电力系统稳定性是关键,特别是在面临干扰时。传统稳定性分析使用时间域模拟不能快速适应电力网络结构变化。
  • methods: 以graph embedding dynamic feature(GEDF)为基础,提出了基于超级对比学习的稳定性GEDF-SCL模型,可以预测稳定性,考虑到电力网络结构信息。
  • results: 对于不同的电力网络结构,通过对N-1和N-m-1干扰情况的模拟,测试结果表明,GEDF-SCL模型可以达到高精度的稳定性预测,并适应电力网络结构变化。
    Abstract Accurate online transient stability prediction is critical for ensuring power system stability when facing disturbances. While traditional transient stablity analysis replies on the time domain simulations can not be quickly adapted to the power grid toplogy change. In order to vectorize high-dimensional power grid topological structure information into low-dimensional node-based graph embedding streaming data, graph embedding dynamic feature (GEDF) has been proposed. The transient stability GEDF-based supervised contrastive learning (GEDF-SCL) model uses supervised contrastive learning to predict transient stability with GEDFs, considering power grid topology information. To evaluate the performance of the proposed GEDF-SCL model, power grids of varying topologies were generated based on the IEEE 39-bus system model. Transient operational data was obtained by simulating N-1 and N-$\bm{m}$-1 contingencies on these generated power system topologies. Test result demonstrated that the GEDF-SCL model can achieve high accuracy in transient stability prediction and adapt well to changing power grid topologies.
    摘要 Traditional transient stability analysis 使用时域模拟,不能快速适应发电系统结构变化。为了将高维电力网结构信息归一化成低维节点基本图卷积数据,提出了图嵌入动态特征(GEDF)。在基于 GEDF 的超级vised contrastive learning(GEDF-SCL)模型中,通过超级vised contrastive learning来预测稳定性,考虑发电系统拓扑信息。为评估提议的 GEDF-SCL 模型表现,对 IEEE 39-bus 系统模型中生成的不同拓扑的发电系统进行了 simulate N-1 和 N-m-1 的稳定操作数据。测试结果表明, GEDF-SCL 模型可以高度准确地预测稳定性,并适应发电系统拓扑变化。Note: "Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

Transfer-Ensemble Learning based Deep Convolutional Neural Networks for Diabetic Retinopathy Classification

  • paper_url: http://arxiv.org/abs/2308.00525
  • repo_url: None
  • paper_authors: Susmita Ghosh, Abhiroop Chatterjee
    for: 这篇论文的目的是用一个ensemble方法来分类糖尿病性视网膜病(DR)为五个不同的类别。methods: 这个模型使用了两个流行的预训练 convolutional neural network:VGG16和Inception V3。 ensemble模型架构中将这两个预训练模型的部分冻结以利用它们已经学习的表示。 全球均值层 pooling层被添加以将输入图像的特征地图转换为固定长度的 вектор。results: 实验结果显示,这个ensemble模型可以高度有效地分类糖尿病性视网膜病,其准确率为96.4%。
    Abstract This article aims to classify diabetic retinopathy (DR) disease into five different classes using an ensemble approach based on two popular pre-trained convolutional neural networks: VGG16 and Inception V3. The proposed model aims to leverage the strengths of the two individual nets to enhance the classification performance for diabetic retinopathy. The ensemble model architecture involves freezing a portion of the layers in each pre-trained model to utilize their learned representations effectively. Global average pooling layers are added to transform the output feature maps into fixed-length vectors. These vectors are then concatenated to form a consolidated representation of the input image. The ensemble model is trained using a dataset of diabetic retinopathy images (APTOS), divided into training and validation sets. During the training process, the model learns to classify the retinal images into the corresponding diabetic retinopathy classes. Experimental results on the test set demonstrate the efficacy of the proposed ensemble model for DR classification achieving an accuracy of 96.4%.
    摘要 Translated into Simplified Chinese:这篇文章旨在使用ensemble方法将糖尿病肠病(DR)分为五个不同的类别,使用两个流行的预训练 convolutional neural networks:VGG16和Inception V3。提议的模型旨在利用这两个个体网络的优势,以提高糖尿病肠病的分类性能。模型的架构包括冻结一部分的层数在每个预训练模型中,以利用它们已经学习的表示。全局平均 pooling层被添加,以将输出特征图 transformed into fixed-length vectors。这些 векторы被 concatenated 以形成输入图像的总合表示。模型通过使用 APTOS 数据集(训练和验证集)进行训练,在测试集上达到了96.4%的准确率。

SurveyLM: A platform to explore emerging value perspectives in augmented language models’ behaviors

  • paper_url: http://arxiv.org/abs/2308.00521
  • repo_url: None
  • paper_authors: Steve J. Bickley, Ho Fai Chan, Bang Dao, Benno Torgler, Son Tran
  • for: 这白皮assailed our work on SurveyLM,一个用于分析人工智能语言模型(ALM)在复杂社会场景中的自适应对行为的平台。
  • methods: 我们使用了survey和实验方法, traditionally used in studying social behaviors,来系统地评估ALMs,从而提供了尚未有的对ALMs的Alignment和emergent behaviors的深入理解。
  • results: 通过SurveyLM平台,我们发现了一些因素影响ALMs的emergent behaviors,并可以通过调整survey和实验设计来推动ALMs的Alignment with human intentions and expectations。这些结果有助于负责任地开发和部署高级社会AI系统。
    Abstract This white paper presents our work on SurveyLM, a platform for analyzing augmented language models' (ALMs) emergent alignment behaviors through their dynamically evolving attitude and value perspectives in complex social contexts. Social Artificial Intelligence (AI) systems, like ALMs, often function within nuanced social scenarios where there is no singular correct response, or where an answer is heavily dependent on contextual factors, thus necessitating an in-depth understanding of their alignment dynamics. To address this, we apply survey and experimental methodologies, traditionally used in studying social behaviors, to evaluate ALMs systematically, thus providing unprecedented insights into their alignment and emergent behaviors. Moreover, the SurveyLM platform leverages the ALMs' own feedback to enhance survey and experiment designs, exploiting an underutilized aspect of ALMs, which accelerates the development and testing of high-quality survey frameworks while conserving resources. Through SurveyLM, we aim to shed light on factors influencing ALMs' emergent behaviors, facilitate their alignment with human intentions and expectations, and thereby contributed to the responsible development and deployment of advanced social AI systems. This white paper underscores the platform's potential to deliver robust results, highlighting its significance to alignment research and its implications for future social AI systems.
    摘要

Explainable Graph Spectral Clustering of Text Documents

  • paper_url: http://arxiv.org/abs/2308.00504
  • repo_url: None
  • paper_authors: Bartłomiej Starosta, Mieczysław A. Kłopotek, Sławomir T. Wierzchoń
  • for: 本研究旨在提供spectral clustering结果的解释方法,以便用户更好地理解和理解文档 clustering结果。
  • methods: 本文提出了一种基于combinatorial Laplacian的图spectral clustering解释方法,包括approximate equivalence of combinatorial Laplacian embedding, $K$-embedding和term vector space embedding。
  • results: 经过实验研究,$K$-embedding可以准确地 aproximate Laplacian embedding,并且在某些情况下,approximation是够好的。
    Abstract Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of results of combinatorial Laplacian based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper) and term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental study showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions.
    摘要 spectral clustering 方法 known for its ability to represent clusters of diverse shapes, densities, etc. However, the results of such algorithms, when applied to text documents, are hard to explain to the user, especially due to the embedding in the spectral space, which has no obvious relation to the document contents. Therefore, there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of the results of combinatorial Laplacian-based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper), and term vector space embedding. Therefore, a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental studies showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions, and show that the approximation is good enough under other conditions.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Retrieval Augmented Generation and Representative Vector Summarization for large unstructured textual data in Medical Education

  • paper_url: http://arxiv.org/abs/2308.00479
  • repo_url: https://github.com/ssm123ssm/docgpt-pharm
  • paper_authors: S. S. Manathunga, Y. A. Illangasekara
  • for: 这篇论文针对对医疗教育领域中大型自然语言模型的应用进行探讨,旨在降低对特定任务的推理错误和生成危险答案。
  • methods: 论文提出了一种叫做Retrieval Augmented Generation(RAG)的方法,可以轻松地将非 Parametric 知识库附加到大型自然语言模型中,并且可以对这些模型进行摘要和抽象Summary的生成。
  • results: 论文发现RAG方法可以帮助大型自然语言模型在医疗教育领域中提供更好的答案,并且可以对模型的推理错误和生成危险答案进行降低。
    Abstract Large Language Models are increasingly being used for various tasks including content generation and as chatbots. Despite their impressive performances in general tasks, LLMs need to be aligned when applying for domain specific tasks to mitigate the problems of hallucination and producing harmful answers. Retrieval Augmented Generation (RAG) allows to easily attach and manipulate a non-parametric knowledgebases to LLMs. Applications of RAG in the field of medical education are discussed in this paper. A combined extractive and abstractive summarization method for large unstructured textual data using representative vectors is proposed.
    摘要

A Satellite Imagery Dataset for Long-Term Sustainable Development in United States Cities

  • paper_url: http://arxiv.org/abs/2308.00465
  • repo_url: https://github.com/axin1301/satellite-imagery-dataset
  • paper_authors: Yanxin Xi, Yu Liu, Tong Li, Jintao Ding, Yunke Zhang, Sasu Tarkoma, Yong Li, Pan Hui
  • for: 这份研究是为了支持美国城市的可持续开发目标(SDGs)研究,尤其是使用卫星影像来研究城市可持续发展。
  • methods: 研究使用了深度学习模型,收集了卫星影像和其他数据,包括人口、夜间照明、调查和城市建筑数据,以描述城市的可持续开发指标。
  • results: 研究创建了一个覆盖100个最大城市和相应的人口普查区域的卫星影像数据集,可以帮助城市规划师和研究人员进一步推进SDGs相关的研究,特别是使用卫星影像来监控城市长期和多个构度的可持续开发。
    Abstract Cities play an important role in achieving sustainable development goals (SDGs) to promote economic growth and meet social needs. Especially satellite imagery is a potential data source for studying sustainable urban development. However, a comprehensive dataset in the United States (U.S.) covering multiple cities, multiple years, multiple scales, and multiple indicators for SDG monitoring is lacking. To support the research on SDGs in U.S. cities, we develop a satellite imagery dataset using deep learning models for five SDGs containing 25 sustainable development indicators. The proposed dataset covers the 100 most populated U.S. cities and corresponding Census Block Groups from 2014 to 2023. Specifically, we collect satellite imagery and identify objects with state-of-the-art object detection and semantic segmentation models to observe cities' bird's-eye view. We further gather population, nighttime light, survey, and built environment data to depict SDGs regarding poverty, health, education, inequality, and living environment. We anticipate the dataset to help urban policymakers and researchers to advance SDGs-related studies, especially applying satellite imagery to monitor long-term and multi-scale SDGs in cities.
    摘要 Translated into Simplified Chinese:城市发挥重要作用于实现可持续发展目标(SDGs),推动经济增长并满足社会需求。尤其是卫星成像是可能的数据源,用于研究可持续城市发展。然而,美国(U.S.)覆盖多个城市、多年、多级、多指标的全面数据集缺乏。为支持美国城市的SDGs研究,我们开发了使用深度学习模型的卫星成像数据集,包括5个SDGs和25个可持续发展指标。该数据集覆盖了美国100个最大人口城市以及相应的人口普查小区,从2014年到2023年。我们通过使用当前的物体检测和semantic segmentation模型,从卫星成像中识别城市的 Bird's-eye view。此外,我们还收集了人口、夜光亮、调查和建筑环境数据,以描绘SDGs关于贫困、健康、教育、不平等和生活环境等方面。我们预计该数据集将帮助城市规划者和研究人员,通过使用卫星成像,对多个城市进行长期和多级SDGs监测。

DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in Cluttered Scenes

  • paper_url: http://arxiv.org/abs/2308.00456
  • repo_url: None
  • paper_authors: Philipp Blättner, Johannes Brand, Gerhard Neumann, Ngo Anh Vien
  • for: 提高多指机器人抓取技能的计算效率和多样性
  • methods: 提出了一种差分 grasp generation 网络(DMFC-GraspNet),包括三大贡献:一种新的神经网络抓取规划算法、一种场景创建和标签映射方法,以及一种综合损失函数和 generalized Q 1 抓取评价指标来训练 DMFC-GraspNet。
  • results: 对于使用 Shadow Dexterous Hand 在 MuJoCo simulator 进行测试,提出的方法能够提高多指机器人抓取技能的计算效率和多样性,并在多指机器人抓取领域取得显著进步。
    Abstract Robotic grasping is a fundamental skill required for object manipulation in robotics. Multi-fingered robotic hands, which mimic the structure of the human hand, can potentially perform complex object manipulation. Nevertheless, current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time, limiting computational efficiency and their versatility, i.e. unimodal grasp distribution. This paper proposes a differentiable multi-fingered grasp generation network (DMFC-GraspNet) with three main contributions to address this challenge. Firstly, a novel neural grasp planner is proposed, which predicts a new grasp representation to enable versatile and dense grasp predictions. Secondly, a scene creation and label mapping method is developed for dense labeling of multi-fingered robotic hands, which allows a dense association of ground truth grasps. Thirdly, we propose to train DMFC-GraspNet end-to-end using using a forward-backward automatic differentiation approach with both a supervised loss and a differentiable collision loss and a generalized Q 1 grasp metric loss. The proposed approach is evaluated using the Shadow Dexterous Hand on Mujoco simulation and ablated by different choices of loss functions. The results demonstrate the effectiveness of the proposed approach in predicting versatile and dense grasps, and in advancing the field of multi-fingered robotic grasping.
    摘要 瑞博机器人抓取是机器人控制领域的基本技能之一,可以帮助机器人抓取和操作物体。多指机器人手臂,它们模仿人类手臂的结构,可以执行复杂的物体抓取。然而,当前的多指机器人抓取技术 frequently predicts only a single grasp for each inference time,这限制了计算效率和其多样性,即单模态抓取分布。这篇论文提出了一种可微分的多指机器人抓取生成网络(DMFC-GraspNet),具有以下三个贡献:首先,一种新的神经网络抓取规划器被提出,可以预测多种抓取方式,以提高抓取多样性和密度。第二,一种场景创建和标签映射方法被开发出来,用于密集标注多指机器人手臂。这allow us to associate dense ground truth grasps with the robotic hands.第三,我们提议使用一种综合损失函数和自动梯度推导法来训练DMFC-GraspNet,包括监督损失和不可 differentiable损失。我们还使用一种通用Q1抓取度量loss。我们在Mujoco simulate中使用瑞博dex手臂进行训练和磨练,并对不同损失函数进行ablation。结果表明,我们的方法可以预测多样和密集的抓取方式,并在多指机器人抓取领域提高了前iers。

MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers

  • paper_url: http://arxiv.org/abs/2308.03741
  • repo_url: None
  • paper_authors: Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam, Naveed Akhtar
  • for: 提高多模态人体动作识别(MHAR)的效果
  • methods: 利用音频模式和图像模式的结合,通过将音频模式转化到图像模式中,形成一个统一的表示。
  • results: 与现有的状态 искусственный智能策略相比,MAiVAR-T表现出色,实验结果证明了模型在人体动作识别任务中的优异表现。
    Abstract In line with the human capacity to perceive the world by simultaneously processing and integrating high-dimensional inputs from multiple modalities like vision and audio, we propose a novel model, MAiVAR-T (Multimodal Audio-Image to Video Action Recognition Transformer). This model employs an intuitive approach for the combination of audio-image and video modalities, with a primary aim to escalate the effectiveness of multimodal human action recognition (MHAR). At the core of MAiVAR-T lies the significance of distilling substantial representations from the audio modality and transmuting these into the image domain. Subsequently, this audio-image depiction is fused with the video modality to formulate a unified representation. This concerted approach strives to exploit the contextual richness inherent in both audio and video modalities, thereby promoting action recognition. In contrast to existing state-of-the-art strategies that focus solely on audio or video modalities, MAiVAR-T demonstrates superior performance. Our extensive empirical evaluations conducted on a benchmark action recognition dataset corroborate the model's remarkable performance. This underscores the potential enhancements derived from integrating audio and video modalities for action recognition purposes.
    摘要 根据人类能同时处理和 integrate 多个模式的高维输入,我们提出一种新的模型,MAiVAR-T(多模态音频图像到视频动作识别变换器)。这个模型采用一种直观的方法将 audio-image 和 video 模式结合,以提高多模态人体动作识别(MHAR)的效果。MAiVAR-T 的核心在于将 audio 模式中的重要表示转化到图像频谱中,然后将这些图像表示与 video 模式结合,形成一个统一的表示。这种结合方法利用了 audio 和 video 模式中的内在背景,提高动作识别的性能。与现有的state-of-the-art策略不同,MAiVAR-T 不仅仅关注 audio 或 video 模式,而是通过结合这两种模式来提高动作识别的效果。我们的广泛的实验证明,MAiVAR-T 在一个标准的动作识别数据集上表现出优秀的成绩,这证明了将 audio 和 video 模式结合起来可以提高动作识别的性能。

Structural Embeddings of Tools for Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00447
  • repo_url: None
  • paper_authors: Eren Unlu
  • for: 本研究的中心目标是强调在未来,大语言模型(LLM)与外部工具之间的图形基本方法的重要性。
  • methods: 该研究提出了一种例子框架,用于导航 LLM 与 exponentially 增长的外部工具之间的互动。该框架使用图形编码对象ives和功能性,以便在不同任务下进行可控的组合。
  • results: 该研究认为,图形基本方法可以为 LLM 在不同任务下的扩展和应用带来新的可能性,包括文本段的链式思维(CoT)等。
    Abstract It is evident that the current state of Large Language Models (LLMs) necessitates the incorporation of external tools. The lack of straightforward algebraic and logical reasoning is well documented and prompted researchers to develop frameworks which allow LLMs to operate via external tools. The ontological nature of tool utilization for a specific task can be well formulated with a Directed Acyclic Graph (DAG). The central aim of the paper is to highlight the importance of graph based approaches to LLM-tool interaction in near future. We propose an exemplary framework to guide the orchestration of exponentially increasing numbers of external tools with LLMs,where objectives and functionalities of tools are graph encoded hierarchically. Assuming that textual segments of a Chain-of-Thought (CoT) can be imagined as a tool as defined here, the graph based framework can pave new avenues in that particular direction as well.
    摘要 现在的大语言模型(LLM)需要外部工具的整合。lack of 直觉的代数逻辑已经被文献所证明,促使研究人员发展出用外部工具进行 LLM 操作的框架。 ontological 性的工具使用方式可以通过指向不同的导航图(DAG)来形式化。本文的主要目的是强调在未来中graph基本方法将在 LLM 与外部工具之间扮演重要的角色。我们提出了一个示范性的框架,以引导 exponentially 增加的外部工具与 LLM 之间的协调,其中工具的目标和功能将在层次结构中被图解编码。假设文本段落可以被想象为一个链接思维(CoT)中的工具,那么图基的框架将可以开启新的可能性。

ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

  • paper_url: http://arxiv.org/abs/2308.02537
  • repo_url: https://github.com/philipp-kohl/active-learning-evaluation-framework
  • paper_authors: Philipp Kohl, Nils Freyer, Yoka Krämer, Henri Werth, Steffen Wolf, Bodo Kraft, Matthias Meinecke, Albert Zündorf
    for: This paper aims to provide an empirical basis for choosing between different active learning (AL) strategies in natural language processing (NLP) tasks.methods: The paper introduces a reproducible active learning evaluation (ALE) framework for comparing AL strategies in NLP. The framework allows for the implementation of AL strategies with low effort and a fair data-driven comparison, and it tracks experiment parameters such as initial dataset size, number of data points per query step, and budget.results: The paper presents a case study to illustrate how to use the ALE framework, and it provides a basis for practitioners to make more informed decisions and for researchers to focus on developing new, effective AL strategies and deriving best practices for specific use cases.
    Abstract Supervised machine learning and deep learning require a large amount of labeled data, which data scientists obtain in a manual, and time-consuming annotation process. To mitigate this challenge, Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample. This method is supposed to save annotation effort while maintaining model performance. However, practitioners face many AL strategies for different tasks and need an empirical basis to choose between them. Surveys categorize AL strategies into taxonomies without performance indications. Presentations of novel AL strategies compare the performance to a small subset of strategies. Our contribution addresses the empirical basis by introducing a reproducible active learning evaluation (ALE) framework for the comparative evaluation of AL strategies in NLP. The framework allows the implementation of AL strategies with low effort and a fair data-driven comparison through defining and tracking experiment parameters (e.g., initial dataset size, number of data points per query step, and the budget). ALE helps practitioners to make more informed decisions, and researchers can focus on developing new, effective AL strategies and deriving best practices for specific use cases. With best practices, practitioners can lower their annotation costs. We present a case study to illustrate how to use the framework.
    摘要 超vised机器学习和深度学习需要大量标注数据,数据科学家通过手动、时间consuming的标注过程获取。为了解决这个挑战,活动学习(AL)提出了有前提的数据点,而不是随机或后续的样本。这种方法可以降低标注努力的时间和成本,同时保持模型性能。然而,实践者面临着许多AL策略,需要一个经验基础来选择。现有的survey categorizes AL策略,但没有表现指标。文章提出了一个可重复的活动学习评价(ALE)框架,用于比较AL策略的相对评价。该框架可以实现AL策略的实现,并且通过定义和跟踪实验参数(例如,初始数据集大小,每次查询步骤中的数据点数和预算)来进行公平的比较。ALE帮助实践者更加了解决,研究者可以更专注于开发新的、有效的AL策略和特定用例的最佳实践。通过最佳实践,实践者可以降低标注成本。我们在case study中示例如如何使用该框架。

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

  • paper_url: http://arxiv.org/abs/2308.00436
  • repo_url: https://github.com/ningmiao/selfcheck
  • paper_authors: Ning Miao, Yee Whye Teh, Tom Rainforth
  • for: 本研究旨在检验大语言模型(LLM)是否可以自动认错,而不需要外部资源。
  • methods: 我们提出了一种零 shot 验证方案,用于识别具有多步骤 reasoning 的错误。然后,我们使用这种验证方案来提高问答性能,通过对不同生成的答案进行权重投票。
  • results: 我们在三个数学 dataset(GSM8K、MathQA 和 MATH)上测试了这种方法,发现它可以成功识别错误,并在最终预测性能中提高表现。
    Abstract The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.
    摘要 最近的大语言模型(LLM)的进步,尤其是创造思维(CoT)提问的发明,使得解释问题变得可能。然而,即使最强的LLM也在更复杂的问题上尚未能具备非线性思维和多步逻辑。在这种情况下,我们询问LLM是否有能力自动发现错误,不需要外部资源。具体来说,我们研究LLM是否可以识别每个步逻辑中的错误。为此,我们提出了零shot验证方案,用于识别错误。然后,我们使用这种验证方案来提高问答性能,通过对不同生成的答案进行权重投票。我们在三个数学 dataset(GSM8K、MathQA和MATH)上测试了该方法,并发现它可以成功识别错误,并在最终预测性能中提高表现。

Patch-wise Auto-Encoder for Visual Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.00429
  • repo_url: None
  • paper_authors: Yajie Cui, Zhaoxiang Liu, Shiguo Lian
  • for: 提高无supervision anomaly detection的能力
  • methods: 使用patch-wise auto-encoder(Patch AE)框架,通过对每个图像 patch 的重建,提高模型对异常图像的重建能力
  • results: 在Mvtec AD benchmark上达到了新的州 Of-the-art性能,表明方法的效果。有很大的实际应用前景。
    Abstract Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.
    摘要 寻找无先知 anomaly 是一项挑战。在无监督 anomaly detection 领域,传统的 auto-encoder (AE) 往往会因为只在正常图像上训练,导致模型无法正确地重建异常图像。相反,我们提出了一种 novel patch-wise auto-encoder (Patch AE) 框架,旨在增强 AE 对异常图像的重建能力,而不是弱化它。每个图像的每个patch 都由对应的空间分布的特征向量来重建, guarantees 异常敏感性。我们的方法简单、高效,可以提高 state-of-the-art 性能,证明了我们的模型的效果。它在实际工业应用场景中表现出了很大的潜力。

Generative adversarial networks with physical sound field priors

  • paper_url: http://arxiv.org/abs/2308.00426
  • repo_url: https://github.com/xefonon/soundfieldgan
  • paper_authors: Xenofon Karakonstantis, Efren Fernandez-Grande
  • for: 这种方法用于重构听场,使用生成敌对网络(GANs)。
  • methods: 该方法使用平面波基础,学习室内压力的下来分布,以准确地重构听场从有限多个测量点。
  • results: 研究表明,该模型可以在高频范围内提高重构精度和能量保持率,特别是在测量区域之外推算。此外,该方法可以适应不同测量点和配置的变化,无需牺牲性能。
    Abstract This paper presents a deep learning-based approach for the spatio-temporal reconstruction of sound fields using Generative Adversarial Networks (GANs). The method utilises a plane wave basis and learns the underlying statistical distributions of pressure in rooms to accurately reconstruct sound fields from a limited number of measurements. The performance of the method is evaluated using two established datasets and compared to state-of-the-art methods. The results show that the model is able to achieve an improved reconstruction performance in terms of accuracy and energy retention, particularly in the high-frequency range and when extrapolating beyond the measurement region. Furthermore, the proposed method can handle a varying number of measurement positions and configurations without sacrificing performance. The results suggest that this approach provides a promising approach to sound field reconstruction using generative models that allow for a physically informed prior to acoustics problems.
    摘要

Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions

  • paper_url: http://arxiv.org/abs/2308.00425
  • repo_url: None
  • paper_authors: Christina Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh
  • for: 这个论文的目的是提高自然语言处理应用程序的预测质量,通过修改句子的结构和长度,使其更容易分析。
  • methods: 这个论文使用了一种基于语言知识的文本简化方法,包括句子拆分和重新排序,以提高句子的简单性和可读性。
  • results: 这个论文的实验结果表明,这种基于语言知识的文本简化方法可以有效地提高自然语言处理应用程序的预测质量,并且可以保持句子的意义完整性。
    Abstract Sentences that present a complex syntax act as a major stumbling block for downstream Natural Language Processing applications whose predictive quality deteriorates with sentence length and complexity. The task of Text Simplification (TS) may remedy this situation. It aims to modify sentences in order to make them easier to process, using a set of rewriting operations, such as reordering, deletion, or splitting. State-of-the-art syntactic TS approaches suffer from two major drawbacks: first, they follow a very conservative approach in that they tend to retain the input rather than transforming it, and second, they ignore the cohesive nature of texts, where context spread across clauses or sentences is needed to infer the true meaning of a statement. To address these problems, we present a discourse-aware TS approach that splits and rephrases complex English sentences within the semantic context in which they occur. Based on a linguistically grounded transformation stage that uses clausal and phrasal disembedding mechanisms, complex sentences are transformed into shorter utterances with a simple canonical structure that can be easily analyzed by downstream applications. With sentence splitting, we thus address a TS task that has hardly been explored so far. Moreover, we introduce the notion of minimality in this context, as we aim to decompose source sentences into a set of self-contained minimal semantic units. To avoid breaking down the input into a disjointed sequence of statements that is difficult to interpret because important contextual information is missing, we incorporate the semantic context between the split propositions in the form of hierarchical structures and semantic relationships. In that way, we generate a semantic hierarchy of minimal propositions that leads to a novel representation of complex assertions that puts a semantic layer on top of the simplified sentences.
    摘要 сложные предложения acted as a major stumbling block for downstream Natural Language Processing applications, whose predictive quality decreased with sentence length and complexity. The task of Text Simplification (TS) may remedy this situation. It aims to modify sentences to make them easier to process, using a set of rewriting operations such as reordering, deletion, or splitting. However, state-of-the-art syntactic TS approaches have two major drawbacks: they tend to retain the input rather than transforming it, and they ignore the cohesive nature of texts, where context spread across clauses or sentences is needed to infer the true meaning of a statement. To address these problems, we present a discourse-aware TS approach that splits and rephrases complex English sentences within the semantic context in which they occur. Based on a linguistically grounded transformation stage that uses clausal and phrasal disembedding mechanisms, complex sentences are transformed into shorter utterances with a simple canonical structure that can be easily analyzed by downstream applications. With sentence splitting, we thus address a TS task that has hardly been explored so far. Moreover, we introduce the notion of minimality in this context, as we aim to decompose source sentences into a set of self-contained minimal semantic units. To avoid breaking down the input into a disjointed sequence of statements that is difficult to interpret because important contextual information is missing, we incorporate the semantic context between the split propositions in the form of hierarchical structures and semantic relationships. In this way, we generate a semantic hierarchy of minimal propositions that leads to a novel representation of complex assertions that puts a semantic layer on top of the simplified sentences.

Exploring the Role of Explainability in AI-Assisted Embryo Selection

  • paper_url: http://arxiv.org/abs/2308.02534
  • repo_url: None
  • paper_authors: Lucia Urcelay, Daniel Hinjos, Pablo A. Martin-Torres, Marta Gonzalez, Marta Mendez, Salva Cívico, Sergio Álvarez-Napagao, Dario Garcia-Gasulla
  • for: 这篇研究旨在探讨人工受精(In Vitro Fertilization)中选择和评估胚胎的方法,以及如何将人工智能(AI)技术应用于胚胎分析中,以提高评估过程的精度和可靠性。
  • methods: 本研究使用了深度学习技术,并分析了现有的AI-助け胚胎分析模型的解释性。
  • results: 研究发现现有的AI-助け胚胎分析模型的解释性有限,并提出了将这些模型整合到临床实践中的建议,以满足诊断师和病人的需求。
    Abstract In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making is key. In this paper we analyze the current work in the explainability of AI-assisted embryo analysis models, identifying the limitations. We also discuss how these models could be integrated in the clinical context as decision support systems, considering the needs of clinicians and patients. Finally, we propose guidelines for the sake of increasing interpretability and trustworthiness, pushing this technology forward towards established clinical practice.
    摘要 幂化诊断是妊娠不孕的一种最广泛的治疗方法。其中一个主要挑战是评估和选择受试 embryo 进行嵌入,这个过程具有大量的内部和外部医生差异性。深度学习基于的方法在引起了关注,但它们的透明性问题限制了它们在临床上的接受度。本文分析了现有的 AI-assisted embryo 分析模型解释性的工作,并识别了其局限性。我们还讨论了如何将这些模型integrated into the clinical context as decision support systems, considering the needs of clinicians and patients。最后,我们提出了增加解释性和可信度的指南,推动这种技术向确定的临床实践前进。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2308.01207
  • repo_url: https://github.com/chriswang98sz/bierl
  • paper_authors: Junyi Wang, Yuanyang Zhu, Zhi Wang, Yan Zheng, Jianye Hao, Chunlin Chen
  • for: 提高复杂RL问题的解决能力,适应不同RL算法的应用
  • methods: 提出了一种总体meta-RL框架,通过级联优化来同时更新内部RL模型和meta参数,不需要先知道预料或优化过程
  • results: 通过在MuJoCo和Box2D任务中进行了广泛的实验,证明了BiERL在多种ERL算法中表现出色,可以持续提高RL模型的学习效率
    Abstract Evolutionary reinforcement learning (ERL) algorithms recently raise attention in tackling complex reinforcement learning (RL) problems due to high parallelism, while they are prone to insufficient exploration or model collapse without carefully tuning hyperparameters (aka meta-parameters). In the paper, we propose a general meta ERL framework via bilevel optimization (BiERL) to jointly update hyperparameters in parallel to training the ERL model within a single agent, which relieves the need for prior domain knowledge or costly optimization procedure before model deployment. We design an elegant meta-level architecture that embeds the inner-level's evolving experience into an informative population representation and introduce a simple and feasible evaluation of the meta-level fitness function to facilitate learning efficiency. We perform extensive experiments in MuJoCo and Box2D tasks to verify that as a general framework, BiERL outperforms various baselines and consistently improves the learning performance for a diversity of ERL algorithms.
    摘要 生化演进学习(ERL)算法在复杂的演进学习(RL)问题中受到关注,因为它们具有高并行性。然而,它们可能会因为不足的探索或模型崩溃而需要精心调整超参数(meta-parameters)。在这篇论文中,我们提出了一种通用的meta-ERL框架,通过级联优化(BiERL)来同时更新超参数和ERL模型,从而避免需要先进行模型部署前的优化或培ippi���hn Domain知识。我们设计了一种美化的meta- уров层建 architecture,将inner-level的演进经验嵌入到一个有用的人口表示中,并引入一种简单可行的meta- уров度评价函数,以便提高学习效率。我们在MuJoCo和Box2D任务中进行了广泛的实验,并证明了BiERL框架在多种ERL算法中具有优秀的总体性能。

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis

  • paper_url: http://arxiv.org/abs/2308.00404
  • repo_url: https://github.com/sisinflab/graph-rss-reproducibility
  • paper_authors: Vito Walter Anelli, Daniele Malitesta, Claudio Pomo, Alejandro Bellogín, Tommaso Di Noia, Eugenio Di Sciascio
  • for: 本研究旨在提高图解析器(Graph Neural Network,GNN)在推荐系统中的可重现性,以便更好地理解不同图解析器在具体的配置下的表现。
  • methods: 本研究使用了六种流行的图解析器(NGCF、DGCF、LightGCN、SGL、UltraGCN、GFCF),在三个常用的数据集(Gowalla、Yelp 2018、Amazon Book)上进行了实验。此外,研究者还与传统的共同推荐模型进行了比较,以评估图解析器在不同数据集上的表现。
  • results: 研究发现,在三个常用的数据集上,图解析器的表现有所不同,而且与传统的共同推荐模型相比,图解析器在一些数据集上表现较差。此外,研究者还发现了数据集的特点对于推荐准确性的影响。
    Abstract The success of graph neural network-based models (GNNs) has significantly advanced recommender systems by effectively modeling users and items as a bipartite, undirected graph. However, many original graph-based works often adopt results from baseline papers without verifying their validity for the specific configuration under analysis. Our work addresses this issue by focusing on the replicability of results. We present a code that successfully replicates results from six popular and recent graph recommendation models (NGCF, DGCF, LightGCN, SGL, UltraGCN, and GFCF) on three common benchmark datasets (Gowalla, Yelp 2018, and Amazon Book). Additionally, we compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. Furthermore, we extend our study to two new datasets (Allrecipes and BookCrossing) that lack established setups in existing literature. As the performance on these datasets differs from the previous benchmarks, we analyze the impact of specific dataset characteristics on recommendation accuracy. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure. The code to reproduce our experiments is available at: https://github.com/sisinflab/Graph-RSs-Reproducibility.
    摘要 GRAPH Neural Network-based models (GNNs) 的成功有效地提高了推荐系统,通过模型用户和物品为两个bidirectional, undirected 图。然而,许多原始图基 works often adopt 基线纸上的结果 без 验证它们是否适用于特定配置。我们的工作解决这个问题,我们专注于复制性。我们提供了一个代码,可以成功复制六种流行的最近的图推荐模型(NGCF、DGCF、LightGCN、SGL、UltraGCN 和 GFCF)的结果在三个常见的benchmark datasets(Gowalla、Yelp 2018 和 Amazon Book)。此外,我们与传统的合作 filtering 模型进行比较,这些模型在过去的线上评估中表现良好。此外,我们将研究 extends to two new datasets(Allrecipes 和 BookCrossing),这些 dataset 在现有 литературе中没有 Established setup。由于这些dataset的性能与之前的benchmark datasets不同,我们分析数据集特点对于推荐准确性的影响。我们通过研究用户邻居的信息流来Identify which models are influenced by intrinsic features in the dataset structure。我们的代码可以在:https://github.com/sisinflab/Graph-RSs-Reproducibility中复制。

Counterfactual Graph Transformer for Traffic Flow Prediction

  • paper_url: http://arxiv.org/abs/2308.00391
  • repo_url: None
  • paper_authors: Ying Yang, Kai Du, Xingyuan Dai, Jianwu Fang
  • for: 提高流量预测的可解释性和可靠性
  • methods: 提出了Counterfactual Graph Transformer(CGT)模型,并实现了对输入感知特征和图结构的干扰掩码生成器,以获取空间和时间 counterfactual 解释
  • results: 对三个实际世界公共数据集进行了广泛的实验,并证明了 CGT 可以生成可靠的解释和提高流量预测的可靠性。
    Abstract Traffic flow prediction (TFP) is a fundamental problem of the Intelligent Transportation System (ITS), as it models the latent spatial-temporal dependency of traffic flow for potential congestion prediction. Recent graph-based models with multiple kinds of attention mechanisms have achieved promising performance. However, existing methods for traffic flow prediction tend to inherit the bias pattern from the dataset and lack interpretability. To this end, we propose a Counterfactual Graph Transformer (CGT) model with an instance-level explainer (e.g., finding the important subgraphs) specifically designed for TFP. We design a perturbation mask generator over input sensor features at the time dimension and the graph structure on the graph transformer module to obtain spatial and temporal counterfactual explanations. By searching the optimal perturbation masks on the input data feature and graph structures, we can obtain the concise and dominant data or graph edge links for the subsequent TFP task. After re-training the utilized graph transformer model after counterfactual perturbation, we can obtain improved and interpretable traffic flow prediction. Extensive results on three real-world public datasets show that CGT can produce reliable explanations and is promising for traffic flow prediction.
    摘要 traffic 流量预测(TFP)是智能交通系统(ITS)的基本问题,它模型了交通流量的隐藏空间时间相互关系,以预测潮湍。 current 图形基于模型已经实现了显著的表现。 however, exiting 方法 для traffic 流量预测通常会继承数据集的偏见模式和lack of interpretability。 To address this, we propose a Counterfactual Graph Transformer(CGT)模型,具有实例级别的解释器(例如,找到重要的子图),specifically designed for TFP. We design a 扰动mask生成器 over input sensor features at the time dimension and the graph structure on the graph transformer module to obtain spatial and temporal counterfactual explanations. By searching the optimal perturbation masks on the input data feature and graph structures, we can obtain the concise and dominant data or graph edge links for the subsequent TFP task. After re-training the utilized graph transformer model after counterfactual perturbation, we can obtain improved and interpretable traffic flow prediction. Extensive results on three real-world public datasets show that CGT can produce reliable explanations and is promising for traffic flow prediction.Note: Some words and phrases have been modified to conform to Simplified Chinese grammar and vocabulary.

Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress

  • paper_url: http://arxiv.org/abs/2308.00382
  • repo_url: None
  • paper_authors: Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao
  • for: 本研究旨在提出一种基于人工智能的三相扩展(TPS)调制策略,以最小化DAB转换器的电流压力。
  • methods: 本研究使用神经网络(NN)和软件推理系统(FIS)来解决TPS调制中的三个调制变量对current stress的影响,并提出了一种基于AI的TPS调制策略。
  • results: 实验结果表明,提出的AI-TPSM策略可以减少DAB转换器的电流压力,并且可以提高TPS调制的精度和可靠性。
    Abstract The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB converter has been an important performance indicator when TPS modulation is applied for smaller size and higher efficiency. However, to minimize the current stress when the DAB converter is under TPS modulation, two difficulties exist in analysis process and realization process, respectively. Firstly, three degrees of modulation variables in TPS modulation bring challenges to the analysis of current stress in different operating modes. This analysis and deduction process leads to heavy computational burden and also suffers from low accuracy. Secondly, to realize TPS modulation, if a lookup table is adopted after the optimization of modulation variables, modulation performance will be unsatisfactory because of the discrete nature of lookup table. Therefore, an AI-based TPS modulation (AI-TPSM) strategy is proposed in this paper. Neural network (NN) and fuzzy inference system (FIS) are utilized to deal with the two difficulties mentioned above. With the proposed AI-TPSM, the optimization of TPS modulation for minimized current stress will enjoy high degree of automation which can relieve engineers' working burden and improve accuracy. In the end of this paper, the effectiveness of the proposed AI-TPSM has been experimentally verified with a 1 kW prototype.
    摘要 双活桥(DAB)Converter在许多应用中具有出色的电力密度和对向电力传输能力。至今为止,三相滑动(TPS)可以被视为DABConverter中最高等级的调制技术。它可以宽化零电压调制范围和提高电力效率很大。然而,当TPS调制被应用时,DABConverter的电流负载成为了重要的性能指标。但是,为了最小化DABConverter在TPS调制下的电流负载,存在两种难点:一是TPS调制中的三个调制变数带来了不同运行模式下的电流压力分析和推导过程中的重要问题,这个过程具有复杂的计算负载和低准确性。二是,为了实现TPS调制,如果使用 lookup 表,则调制性能将会不满足。因此,本文提出了一个基于人工智能(AI)的TPS调制策略(AI-TPSM)。使用神经网络(NN)和决策系统(FIS)来解决这两个问题。具有提案的AI-TPSM,将促进TPS调制的最佳化,从而实现较高的自动化程度,减轻工程师的工作负担,并提高准确性。本文的实验结果显示,提案的AI-TPSM在1kW试验产品中得到了认可。

Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency

  • paper_url: http://arxiv.org/abs/2308.00381
  • repo_url: None
  • paper_authors: Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao
  • For: The paper aims to propose an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation for dual active bridge (DAB) converters to achieve optimal efficiency with full zero-voltage switching (ZVS) operation over the entire operating range.* Methods: The HEPS modulation is developed using an automated fashion, which alleviates the cumbersome model building process while maintaining high model accuracy. The paper uses extreme gradient boosting (XGBoost) to build data-driven models of ZVS and efficiency performance, and particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) to select the best EPS strategy and optimize modulation parameters.* Results: The paper verifies the feasibility of HEPS with 1 kW hardware experiments, achieving optimal efficiency of up to 97.1% and full-range ZVS operation.
    Abstract Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while keeping high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation.
    摘要 双活桥(DAB)转换器是许多受欢迎应用程序中的关键启用器,如无线充电、电动车和可再生能源。ZVS范围和效率是DAB转换器的两个重要性能指标。为了实现所需的ZVS和效率性能,模拟应该仔细设计。在本文中,我们提出了人工智能基于的扩展phas shift(HEPS)模ulation,以解决传统的异步模ulation和分割模ulation的缺点。HEPS模ulation通过自动化模型建立过程来缓解繁琐的模型建立过程,同时保持高度准确。在HEPS模ulation中,两种EPS策略被考虑以实现最佳的效率和ZVS操作。具体来说,通过使用极限梯度搜索(XGBoost)算法来建立数据驱动模型,以实现ZVS和效率性能的优化。然后,使用粒子群搜索与状态基于适应限速(PSO-SAVL)算法来选择最佳EPS策略和优化模ulation参数。在1千瓦硬件实验中,我们证明了HEPS的可行性,实现了最佳效率(97.1%)和全范围ZVS操作。

Shape Completion with Prediction of Uncertain Regions

  • paper_url: http://arxiv.org/abs/2308.00377
  • repo_url: https://github.com/dlr-rm/shape-completion
  • paper_authors: Matthias Humt, Dominik Winkelbauer, Ulrich Hillenbrand
  • for: 这个研究是为了解决Shape completion问题,即从partial observation中推算出物体的完整几何构造。
  • methods: 本研究提出了两种新的方法来预测 uncertain regions,一种是通过处理occupancy scores的后处理,另一种是直接预测不确定指标。
  • results: 比较两种新方法和两种已知的方法,新方法在Shape completion和uncertain region prediction中都表现出来了较高的精度,并且可以避免预测的uncertain regions以提高grasps的质量。
    Abstract Shape completion, i.e., predicting the complete geometry of an object from a partial observation, is highly relevant for several downstream tasks, most notably robotic manipulation. When basing planning or prediction of real grasps on object shape reconstruction, an indication of severe geometric uncertainty is indispensable. In particular, there can be an irreducible uncertainty in extended regions about the presence of entire object parts when given ambiguous object views. To treat this important case, we propose two novel methods for predicting such uncertain regions as straightforward extensions of any method for predicting local spatial occupancy, one through postprocessing occupancy scores, the other through direct prediction of an uncertainty indicator. We compare these methods together with two known approaches to probabilistic shape completion. Moreover, we generate a dataset, derived from ShapeNet, of realistically rendered depth images of object views with ground-truth annotations for the uncertain regions. We train on this dataset and test each method in shape completion and prediction of uncertain regions for known and novel object instances and on synthetic and real data. While direct uncertainty prediction is by far the most accurate in the segmentation of uncertain regions, both novel methods outperform the two baselines in shape completion and uncertain region prediction, and avoiding the predicted uncertain regions increases the quality of grasps for all tested methods. Web: https://github.com/DLR-RM/shape-completion
    摘要 Shape completion, 即从部分观察获取物体完整的几何结构,在机器人抓取任务中非常有 relevance。当基于物体形状重建的计划或预测真正的抓取动作时,确保geometry uncertainty的存在是非常重要的。尤其是在给出杂乱的物体视图时,可能存在扩展区域中对整个物体部分的存在的不确定性。为处理这种重要的情况,我们提出了两种新的方法来预测这些不确定的区域,一种通过质量分配的后处理,另一种通过直接预测不确定指标。我们将这些方法与两种已知的概率形状完成方法进行比较。此外,我们还生成了基于ShapeNet的数据集,包含真实渲染的深度图像,以及对这些图像的 annotations for uncertain regions。我们在这个数据集上训练和测试每种方法的 shape completion和不确定区域预测能力,并发现两种新方法在shape completion和不确定区域预测方面都高于两个基elines,并且避免预测的不确定区域可以提高所有测试方法的抓取质量。详细信息请参考

Fountain – an intelligent contextual assistant combining knowledge representation and language models for manufacturing risk identification

  • paper_url: http://arxiv.org/abs/2308.00364
  • repo_url: None
  • paper_authors: Saurabh Kumar, Daniel Fuchs, Klaus Spindler
  • for: 本研究旨在提供一种基于语言模型和知识图的启用中间件,以帮助工程师在 deviations 管理中预测和避免因产品设计和生产过程变化而导致的风险。
  • methods: 本研究使用语言模型finetuned для域pecific semantic similarity,以及基于bill of materials、Failure Modes and Effect Analysis (FMEA)和客户前面 Failure 的知识表示。
  • results: 研究人员通过实验和案例研究表明,可以通过采用本研究提出的方法,实时地预测和避免因 deviations 而导致的风险,并且可以在现有的计算机机器基础设施上进行模型更新和推理。
    Abstract Deviations from the approved design or processes during mass production can lead to unforeseen risks. However, these changes are sometimes necessary due to changes in the product design characteristics or an adaptation in the manufacturing process. A major challenge is to identify these risks early in the workflow so that failures leading to warranty claims can be avoided. We developed Fountain as a contextual assistant integrated in the deviation management workflow that helps in identifying the risks based on the description of the existing design and process criteria and the proposed deviation. In the manufacturing context, it is important that the assistant provides recommendations that are explainable and consistent. We achieve this through a combination of the following two components 1) language models finetuned for domain specific semantic similarity and, 2) knowledge representation in the form of a property graph derived from the bill of materials, Failure Modes and Effect Analysis (FMEA) and prior failures reported by customers. Here, we present the nuances of selecting and adapting pretrained language models for an engineering domain, continuous model updates based on user interaction with the contextual assistant and creating the causal chain for explainable recommendations based on the knowledge representation. Additionally, we demonstrate that the model adaptation is feasible using moderate computational infrastructure already available to most engineering teams in manufacturing organizations and inference can be performed on standard CPU only instances for integration with existing applications making these methods easily deployable.
    摘要 不同于批量生产中所批准的设计或过程的变化可能会导致未预见的风险。然而,这些变化有时是必要的,因为产品设计特点或生产过程中的变化。一个主要挑战是在工作流程中早期发现这些风险,以避免因杂 deviation 导致的售后服务请求。我们开发了一个名为“喷泉”的上下文助手,它在异常处理工作流程中帮助Identify 风险,基于现有设计和过程标准的描述和提案的异常。在制造上下文中,助手需要提供可解释的建议。我们通过以下两个组件实现了这一点:1)预处理语言模型,适应域pecific Semantic Similarity,和2)基于生产物料清单、失效模式分析(FMEA)和客户前置报告的知识表示。我们还解释了如何选择和适应预训练语言模型,连续更新模型基于用户与上下文助手的交互,以及如何创建 causal chain для可解释的建议。此外,我们还证明了模型适配可以在现有的制造组织中进行,并且推理可以在标准CPU上进行,以便与现有应用程序集成。

MetaGPT: Meta Programming for Multi-Agent Collaborative Framework

  • paper_url: http://arxiv.org/abs/2308.00352
  • repo_url: https://github.com/geekan/metagpt
  • paper_authors: Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu
  • for: 这个论文的目的是探讨如何使用大型自然语言模型(LLM)驱动多代理人工作,以解决复杂的多代理人问题。
  • methods: 这个论文使用了一个创新的框架,名为MetaGPT,它将人类工作流程组合入大型自然语言模型中,以提高多代理人协作的有效性。
  • results: 实验结果显示,MetaGPT可以生成更一致和正确的解决方案,比较先进的对话式多代理人系统。这显示了将人类专业知识 integrate 到多代理人系统中的潜在可能性,并创造了复杂的现实世界挑战的新机会。
    Abstract Recently, remarkable progress has been made in automated task-solving through the use of multi-agent driven by large language models (LLMs). However, existing LLM-based multi-agent works primarily focus on solving simple dialogue tasks, and complex tasks are rarely studied, mainly due to the LLM hallucination problem. This type of hallucination becomes cascading when naively chaining multiple intelligent agents, resulting in a failure to effectively address complex problems. Therefore, we introduce MetaGPT, an innovative framework that incorporates efficient human workflows as a meta programming approach into LLM-based multi-agent collaboration. Specifically, MetaGPT encodes Standardized Operating Procedures (SOPs) into prompts to enhance structured coordination. Subsequently, it mandates modular outputs, empowering agents with domain expertise comparable to human professionals, to validate outputs and minimize compounded errors. In this way, MetaGPT leverages the assembly line paradigm to assign diverse roles to various agents, thereby establishing a framework that can effectively and cohesively deconstruct complex multi-agent collaborative problems. Our experiments on collaborative software engineering benchmarks demonstrate that MetaGPT generates more coherent and correct solutions compared to existing chat-based multi-agent systems. This highlights the potential of integrating human domain knowledge into multi-agent systems, thereby creating new opportunities to tackle complex real-world challenges. The GitHub repository of this project is publicly available on:https://github.com/geekan/MetaGPT.
    摘要 最近,在多代理驱动大语言模型(LLM)的情况下,自动任务解决得到了非常出色的进步。然而,现有的LLM基于多代理工作主要集中于解决简单对话任务,复杂任务则rarely studied,主要是因为LLM幻觉问题。这种幻觉会在不经过严格的验证和约束的情况下,继续层层传递,导致复杂问题的解决不具有效果。因此,我们提出了MetaGPT框架,它将人工流程作为多代理协作的meta编程方法。具体来说,MetaGPT将标准化的操作 процедуures(SOPs)编码成提示,以提高结构化协作。然后,它要求机器人输出具有域专业知识的Module输出,以验证输出并减少相加的错误。通过这种方式,MetaGPT利用了生产线模式,将不同的代理分配给不同的机器人,以建立一个可以有效和凝合地解决复杂多代理协作问题的框架。我们对协同软件工程标准准测试数据进行了实验,并证明MetaGPT可以生成更凝合和正确的解决方案,比现有的协同多代理系统更好。这表明了将人类域知识integrated into多代理系统的潜在价值,以创造新的机会来解决复杂的现实世界挑战。MetaGPT项目的GitHub存储库可以在以下链接中找到:https://github.com/geekan/MetaGPT。

Learning Green’s Function Efficiently Using Low-Rank Approximations

  • paper_url: http://arxiv.org/abs/2308.00350
  • repo_url: https://github.com/kishanwn/decgreennet
  • paper_authors: Kishan Wimalawarne, Taiji Suzuki, Sophie Langer
  • for: 用深度学习模型解决不同类型的partial differential equations
  • methods: 使用低级别分解学习绿函数,避免重复 computationally expensive Monte-Carlo integral approximations
  • results: 提高计算时间,与PINNs和MOD-Net相当准确,但减少计算量
    Abstract Learning the Green's function using deep learning models enables to solve different classes of partial differential equations. A practical limitation of using deep learning for the Green's function is the repeated computationally expensive Monte-Carlo integral approximations. We propose to learn the Green's function by low-rank decomposition, which results in a novel architecture to remove redundant computations by separate learning with domain data for evaluation and Monte-Carlo samples for integral approximation. Using experiments we show that the proposed method improves computational time compared to MOD-Net while achieving comparable accuracy compared to both PINNs and MOD-Net.
    摘要 使用深度学习模型学习绿函数可以解决不同类型的partial differential equations。但是使用深度学习 для绿函数受到重复计算成本高的Monte-Carlo积分approximation的限制。我们提议通过低级别分解来学习绿函数,这导致了一种新的架构,可以通过预测和Monte-Carlo样本进行独立学习和积分近似。通过实验,我们发现提议的方法可以比MOD-Net减少计算时间,同时与PINNs和MOD-Net的准确率相似。

Dynamic ensemble selection based on Deep Neural Network Uncertainty Estimation for Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2308.00346
  • repo_url: None
  • paper_authors: Ruoxi Qin, Linyuan Wang, Xuehui Du, Xingyuan Chen, Bin Yan
  • for: 提高模型对抗性和稳定性
  • methods: 动态ensemble选择技术, Dirichlet分布和多模型多态性
  • results: 无需损害准确率,提高了模型对抗性和稳定性
    Abstract The deep neural network has attained significant efficiency in image recognition. However, it has vulnerable recognition robustness under extensive data uncertainty in practical applications. The uncertainty is attributed to the inevitable ambient noise and, more importantly, the possible adversarial attack. Dynamic methods can effectively improve the defense initiative in the arms race of attack and defense of adversarial examples. Different from the previous dynamic method depend on input or decision, this work explore the dynamic attributes in model level through dynamic ensemble selection technology to further protect the model from white-box attacks and improve the robustness. Specifically, in training phase the Dirichlet distribution is apply as prior of sub-models' predictive distribution, and the diversity constraint in parameter space is introduced under the lightweight sub-models to construct alternative ensembel model spaces. In test phase, the certain sub-models are dynamically selected based on their rank of uncertainty value for the final prediction to ensure the majority accurate principle in ensemble robustness and accuracy. Compared with the previous dynamic method and staic adversarial traning model, the presented approach can achieve significant robustness results without damaging accuracy by combining dynamics and diversity property.
    摘要 深度神经网络在图像识别方面已经达到了显著的效率。然而,它在实际应用中面临着广泛的数据不确定性,这种不确定性来自于不可避免的环境噪音以及可能的敌意攻击。动态方法可以有效地提高模型的防御机制,在攻击者和防御者之间的战争中,动态方法可以帮助模型更好地应对敌意攻击。在这个工作中,我们explore了模型层次的动态特性,通过动态ensemble选择技术来进一步保护模型免受白盒攻击,提高模型的 Robustness。在训练阶段,我们使用Dirichlet分布作为子模型预测分布的先验,并在轻量级子模型中引入多样性约束来构建多个可 ensemble模型空间。在测试阶段,我们 dynamically选择一些最高的uncertainty值来确定最终预测的子模型,以保证ensemble robustness和准确性的多数原则。与过去的动态方法和静态敌意训练模型相比,我们的方法可以无需损害准确性的情况下获得显著的Robustness。通过结合动态和多样性特性,我们的方法可以提供更好的防御机制,使得模型在实际应用中更加可靠。

Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches

  • paper_url: http://arxiv.org/abs/2308.00344
  • repo_url: https://github.com/imrclab/flying_adversarial_patch
  • paper_authors: Pia Hanfeld, Khaled Wahba, Marina M. -C. Höhne, Michael Bussmann, Wolfgang Hönig
  • for: 这 paper 是关于 autonomous flying robots 的 deep learning 模型如何受到攻击的研究。
  • methods: 这 paper 使用了多种攻击方法,包括 computed adversarial patches 和 novel attack policy。
  • results: 这 paper 的结果表明,使用这些攻击方法可以 manipulate autonomous flying robots 的 neural network 预测,并且可以在Physical flights 中实现 robot 的 kidnapping。Here’s the full translation in Simplified Chinese:
  • for: 这 paper 是关于 autonomous flying robots 的 deep learning 模型如何受到攻击的研究。
  • methods: 这 paper 使用了多种攻击方法,包括 computed adversarial patches 和 novel attack policy。
  • results: 这 paper 的结果表明,使用这些攻击方法可以 manipulate autonomous flying robots 的 neural network 预测,并且可以在Physical flights 中实现 robot 的 kidnapping。I hope that helps! Let me know if you have any further questions.
    Abstract Autonomous flying robots, such as multirotors, often rely on deep learning models that makes predictions based on a camera image, e.g. for pose estimation. These models can predict surprising results if applied to input images outside the training domain. This fault can be exploited by adversarial attacks, for example, by computing small images, so-called adversarial patches, that can be placed in the environment to manipulate the neural network's prediction. We introduce flying adversarial patches, where multiple images are mounted on at least one other flying robot and therefore can be placed anywhere in the field of view of a victim multirotor. By introducing the attacker robots, the system is extended to an adversarial multi-robot system. For an effective attack, we compare three methods that simultaneously optimize multiple adversarial patches and their position in the input image. We show that our methods scale well with the number of adversarial patches. Moreover, we demonstrate physical flights with two robots, where we employ a novel attack policy that uses the computed adversarial patches to kidnap a robot that was supposed to follow a human.
    摘要 自主飞行机器人,如多Rotor,常用深度学习模型,以图像为输入,例如pose estimation。这些模型可能会出现意外的结果,如果应用于输入图像外的训练领域。这种错误可以被恶意攻击,例如计算小图像,称为恶意补丁,并将其置入环境中,以操纵神经网络的预测。我们引入飞行恶意补丁,其中至少有一个飞行机器人携带多个图像,因此可以在犯罪者多rotor的视场中随意地放置。通过引入攻击机器人,系统变成了一个敌对多机器人系统。为实现攻击,我们比较了三种方法,同时优化多个恶意补丁和其位置在输入图像中。我们发现我们的方法可以很好地扩展到多个恶意补丁。此外,我们还实现了实际的飞行试验,使用计算出来的恶意补丁,将一个预定要跟随人类的机器人绑架。

Monitoring Algorithmic Fairness under Partial Observations

  • paper_url: http://arxiv.org/abs/2308.00341
  • repo_url: None
  • paper_authors: Thomas A. Henzinger, Konstantin Kueffner, Kaushik Mallik
  • for: 这篇论文目的是监控 deployed 系统的algorithmic fairness,以 guaranteee that AI和机器学习软件在做出决策时保持公正和无偏见。
  • methods: 这篇论文使用 runtime verification techniques,可以在 deployed 系统上监控algorithmic fairness。这些监控技术假设系统的全 observability,并且只能监控已经指定的公正性特性,即 arithmetic expressions over the probabilities of different events。
  • results: 这篇论文延伸了 fairness monitoring 到 partially observed Markov chains (POMC) 和 specifications containing arithmetic expressions over the expected values of numerical functions on event sequences。这篇论文可以在这些系统上监控algorithmic fairness,并且可以在一个执行的系统上监控整个分布中的公正性。
    Abstract As AI and machine-learned software are used increasingly for making decisions that affect humans, it is imperative that they remain fair and unbiased in their decisions. To complement design-time bias mitigation measures, runtime verification techniques have been introduced recently to monitor the algorithmic fairness of deployed systems. Previous monitoring techniques assume full observability of the states of the (unknown) monitored system. Moreover, they can monitor only fairness properties that are specified as arithmetic expressions over the probabilities of different events. In this work, we extend fairness monitoring to systems modeled as partially observed Markov chains (POMC), and to specifications containing arithmetic expressions over the expected values of numerical functions on event sequences. The only assumptions we make are that the underlying POMC is aperiodic and starts in the stationary distribution, with a bound on its mixing time being known. These assumptions enable us to estimate a given property for the entire distribution of possible executions of the monitored POMC, by observing only a single execution. Our monitors observe a long run of the system and, after each new observation, output updated PAC-estimates of how fair or biased the system is. The monitors are computationally lightweight and, using a prototype implementation, we demonstrate their effectiveness on several real-world examples.
    摘要

Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.00331
  • repo_url: None
  • paper_authors: Yun Chen, Jiaping Xiao
  • for: 这篇论文是为了探索和搜寻任务中的合作多元机器人系统,以提高效率。
  • methods: 这篇论文使用深度强化学习算法来教育策略,并导入多阶段强化学习框架和curiosity模组,以增强代理人对无 visited 环境的探索。
  • results: 在模拟环境中的实验显示,我们的框架可以将多元机器人系统训练到 unknown 目标位置的搜寻和NAVIGATION,而exist的基准不能实现,且加速训练速度。
    Abstract Collaborative heterogeneous robot systems can greatly improve the efficiency of target search and navigation tasks. In this paper, we design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments. The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms. During the training process, if two robots are trained simultaneously, the rewards related to their collaboration may not be properly obtained. Hence, we introduce a multi-stage reinforcement learning framework and a curiosity module to encourage agents to explore unvisited environments. Experiments in simulation environments show that our framework can train the heterogeneous robot system to achieve the search and navigation with unknown target locations while existing baselines may not, and accelerate the training speed.
    摘要 将文本翻译成简化中文。<>合作多种机器人系统可以大幅提高目标搜寻和导航任务的效率。在这篇论文中,我们设计了一个多种机器人系统,包括一架UAV和一架UGV,用于搜寻和救援任务在未知环境中。系统可以在封闭的矿山环境中搜寻目标并导航到它们,通过深度强化学习算法学习的策略。在训练过程中,如果两个机器人同时训练,关于他们之间的合作的奖励可能不会正确获得。因此,我们提出了多stage强化学习框架和curiosity模块,以促使代理人探索未曾访问的环境。在模拟环境中的实验表明,我们的框架可以训练多种机器人系统,在目标位置未知的情况下实现搜寻和导航,而现有的基线可能无法达成,并且加速训练速度。

Threshold-aware Learning to Generate Feasible Solutions for Mixed Integer Programs

  • paper_url: http://arxiv.org/abs/2308.00327
  • repo_url: None
  • paper_authors: Taehyun Yoon, Jinwon Choi, Hyokun Yun, Sungbin Lim
  • for: 寻找一个高质量可行的解决方案,以应对 combinatorial optimization(CO)问题,并在有限时间内完成。
  • methods: 使用 Neural Diving(ND)方法,一种基于机器学习的方法,以生成混合整数程式中的部分零値变量分配。
  • results: 透过优化覆盖范围,实现将 machine learning 和整数程式的目标更加接近,并且透过内置学习来实现更好的性能。实验结果显示,使用深度神经网估算覆盖可以实现最佳性能,并且在 NeurIPS ML4CO 数据集上表现出色,特别是在工作负载分配数据集上,实现了最佳性能,优化差值为 0.45%,与 SCIP 相比,提高了十倍。
    Abstract Finding a high-quality feasible solution to a combinatorial optimization (CO) problem in a limited time is challenging due to its discrete nature. Recently, there has been an increasing number of machine learning (ML) methods for addressing CO problems. Neural diving (ND) is one of the learning-based approaches to generating partial discrete variable assignments in Mixed Integer Programs (MIP), a framework for modeling CO problems. However, a major drawback of ND is a large discrepancy between the ML and MIP objectives, i.e., variable value classification accuracy over primal bound. Our study investigates that a specific range of variable assignment rates (coverage) yields high-quality feasible solutions, where we suggest optimizing the coverage bridges the gap between the learning and MIP objectives. Consequently, we introduce a post-hoc method and a learning-based approach for optimizing the coverage. A key idea of our approach is to jointly learn to restrict the coverage search space and to predict the coverage in the learned search space. Experimental results demonstrate that learning a deep neural network to estimate the coverage for finding high-quality feasible solutions achieves state-of-the-art performance in NeurIPS ML4CO datasets. In particular, our method shows outstanding performance in the workload apportionment dataset, achieving the optimality gap of 0.45%, a ten-fold improvement over SCIP within the one-minute time limit.
    摘要 寻找一个高质量可行的解决方案 для combinatorial optimization(CO)问题在有限时间内是挑战的,因为它的特点是离散的。在最近,机器学习(ML)方法在Addressing CO problems的问题上增加了。Neural diving(ND)是一种学习基于方法,用于在混合整数程序(MIP)中生成部分离散变量分配。然而,ND的一个主要缺点是ML和MIP目标之间的差距,即变量值分类准确率 над primal bound。我们的研究发现,在特定的变量分配率(coverage)范围内,可以获得高质量可行的解决方案,我们建议优化coverage来跨越这两个目标之间的差距。因此,我们引入了一种后续方法和一种学习基于的方法来优化coverage。我们的方法的关键思想是同时学习限制coverage搜索空间和预测coverage在学习的搜索空间中。实验结果表明,通过学习深度神经网络来估计coverage以找到高质量可行的解决方案,可以在NeurIPS ML4CO数据集中实现状态计算机科学技术最佳性能。尤其是在分配工作量数据集中,我们的方法实现了0.45%的优化率,胜过SCIP在一分钟时限内的十倍提高。

Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

  • paper_url: http://arxiv.org/abs/2308.02531
  • repo_url: None
  • paper_authors: Jiuyang Zhou, Hong Zhu, Xingping Wang
  • for: 这个论文的目的是提出一种基于Transformer的多VOICE音乐生成模型,以便更好地模型音乐的结构和关系。
  • methods: 这个模型使用了相对位置注意力来更好地模型长距离音符之间的关系,并提出了一种适合多VOICE音乐生成的音乐表示方式。
  • results: 实验表明,这个模型的性能超过了之前的最佳值4.06%,并且可以根据输入进行不同的风格化音乐生成。
    Abstract Polyphonic music generation is still a challenge direction due to its correct between generating melody and harmony. Most of the previous studies used RNN-based models. However, the RNN-based models are hard to establish the relationship between long-distance notes. In this paper, we propose a polyphonic music generation neural network named Choir Transformer[ https://github.com/Zjy0401/choir-transformer], with relative positional attention to better model the structure of music. We also proposed a music representation suitable for polyphonic music generation. The performance of Choir Transformer surpasses the previous state-of-the-art accuracy of 4.06%. We also measures the harmony metrics of polyphonic music. Experiments show that the harmony metrics are close to the music of Bach. In practical application, the generated melody and rhythm can be adjusted according to the specified input, with different styles of music like folk music or pop music and so on.
    摘要 <> traduced the text into Simplified Chinese.<>复音乐生成仍然是一个挑战方向,因为它们需要正确地生成旋律和和谐。大多数前一代的研究使用RNN型模型。然而,RNN型模型对于距离较长的当地谱进行建立关系困难。在这篇论文中,我们提出了一个复音乐生成神经网络名为Choir Transformer[https://github.com/Zjy0401/choir-transformer],具有相对位置注意力以更好地模型音乐结构。我们还提出了适合复音乐生成的音乐表示。Choir Transformer的表现超过了过去的州立准确率4.06%。我们还测量了复音乐中的和谐指标,实验结果显示和谐指标与巴赫的音乐很相似。在实际应用中,生成的旋律和节奏可以根据输入的特定要求进行调整,包括不同的音乐风格,如民族音乐或流行音乐等。

Pixel to policy: DQN Encoders for within & cross-game reinforcement learning

  • paper_url: http://arxiv.org/abs/2308.00318
  • repo_url: None
  • paper_authors: Ashrya Agrawal, Priyanshi Shah, Sourabh Prakash
  • for: 本研究探讨了基于增强学习的游戏撸抓机器人的开发,以及如何通过知识转移来提高RL性能。
  • methods: 本研究使用了基于深度强化学习的DQN模型,并在不同的游戏环境中进行了训练。同时,研究还 comparing了从头开始训练和知识转移的RL模型,以及使用多个游戏环境训练一个通用游戏撸抓机器人的方法。
  • results: 研究结果显示,使用知识转移的DQN模型可以在不同的游戏环境中达到优秀的性能,其中 mean episode reward 为 46.16,even 超过了人类水平的表现。此外,在 Assault 和 Space Invader 环境中,模型的 achieved mean rewards 分别为 533.42 和 402.17,表现非常出色。
    Abstract Reinforcement Learning can be applied to various tasks, and environments. Many of these environments have a similar shared structure, which can be exploited to improve RL performance on other tasks. Transfer learning can be used to take advantage of this shared structure, by learning policies that are transferable across different tasks and environments and can lead to more efficient learning as well as improved performance on a wide range of tasks. This work explores as well as compares the performance between RL models being trained from the scratch and on different approaches of transfer learning. Additionally, the study explores the performance of a model trained on multiple game environments, with the goal of developing a universal game-playing agent as well as transfer learning a pre-trained encoder using DQN, and training it on the same game or a different game. Our DQN model achieves a mean episode reward of 46.16 which even beats the human-level performance with merely 20k episodes which is significantly lower than deepmind's 1M episodes. The achieved mean rewards of 533.42 and 402.17 on the Assault and Space Invader environments respectively, represent noteworthy performance on these challenging environments.
    摘要 � Reinforcement Learning 可以应用于多种任务和环境中。许多这些环境具有相似的共享结构,可以利用这些共享结构来提高RL性能。传输学习可以利用这些共享结构,学习可以在不同任务和环境中传输的策略,从而导致更高效的学习以及多种任务的改进表现。本研究探讨了RL模型从零开始学习和不同的传输学习方法的比较性能,同时还研究了使用多个游戏环境训练的模型,以达到开发通用游戏玩家代理以及传输学习预训练的DQN模型的目的。我们的DQN模型在46.16 episodes中获得了平均回合奖励,这even exceeds human-level performance,只需20k episodes,与深入的1M episodes相比,这是显著的提高。在Assault和Space Invader环境中,我们获得的平均奖励为533.42和402.17,这表示在这些复杂的环境中的出色表现。

Revolutionizing TCAD Simulations with Universal Device Encoding and Graph Attention Networks

  • paper_url: http://arxiv.org/abs/2308.11624
  • repo_url: None
  • paper_authors: Guangxi Fan, Kain Lu Low
  • for: 这篇论文的目的是提出一种基于人工智能(AI)和图表表示法的TCAD设备模拟中的设备编码方法。
  • methods: 该方法使用图表基本编码方案,考虑材料层和设备层嵌入,同时还引入了一种新的空间关系嵌入, inspirited by interpolate操作通常用于finite element分 meshing。
  • results: 该方法可以实现全面的数据驱动模拟,包括surrogate Poisson伪 simulate和current-voltage(IV)预测基于漂移扩散模型,使用一种新的图注意力网络,称为RelGAT。
    Abstract An innovative methodology that leverages artificial intelligence (AI) and graph representation for semiconductor device encoding in TCAD device simulation is proposed. A graph-based universal encoding scheme is presented that not only considers material-level and device-level embeddings, but also introduces a novel spatial relationship embedding inspired by interpolation operations typically used in finite element meshing. Universal physical laws from device simulations are leveraged for comprehensive data-driven modeling, which encompasses surrogate Poisson emulation and current-voltage (IV) prediction based on drift-diffusion model. Both are achieved using a novel graph attention network, referred to as RelGAT. Comprehensive technical details based on the device simulator Sentaurus TCAD are presented, empowering researchers to adopt the proposed AI-driven Electronic Design Automation (EDA) solution at the device level.
    摘要 一种创新的方法ология,利用人工智能(AI)和图表示法来实现半导体设备编码在TCAD设备仿真中,被提议。这种图基本 універсаル编码方案不仅考虑材料层和设备层嵌入,还引入了一种新的空间关系嵌入,取得自 interpolate 操作通常用于finite element meshing。这种新的空间关系嵌入可以帮助更好地捕捉设备的物理特性。此外,通过利用设备仿真中的物理法律,实现了全面的数据驱动模拟,包括surrogate Poisson 伪 simulate和current-voltage(IV)预测,基于漫步-扩散模型。这些技术性细节基于设备仿真器Sentaurus TCAD,以便研究人员可以在设备层采用这种AI驱动的电子设计自动化(EDA)解决方案。

Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting

  • paper_url: http://arxiv.org/abs/2308.02582
  • repo_url: None
  • paper_authors: Aseem Arora, Shabbirhussain Bhaisaheb, Harshit Nigam, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff
  • For: This paper focuses on improving the cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing using a novel prompt-based approach.* Methods: The proposed method involves offline sampling of a minimal set of few-shots from the training data to synthesize a fixed Generic Prompt (GP) with complete coverage of SQL clauses, operators, and functions, and maximal domain coverage within the allowed token length. The GP is then adapted to the target database domain (DA-GP) to handle cross-domain generalization, and a decomposed Least-To-Most-Prompting (LTMP-DA-GP) is used to handle cross-compositional generalization.* Results: The proposed approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task, and shows consistent performance improvement over the baseline GP across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of the prompt-based adapt and decompose approach.Here is the same information in Simplified Chinese:* For: 这篇论文关注提高cross-domain和cross-compositional generalization的文本到SQL semantics解析。* Methods: 提议方法包括在训练数据集中Offline采样一个最小的几个shot,以生成一个包含全部SQL语句、运算符和函数的固定Generic Prompt (GP),并在目标数据库领域内具有最大的领域覆盖。此外,还使用分解的Least-To-Most-Prompting (LTMP-DA-GP)来处理cross-compositional generalization。* Results: 提议方法在KaggleDBQA dataset上显示出了superior的性能,这个dataset是用来评估文本到SQL任务的通用性。此外,在不同的LLMs和数据库上,LTMP-DA-GP也显示了与GP相比的性能提高,这highlights the efficacy和model agnostic benefits of the prompt-based adapt和decomposeapproach。
    Abstract Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach.
    摘要 横跨域和构成层次的文本到SQL semantics解析是一个具有挑战性的任务。现有的大型自然语言模型(LLM)基本解决方案采用在执行时从训练集中检索一些几个示例来生成每个自然语言(NL)测试查询的时间。相比之下,我们提出了一个算法,它在线上采样了训练数据中的最小集合,涵盖了SQL子句、运算符和函数,并且在允许的字符串长度内实现了最大的域覆盖。这允许我们生成一个固定的生成器提示(GP),其中包含一个多样化的示例集,通用于NL测试查询,以避免高昂的测试时间示例检索。我们进一步自动适应了GP到目标数据库域(DA-GP),以更好地处理跨域泛化;并使用分解的最小到最多提示(LTMP-DA-GP)来处理跨组件泛化。这个synthesis任务是一个离线任务,需要在新的数据库添加时进行一次性的少量人工干预。我们的方法在KaggleDBQA数据集上表现出色,并且在LLMs和数据库之间 consistently 表现出 GP 和 LTMP-DA-GP 之间的性能改进,这 highlights 了我们的提示基于适应和分解方法的效果和模型免疫性。

Making the V in Text-VQA Matter

  • paper_url: http://arxiv.org/abs/2308.00295
  • repo_url: None
  • paper_authors: Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty
  • for: 提高文本检查问题(Text-based VQA)的答案质量,解决图像文本关系理解不足问题。
  • methods: 利用VQA数据集作为外部知识,学习图像特征和文本特征之间的关系,提高文本检查问题的答案质量。
  • results: 通过组合TextVQA数据集和VQA数据集,提高文本检查问题的答案质量,并在不同数据集上进行质量评估和比较。
    Abstract Text-based VQA aims at answering questions by reading the text present in the images. It requires a large amount of scene-text relationship understanding compared to the VQA task. Recent studies have shown that the question-answer pairs in the dataset are more focused on the text present in the image but less importance is given to visual features and some questions do not require understanding the image. The models trained on this dataset predict biased answers due to the lack of understanding of visual context. For example, in questions like "What is written on the signboard?", the answer predicted by the model is always "STOP" which makes the model to ignore the image. To address these issues, we propose a method to learn visual features (making V matter in TextVQA) along with the OCR features and question features using VQA dataset as external knowledge for Text-based VQA. Specifically, we combine the TextVQA dataset and VQA dataset and train the model on this combined dataset. Such a simple, yet effective approach increases the understanding and correlation between the image features and text present in the image, which helps in the better answering of questions. We further test the model on different datasets and compare their qualitative and quantitative results.
    摘要 文本基于VQA目标解答问题,通过阅读图像中的文本来回答问题。与普通的VQA任务相比,这种任务需要更多的场景文本关系理解。现有研究表明,数据集中的问题对应答案更加注重图像中的文本,而视觉特征相对较少,一些问题甚至不需要理解图像。因此,模型在这些数据集上训练时会预测偏向的答案,如果只凭文本内容,则忽略图像。为了解决这些问题,我们提议一种方法,通过与VQA数据集的 externos知识来学习图像特征(使V在TextVQA中变得重要),同时学习文本和问题特征。具体来说,我们将TextVQA数据集和VQA数据集组合在一起,并将模型在这个组合数据集上训练。这种简单而有效的方法可以增加图像特征和文本中的关系,从而帮助模型更好地回答问题。我们还对不同的数据集进行测试,并比较其质量和量化结果。

Gated Driver Attention Predictor

  • paper_url: http://arxiv.org/abs/2308.02530
  • repo_url: https://github.com/jwfangit/gate-dap
  • paper_authors: Tianci Zhao, Xue Bai, Jianwu Fang, Jianru Xue
  • for: 预测司机注意力,以提高驾驶任务理解和安全预测。
  • methods: 使用网络连接闭合机制,学习不同空间、时间和信息类型在驾驶场景中的重要性。
  • results: 在DADA-2000和BDDA数据集上证明提出方法的优越性,与现有方法进行比较。
    Abstract Driver attention prediction implies the intention understanding of where the driver intends to go and what object the driver concerned about, which commonly provides a driving task-guided traffic scene understanding. Some recent works explore driver attention prediction in critical or accident scenarios and find a positive role in helping accident prediction, while the promotion ability is constrained by the prediction accuracy of driver attention maps. In this work, we explore the network connection gating mechanism for driver attention prediction (Gate-DAP). Gate-DAP aims to learn the importance of different spatial, temporal, and modality information in driving scenarios with various road types, occasions, and light and weather conditions. The network connection gating in Gate-DAP consists of a spatial encoding network gating, long-short-term memory network gating, and information type gating modules. Each connection gating operation is plug-and-play and can be flexibly assembled, which makes the architecture of Gate-DAP transparent for evaluating different spatial, temporal, and information types for driver attention prediction. Evaluations on DADA-2000 and BDDA datasets verify the superiority of the proposed method with the comparison with state-of-the-art approaches. The code is available on https://github.com/JWFangit/Gate-DAP.
    摘要 driver attention prediction 表示司机的意图理解,包括他们想要去哪里和关注哪些对象,通常提供了驾驶任务指导的交通场景理解。一些最近的工作研究了在关键或事故情况下的司机注意力预测,并发现它可以帮助预测事故,但是预测精度的限制了推广能力。在这种情况下,我们研究了网络连接锁定机制 для司机注意力预测(Gate-DAP)。Gate-DAP的目标是在不同的道路类型、场合、照明和天气条件下学习不同的空间、时间和类型信息在驾驶场景中的重要性。Gate-DAP网络连接锁定包括空间编码网络锁定、长短时间记忆网络锁定和信息类型锁定模块。每个连接锁定操作都是可拔取的,可以灵活组合,这使得Gate-DAP的架构可以透明地评估不同的空间、时间和信息类型在司机注意力预测中的重要性。评估于DADA-2000和BDDA数据集 verify了我们提出的方法的优越性,与现有方法进行比较。代码可以在https://github.com/JWFangit/Gate-DAP中找到。

Multi-Modality Multi-Loss Fusion Network

  • paper_url: http://arxiv.org/abs/2308.00264
  • repo_url: None
  • paper_authors: Zehui Wu, Ziwei Gong, Jaywon Koo, Julia Hirschberg
  • for: 本研究探讨了多模态特征选择和融合的优化方法,以提高情感识别。
  • methods: 研究使用了不同的融合方法,并对多模态融合网络中的多产业训练进行研究,发现了有用的发现 relate to subnet performance。
  • results: 我们的最佳模型在三个 dataset(CMU-MOSI、CMU-MOSEI 和 CH-SIMS)上 achieve state-of-the-art performance,并在大多数指标上超过其他方法。我们发现,在多模态特征上进行训练可以提高单模态测试,并基于数据Annotation schema设计融合方法可以提高模型性能。这些结果表明了一种优化特征选择和融合方法的路线图,以提高情感识别在神经网络中的性能。
    Abstract In this work we investigate the optimal selection and fusion of features across multiple modalities and combine these in a neural network to improve emotion detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network, identifying useful findings relating to subnet performance. Our best model achieves state-of-the-art performance for three datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS), and outperforms the other methods in most metrics. We have found that training on multimodal features improves single modality testing and designing fusion methods based on dataset annotation schema enhances model performance. These results suggest a roadmap towards an optimized feature selection and fusion approach for enhancing emotion detection in neural networks.
    摘要 在这项研究中,我们研究了多modalities之间的最佳选择和融合,并将其 integrate into a neural network to improve emotion detection。我们比较了不同的融合方法,并研究在多模态融合网络中的多loss训练的影响,发现了有用的发现关于子网络性能。我们的最佳模型在三个 datasets(CMU-MOSI、CMU-MOSEI 和 CH-SIMS)上达到了状态部署性能,并在大多数指标上超过了其他方法。我们发现,训练在多模态特征上提高了单模态测试,而基于数据集注释 schema 设计的融合方法可以提高模型性能。这些结果表明了一种优化特征选择和融合方法的道路,以提高神经网络中的情感检测。

LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

  • paper_url: http://arxiv.org/abs/2308.00255
  • repo_url: None
  • paper_authors: Guanyu Xu, Jiawei Hao, Li Shen, Han Hu, Yong Luo, Hui Lin, Jialie Shen
  • for: 这个论文的目的是提高资源有限的边缘设备上的视Transformers(ViTs)的部署和加速,以提供多媒体服务。
  • methods: 这篇论文使用了早期离开方法来加速推理,但是这些方法通常只适用于卷积神经网络(CNNs)和自然语言处理(NLP)领域中的模型。作者们系统地研究了在ViTs中使用早期离开方法的有效性,并发现了这些方法的缺点,例如内部分类器的不充分表示和深度内部分类器的限制。
  • results: 作者们提出了一个名为LGViT的早期离开框架,该框架包括多种类型的离开头,以实现效率和准确性之间的负荷。作者们还提出了一种两阶段训练方案,包括终端到终端训练和自我顾问,以便将全局和本地信息拼接在一起。实验结果表明,LGViT可以与约1.8倍的速度实现竞争性的性能。
    Abstract Recently, the efficient deployment and acceleration of powerful vision transformers (ViTs) on resource-limited edge devices for providing multimedia services have become attractive tasks. Although early exiting is a feasible solution for accelerating inference, most works focus on convolutional neural networks (CNNs) and transformer models in natural language processing (NLP).Moreover, the direct application of early exiting methods to ViTs may result in substantial performance degradation. To tackle this challenge, we systematically investigate the efficacy of early exiting in ViTs and point out that the insufficient feature representations in shallow internal classifiers and the limited ability to capture target semantic information in deep internal classifiers restrict the performance of these methods. We then propose an early exiting framework for general ViTs termed LGViT, which incorporates heterogeneous exiting heads, namely, local perception head and global aggregation head, to achieve an efficiency-accuracy trade-off. In particular, we develop a novel two-stage training scheme, including end-to-end training and self-distillation with the backbone frozen to generate early exiting ViTs, which facilitates the fusion of global and local information extracted by the two types of heads. We conduct extensive experiments using three popular ViT backbones on three vision datasets. Results demonstrate that our LGViT can achieve competitive performance with approximately 1.8 $\times$ speed-up.
    摘要 最近,用于提供多媒体服务的资源有限边缘设备上快速部署和加速强大视transformer(ViT)的任务已经变得非常吸引人。虽然早期离开是一种可行的解决方案,但大多数工作都集中在卷积神经网络(CNN)和自然语言处理(NLP)中。此外,直接将早期离开方法应用于ViT可能会导致显著性能下降。为解决这个挑战,我们系统地研究了ViT中早期离开的效果,并发现了内部分类器的不充分的特征表示和深度内部分类器的限制了这些方法的性能。我们 затем提出了一个通用ViT的早期离开框架,称为LGViT,该框架包括了多种exit head,包括本地感知头和全局聚合头,以实现效率准确之间的负面关系。具体来说,我们开发了一种新的两阶段训练方案,包括端到端训练和自我折衔练习,使得早期离开ViT可以同时捕捉全局和本地信息。我们在三个流行的ViT背景上进行了三个视觉数据集的广泛实验。结果表明,我们的LGViT可以实现与约1.8倍的速度提升。

EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning

  • paper_url: http://arxiv.org/abs/2308.00246
  • repo_url: None
  • paper_authors: Dustin Pulver, Prithila Angkan, Paul Hungler, Ali Etemad
  • for: 本研究的目的是开发一个基于电enzephalogram(EEG)的认知负载分类方法。
  • methods: 本研究使用了 transformer 架构,通过将情感和认知负载的训练数据整合,以进行认知负载分类。我们首先使用了自我监督隐藏数据的泛化学习,然后使用固定重量和精致调整的转移学习来进行下游认知负载分类。
  • results: 我们的实验结果显示,我们的提议方法可以实现优异的结果,并比传统单阶充分监督学习更好。此外,我们还进行了细部拆分和敏感度研究,以评估不同方面的影响。本研究对于情感 computing 领域的成长做出了贡献,并开启了跨领域转移学习的新领域。
    Abstract Cognitive load, the amount of mental effort required for task completion, plays an important role in performance and decision-making outcomes, making its classification and analysis essential in various sensitive domains. In this paper, we present a new solution for the classification of cognitive load using electroencephalogram (EEG). Our model uses a transformer architecture employing transfer learning between emotions and cognitive load. We pre-train our model using self-supervised masked autoencoding on emotion-related EEG datasets and use transfer learning with both frozen weights and fine-tuning to perform downstream cognitive load classification. To evaluate our method, we carry out a series of experiments utilizing two publicly available EEG-based emotion datasets, namely SEED and SEED-IV, for pre-training, while we use the CL-Drive dataset for downstream cognitive load classification. The results of our experiments show that our proposed approach achieves strong results and outperforms conventional single-stage fully supervised learning. Moreover, we perform detailed ablation and sensitivity studies to evaluate the impact of different aspects of our proposed solution. This research contributes to the growing body of literature in affective computing with a focus on cognitive load, and opens up new avenues for future research in the field of cross-domain transfer learning using self-supervised pre-training.
    摘要 它们的诠释:词语翻译:认知负担(Cognitive Load)在完成任务和决策过程中扮演着重要的角色,因此其分类和分析变得非常重要。在这篇论文中,我们提出了一种基于电энцефалографи(EEG)的认知负担分类方法。我们的模型采用了变换器架构,并使用了移植学习来连接情感和认知负担。我们在情感相关的EEG数据集上进行自主学习的掩码自动编码预训练,然后使用冻结权重和精度调整来执行下游认知负担分类。为了评估我们的方法,我们在两个公共可用的EEG基于情感数据集上进行预训练,而用CL-Drive数据集进行下游认知负担分类。实验结果表明,我们的提议方法实现了出色的结果,并超过了传统的单阶段完全监督学习。此外,我们进行了详细的减少和敏感性研究,以评估不同方面的影响。这些研究对情感计算领域的发展做出了贡献,并开启了跨领域自动学习使用自主预训练的新途径。

The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00245
  • repo_url: None
  • paper_authors: Haonan Li, Yu Hao, Yizhuo Zhai, Zhiyun Qian
  • for: This paper is written to explore the use of Large Language Models (LLMs) in assisting static analysis for identifying bugs in software systems.
  • methods: The paper proposes a fully automated agent called LLift, which interfaces with a static analysis tool and an LLM to overcome challenges in using LLMs for bug discovery.
  • results: The paper demonstrates the effectiveness of LLift in identifying potential use-before-initialization (UBI) bugs in a real-world scenario, with a high precision (50%) and recall rate (100%). Additionally, LLift identified 13 previously unknown UBI bugs in the Linux kernel.
    Abstract Static analysis is a widely used technique in software engineering for identifying and mitigating bugs. However, a significant hurdle lies in achieving a delicate balance between precision and scalability. Large Language Models (LLMs) offer a promising alternative, as recent advances demonstrate remarkable capabilities in comprehending, generating, and even debugging code. Yet, the logic of bugs can be complex and require sophisticated reasoning and a large analysis scope spanning multiple functions. Therefore, at this point, LLMs are better used in an assistive role to complement static analysis. In this paper, we take a deep dive into the open space of LLM-assisted static analysis, using use-before-initialization (UBI) bugs as a case study. To this end, we develop LLift, a fully automated agent that interfaces with both a static analysis tool and an LLM. By carefully designing the agent and the prompts, we are able to overcome a number of challenges, including bug-specific modeling, the large problem scope, the non-deterministic nature of LLMs, etc. Tested in a real-world scenario analyzing nearly a thousand potential UBI bugs produced by static analysis, LLift demonstrates an extremely potent capability, showcasing a high precision (50%) and recall rate (100%). It even identified 13 previously unknown UBI bugs in the Linux kernel. This research paves the way for new opportunities and methodologies in the use of LLMs for bug discovery in extensive, real-world datasets.
    摘要 Static 分析是软件工程中广泛使用的技术,用于发现和解决错误。然而,实现精准和可扩展性之间存在一定的挑战。大自然语言模型(LLM)提供了一个有前途的选择,因为最新的进步表明其在理解、生成和甚至调试代码方面具有惊人的能力。然而,错误的逻辑可能很复杂,需要卓越的推理和广泛的分析范围,因此在这点上,LLMs 更适合作为辅助工具来补充静态分析。在这篇论文中,我们深入探讨了使用 LLM 辅助静态分析的开放空间,使用 use-before-initialization(UBI)错误作为例子。为此,我们开发了 LLift,一个完全自动的代理人,它与静态分析工具和 LLM 集成。通过细心设计代理人和提示,我们成功解决了一些挑战,包括错误特定模型、大问题范围、非束定的 LLM 等。在实际场景中,LLift 对 nearly thousand 个 UBI 错误进行了分析,显示了非常高的精度(50%)和回归率(100%)。甚至找到了 13 个前不知道的 UBI 错误在 Linux 内核中。这项研究开创了新的机遇和方法在使用 LLM 进行错误发现的大规模、真实世界数据中。

ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks

  • paper_url: http://arxiv.org/abs/2308.01423
  • repo_url: https://github.com/yeonghun1675/chatmof
  • paper_authors: Yeonghun Kang, Jihan Kim
  • for: 这个论文是为了探讨和开发一个基于自然语言处理的金属有机框架预测和生成系统(ChatMOF)。
  • methods: 该系统使用了一个大规模语言模型(gpt-3.5-turbo),从文本输入中提取关键信息并提供相应的回答,从而消除了僵化的结构化查询的需求。系统由三个核心组件(代理、工具包和评估器)组成,实现了许多任务,包括数据检索、性能预测和结构生成。
  • results: 研究发现,使用大规模语言模型AI系统在材料科学领域可以提供优秀的预测和生成功能,并且具有可观的可Transformative potential。
    Abstract ChatMOF is an autonomous Artificial Intelligence (AI) system that is built to predict and generate of metal-organic frameworks (MOFs). By leveraging a large-scale language model (gpt-3.5-turbo), ChatMOF extracts key details from textual inputs and delivers appropriate responses, thus eliminating the necessity for rigid structured queries. The system is comprised of three core components (i.e. an agent, a toolkit, and an evaluator) and it forms a robust pipeline that manages a variety of tasks, including data retrieval, property prediction, and structure generation. The study further explores the merits and constraints of using large language models (LLMs) AI system in material sciences using and showcases its transformative potential for future advancements.
    摘要 chatMOF是一个自主的人工智能系统,用于预测和生成金刚烷金刚烷框架(MOFs)。通过利用大规模语言模型(gpt-3.5-turbo),chatMOF从文本输入中提取关键信息并提供相应的回答,因此消除了僵化的结构化查询的需求。系统由三个核心组件(即代理、工具包和评估器)组成,形成一个完整的管道,可以处理多种任务,包括数据 retrieve、属性预测和结构生成。研究还探讨了使用大语言模型(LLMs)AI系统在材料科学中的优劣和限制,并展示了其在未来发展中的Transformative潜力。

Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2308.00231
  • repo_url: None
  • paper_authors: Sadhana Lolla, Iaroslav Elistratov, Alejandro Perez, Elaheh Ahmadi, Daniela Rus, Alexander Amini
  • for: 本研究旨在提供一个扩展模型的框架,以提高深度神经网络(NN)的风险意识。
  • methods: 本研究使用的方法包括多种风险量化方法,包括 aleatoric uncertainty、epistemic uncertainty 和 bias estimation。这些方法可以独立或相互组合使用,以提供全面的风险意识。
  • results: 研究表明,使用 capsa 框架可以轻松地组合不同的风险量化方法,并在复杂的感知 datasets 上进行测试。 results 显示,capsa 框架可以提供全面的风险意识,并且可以轻松地扩展到不同的应用场景。
    Abstract The modern pervasiveness of large-scale deep neural networks (NNs) is driven by their extraordinary performance on complex problems but is also plagued by their sudden, unexpected, and often catastrophic failures, particularly on challenging scenarios. Existing algorithms that provide risk-awareness to NNs are complex and ad-hoc. Specifically, these methods require significant engineering changes, are often developed only for particular settings, and are not easily composable. Here we present capsa, a framework for extending models with risk-awareness. Capsa provides a methodology for quantifying multiple forms of risk and composing different algorithms together to quantify different risk metrics in parallel. We validate capsa by implementing state-of-the-art uncertainty estimation algorithms within the capsa framework and benchmarking them on complex perception datasets. We demonstrate capsa's ability to easily compose aleatoric uncertainty, epistemic uncertainty, and bias estimation together in a single procedure, and show how this approach provides a comprehensive awareness of NN risk.
    摘要 现代大规模深度神经网络(NN)的普遍性受到了它在复杂问题上的极高性能的推动,然而它也面临着突然、不可预期和有时候可致命的失败问题,尤其是在复杂的场景下。现有的风险意识提供方法是复杂且尝试性的,它们需要大量的工程改进,通常只适用于特定的设置,并且不易组合。我们现在提出了 capsa 框架,用于延展模型中的风险意识。capsa 提供了评估多种风险形式的方法和组合不同风险度量计算的能力。我们通过在 capsa 框架中实现现状的不确定性估计算算法,并将其分类 onto 复杂的感知数据集进行了验证。我们示出了 capsa 可以轻松地组合 aleatoric 不确定性、epistemic 不确定性和偏见估计 together 在单个过程中,并证明了这种方法可以提供全面的 NN 风险意识。

Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design

  • paper_url: http://arxiv.org/abs/2308.00227
  • repo_url: None
  • paper_authors: Jaechang Ko, John Ajibefun, Wei Yan
  • for: 该论文提出了一种新的建筑设计框架,利用生成AI工具包括ChatGPT和Veras,并结合参数化模型和建筑信息模型(BIM),以提高设计过程。
  • methods: 该研究利用了ChatGPT和生成AI在3D建筑设计中的潜在力,超出了其在文本和2D图像生成中的使用范围。
  • results: 提出的框架可以增强建筑师和AI之间的合作,快速探索设计想法,生成具有上下文敏感性和创新性的设计生成。通过将ChatGPT用于脚本编写和Veras用于生成设计想法,并与广泛使用的参数化模型和BIM工具集成,该框架为建筑师提供了直观和强大的方法来传达设计意图,从而提高设计效率、创新性和合作性。
    Abstract This paper introduces a new architectural design framework that utilizes generative AI tools including ChatGPT and Veras with parametric modeling and Building Information Modeling (BIM) to enhance the design process. The study experiments with the potential of ChatGPT and generative AI in 3D architectural design, extending beyond its use in text and 2D image generation. The proposed framework promotes collaboration between architects and AI, facilitating a quick exploration of design ideas and producing context-sensitive, creative design generation. By integrating ChatGPT for scripting and Veras for generating design ideas with widely used parametric modeling and BIM tools, the framework provides architects with an intuitive and powerful method to convey design intent, leading to more efficient, creative, and collaborative design processes.
    摘要 这篇论文提出了一种新的建筑设计框架,利用生成AI工具,包括ChatGPT和Veras,并与参数化建筑模型和建筑信息模型(BIM)结合,以提高设计过程。该研究对生成AI在3D建筑设计中的潜力进行了实验,超出了其在文本和2D图像生成领域的使用范围。提出的框架旨在促进建筑师和AI之间的合作,快速探索设计想法,生成受Contextsensitive的、创新的设计生成。通过将ChatGPT用于脚本和Veras用于生成设计想法,并与常用的参数化建筑模型和BIM工具结合,该框架为建筑师提供了直观和强大的方法,以达到更高效、更创新、更合作的设计过程。

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

  • paper_url: http://arxiv.org/abs/2308.00225
  • repo_url: None
  • paper_authors: Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov
  • for: 本研究旨在探讨大语言模型(LM)在受到人类反馈的指导下进行调整后,是否会受到更多的偏见影响。
  • methods: 本研究使用了多种方法来检测大语言模型中的偏见,包括权重调整、排名预测和人类反馈等。
  • results: 研究发现,经过受到人类反馈的调整后,大语言模型中的偏见呈现更加明显,尤其是在三种常见的偏见中,即套利效应、信心效应和信仰偏见中。这些偏见在不同的模型中显得更加明显,特别是在受到人类反馈的模型中,如Flant-T5、GPT3.5和GPT4等。这些发现提供了对instruction-tuned LMs中偏见的更深入的理解,这将对于开发更加可靠和无偏的语言模型是非常重要。
    Abstract Recent studies show that instruction tuning and learning from human feedback improve the abilities of large language models (LMs) dramatically. While these tuning methods can make models generate high-quality text, we conjecture that more implicit cognitive biases may arise in these fine-tuned models. Our work provides evidence that these fine-tuned models exhibit biases that were absent or less pronounced in their pretrained predecessors. We examine the extent of this phenomenon in three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. Our findings highlight the presence of these biases in various models, especially those that have undergone instruction tuning, such as Flan-T5, GPT3.5, and GPT4. This research constitutes a step toward comprehending cognitive biases in instruction-tuned LMs, which is crucial for the development of more reliable and unbiased language models.
    摘要 Recent studies show that 教程调整和学习人类反馈可以大幅提高大语言模型(LM)的能力。然而,我们 conjecture 这些调整方法可能会使模型生成高质量的文本中潜藏更多的隐性认知偏见。我们的研究发现,经过调整后的模型具有先前缺 absent 或较弱的偏见。我们对三种认知偏见——杂谱效应、确定性效应和信念偏见——进行了调查,这些偏见都是影响人类决策和思维的known。我们发现,特别是经过 instrucion tuning 的模型,如 Flan-T5、GPT3.5 和 GPT4,具有这些偏见。这项研究为了理解 instruction-tuned LM 中的认知偏见,是对更可靠和不偏的语言模型发展的重要一步。

Advancing Beyond Identification: Multi-bit Watermark for Language Models

  • paper_url: http://arxiv.org/abs/2308.00221
  • repo_url: None
  • paper_authors: KiYoon Yoo, Wonhyuk Ahn, Nojun Kwak
  • for: 这项研究旨在预防大语言模型的滥用,不仅是机器生成文本的识别。
  • methods: 该研究提出了一种名为“多位色列标”(COLOR)的技术,在语言模型生成过程中嵌入可追溯的多位信息。
  • results: 验证性实验表明,COLOR可以在500个符号左右的中等长度文本中成功嵌入32位消息,准确率达91.9%。这项工作为抗language模型滥用提供了新的策略。
    Abstract This study aims to proactively tackle misuse of large language models beyond identification of machine-generated text. While existing methods focus on detection, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose "Multi-bit Watermark through Color-listing" (COLOR), embedding traceable multi-bit information during language model generation. Leveraging the benefits of zero-bit watermarking (Kirchenbauer et al., 2023a), COLOR enables extraction without model access, on-the-fly embedding, and maintains text quality, while allowing zero-bit detection all at the same time. Preliminary experiments demonstrates successful embedding of 32-bit messages with 91.9% accuracy in moderate-length texts ($\sim$500 tokens). This work advances strategies to counter language model misuse effectively.
    摘要

Deep Reinforcement Learning-Based Battery Conditioning Hierarchical V2G Coordination for Multi-Stakeholder Benefits

  • paper_url: http://arxiv.org/abs/2308.00218
  • repo_url: None
  • paper_authors: Yubao Zhang, Xin Chen, Yi Gu, Zhicheng Li, Wu Kai
  • for: 提高可再生能源利用率和电网稳定性,适用于大规模电动车充电 scheduling 策略。
  • methods: 基于深度强化学习(DRL)和证明权益算法,实现多方参与者协调。
  • results: 相比四种基准方案,提高可再生能源消耗率、缓和负荷波动、满足电动车充电需求,降低充电成本和电池衰竭。
    Abstract With the growing prevalence of electric vehicles (EVs) and advancements in EV electronics, vehicle-to-grid (V2G) techniques and large-scale scheduling strategies have emerged to promote renewable energy utilization and power grid stability. This study proposes a multi-stakeholder hierarchical V2G coordination based on deep reinforcement learning (DRL) and the Proof of Stake algorithm. Furthermore, the multi-stakeholders include the power grid, EV aggregators (EVAs), and users, and the proposed strategy can achieve multi-stakeholder benefits. On the grid side, load fluctuations and renewable energy consumption are considered, while on the EVA side, energy constraints and charging costs are considered. The three critical battery conditioning parameters of battery SOX are considered on the user side, including state of charge, state of power, and state of health. Compared with four typical baselines, the multi-stakeholder hierarchical coordination strategy can enhance renewable energy consumption, mitigate load fluctuations, meet the energy demands of EVA, and reduce charging costs and battery degradation under realistic operating conditions.
    摘要 On the grid side, the approach considers load fluctuations and renewable energy consumption, while on the EVA side, it considers energy constraints and charging costs. For users, the approach takes into account three critical battery conditioning parameters: state of charge, state of power, and state of health.Compared to four typical baselines, the multi-stakeholder hierarchical coordination strategy can increase renewable energy consumption, mitigate load fluctuations, meet the energy demands of EVAs, and reduce charging costs and battery degradation under realistic operating conditions.

Performance Evaluation of Swin Vision Transformer Model using Gradient Accumulation Optimization Technique

  • paper_url: http://arxiv.org/abs/2308.00197
  • repo_url: None
  • paper_authors: Sanad Aburass, Osama Dorgham
  • for: 评估 Swin Transformers 模型使用 gradient accumulation optimization(GAO)技术的性能和训练时间影响。
  • methods: 使用 GAO 技术对 Swin ViT 模型进行优化。
  • results: GAO 技术对 Swin ViT 模型的精度有显著下降,训练时间增加显著。这些结果表明在 Swin ViT 模型中使用 GAO 技术可能不太适用,需要谨慎使用。
    Abstract Vision Transformers (ViTs) have emerged as a promising approach for visual recognition tasks, revolutionizing the field by leveraging the power of transformer-based architectures. Among the various ViT models, Swin Transformers have gained considerable attention due to their hierarchical design and ability to capture both local and global visual features effectively. This paper evaluates the performance of Swin ViT model using gradient accumulation optimization (GAO) technique. We investigate the impact of gradient accumulation optimization technique on the model's accuracy and training time. Our experiments show that applying the GAO technique leads to a significant decrease in the accuracy of the Swin ViT model, compared to the standard Swin Transformer model. Moreover, we detect a significant increase in the training time of the Swin ViT model when GAO model is applied. These findings suggest that applying the GAO technique may not be suitable for the Swin ViT model, and concern should be undertaken when using GAO technique for other transformer-based models.
    摘要 视力变换器(ViT)技术在视觉识别任务中表现出色,革命化了该领域,通过利用变换器结构的力量。 Among the various ViT models, Swin Transformers have gained considerable attention due to their hierarchical design and ability to capture both local and global visual features effectively. This paper evaluates the performance of Swin ViT model using gradient accumulation optimization (GAO) technique. We investigate the impact of gradient accumulation optimization technique on the model's accuracy and training time. Our experiments show that applying the GAO technique leads to a significant decrease in the accuracy of the Swin ViT model, compared to the standard Swin Transformer model. Moreover, we detect a significant increase in the training time of the Swin ViT model when GAO model is applied. These findings suggest that applying the GAO technique may not be suitable for the Swin ViT model, and concern should be undertaken when using GAO technique for other transformer-based models.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese languages used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Analytical Techniques to Support Hospital Case Mix Planning

  • paper_url: http://arxiv.org/abs/2308.07323
  • repo_url: None
  • paper_authors: Robert L Burdett, Paul corry, David Cook, Prasad Yarlagadda
  • for: 本文旨在提供价值评估和决策支持工具,以支持医院资源评估和案例混合规划(CMP)方法。
  • methods: 本文提出了一种优化模型,用于分析修改现有案例混合的影响。该模型可以根据医院资源可用性水平的变化,对其他患者类型进行修改。此外,本文还提出了多目标决策技术,用于比较和评估竞争性案例混合解决方案。
  • results: 本文的提出的技术可以帮助医院管理者更好地了解医院资源的使用情况,并提供更多的情况情况。这些技术还可以帮助医院管理者根据医院资源的限制,选择最佳的案例混合方案。
    Abstract This article introduces analytical techniques and a decision support tool to support capacity assessment and case mix planning (CMP) approaches previously created for hospitals. First, an optimization model is proposed to analyse the impact of making a change to an existing case mix. This model identifies how other patient types should be altered proportionately to the changing levels of hospital resource availability. Then we propose multi-objective decision-making techniques to compare and critique competing case mix solutions obtained. The proposed techniques are embedded seamlessly within an Excel Visual Basic for Applications (VBA) personal decision support tool (PDST), for performing informative quantitative assessments of hospital capacity. The PDST reports informative metrics of difference and reports the impact of case mix modifications on the other types of patient present. The techniques developed in this article provide a bridge between theory and practice that is currently missing and provides further situational awareness around hospital capacity.
    摘要

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?

  • paper_url: http://arxiv.org/abs/2308.00189
  • repo_url: None
  • paper_authors: Ari Holtzman, Peter West, Luke Zettlemoyer
  • for: 本研究旨在解释语言模型完成多个任务的行为,以帮助未来的研究人员更好地理解和预测语言模型的行为。
  • methods: 本研究使用了系统性的分析方法,将语言模型的行为 decomposed into 多个类别,以便更好地理解cross-task表现。
  • results: 研究发现,语言模型的行为可以分为多个类别,包括表示、生成和逻辑等类别,这些类别之间存在着较强的相互关系。这些结果可以帮助未来的研究人员更好地理解和预测语言模型的行为,以便更好地应用语言模型。
    Abstract Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP and is reshaping how we interact with computers. What was once a scientific engineering discipline-in which building blocks are stacked one on top of the other-is arguably already a complex systems science, in which emergent behaviors are sought out to support previously unimagined use cases. Despite the ever increasing number of benchmarks that measure task performance, we lack explanations of what behaviors language models exhibit that allow them to complete these tasks in the first place. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance, to guide mechanistic explanations and help future-proof analytic research.
    摘要 使已经训练过的模型展现所需的行为,而不是不良的行为,已经重定义了自然语言处理(NLP)领域,并在我们与计算机之间的交互方式中发生了重大变革。以前,建构模型的工程学是一个科学工程领域,在其中建构块一个接一个排列在上面。然而,现在可能已经是一种复杂系统科学,在寻找emergent行为以支持前所未想到的用 caso。尽管任务性能的测试benchmark数量在不断增加,但我们仍然缺乏 Completing these tasks in the first place. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance, to guide mechanistic explanations and help future-proof analytic research.

Multicriteria Optimization Techniques for Understanding the Case Mix Landscape of a Hospital

  • paper_url: http://arxiv.org/abs/2308.07322
  • repo_url: None
  • paper_authors: Robert L Burdett, Paul Corry, Prasad Yarlagadda, David Cook, Sean Birgan
  • for: 本文针对医院内不同病例混合(case mix)的影响进行研究,以提高医院的质量和效率。
  • methods: 本文提出了一个改进了多 критери维优化(MCO)方法,并使用了平行减少条件法(ECM)和KD-Trees来生成更多的非主对称(Pareto优)病例混合。
  • results: 本文获得了一个更好的非主对称病例混合archive,并提出了一个适合决策支持工具(DST)来生成、检视、导航和查询这个archive。
    Abstract Various medical and surgical units operate in a typical hospital and to treat their patients these units compete for infrastructure like operating rooms (OR) and ward beds. How that competition is regulated affects the capacity and output of a hospital. This article considers the impact of treating different patient case mix (PCM) in a hospital. As each case mix has an economic consequence and a unique profile of hospital resource usage, this consideration is important. To better understand the case mix landscape and to identify those which are optimal from a capacity utilisation perspective, an improved multicriteria optimization (MCO) approach is proposed. As there are many patient types in a typical hospital, the task of generating an archive of non-dominated (i.e., Pareto optimal) case mix is computationally challenging. To generate a better archive, an improved parallelised epsilon constraint method (ECM) is introduced. Our parallel random corrective approach is significantly faster than prior methods and is not restricted to evaluating points on a structured uniform mesh. As such we can generate more solutions. The application of KD-Trees is another new contribution. We use them to perform proximity testing and to store the high dimensional Pareto frontier (PF). For generating, viewing, navigating, and querying an archive, the development of a suitable decision support tool (DST) is proposed and demonstrated.
    摘要 医院内的不同医疗和手术单位需要共享设备,如操作室(OR)和病房床位,这些设备的竞争会影响医院的负荷和产量。这篇文章考虑了医院内不同患者案例混合(PCM)的影响。每个案例混合都有经济影响和医院资源的唯一性使用特征,因此这种考虑是重要的。为更好地了解案例混合风景,并确定最佳的负荷利用情况,本文提出了改进的多 criterion 优化(MCO)方法。由于医院内通常有很多种类的患者,生成完整的非束定(i.e., Pareto优)案例混合的任务是计算复杂的。为了生成更好的案例混合,本文引入了改进的并行epsilon constraint方法(ECM)。我们的并行随机修正方法比之前的方法更快速,并不 Restricted to评估点structured uniform mesh。因此,我们可以生成更多的解。另外,我们使用KD-Trees来进行 proximity testing和存储高维度Pareto前沿(PF)。为生成、查看、导航和查询案例混合 archive,我们提出了一种适用的决策支持工具(DST),并进行了示例。

The Efficacy of Utility Functions for Multicriteria Hospital Case-Mix Planning

  • paper_url: http://arxiv.org/abs/2308.07321
  • repo_url: None
  • paper_authors: Robert L Burdett, Paul Corry, Prasad Yarlagadda, David Cook, Sean Birgan
  • for: 本研究的目的是发展一种新的医院案例混合规划方法,以满足不同决策者的 preference和需求。
  • methods: 本研究使用了 utility functions(UF)来表达不同决策者对输出的偏好和看法。 UF 被 scalarization 以生成一个量化的评价方法,以分配医院资源到不同的运营单元,并提供更好的容量分配和案例混合。
  • results: 研究发现,UF 方法可以评价医院不同决策者的评价标准,并且可以捕捉医院不同目标和对输出的不同需求。 此外,UF 方法还可以提供对输出的敏感分析,帮助医院管理者更好地理解和评价不同案例的需求和优先级。
    Abstract A new approach to perform hospital case-mix planning (CMP) is introduced in this article. Our multi-criteria approach utilises utility functions (UF) to articulate the preferences and standpoint of independent decision makers regarding outputs. The primary aim of this article is to test whether a utility functions method (UFM) based upon the scalarization of aforesaid UF is an appropriate quantitative technique to, i) distribute hospital resources to different operating units, and ii) provide a better capacity allocation and case mix. Our approach is motivated by the need to provide a method able to evaluate the trade-off between different stakeholders and objectives of hospitals. To the best of our knowledge, no such approach has been considered before in the literature. As we will later show, this idea addresses various technical limitations, weaknesses, and flaws in current CMP. The efficacy of the aforesaid approach is tested on a case study of a large tertiary hospital. Currently UF are not used by hospital managers, and real functions are unavailable, hence, 14 rational options are tested. Our exploratory analysis has provided important guidelines for the application of these UF. It indicates that these UF provide a valuable starting point for planners, managers, and executives of hospitals to impose their goals and aspirations. In conclusion, our approach may be better at identifying case mix that users want to treat and seems more capable of modelling the varying importance of different levels of output. Apart from finding desirable case mixes to consider, the approach can provide important insights via a sensitivity analysis of the parameters of each UF.
    摘要 本文介绍了一种新的医院案例混合规划(CMP)方法。我们的多 criterion 方法利用了用户函数(UF)来表达各独立决策者对输出的偏好和观点。本文的主要目标是测试 Whether a utility functions method(UFM)基于上述 UF 的权值函数是一种适当的量化技术,以分配医院资源到不同的运营部门,并提供更好的容量分配和案例混合。我们的方法是由医院的不同参与者和目标所驱动的,以解决现有 CMP 中的技术限制、弱点和缺陷。根据我们的研究,这种想法在现有 CMP 中未有所考虑。我们的案例研究表明,这种方法可以更好地评估医院的资源分配和案例混合,并提供关键的参与者和目标的指导。在本文中,我们使用了14种理性选项来测试 UF。我们的探索分析表明,这些 UF 提供了一个有价值的起点,可以帮助医院规划者、管理者和执行者实现他们的目标和 aspirations。结论是,我们的方法可能更好地认知用户想要治疗的案例混合,并且更能模拟不同输出水平的重要性。除了找到愿意考虑的案例混合外,我们的方法还可以提供重要的参与者和目标的指导,以及敏感分析每个 UF 的参数。

Attribution-Scores in Data Management and Explainable Machine Learning

  • paper_url: http://arxiv.org/abs/2308.00184
  • repo_url: None
  • paper_authors: Leopoldo Bertossi
  • for: 本研究探讨了使用实际 causality 定义责任分数的最新研究,用于解释数据库中的查询结果,以及机器学习中的分类模型的结果。
  • methods: 本研究使用了实际 causality 定义责任分数,并与数据库修复相连接,以获得数据库的一致性量化度量。在机器学习中,责任分数得到了正确扩展和解释。此外,本研究还研究了 Shap-score 的有效计算方法。
  • results: 本研究的结果表明,使用实际 causality 定义责任分数可以帮助解释数据库中的查询结果和机器学习中的分类结果。此外,本研究还提供了一种有效的 Shap-score 计算方法。
    Abstract We describe recent research on the use of actual causality in the definition of responsibility scores as explanations for query answers in databases, and for outcomes from classification models in machine learning. In the case of databases, useful connections with database repairs are illustrated and exploited. Repairs are also used to give a quantitative measure of the consistency of a database. For classification models, the responsibility score is properly extended and illustrated. The efficient computation of Shap-score is also analyzed and discussed. The emphasis is placed on work done by the author and collaborators.
    摘要 我们描述了最近的研究,把实际 causality 用于责任分数的定义,以解释查询结果存储在数据库中,以及机器学习模型中的结果。在数据库方面,我们 illustrate 了有用的连接,并利用了数据库修复。修复还可以给出数据库的一个量化度量。在机器学习模型方面,责任分数得到了正确的扩展和图示。我们还分析了计算 Shap-score 的高效方法。我们强调了作者和合作者的工作。

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

  • paper_url: http://arxiv.org/abs/2308.00177
  • repo_url: None
  • paper_authors: Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi
  • for: 本研究旨在检验是否可以通过无监督预训练提高学习排序(LTR)问题的性能,并与基本监督学习模型(GBDT)和其他非预训练模型进行比较。
  • methods: 本研究使用了一些简单的设计选择,包括 SimCLR-Rank,我们对 SimCLR(一种无监督预训练方法)进行了修改,以生成预训练的深度学习模型。
  • results: 研究结果表明,预训练模型可以将 GBDT 和其他非预训练模型Soundly 超越,特别是在 labels 数量相对较多的情况下。此外,预训练模型还可以在排序异常数据时表现更好的Robustness 性能。
    Abstract While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data. In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices--including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)--we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.
    摘要 深度学习(DL)模型在文本和图像领域是当前最佳性的,但它们没有一直Consistently exceeded Gradient Boosted Decision Trees(GBDTs)在表格学习到排名(LTR)问题上。大多数最近的DL模型在文本和图像任务中的性能提升都是通过预训练来实现,这些预训练使用了几个数量级的无标签数据。据我们所知,预训练没有被应用于LTR问题,这个问题通常会生成巨量的无标签数据。在这个工作中,我们研究了是否可以通过预训练提高LTR性能,并与GBDTs和其他非预训练模型进行比较。使用简单的设计选择(包括我们的SimCLR-Rank修改),我们生成了在标签数据被很多的情况下,深度学习模型与GBDTs和其他非预训练模型相比,具有显著的性能优势。我们还发现预训练模型通常在排名异常数据时也具有更好的robustness性。

  • paper_url: http://arxiv.org/abs/2308.00165
  • repo_url: None
  • paper_authors: Rohit Raj, V Susheela Devi
  • for: 法律判断预测系统的研究,旨在预测法律案件的结果基于案件的事实描述。
  • methods: 这些研究使用自然语言处理(NLP)技术预测法律判断结果基于事实描述。
  • results: 我们的方法在三个法律数据集上进行了广泛的实验,并显示了对抗攻击的显著改进,比前一代法律判断预测系统更加可靠。
    Abstract Legal judgment prediction is the task of predicting the outcome of court cases on a given text description of facts of cases. These tasks apply Natural Language Processing (NLP) techniques to predict legal judgment results based on facts. Recently, large-scale public datasets and NLP models have increased research in areas related to legal judgment prediction systems. For such systems to be practically helpful, they should be robust from adversarial attacks. Previous works mainly focus on making a neural legal judgement system; however, significantly less or no attention has been given to creating a robust Legal Judgement Prediction(LJP) system. We implemented adversarial attacks on early existing LJP systems and found that none of them could handle attacks. In this work, we proposed an approach for making robust LJP systems. Extensive experiments on three legal datasets show significant improvements in our approach over the state-of-the-art LJP system in handling adversarial attacks. To the best of our knowledge, we are the first to increase the robustness of early-existing LJP systems.
    摘要 法律判断预测是指根据案例事实文本预测法律判断结果。这些任务应用自然语言处理(NLP)技术来预测法律判断结果基于事实。最近,大规模公共数据集和NLP模型的提高了有关法律判断预测系统的研究。为了使这些系统在实际中有用,它们应该是Robust against adversarial attacks。以前的工作主要集中在建立神经法律判断系统;然而,Significantly less or no attention has been given to create a robust Legal Judgement Prediction(LJP) system。我们对早期的LJP系统进行了抗击攻击,并发现其无法抵抗攻击。在这种情况下,我们提出了一种方法来提高LJP系统的Robustness。广泛的实验表明,我们的方法在三个法律数据集上显著提高了对抗攻击的能力,与现有的LJP系统相比。到目前为止,我们是第一个提高早期LJP系统的Robustness。

Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is it possible to capture editing distance patterns from historical data?

  • paper_url: http://arxiv.org/abs/2308.00158
  • repo_url: None
  • paper_authors: Serge Gladkoff, Gleb Erofeev, Lifeng Han, Goran Nenadic
  • for: 本研究旨在检验现有大语言模型(LLM)是否可以用于精度评估翻译(TQE)任务,以及其能力。
  • methods: 我们使用了ChatGPT作为示例,将TQE作为二分类任务进行 fine-tuning。使用了英语到意大利语、德语、法语、日语、荷语、葡萄牙语、土耳其语和中文等八种语言对training corpora进行训练。
  • results: 我们的实验结果显示,通过API进行 fine-tuning的ChatGPT可以在预测翻译质量方面达到相对高的分数,例如英语-意大利语和英语-德语的分数分别是82.42%和83.69%。但是,模型准确率还有很大空间提高。
    Abstract Translation Quality Estimation (TQE) is an essential step before deploying the output translation into usage. TQE is also critical in assessing machine translation (MT) and human translation (HT) quality without seeing the reference translations. This work examines whether the state-of-the-art large language models (LLMs) can be fine-tuned for the TQE task and their capability. We take ChatGPT as one example and approach TQE as a binary classification task. Using \textbf{eight language pairs} including English to Italian, German, French, Japanese, Dutch, Portuguese, Turkish, and Chinese training corpora, our experimental results show that fine-tuned ChatGPT via its API can achieve a relatively high score on predicting translation quality, i.e. \textit{if the translation needs to be edited}. However, there is definitely much space to improve the model accuracy, e.g. they are 82.42\% and 83.69\% for English-Italian and English-German respectively using our experimental settings. English-Italiano bilingual Abstract is available in the paper.
    摘要 tranlation Quality Estimation (TQE) 是一个非常重要的步骤在部署输出翻译之前。 TQE 也是评估机器翻译 (MT) 和人工翻译 (HT) 质量的关键步骤,无需看到参考翻译。这项工作检验了现代大语言模型 (LLM) 是否可以用于 TQE 任务,以及其能力。我们使用 ChatGPT 作为一个例子,将 TQE 视为二分类问题。使用包括英语到意大利语、德语、法语、日语、荷兰语、葡萄牙语、土耳其语和中文的训练集,我们的实验结果显示,经过 ChatGPT 的 API 微调,可以达到一定的高分数,即确定翻译是否需要编辑。然而,模型准确率仍然有很大的提升空间,例如英语-意大利语和英语-德语的准确率分别为 82.42% 和 83.69%。英语-意大利语双语摘要在论文中可用。

Formally Explaining Neural Networks within Reactive Systems

  • paper_url: http://arxiv.org/abs/2308.00143
  • repo_url: None
  • paper_authors: Shahaf Bassan, Guy Amir, Davide Corsi, Idan Refaeli, Guy Katz
  • for: 这个论文的目的是解释深度神经网络(DNN)控制的反应系统中的行为,以便提高系统的透明度和可信度。
  • methods: 这篇论文提出了一种基于验证的可解释AI技术,可以帮助找到DNN控制系统中的输入特征,并且提供了有效的计算方法来减少搜索空间。
  • results: 这篇论文在两个popular benchmark上进行了评估,并证明了其方法可以效率地计算出最小和最小的解释,超过了现有技术的性能。此外,论文还表明了其方法生成的正式解释比非验证基于AI技术更可靠。
    Abstract Deep neural networks (DNNs) are increasingly being used as controllers in reactive systems. However, DNNs are highly opaque, which renders it difficult to explain and justify their actions. To mitigate this issue, there has been a surge of interest in explainable AI (XAI) techniques, capable of pinpointing the input features that caused the DNN to act as it did. Existing XAI techniques typically face two limitations: (i) they are heuristic, and do not provide formal guarantees that the explanations are correct; and (ii) they often apply to ``one-shot'' systems, where the DNN is invoked independently of past invocations, as opposed to reactive systems. Here, we begin bridging this gap, and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. We suggest methods for efficiently calculating succinct explanations, by exploiting the system's transition constraints in order to curtail the search space explored by the underlying verifier. We evaluate our approach on two popular benchmarks from the domain of automated navigation; and observe that our methods allow the efficient computation of minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.
    摘要 Here, we aim to bridge this gap and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. Our approach leverages the system's transition constraints to efficiently calculate succinct explanations. We evaluate our method on two popular benchmarks from the domain of automated navigation and observe that our approach can efficiently compute minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.我们使用的 DNN 是响应系统中的控制器,但 DNN 却具有高度透明性,使得解释和证明其行为具有困难。为解决这个问题,有一种强大的兴趣在解释 AI 技术(XAI)中,可以 pinpoint DNN 的输入特征,并且提供正式的 garanties。现有的 XAI 技术有两种限制:(i)它们是规则的,无法提供正式的 garanties,和(ii)它们通常适用于独立的一个执行,而不是响应系统。在这里,我们希望bridging这个差距,并提出一种基于 DNN 验证的 XAI 技术,用于理解多步、响应系统。我们利用系统的转移约束,以便高效地计算简短的解释。我们在两个popular的自动导航 benchmark 上评估了我们的方法,并观察到我们的方法可以高效地计算最小和最小的解释,与当前最佳性能相比较高。我们还证明了我们的方法生成的正式解释,比非验证基于 XAI 技术更可靠。

A Suite of Fairness Datasets for Tabular Classification

  • paper_url: http://arxiv.org/abs/2308.00133
  • repo_url: None
  • paper_authors: Martin Hirzel, Michael Feffer
  • for: 提高机器学习分类器的公平性
  • methods: 引入了一组函数来获取20个公平性数据集和相关的公平性metadata,以促进未来的公平性意识机器学习研究的更加严格的实验评估。
  • results: 未提出实验结果,主要是提供数据集和metadata供后续研究使用。
    Abstract There have been many papers with algorithms for improving fairness of machine-learning classifiers for tabular data. Unfortunately, most use only very few datasets for their experimental evaluation. We introduce a suite of functions for fetching 20 fairness datasets and providing associated fairness metadata. Hopefully, these will lead to more rigorous experimental evaluations in future fairness-aware machine learning research.
    摘要 有很多文献提出了机器学习分类器公平性的算法优化方法,但大多数只使用了很少的数据集进行实验评估。我们介绍了一个函数集,用于抓取20个公平性数据集和相关的公平性元数据。希望这些函数能够促进未来的公平性意识机器学习研究的更加严格的实验评估。Here's a breakdown of the translation:* "There have been many papers" becomes "有很多文献"* "with algorithms for improving fairness" becomes "提出了机器学习分类器公平性的算法优化方法"* "for tabular data" becomes " для标量数据"* "Unfortunately, most use only very few datasets" becomes "但大多数只使用了很少的数据集"* "for their experimental evaluation" becomes "进行实验评估"* "We introduce a suite of functions" becomes "我们介绍了一个函数集"* "for fetching 20 fairness datasets" becomes "用于抓取20个公平性数据集"* "and providing associated fairness metadata" becomes "并提供相关的公平性元数据"* "Hopefully, these will lead to more rigorous experimental evaluations" becomes "希望这些函数能够促进未来的公平性意识机器学习研究的更加严格的实验评估"

A Modular Ontology for MODS – Metadata Object Description Schema

  • paper_url: http://arxiv.org/abs/2308.00116
  • repo_url: None
  • paper_authors: Rushrukh Rayan, Cogan Shimizu, Heidi Sieverding, Pascal Hitzler
  • for: 该论文主要描述了Metadata Object Description Schema (MODS)的开发和改进。
  • methods: 该论文使用了Modular Ontology Design Methodology (MOMo)来设计了Modular MODS Ontology (MMODS-O),包含了所有MODS XML schema中的元素和属性。
  • results: 该论文通过对MODS XML schema进行修改和扩展,实现了更好的知识图数据模型化。
    Abstract The Metadata Object Description Schema (MODS) was developed to describe bibliographic concepts and metadata and is maintained by the Library of Congress. Its authoritative version is given as an XML schema based on an XML mindset which means that it has significant limitations for use in a knowledge graphs context. We have therefore developed the Modular MODS Ontology (MMODS-O) which incorporates all elements and attributes of the MODS XML schema. In designing the ontology, we adopt the recent Modular Ontology Design Methodology (MOMo) with the intention to strike a balance between modularity and quality ontology design on the one hand, and conservative backward compatibility with MODS on the other.
    摘要 MetadataObjectDescriptionSchema(MODS)是由美国国会图书馆开发的,用于描述文献元素和元数据。它的官方版本是基于XMLschema的XML思想,这意味着在知识图谱上使用它有一定的限制。我们因此开发了Modular MODS Ontology(MMODS-O),它包含了MODS XML schema中的所有元素和属性。在设计 ontology 时,我们采用了最近的Module Ontology Design Methodology(MOMo),以达到平衡模块性和质量ontology设计的目的,同时保持与MODS的保守兼容性。

Can A Single Human Supervise A Swarm of 100 Heterogeneous Robots?

  • paper_url: http://arxiv.org/abs/2308.00102
  • repo_url: None
  • paper_authors: Julie A. Adams, Joshua Hamell, Phillip Walker
  • for: 这个论文是为了检验人类单个人是否能够监督真正的多种机器人完成现实世界环境中的任务而写的。
  • methods: 这个论文使用了多种机器人群体的 Command and Control of Aggregate Swarm Tactics integrator 系统,并在美国陆军城市训练场地进行了实践。
  • results: 研究发现,人类单个人可以成功地监督100种不同机器人完成现实世界任务,但工作负担会经常超过额度。
    Abstract An open research question has been whether a single human can supervise a true heterogeneous swarm of robots completing tasks in real world environments. A general concern is whether or not the human's workload will be taxed to the breaking point. The Defense Advanced Research Projects Agency's OFFsensive Swarm-Enabled Tactics program's field exercises that occurred at U.S. Army urban training sites provided the opportunity to understand the impact of achieving such swarm deployments. The Command and Control of Aggregate Swarm Tactics integrator team's swarm commander users the heterogeneous robot swarm to conduct relevant missions. During the final OFFSET program field exercise, the team collected objective and subjective metrics related to teh swarm commander's human performance. A multi-dimensional workload algorithm that estimates overall workload based on five components of workload was used to analyze the results. While the swarm commander's workload estimate did cross the overload threshold frequently, the swarm commander was able to successfully complete the missions, often under challenging operational conditions. The presented results demonstrate that a single human can deploy a swarm of 100 heterogeneous robots to conduct real-world missions.
    摘要 一个打开的研究问题是可以让一个人监督真正多样化的机器人完成实际环境中的任务。一个普遍的问题是人类工作负担是否会被推到极限。美国国防高等研究计划署(DARPA)的OFFsensive Swarm-Enabled Tactics(OFFSET)项目的场地演练在美国陆军城市训练场地上进行了。 Command and Control of Aggregate Swarm Tactics(C2AST)团队的群组指挥官使用多样化机器人群组进行了相关的任务。在最后一次 OFFSET 项目场地演练中,团队收集了对群组指挥官的人类性能数据。使用多维度工作负担算法,分析结果表明,虽然群组指挥官的工作负担估计频繁超过了过载点,但群组指挥官仍然成功完成了任务,经常在具有挑战性的作战情况下。这些结果表明,一个人可以使用100个多样化机器人完成实际任务。

Towards Semantically Enriched Embeddings for Knowledge Graph Completion

  • paper_url: http://arxiv.org/abs/2308.00081
  • repo_url: None
  • paper_authors: Mehwish Alam, Frank van Harmelen, Maribel Acosta
  • for: 本文主要用于介绍基于嵌入的知识Graph(KG)完成算法的现状和未来发展方向。
  • methods: 本文详细介绍了基于KG的逻辑预测算法,包括推uctive和普适链预测算法,以及利用KG、语言模型(LLM)和描述逻辑axioms中的类型信息。
  • results: 本文结合了现有的KG完成算法和LLM技术,并对各种逻辑预测算法进行了分析和评价。结果表明,通过capturing semantics of description logic axioms, 可以提高KG的完成精度和效率。
    Abstract Embedding based Knowledge Graph (KG) Completion has gained much attention over the past few years. Most of the current algorithms consider a KG as a multidirectional labeled graph and lack the ability to capture the semantics underlying the schematic information. In a separate development, a vast amount of information has been captured within the Large Language Models (LLMs) which has revolutionized the field of Artificial Intelligence. KGs could benefit from these LLMs and vice versa. This vision paper discusses the existing algorithms for KG completion based on the variations for generating KG embeddings. It starts with discussing various KG completion algorithms such as transductive and inductive link prediction and entity type prediction algorithms. It then moves on to the algorithms utilizing type information within the KGs, LLMs, and finally to algorithms capturing the semantics represented in different description logic axioms. We conclude the paper with a critical reflection on the current state of work in the community and give recommendations for future directions.
    摘要 <>TRANSLATE_TEXT多年来,基于嵌入的知识图(KG)完成技术已经受到了广泛关注。现有大多数算法视知识图为多向标签图,而无法捕捉知识图中下发的 semantics。在另一方面,大量信息已经被 capture在大语言模型(LLM)中,这些模型已经革命化了人工智能领域。KG可以受益于这些 LLM,并且 LLM 也可以受益于 KG。本视点论文将讨论现有的 KG 完成算法,包括推uctive 和 inductive 链接预测算法,entity type 预测算法,以及利用类型信息在 KG、LLM 和描述逻辑axioms中的算法。我们将结束这篇论文 WITH 一个批判性的反思,并提出未来研究的方向。TRANSLATE_TEXT_END

A Novel Deep Learning based Model to Defend Network Intrusion Detection System against Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2308.00077
  • repo_url: None
  • paper_authors: Khushnaseeb Roshan, Aasim Zafar, Shiekh Burhan Ul Haque
  • for: 这项研究的目的是研究基于深度学习的网络入侵检测系统(NIDS)中的强大敌意攻击算法以及防御策略。
  • methods: 本研究使用了四种强大敌意攻击算法,即快速梯度签名方法(FGSM)、Jacobian Saliency Map Attack(JSMA)、Projected Gradient Descent(PGD)以及Carlini & Wagner(C&W)。为了增强NIDS模型的可您性,本研究还使用了对抗训练作为防御策略。
  • results: 研究结果分为三个阶段,即在攻击前(前期)、攻击后(后期)和防御后(后防)。使用加拿大安全计划2017(CICIDS-2017)数据集进行评估,并使用了各种性能指标如f1-score、准确率等来评估NIDS模型的性能。
    Abstract Network Intrusion Detection System (NIDS) is an essential tool in securing cyberspace from a variety of security risks and unknown cyberattacks. A number of solutions have been implemented for Machine Learning (ML), and Deep Learning (DL) based NIDS. However, all these solutions are vulnerable to adversarial attacks, in which the malicious actor tries to evade or fool the model by injecting adversarial perturbed examples into the system. The main aim of this research work is to study powerful adversarial attack algorithms and their defence method on DL-based NIDS. Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) are four powerful adversarial attack methods implemented against the NIDS. As a defence method, Adversarial Training is used to increase the robustness of the NIDS model. The results are summarized in three phases, i.e., 1) before the adversarial attack, 2) after the adversarial attack, and 3) after the adversarial defence. The Canadian Institute for Cybersecurity Intrusion Detection System 2017 (CICIDS-2017) dataset is used for evaluation purposes with various performance measurements like f1-score, accuracy etc.
    摘要 网络侵入检测系统(NIDS)是保护网络安全的重要工具,它可以检测到许多安全风险和未知的网络攻击。为了提高NIDS的检测精度,许多解决方案已经实施了机器学习(ML)和深度学习(DL)技术。然而,这些解决方案都受到敌意攻击的威胁,敌意攻击者可以通过投入敌意扰动的示例来诱导模型出错。本研究的主要目标是研究敌意攻击 Algorithm 和对 DL-based NIDS 的防御策略。本研究使用了 Fast Gradient Sign Method(FGSM)、Jacobian Saliency Map Attack(JSMA)、Projected Gradient Descent(PGD)和Carlini & Wagner(C&W)四种强大的敌意攻击方法,以及对 NIDS 模型的 Adversarial Training 防御策略。研究结果分为三个阶段:前 adversarial 攻击、后 adversarial 攻击和后 adversarial 防御。使用了 Canadian Institute for Cybersecurity Intrusion Detection System 2017(CICIDS-2017)数据集进行评估,并使用了各种性能指标,如 f1-score、准确率等。

Crowd Safety Manager: Towards Data-Driven Active Decision Support for Planning and Control of Crowd Events

  • paper_url: http://arxiv.org/abs/2308.00076
  • repo_url: None
  • paper_authors: Panchamy Krishnakumari, Sascha Hoogendoorn-Lanser, Jeroen Steenbakkers, Serge Hoogendoorn
  • for: 这个论文目的是提出一种新的技术和方法,用于提高群体管理的规划和运作阶段。这种方法包括创新的数据收集技术、数据 интеграción和可视化使用3D数字双胞虫,以及基于人工智能工具的风险识别。
  • methods: 这个论文使用了Bowtie模型,这是一种全面的风险评估和预测模型,可以评估和预测不同因素的影响,如交通流量和人群密度、天气条件、情绪和访客目的等。此外,这个模型还使用了大量实时数据源,如Resono,以获取访客数量和运动幅度的信息。
  • results: 这个论文的结果表明,使用XGBoost框架可以获得最准确的预测结果,但是certain locations may benefit from additional input data to further enhance prediction quality。不withstanding these limitations, this work contributes to a more effective crowd management system and opens avenues for further advancements in this critical field.
    Abstract This paper presents novel technology and methodology aimed at enhancing crowd management in both the planning and operational phases. The approach encompasses innovative data collection techniques, data integration, and visualization using a 3D Digital Twin, along with the incorporation of artificial intelligence (AI) tools for risk identification. The paper introduces the Bowtie model, a comprehensive framework designed to assess and predict risk levels. The model combines objective estimations and predictions, such as traffic flow operations and crowdedness levels, with various aggravating factors like weather conditions, sentiments, and the purpose of visitors, to evaluate the expected risk of incidents. The proposed framework is applied to the Crowd Safety Manager project in Scheveningen, where the DigiTwin is developed based on a wealth of real-time data sources. One noteworthy data source is Resono, offering insights into the number of visitors and their movements, leveraging a mobile phone panel of over 2 million users in the Netherlands. Particular attention is given to the left-hand side of the Bowtie, which includes state estimation, prediction, and forecasting. Notably, the focus is on generating multi-day ahead forecasts for event-planning purposes using Resono data. Advanced machine learning techniques, including the XGBoost framework, are compared, with XGBoost demonstrating the most accurate forecasts. The results indicate that the predictions are adequately accurate. However, certain locations may benefit from additional input data to further enhance prediction quality. Despite these limitations, this work contributes to a more effective crowd management system and opens avenues for further advancements in this critical field.
    摘要 Note: The translation is written in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

  • paper_url: http://arxiv.org/abs/2308.00031
  • repo_url: None
  • paper_authors: Giorgio Franceschelli, Mirco Musolesi
  • for: 这篇论文探讨了应用强化学习(RL)到生成人工智能(AI)的现状、机遇和未解决问题。
  • methods: 论文使用RL作为生成AI的一种代替方法,同时强调RL可以同时最大化一个目标函数并生成输出。
  • results: 论文结束时未提供具体的结果,但认为RL在生成AI中具有广泛的应用前景和挑战。
    Abstract Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area.
    摘要 优化人工智能(AI)是计算机科学领域最有前途的发展之一。同时,奖励学习(RL)已成为许多机器学习任务的非常成功的方法。在这篇评论中,我们讨论了RL应用于生成AI的当前状况、机会和开放研究问题。具体来说,我们会讨论以下三种应用:RL作为生成无需定义目标的备用方法;RL可以同时最大化目标函数并生成输出;RL可以嵌入不容易通过目标函数捕捉的欲求特征到生成过程中。我们在评论中结束,总结了这个有趣的新领域的机会和挑战。

DiVA-360: The Dynamic Visuo-Audio Dataset for Immersive Neural Fields

  • paper_url: http://arxiv.org/abs/2307.16897
  • repo_url: None
  • paper_authors: Cheng-You Lu, Peisen Zhou, Angela Xing, Chandradeep Pokhariya, Arnab Dey, Ishaan Shah, Rugved Mavidipalli, Dylan Hu, Andrew Comport, Kefan Chen, Srinath Sridhar
  • for: 这篇论文主要是为了提高神经场景的捕捉精度和可靠性。
  • methods: 这篇论文使用了新的硬件系统和数据采集技术,包括53个RGB摄像头和6个麦克风,以获得高速度和高分辨率的视觉和声音数据。
  • results: 这篇论文提供了一个大规模的实际场景数据集,包括46个动态场景、30个静态场景和95个静态对象,以及对这些场景的详细文本描述、前景后景分割mask和对象的3D姿态对应。
    Abstract Advances in neural fields are enabling high-fidelity capture of the shape and appearance of static and dynamic scenes. However, their capabilities lag behind those offered by representations such as pixels or meshes due to algorithmic challenges and the lack of large-scale real-world datasets. We address the dataset limitation with DiVA-360, a real-world 360 dynamic visual-audio dataset with synchronized multimodal visual, audio, and textual information about table-scale scenes. It contains 46 dynamic scenes, 30 static scenes, and 95 static objects spanning 11 categories captured using a new hardware system using 53 RGB cameras at 120 FPS and 6 microphones for a total of 8.6M image frames and 1360 s of dynamic data. We provide detailed text descriptions for all scenes, foreground-background segmentation masks, category-specific 3D pose alignment for static objects, as well as metrics for comparison. Our data, hardware and software, and code are available at https://diva360.github.io/.
    摘要

Predicting masked tokens in stochastic locations improves masked image modeling

  • paper_url: http://arxiv.org/abs/2308.00566
  • repo_url: None
  • paper_authors: Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun
  • for: 提高自然语言处理中的隐藏学习性能,特别是处理图像中的Semantic segmentation任务。
  • methods: 提出了一种名为FlexPredict的随机掩码位置Conditional模型,通过引入位置不确定性来帮助模型学习更加稳定的特征表示。
  • results: 在多种任务上提高了下游性能,比如与基eline相比,FlexPredict在ImageNet线性探测任务上提高了1.6%的性能,而在半supervised видеSegmentation任务上提高了2.5%的性能。
    Abstract Self-supervised learning is a promising paradigm in deep learning that enables learning from unlabeled data by constructing pretext tasks that require learning useful representations. In natural language processing, the dominant pretext task has been masked language modeling (MLM), while in computer vision there exists an equivalent called Masked Image Modeling (MIM). However, MIM is challenging because it requires predicting semantic content in accurate locations. E.g, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose FlexPredict, a stochastic model that addresses this challenge by incorporating location uncertainty into the model. Specifically, we condition the model on stochastic masked token positions to guide the model toward learning features that are more robust to location uncertainties. Our approach improves downstream performance on a range of tasks, e.g, compared to MIM baselines, FlexPredict boosts ImageNet linear probing by 1.6% with ViT-B and by 2.5% for semi-supervised video segmentation using ViT-L.
    摘要 自我指导学习是深度学习中一种有前途的方法,允许学习无标签数据中的有用表示。在自然语言处理中,主导性隐藏任务是覆盖语言模型(MLM),而在计算机视觉中则存在相应的equivalent,即隐藏图像模型(MIM)。然而,MIM具有很大的挑战,因为它需要准确预测 semantic content的位置。例如,给定一个含有缺失的狗图像,我们可以预测有尾巴,但是无法准确地确定其位置。在这种情况下,我们提出了FlexPredict,一种随机模型,以便在模型中包含位置不确定性。具体来说,我们将模型 Conditioned on stochastic masked token positions,以便导引模型学习更加鲁棒的特征。我们的方法可以提高下游任务的性能,例如,相比MIM基eline,FlexPredict在ImageNet直线探测中提高了ViT-B上的1.6%,并在使用ViT-L的半监督视频分割任务中提高了2.5%。

Foundational Models for Fault Diagnosis of Electrical Motors

  • paper_url: http://arxiv.org/abs/2307.16891
  • repo_url: None
  • paper_authors: Sriram Anbalagan, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan
  • for: 这个研究旨在提出一个基础模型,用于电动机故障诊断。
  • methods: 这个方法利用自动学习来建立一个神经网络底部,并将其精致化以达到特定目标。
  • results: 实验结果显示,这个方法可以在不同的故障情况和操作条件下,从少量训练数据中获得高于90%的分类精度。
    Abstract A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing studies for fault diagnosis, as they rely on fully labelled training data spanning all operating conditions and assume a consistent distribution. This is because obtaining a large number of labelled samples for several machines across different fault cases and operating scenarios may be unfeasible. In order to overcome the aforementioned limitations, this work proposes a framework to develop a foundational model for fault diagnosis of electrical motors. It involves building a neural network-based backbone to learn high-level features using self-supervised learning, and then fine-tuning the backbone to achieve specific objectives. The primary advantage of such an approach is that the backbone can be fine-tuned to achieve a wide variety of target tasks using very less amount of training data as compared to traditional supervised learning methodologies. The empirical evaluation demonstrates the effectiveness of the proposed approach by obtaining more than 90\% classification accuracy by fine-tuning the backbone not only across different types of fault scenarios or operating conditions, but also across different machines. This illustrates the promising potential of the proposed approach for cross-machine fault diagnosis tasks in real-world applications.
    摘要 多数最近的电机故障诊断研究假设训练和测试数据来自同一个分布。然而,实际应用中的数据分布可能会在不同的运行条件下发生变化。因此,这种假设限制了现有的研究实施,因为它们依赖于完全标注的训练数据,涵盖所有运行条件和假设一致。这可能是因为获得许多标注的样本是不可能的。为了突破以上限制,本研究提出了一个框架,用于开发电机故障诊断的基础模型。它包括使用神经网络为背景学习高级特征,然后精度地调整背景以实现特定目标。这种方法的优点是,可以通过非常少的训练数据来进行精度地调整背景,相比于传统的直接学习方法。实验证明,提出的方法可以在不同的故障enario和运行条件下达到90%以上的分类精度,并且可以在不同的机器上进行跨机器的故障诊断任务。这表明了提出的方法在实际应用中的扎实潜力。

Learning to Model the World with Language

  • paper_url: http://arxiv.org/abs/2308.01399
  • repo_url: https://github.com/microsoft/OpenKP
  • paper_authors: Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan
  • for: 本研究旨在建立一种可以理解多种语言、关联语言和视觉世界、基于语言的Future Prediction的自然语言处理 Agent。
  • methods: 该Agent使用了多种语言理解方法,包括语言模型、视觉模型和Future Prediction模型,以便从语言中提取信息、理解语言的含义和预测未来的情况。
  • results: 研究表明,通过使用多种语言理解方法和Future Prediction模型,该Agent可以在不同的环境中提高任务性能,例如在文本、图像和视频等多种语言中提高环境描述、游戏规则和指令等。
    Abstract To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them. While current agents learn to execute simple language instructions from task rewards, we aim to build agents that leverage diverse language that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that language helps agents predict the future: what will be observed, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We present Dynalang, an agent that learns a multimodal world model that predicts future text and image representations and learns to act from imagined model rollouts. Unlike traditional agents that use language only to predict actions, Dynalang acquires rich language understanding by using past language also to predict future language, video, and rewards. In addition to learning from online interaction in an environment, Dynalang can be pretrained on datasets of text, video, or both without actions or rewards. From using language hints in grid worlds to navigating photorealistic scans of homes, Dynalang utilizes diverse types of language to improve task performance, including environment descriptions, game rules, and instructions.
    摘要 (Current agents只能执行简单的语言指令,但我们想建立能够应用多种语言的agent。我们的关键思想是语言可以帮助agent预测未来:将会见到什么,世界会如何变化,哪些情况会得到奖励。这种观点将语言理解与未来预测作为一个强大的自我监督学习目标联系起来。我们介绍了Dynalang,一个可以预测未来文本和图像表示的多模态世界模型,并从想象的模型演示中学习行为。不同于传统的语言只用于预测行动的agent,Dynalang通过过去的语言来预测未来语言、视频和奖励,从而获得丰富的语言理解。除了在环境中学习外,Dynalang还可以在没有行动或奖励的情况下在文本、视频或两者之间进行预训练。从使用语言提示在格子世界中到探索高级扫描图像的家庭,Dynalang利用多种语言来提高任务性能,包括环境描述、游戏规则和指令。)

Discovering Adaptable Symbolic Algorithms from Scratch

  • paper_url: http://arxiv.org/abs/2307.16890
  • repo_url: None
  • paper_authors: Stephen Kelly, Daniel S. Park, Xingyou Song, Mitchell McIntire, Pranav Nashikkar, Ritam Guha, Wolfgang Banzhaf, Kalyanmoy Deb, Vishnu Naresh Boddeti, Jie Tan, Esteban Real
  • for: 这篇论文是为了开发一种能够快速适应环境变化的自主机器人控制策略而写的。
  • methods: 这篇论文提出了一种基于AutoML-Zero的方法,称为AutoRobotics-Zero(ARZ),可以从scratch找到适应环境变化的零基础策略。与传统的神经网络适应策略不同,ARZ可以构建一个完整的线性注册机器的控制算法,并且可以在运行中调整模型参数和推理算法来适应突然的环境变化。
  • results: 在一个真实的四肢机器人 simulate 环境中,这种方法可以生成安全的控制策略,以避免机器人当某个肢体突然失效时坠落。与两种常见的神经网络基础模型不同,ARZ可以在突然的环境变化下表现出更高的鲁棒性和可靠性。此外,作者还进行了一种 novel 和复杂的非站台控制任务的分析,结果证明了ARZ的优势。
    Abstract Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaption policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.
    摘要 自主机器人在真实世界中部署时需要快速适应环境变化的控制策略。为此,我们提议了AutoRobotics-Zero(ARZ)方法,基于AutoML-Zero方法,可以从头开始找到适应环境变化的零学习策略。与神经网络适应策略相比,ARZ可以构建一个完全可控的线性注册机器。我们演化出模块化策略,可以在运行时调整模型参数和推理算法,以适应突然发生的环境变化。我们在一个真实的四肢机器人模拟中进行了实验,演示了我们的方法可以避免单个肢体突然失效时的倒下。这是一个许多神经网络基础模型无法解决的问题。最后,我们对一个新的和复杂的非站立控制任务进行了详细分析,称为洗礼Cartpole。结果证明了我们的方法在突然环境变化中更加稳定和可控,可以构建简单、可读取的控制策略。

Image Synthesis under Limited Data: A Survey and Taxonomy

  • paper_url: http://arxiv.org/abs/2307.16879
  • repo_url: https://github.com/kobeshegu/awesome-few-shot-generation
  • paper_authors: Mengping Yang, Zhe Wang
  • for: 这篇论文旨在为有限数据情况下的图像生成提供系统aticreview和新的分类法,以帮助研究者更好地理解和进行相关研究。
  • methods: 本论文使用了大量的文献检索和分析,以及一些新的方法和技术,如批处理学习、自适应训练、抽象特征学习等,以解决有限数据情况下的图像生成问题。
  • results: 本论文通过对各种有限数据情况下的图像生成方法的比较和分析,提出了一种新的图像生成方法,并实现了在有限数据情况下的图像生成 task 的进一步改进。
    Abstract Deep generative models, which target reproducing the given data distribution to produce novel samples, have made unprecedented advancements in recent years. Their technical breakthroughs have enabled unparalleled quality in the synthesis of visual content. However, one critical prerequisite for their tremendous success is the availability of a sufficient number of training samples, which requires massive computation resources. When trained on limited data, generative models tend to suffer from severe performance deterioration due to overfitting and memorization. Accordingly, researchers have devoted considerable attention to develop novel models that are capable of generating plausible and diverse images from limited training data recently. Despite numerous efforts to enhance training stability and synthesis quality in the limited data scenarios, there is a lack of a systematic survey that provides 1) a clear problem definition, critical challenges, and taxonomy of various tasks; 2) an in-depth analysis on the pros, cons, and remain limitations of existing literature; as well as 3) a thorough discussion on the potential applications and future directions in the field of image synthesis under limited data. In order to fill this gap and provide a informative introduction to researchers who are new to this topic, this survey offers a comprehensive review and a novel taxonomy on the development of image synthesis under limited data. In particular, it covers the problem definition, requirements, main solutions, popular benchmarks, and remain challenges in a comprehensive and all-around manner.
    摘要 深度生成模型在最近几年内已经取得了无 precedent 的进步,它们的技术突破使得视觉内容的合成质量达到了前所未有的水平。然而,这些模型的巨大成功受到了充足的训练样本数据的限制。当训练数据少时,生成模型往往会因过拟合和记忆而表现严重的性能下降。因此,研究人员在最近几年里一直在努力开发新的模型,以便从有限的训练数据中生成可信度高、多样性强的图像。虽然有很多尝试以提高训练稳定性和合成质量在有限数据情况下,但是还没有一份系统性的报告,提供以下内容:1)明确的问题定义、重要挑战和多种任务的分类; 2)对现有文献的优缺点和限制进行深入分析; 以及3)将来应用和未来方向在有限数据情况下的图像合成领域的详细讨论。为了填补这个空白和为新手研究者提供一份有用的引导,本文提供了一份COMPREHENSIVE REVIEW和一种新的分类方法,涵盖问题定义、需求、主要解决方案、流行的标准 benchmarks 以及未解决的挑战。特别是,本文涵盖了问题定义、需求、主要解决方案、流行的标准 benchmarks 以及未解决的挑战在全面和协调的方式。

Contrastive Learning for API Aspect Analysis

  • paper_url: http://arxiv.org/abs/2307.16878
  • repo_url: https://github.com/disa-lab/contrastive-learning-api-aspect-ase2023
  • paper_authors: G. M. Shahariar, Tahmid Hasan, Anindya Iqbal, Gias Uddin
  • for: 本研究开发了一个新的方法 - CLAA - 用于 API 层级的问题探索,这个方法使用了训练了对照问题的 transformer 模型,并使用了监督的对照损失函数。
  • methods: 本研究使用了一个 benchmark Dataset 集合 developer 讨论数据库,从 Stack Overflow 上收集来的讨论数据,并与现有的 transformer 模型进行比较。
  • results: 我们的实验结果显示,对照学习可以对 transformer 模型在检测 Performance、安全、使用度和文档等方面的表现提供明显改善。此外,我们还进行了实际和开发者的研究,结果显示使用 ‘Stack Overflow + CLAA’ 可以增加了准确性和自信度 During API 选择。
    Abstract We present a novel approach - CLAA - for API aspect detection in API reviews that utilizes transformer models trained with a supervised contrastive loss objective function. We evaluate CLAA using performance and impact analysis. For performance analysis, we utilized a benchmark dataset on developer discussions collected from Stack Overflow and compare the results to those obtained using state-of-the-art transformer models. Our experiments show that contrastive learning can significantly improve the performance of transformer models in detecting aspects such as Performance, Security, Usability, and Documentation. For impact analysis, we performed empirical and developer study. On a randomly selected and manually labeled 200 online reviews, CLAA achieved 92% accuracy while the SOTA baseline achieved 81.5%. According to our developer study involving 10 participants, the use of 'Stack Overflow + CLAA' resulted in increased accuracy and confidence during API selection. Replication package: https://github.com/disa-lab/Contrastive-Learning-API-Aspect-ASE2023
    摘要 我们提出了一种新的方法 - CLAA - 用于API方面检测,该方法使用已经训练过监督对比损失函数的转换器模型。我们通过性能分析和影响分析来评估CLAA。在性能分析中,我们使用了Stack Overflow上的开发者讨论数据集,并与现有的转换器模型进行比较。我们的实验结果显示,对比学习可以明显提高转换器模型在检测性能、安全、可用性和Documentation等方面的性能。在影响分析中,我们进行了实验和开发者调查。对于200个在线评论中随机选择和手动标注的样本,CLAA达到了92%的准确率,而基eline达到了81.5%。根据我们的开发者调查,使用'Stack Overflow + CLAA'可以提高了选择API的准确率和自信心。可以在以下GitHub上下载我们的复现包:https://github.com/disa-lab/Contrastive-Learning-API-Aspect-ASE2023。

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

  • paper_url: http://arxiv.org/abs/2307.16877
  • repo_url: https://github.com/mcgill-nlp/instruct-qa
  • paper_authors: Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, Siva Reddy
  • for: 这个论文旨在探讨 Retriever-augmented instruction-following 模型在问答任务中的表现,以及 традицион的评价指标是否准确反映这些模型的表现。
  • methods: 这个论文使用了 Retriever-augmented instruction-following 模型,并通过自动和人工评价来评估这些模型的表现。
  • results: 研究发现,使用这些模型可以达到高度正确性,但是它们在保持提供的知识的方面存在困难,经常会产生假的回答。
    Abstract Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa
    摘要 “具有增强功能的寻勤模型是问答任务中信息寻找任务的可取代方案。它们可以通过在输入中附加检索到的文档来适应不同的信息领域和任务,无需进一步的调整。虽然模型的回答通常具有自然和流畅的特点,但是由于额外的verbosity,传统的问答评价指标如精准匹配(EM)和F1指标无法准确地衡量模型的表现。在这项工作中,我们研究了寻勤模型在三种信息寻找问答任务中的表现。我们使用自动和人工评价来评估这些模型,以两个维度评估它们的表现:1)满足用户的信息需求是否正确(正确性),2)是否按照提供的知识生成回答(忠实)。受人工评价和分析的指导,我们发现传统的评价指标对正确性和忠实性都存在缺陷。我们然后提出了基于符号重叠的metric和模型基于的metric,这些metric能够准确反映寻勤模型的表现。我们的分析显示,寻勤模型在正确性方面竞争力强,有时 même outperform了调整模型。然而,这些模型在保持提供的知识并且减少幻想的回答方面做出了差的表现。我们希望我们的工作能够激励更加全面的评估寻勤模型的问答能力。我们的代码和数据可以在https://github.com/McGill-NLP/instruct-qa上获取。”

Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives

  • paper_url: http://arxiv.org/abs/2307.16851
  • repo_url: None
  • paper_authors: Haoyang Liu, Maheep Chaudhary, Haohan Wang
  • for: 本研究旨在系统性地回顾过去十年内对机器学习可靠性的研究,包括可靠性、安全性、可解释性和公平性等方面。
  • methods: 本文使用了许多独立发展的方法来解决这些挑战,其中包括robustness、安全性、可解释性和公平性等方面的方法。这些方法都是基于Pearl的 causality hierarchy的。
  • results: 本文通过一种统一的语言和数学术语来连接这些方法,并发现它们之间的相似性。此外,本文还探讨了大型预训模型的可靠性,包括精度调整、参数效率调整、提示和人工反馈等方法。
    Abstract The trustworthiness of machine learning has emerged as a critical topic in the field, encompassing various applications and research areas such as robustness, security, interpretability, and fairness. The last decade saw the development of numerous methods addressing these challenges. In this survey, we systematically review these advancements from a data-centric perspective, highlighting the shortcomings of traditional empirical risk minimization (ERM) training in handling challenges posed by the data. Interestingly, we observe a convergence of these methods, despite being developed independently across trustworthy machine learning subfields. Pearl's hierarchy of causality offers a unifying framework for these techniques. Accordingly, this survey presents the background of trustworthy machine learning development using a unified set of concepts, connects this language to Pearl's causal hierarchy, and finally discusses methods explicitly inspired by causality literature. We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness, fostering a more cohesive understanding of the field. Further, we explore the trustworthiness of large pretrained models. After summarizing dominant techniques like fine-tuning, parameter-efficient fine-tuning, prompting, and reinforcement learning with human feedback, we draw connections between them and the standard ERM. This connection allows us to build upon the principled understanding of trustworthy methods, extending it to these new techniques in large pretrained models, paving the way for future methods. Existing methods under this perspective are also reviewed. Lastly, we offer a brief summary of the applications of these methods and discuss potential future aspects related to our survey. For more information, please visit http://trustai.one.
    摘要 machine learning 的可靠性已经成为Field中的一个关键话题,涵盖了多个应用和研究领域,如Robustness、安全性、可读性和公平性。过去十年内,有多种方法提出来解决这些挑战。在这份报告中,我们系统性地回顾这些进步,强调传统的Empirical Risk Minimization(ERM)训练在处理数据时的缺陷。意外地,我们发现这些方法即使独立地在可靠机器学习子领域中发展,它们却有趋同的特点。以珍珠的 causality 领域为基础,我们提出一种统一的语言和概念集,将这些方法连接起来,并且使用概率论语言来链接这些方法。我们提供一种统一的语言和概念集,将这些方法连接起来,并且使用概率论语言来链接这些方法。我们还探讨了大规模预训练模型的可靠性。我们将dominant technique如 fine-tuning、parameter-efficient fine-tuning、提示和人工回响学习简要介绍,并连接它们与标准 ERM。这种连接允许我们将可靠方法的原理性理解扩展到这些新技术上,开 up新的可能性。此外,我们还概述了这些方法的应用和未来方向。更多信息请参考

Decidable Fragments of LTLf Modulo Theories (Extended Version)

  • paper_url: http://arxiv.org/abs/2307.16840
  • repo_url: None
  • paper_authors: Luca Geatti, Alessandro Gianola, Nicola Gigante, Sarah Winkler
  • for: 这篇论文是关于Linear Temporal Logic Modulo Theories over Finite Traces(LTLfMT)的研究。
  • methods: 这篇论文使用了一种新的杜雷茨矩阵(tableau),用于解决LTLfMT的满足问题。
  • results: 这篇论文证明了一种新的缩矩阵规则,可以保证LTLfMT表达式的满足问题的决策结果是确定的,并且可以提供新的可 decidability 结果 для一些LTLfMT的衍生物。
    Abstract We study Linear Temporal Logic Modulo Theories over Finite Traces (LTLfMT), a recently introduced extension of LTL over finite traces (LTLf) where propositions are replaced by first-order formulas and where first-order variables referring to different time points can be compared. In general, LTLfMT was shown to be semi-decidable for any decidable first-order theory (e.g., linear arithmetics), with a tableau-based semi-decision procedure. In this paper we present a sound and complete pruning rule for the LTLfMT tableau. We show that for any LTLfMT formula that satisfies an abstract, semantic condition, that we call finite memory, the tableau augmented with the new rule is also guaranteed to terminate. Last but not least, this technique allows us to establish novel decidability results for the satisfiability of several fragments of LTLfMT, as well as to give new decidability proofs for classes that are already known.
    摘要 我们研究线性时间逻辑模式论(LTLfMT),是LTLf(线性时间逻辑)的一种扩展,其中提poseitions被 replaced by first-order式并允许不同时刻的first-order变量进行比较。在总的来说,LTLfMT已经被证明是任何可 decidable的first-order理论(例如线性算术)的semi-decidable,使用表格式的semi-decision过程。在这篇论文中,我们提出了LTLfMT表格中的一个有效和完整的剪切规则。我们证明,对任何满足抽象的semantic条件的LTLfMT公式,将表格与该规则相加将 garantuee the tableau terminate。此外,这种技术还允许我们为LTLfMT的各个 Fragment establishment新的 decidability result,以及为已知的类提供新的 decidability证明。

Alpha-GPT: Human-AI Interactive Alpha Mining for Quantitative Investment

  • paper_url: http://arxiv.org/abs/2308.00016
  • repo_url: None
  • paper_authors: Saizhuo Wang, Hang Yuan, Leon Zhou, Lionel M. Ni, Heung-Yeung Shum, Jian Guo
  • for: 这篇论文的目的是探讨一新的α探索方法(effective trading signals or factors),并提供一个新的人工智能-人类互动式α探索框架。
  • methods: 这篇论文使用了大量语言模型来实现人工智能-人类互动式α探索方法,并提出了一个新的问题工程式框架来实现这种互动。
  • results: 这篇论文透过一些α探索实验,证明了 alpha-GPT 框架的有效性和优势,并提供了一些创新、深入和有效的 α。
    Abstract One of the most important tasks in quantitative investment research is mining new alphas (effective trading signals or factors). Traditional alpha mining methods, either hand-crafted factor synthesizing or algorithmic factor mining (e.g., search with genetic programming), have inherent limitations, especially in implementing the ideas of quants. In this work, we propose a new alpha mining paradigm by introducing human-AI interaction, and a novel prompt engineering algorithmic framework to implement this paradigm by leveraging the power of large language models. Moreover, we develop Alpha-GPT, a new interactive alpha mining system framework that provides a heuristic way to ``understand'' the ideas of quant researchers and outputs creative, insightful, and effective alphas. We demonstrate the effectiveness and advantage of Alpha-GPT via a number of alpha mining experiments.
    摘要 一种非常重要的任务在量化投资研究中是挖掘新的α(有效的交易信号或因素)。传统的α挖掘方法,例如手动制造因子汇集或算法生成器(如基因编程),具有内在的限制,特别是在实现量化研究者的想法方面。在这项工作中,我们提出了一种新的α挖掘方框,通过引入人工智能与人类之间的互动,并利用大语言模型的力量。此外,我们开发了Alpha-GPT,一种新的互动式α挖掘系统框架,可以帮助量化研究者更好地理解他们的想法,并生成创新、深 insightful和有效的α。我们通过一些α挖掘实验来证明Alpha-GPT的效果和优势。

Recent advancement in Disease Diagnostic using machine learning: Systematic survey of decades, comparisons, and challenges

  • paper_url: http://arxiv.org/abs/2308.01319
  • repo_url: None
  • paper_authors: Farzaneh Tajidini, Mohammad-Javad Kheiri
  • for: 这篇论文主要是为了探讨计算机支持的医学诊断技术,以及这些技术在诊断疾病方面的应用。
  • methods: 该论文使用了机器学习技术,包括例示学习和深度学习等方法,来分析多个维度和多模式的生物医学数据,以提高疾病诊断的精度。
  • results: 该论文总结了各种机器学习算法和技术在诊断不同疾病方面的应用和效果,包括肝炎、糖尿病、肝病、登革热和心血管疾病等。
    Abstract Computer-aided diagnosis (CAD), a vibrant medical imaging research field, is expanding quickly. Because errors in medical diagnostic systems might lead to seriously misleading medical treatments, major efforts have been made in recent years to improve computer-aided diagnostics applications. The use of machine learning in computer-aided diagnosis is crucial. A simple equation may result in a false indication of items like organs. Therefore, learning from examples is a vital component of pattern recognition. Pattern recognition and machine learning in the biomedical area promise to increase the precision of disease detection and diagnosis. They also support the decision-making process's objectivity. Machine learning provides a practical method for creating elegant and autonomous algorithms to analyze high-dimensional and multimodal bio-medical data. This review article examines machine-learning algorithms for detecting diseases, including hepatitis, diabetes, liver disease, dengue fever, and heart disease. It draws attention to the collection of machine learning techniques and algorithms employed in studying conditions and the ensuing decision-making process.
    摘要 计算机支持诊断(CAD)是医疗影像研究领域的一个热门领域,正在迅速扩展。由于医疗诊断系统中的错误可能会导致严重错误的医疗治疗,因此在最近几年内,对计算机支持诊断应用程序进行改进的努力已经做出了巨大的投入。机器学习在计算机支持诊断中扮演着关键的角色。由于简单的方程可能会导致精准的识别结果,因此学习从示例中的知识是绝对必要的。生物医学领域中的模式识别和机器学习技术 promise to 提高疾病检测和诊断的精度。它们还支持决策过程的 объекivity。机器学习提供了一种实用的方法来分析高维和多模式生物医学数据。本文icle 评论了使用机器学习算法检测疾病,包括肝炎、糖尿病、肝病、登革热和心血管疾病。它吸引了对机器学习技术和算法在研究疾病 condition 的应用和决策过程中的集成。

Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc

  • paper_url: http://arxiv.org/abs/2308.04445
  • repo_url: None
  • paper_authors: Doug Lenat, Gary Marcus
    for:The paper is written to address the limitations of current AI approaches, particularly the lack of reasoning ability and unpredictability of large language models (LLMs).methods:The paper proposes an alternative approach to AI that combines the strengths of LLMs with the reasoning ability of symbolic AI systems, using curated pieces of explicit knowledge and rules of thumb to enable an inference engine to automatically deduce logical entailments.results:The paper describes how one AI system, Cyc, has developed ways to overcome the tradeoff between expressiveness and speed in reasoning, allowing it to reason in higher order logic in real time. The authors suggest that any trustworthy general AI will need to hybridize the LLM and symbolic approaches, and lay out a path to realizing this dream.
    Abstract Generative AI, the most popular current approach to AI, consists of large language models (LLMs) that are trained to produce outputs that are plausible, but not necessarily correct. Although their abilities are often uncanny, they are lacking in aspects of reasoning, leading LLMs to be less than completely trustworthy. Furthermore, their results tend to be both unpredictable and uninterpretable. We lay out 16 desiderata for future AI, and discuss an alternative approach to AI which could theoretically address many of the limitations associated with current approaches: AI educated with curated pieces of explicit knowledge and rules of thumb, enabling an inference engine to automatically deduce the logical entailments of all that knowledge. Even long arguments produced this way can be both trustworthy and interpretable, since the full step-by-step line of reasoning is always available, and for each step the provenance of the knowledge used can be documented and audited. There is however a catch: if the logical language is expressive enough to fully represent the meaning of anything we can say in English, then the inference engine runs much too slowly. That's why symbolic AI systems typically settle for some fast but much less expressive logic, such as knowledge graphs. We describe how one AI system, Cyc, has developed ways to overcome that tradeoff and is able to reason in higher order logic in real time. We suggest that any trustworthy general AI will need to hybridize the approaches, the LLM approach and more formal approach, and lay out a path to realizing that dream.
    摘要 现代AI的主流方法是生成AI,它们通过训练大型语言模型(LLM)来生成可能性强的输出,但并非总是正确的。尽管它们的能力可能很强大,但它们缺乏推理能力,导致它们不够可靠。此外,它们的输出通常是难以预测和解释的。我们提出了16个愿景 для未来的AI,并讨论了一种可能解决当前方法的局限性的替代方法:通过手动约束的知识和规则来教育AI,使其推理引擎可以自动推理出知识的逻辑推论。这些推理结果不仅可靠,而且可解释,因为整个推理过程的每一步都可以查看,并且每一步使用的知识的来源都可以记录和审核。然而,有一点问题:如果逻辑语言足够表达任何我们可以在英语中说的意思,那么推理引擎就会非常慢。因此,符号AI系统通常选择一些快速而但是远不如表达力强的逻辑,如知识图。我们描述了一个AI系统——Cyc——如何超越这一限制,在实时内推理高阶逻辑。我们建议任何可靠的通用AI都需要混合这两种方法,以及LLM方法和更正式的方法。我们还讲解了实现这一梦想的路径。

On the use of associative memory in Hopfield networks designed to solve propositional satisfiability problems

  • paper_url: http://arxiv.org/abs/2307.16807
  • repo_url: https://github.com/nata-web/SO_for_SAT
  • paper_authors: Natalya Weber, Werner Koch, Ozan Erdem, Tom Froese
  • for: 该 paper 用 Hopfield networks 和 Self-Optimization (SO) 模型解决了许多计算问题,因为它们提供了生物学可能的机制。
  • methods: 该 paper 使用了 Hebbian 学习规则,在重复地将网络重置到初始状态后,使网络优化其行为,以达到某种愿望状态。
  • results: 该 paper 通过两个例子(假象问题和地图彩色问题)示出了 SO 模型可以解决具体的 combinatorial 问题,但也发现在某些条件下,学习后的网络可能产生不适合问题的优化解决方案,这可能是 SO 模型解决难解问题的一种不良效果。
    Abstract Hopfield networks are an attractive choice for solving many types of computational problems because they provide a biologically plausible mechanism. The Self-Optimization (SO) model adds to the Hopfield network by using a biologically founded Hebbian learning rule, in combination with repeated network resets to arbitrary initial states, for optimizing its own behavior towards some desirable goal state encoded in the network. In order to better understand that process, we demonstrate first that the SO model can solve concrete combinatorial problems in SAT form, using two examples of the Liars problem and the map coloring problem. In addition, we show how under some conditions critical information might get lost forever with the learned network producing seemingly optimal solutions that are in fact inappropriate for the problem it was tasked to solve. What appears to be an undesirable side-effect of the SO model, can provide insight into its process for solving intractable problems.
    摘要

Multiobjective Evolutionary Component Effect on Algorithm behavior

  • paper_url: http://arxiv.org/abs/2308.02527
  • repo_url: None
  • paper_authors: Yuri Lavinas, Marcelo Ladeira, Gabriela Ochoa, Claus Aranha
  • for: This paper aims to investigate the effects of the final configuration of an automatically designed multiobjective evolutionary algorithm (MOEA) on the algorithm’s performance.
  • methods: The paper uses a methodology to analyze the impact of the algorithm components, such as Search Trajectory Networks (STNs), population diversity, and anytime hypervolume values, on the performance of the MOEA.
  • results: The study finds that the MOEA converges to good hypervolume values in analytical artificial and real-world problems, but the search is still ongoing in simulated real-world problems. The paper also observes a diverse set of trajectories in the analytical artificial problems, and these trajectories are more similar and frequently reach optimal solutions in the other problems.Here is the Chinese translation of the three key information points:
  • for: 这篇论文目的是研究自动设计的多目标进化算法(MOEA)的配置对算法性能的影响。
  • methods: 这篇论文使用一种方法来分析自动设计算法组件,如搜索轨迹网络(STNs)、人口多样性和任何时间超量值对算法性能的影响。
  • results: 研究发现,MOEA在分析人工和实际问题上 converge 到良好的超量值,但在模拟问题上的搜索仍在进行中。论文还发现,分析人工问题时,搜索轨迹网络的多样性较高,这些轨迹更frequently到达优解。
    Abstract The performance of multiobjective evolutionary algorithms (MOEAs) varies across problems, making it hard to develop new algorithms or apply existing ones to new problems. To simplify the development and application of new multiobjective algorithms, there has been an increasing interest in their automatic design from their components. These automatically designed metaheuristics can outperform their human-developed counterparts. However, it is still unknown what are the most influential components that lead to performance improvements. This study specifies a new methodology to investigate the effects of the final configuration of an automatically designed algorithm. We apply this methodology to a tuned Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D) designed by the iterated racing (irace) configuration package on constrained problems of 3 groups: (1) analytical real-world problems, (2) analytical artificial problems and (3) simulated real-world. We then compare the impact of the algorithm components in terms of their Search Trajectory Networks (STNs), the diversity of the population, and the anytime hypervolume values. Looking at the objective space behavior, the MOEAs studied converged before half of the search to generally good HV values in the analytical artificial problems and the analytical real-world problems. For the simulated problems, the HV values are still improving at the end of the run. In terms of decision space behavior, we see a diverse set of the trajectories of the STNs in the analytical artificial problems. These trajectories are more similar and frequently reach optimal solutions in the other problems.
    摘要 multiobjective evolutionary algorithms(MOEAs)的表现在不同的问题上有很大差异,这使得开发新的算法或应用现有的算法到新问题变得困难。为了简化新算法的开发和应用,有越来越多的关注于它们的自动设计。这些自动设计的metaheuristics可以超越人类开发的对应算法。然而,还不清楚哪些组件导致性能改进。本研究提出了一种新的方法来调查自动设计算法的最后配置对性能的影响。我们应用这种方法于一个调参的多目标演化算法基于分解(MOEA/D)在约束问题上,包括三类问题:(1)分析世界问题,(2)分析人工问题和(3)实际世界问题。然后,我们比较了算法组件在各个方面的搜索轨迹网络(STNs)、人口多样性和任何时间的超Volume值的影响。在目标空间行为方面,MOEAsstudied在分析人工问题和分析世界问题中通常在半个搜索时间内 converged to good HV值。而在模拟问题上,HV值仍在搜索结束时提高。在决策空间行为方面,我们看到了分析人工问题中的STN trajectories是更加多样的,这些轨迹更frequently reach到优质解。

Structural Transfer Learning in NL-to-Bash Semantic Parsers

  • paper_url: http://arxiv.org/abs/2307.16795
  • repo_url: None
  • paper_authors: Kyle Duffy, Satwik Bhattamishra, Phil Blunsom
  • for: 本研究旨在解释预训练数据集的设计方面进行大规模进攻。
  • methods: 本研究提出了一种方法来量化理解机器翻译任务之间的结构匹配。该方法在NLBash任务上应用,并显示NLBash大多可reducible到lexical alignment。此外,研究还发现了natural language to SQL和NLBash之间强相似性。
  • results: 研究发现,在英文到德文翻译任务中,更多的预训练计算量不总是导致NLBash中的semantic representation具有更强的传递性。
    Abstract Large-scale pre-training has made progress in many fields of natural language processing, though little is understood about the design of pre-training datasets. We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. We apply our methodology to the natural language to Bash semantic parsing task (NLBash) and show that it is largely reducible to lexical alignment. We also find that there is strong structural overlap between NLBash and natural language to SQL. Additionally, we perform a study varying compute expended during pre-training on the English to German machine translation task and find that more compute expended during pre-training does not always correspond semantic representations with stronger transfer to NLBash.
    摘要 大规模预训练在自然语言处理多个领域取得了进步,然而预训练数据集的设计仍然不够了解。我们提出了一种方法来获得自然语言翻译任务之间的结构 overlap 的量化理解。我们对 natural language to Bash semantics parsing任务 (NLBash) 应用了这种方法,并发现它大多数可以归结为词语对应。此外,我们发现 natural language to SQL 和 NLBash 之间存在强大的结构 overlap。此外,我们对英语到德语机器翻译任务进行了不同计算开销的预训练研究,发现在某些情况下,更多的计算开销不一定对应 stronger 的语义传递到 NLBash。

cs.CL - 2023-08-01

CoSMo: A constructor specification language for Abstract Wikipedia’s content selection process

  • paper_url: http://arxiv.org/abs/2308.02539
  • repo_url: None
  • paper_authors: Kutz Arrieta, Pablo R. Fillottrani, C. Maria Keet
  • for: This paper is written for the purpose of creating a novel language modeling framework called CoSMo, which can be used for multilingual content selection and abstract representation in the context of the Abstract Wikipedia project.
  • methods: The paper uses a rigorous language design process that includes broad stakeholder consultation to create CoSMo, which meets the requirements of multilingual modeling, content selection covering declarative content and functions, and both classes and instances.
  • results: The preliminary evaluation of CoSMo shows that it is a useful language modeling framework for abstract representation in the Abstract Wikipedia project and potentially in other contexts as well.
    Abstract Representing snippets of information abstractly is a task that needs to be performed for various purposes, such as database view specification and the first stage in the natural language generation pipeline for generative AI from structured input, i.e., the content selection stage to determine what needs to be verbalised. For the Abstract Wikipedia project, requirements analysis revealed that such an abstract representation requires multilingual modelling, content selection covering declarative content and functions, and both classes and instances. There is no modelling language that meets either of the three features, let alone a combination. Following a rigorous language design process inclusive of broad stakeholder consultation, we created CoSMo, a novel {\sc Co}ntent {\sc S}election {\sc Mo}deling language that meets these and other requirements so that it may be useful both in Abstract Wikipedia as well as other contexts. We describe the design process, rationale and choices, the specification, and preliminary evaluation of the language.
    摘要 表示信息抽象是一项需要进行的任务,用于多种目的,如数据库视图规定和生成AI自结构输入的首个阶段,即内容选择阶段,以确定需要被描述。为Abstract Wikipedia项目,需求分析表明,这种抽象表示需要多语言模型、内容选择覆盖声明内容和函数,以及类和实例。当时没有一种模型语言满足这三个特点,更不用说是一起满足。我们采用了严格的语言设计过程,包括广泛的参与者咨询,创造了CoSMo,一种新的内容选择模型语言,满足这些和其他需求,以便在Abstract Wikipedia中以及其他上使用。我们将介绍设计过程、理由和选择、规范,以及初步评估语言。

Unimodal Intermediate Training for Multimodal Meme Sentiment Classification

  • paper_url: http://arxiv.org/abs/2308.00528
  • repo_url: None
  • paper_authors: Muzhaffar Hazman, Susan McKeever, Josephine Griffith
  • for: 这篇论文的目的是为了开发一个多Modal的Memes感受分类器。
  • methods: 这篇论文使用了一种新的supervised中途训练方法,利用大量的文本和图像感受分类数据。
  • results: 这篇论文的结果显示,将多Modal的Memes融合到单Modal的感受分类器中可以提高模型的性能,并且可以将训练集中的标签Memes减少40%而不影响下游模型的性能。
    Abstract Internet Memes remain a challenging form of user-generated content for automated sentiment classification. The availability of labelled memes is a barrier to developing sentiment classifiers of multimodal memes. To address the shortage of labelled memes, we propose to supplement the training of a multimodal meme classifier with unimodal (image-only and text-only) data. In this work, we present a novel variant of supervised intermediate training that uses relatively abundant sentiment-labelled unimodal data. Our results show a statistically significant performance improvement from the incorporation of unimodal text data. Furthermore, we show that the training set of labelled memes can be reduced by 40% without reducing the performance of the downstream model.
    摘要 互联网迷因(Internet Memes)是自动感情分类的挑战性用户生成内容之一。实际上,缺乏labelled memes是发展多modal meme感情分类器的阻碍因素。为解决这个问题,我们提出使用具有标签的多modal meme感情分类器的训练,并补充训练材料中的单modal(仅有图像和仅有文本)数据。在这个研究中,我们提出了一种新的supervised intermediate training的变iante,使用比较充足的标签文本数据。我们的结果显示,将单modal文本数据包含在训练中可以 statistically significant提高下游模型的表现。此外,我们显示了可以透过将labelled meme训练集量减少40%而不减少下游模型的表现。

Covid-19 Public Sentiment Analysis for Indian Tweets Classification

  • paper_url: http://arxiv.org/abs/2308.06241
  • repo_url: None
  • paper_authors: Mohammad Maksood Akhter, Devpriya Kanojia
  • for: 这篇论文主要是为了研究印度Twitter数据中的情感分析,以便分析COVID-19 tweets中的意见和情感。
  • methods: 论文使用Twitter数据EXTRACTING和情感分析 queries来分析Twitter数据中的意见和情感。
  • results: 这篇论文显示了在Twitter数据中的情感分析结果,包括正面、负面和中性等意见和情感。
    Abstract When any extraordinary event takes place in the world wide area, it is the social media that acts as the fastest carrier of the news along with the consequences dealt with that event. One can gather much information through social networks regarding the sentiments, behavior, and opinions of the people. In this paper, we focus mainly on sentiment analysis of twitter data of India which comprises of COVID-19 tweets. We show how Twitter data has been extracted and then run sentimental analysis queries on it. This is helpful to analyze the information in the tweets where opinions are highly unstructured, heterogeneous, and are either positive or negative or neutral in some cases.
    摘要 当世界范围内发生任何不寻常事件时,社交媒体就会成为最快的消息传递者,同时也会附带该事件所带来的后果。通过社交网络,你可以了解人们的情感、行为和意见的趋势。在这篇论文中,我们主要关注印度Twitter数据的情感分析,即COVID-19 tweets。我们将介绍如何从Twitter数据中提取数据,然后运行情感分析查询。这有助于分析社交媒体上的信息,因为这些信息通常是不结构化、多元和有时是正面、负面或中性的。

ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation

  • paper_url: http://arxiv.org/abs/2308.00400
  • repo_url: https://github.com/zhangbo-nlp/zrigf
  • paper_authors: Bo Zhang, Jian Wang, Hui Ma, Bo Xu, Hongfei Lin
  • for: 本研究旨在开发一种能够在零资源情况下使用图像信息进行对话生成的框架。
  • methods: 该框架基于一种两阶段学习策略,包括对比预训练和生成预训练。对比预训练包括一个文本和图像匹配模块,该模块将图像和文本映射到一个统一的编码vector空间中,以及一个文本辅助遮盲图像模型,以保持预训练的视觉特征并促进多模态特征的对齐。生成预训练使用一个多模态融合模块和一个信息传递模块,生成基于融合的多模态表示的相关回答。
  • results: 对文本基本对话和图像基本对话数据集进行了广泛的实验,并emonstrated ZRIGF的有效性在生成Contextually pertinent和informative回答。此外,我们采用了一种完全零资源情况来证明我们的框架在新领域中的稳定普适性。
    Abstract Image-grounded dialogue systems benefit greatly from integrating visual information, resulting in high-quality response generation. However, current models struggle to effectively utilize such information in zero-resource scenarios, mainly due to the disparity between image and text modalities. To overcome this challenge, we propose an innovative multimodal framework, called ZRIGF, which assimilates image-grounded information for dialogue generation in zero-resource situations. ZRIGF implements a two-stage learning strategy, comprising contrastive pre-training and generative pre-training. Contrastive pre-training includes a text-image matching module that maps images and texts into a unified encoded vector space, along with a text-assisted masked image modeling module that preserves pre-training visual features and fosters further multimodal feature alignment. Generative pre-training employs a multimodal fusion module and an information transfer module to produce insightful responses based on harmonized multimodal representations. Comprehensive experiments conducted on both text-based and image-grounded dialogue datasets demonstrate ZRIGF's efficacy in generating contextually pertinent and informative responses. Furthermore, we adopt a fully zero-resource scenario in the image-grounded dialogue dataset to demonstrate our framework's robust generalization capabilities in novel domains. The code is available at https://github.com/zhangbo-nlp/ZRIGF.
    摘要 图像背景对话系统受益很大地含义图像信息,从而生成高质量的回答。然而,当前模型在零资源场景下尚未能有效利用这些信息,主要是因为图像和文本Modalities之间的差异。为了解决这个挑战,我们提出了一种创新的多模态框架,称为ZRIGF,它在零资源情况下使用图像背景信息进行对话生成。ZRIGF采用了两个阶段的学习策略:对比预训练和生成预训练。对比预训练包括一个图像和文本匹配模块,将图像和文本映射到一个统一编码 vector space,以及一个文本辅助遮盖图像模型,以保留预训练的视觉特征并促进多模态特征的对应。生成预训练使用多模态融合模块和信息传递模块,生成基于融合的多模态表示中的启发性回答。我们在文本基于和图像背景基于对话数据集上进行了广泛的实验,证明ZRIGF在生成上下文ually pertinent和有用的回答。此外,我们采用了完全零资源场景,以示我们的框架在新领域中的稳定性和普适性。代码可以在https://github.com/zhangbo-nlp/ZRIGF 上获取。

Tackling Hallucinations in Neural Chart Summarization

  • paper_url: http://arxiv.org/abs/2308.00399
  • repo_url: https://github.com/worldhellow/hallucinations-c2t
  • paper_authors: Saad Obaid ul Islam, Iza Škrjanec, Ondřej Dušek, Vera Demberg
  • for: 这个论文目的是解决chart summarization中的幻觉问题。
  • methods: 该论文提出了一种基于自然语言推理(NLI)的数据预处理方法,以减少幻觉现象。
  • results: 人工评估表明,该方法可以显著减少幻觉现象,同时也提高了总的性能。此外,缩短长距离依赖关系和添加图表标题和标签也有助于提高表要求。
    Abstract Hallucinations in text generation occur when the system produces text that is not grounded in the input. In this work, we tackle the problem of hallucinations in neural chart summarization. Our analysis shows that the target side of chart summarization training datasets often contains additional information, leading to hallucinations. We propose a natural language inference (NLI) based method to preprocess the training data and show through human evaluation that our method significantly reduces hallucinations. We also found that shortening long-distance dependencies in the input sequence and adding chart-related information like title and legends improves the overall performance.
    摘要 文本生成中的幻觉现象发生在系统生成的文本与输入无法匹配时。在这个工作中,我们解决了chart summarization的幻觉问题。我们的分析显示,chart summarization的目标 сторо面常常包含更多的信息,导致幻觉。我们提出了基于自然语言推理(NLI)的预处理方法,并通过人工评估表明,我们的方法可以明显减少幻觉。此外,我们发现缩短输入序列中的长距离依赖关系和添加图表标题和标签信息可以提高总性表现。

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

  • paper_url: http://arxiv.org/abs/2308.00319
  • repo_url: None
  • paper_authors: Hai Zhu, Zhaoqing Yang, Weiwei Shang, Yuren Wu
  • for: 本研究探讨了自然语言处理模型对 adversarial example 的抵御性。
  • methods: 本文提出了一种新的 hard-label attack 算法,named LimeAttack,该算法使用了本地可解释方法来估算单词重要性排名,然后通过 beam search 找到最佳的黑hat 示例。
  • results: 对比 existed 的 hard-label attack 算法,LimeAttack 在同样的查询预算下 achiev 了更好的攻击性能。此外,对大型语言模型进行评估,结果表明 adversarial example 仍然是大型语言模型的一大威胁。LimeAttack 生成的黑hat 示例具有高度传输性,可以有效提高模型的Robustness 在 adversarial training 中。
    Abstract Natural language processing models are vulnerable to adversarial examples. Previous textual adversarial attacks adopt gradients or confidence scores to calculate word importance ranking and generate adversarial examples. However, this information is unavailable in the real world. Therefore, we focus on a more realistic and challenging setting, named hard-label attack, in which the attacker can only query the model and obtain a discrete prediction label. Existing hard-label attack algorithms tend to initialize adversarial examples by random substitution and then utilize complex heuristic algorithms to optimize the adversarial perturbation. These methods require a lot of model queries and the attack success rate is restricted by adversary initialization. In this paper, we propose a novel hard-label attack algorithm named LimeAttack, which leverages a local explainable method to approximate word importance ranking, and then adopts beam search to find the optimal solution. Extensive experiments show that LimeAttack achieves the better attacking performance compared with existing hard-label attack under the same query budget. In addition, we evaluate the effectiveness of LimeAttack on large language models, and results indicate that adversarial examples remain a significant threat to large language models. The adversarial examples crafted by LimeAttack are highly transferable and effectively improve model robustness in adversarial training.
    摘要 自然语言处理模型容易受到敌意攻击。先前的文本敌意攻击通常使用梯度或信心分数来计算单词重要性排名,并生成敌意攻击示例。然而,这些信息在实际世界中不可获得。因此,我们将注意力点在更真实和挑战性的设定上,即硬标签攻击。现有的硬标签攻击算法通常通过随机替换初始化敌意示例,然后使用复杂的规则来优化敌意扰动。这些方法需要访问模型的多少次,并且攻击成功率受到敌手初始化的限制。在这篇论文中,我们提出了一种新的硬标签攻击算法,名为LimeAttack。LimeAttack利用了一种本地可解释的方法来估算单词重要性排名,然后使用搜索树来找到最佳解决方案。广泛的实验表明,LimeAttack在同样的查询预算下比现有的硬标签攻击算法更好的攻击性能。此外,我们还评估了LimeAttack的效果于大语言模型,结果表明,敌意示例仍然是大语言模型的重要威胁。LimeAttack生成的敌意示例具有高 Transfer Learning 性和可以有效地提高模型在对抗训练中的 Robustness。

Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00304
  • repo_url: None
  • paper_authors: Jiaao Chen, Xiaoman Pan, Dian Yu, Kaiqiang Song, Xiaoyang Wang, Dong Yu, Jianshu Chen
  • for: 提高大型自然语言模型(LLM)的compositional generalization能力
  • methods: 使用novel的prompting策略——skills-in-context(SKiC) prompting
  • results: 实现了LLM解决难度较高的问题的能力,并且能够解决未看过的问题,这表明LLM具有人类智能的推理能力。
    Abstract We consider the problem of eliciting compositional generalization capabilities in large language models (LLMs) with a novel type of prompting strategy. Compositional generalization empowers the LLMs to solve problems that are harder than the ones they have seen (i.e., easy-to-hard generalization), which is a critical reasoning capability of human-like intelligence. However, even the current state-of-the-art LLMs still struggle with this form of reasoning. To bridge this gap, we propose skills-in-context (SKiC) prompting, which instructs LLMs how to compose basic skills to resolve more complex problems. We find that it is crucial to demonstrate both the skills and the compositional examples within the same prompting context. With as few as two examplars, our SKiC prompting initiates strong synergies between skills and their composition capabilities. Notably, it empowers LLMs to solve unseen problems that require innovative skill compositions, achieving near-perfect generalization on a broad range of challenging compositionality tasks. Intriguingly, SKiC prompting unlocks the latent potential of LLMs, enabling them to leverage pre-existing internal skills acquired during earlier pre-training stages, even when these skills are not explicitly presented in the prompting context. This results in the capability of LLMs to solve unseen complex problems by activating and composing internal competencies. With such prominent features, SKiC prompting is able to achieve state-of-the-art performance on challenging mathematical reasoning benchmarks (e.g., MATH).
    摘要 我们考虑了大型自然语言模型(LLM)中的问题,即召唤扩展能力的启发策略。这种能力使得LLM能够解决比它们所见过的问题更加复杂(即易于难的扩展),这是人类智能的重要逻辑能力之一。然而,当前的LLM仍然在这种类型的逻辑能力方面做出了差。为bridge这个差距,我们提出了技能在Context(SKiC)召唤策略。我们发现,在同一个召唤上显示技能和其组合例子是关键。只需要两个示例,我们的 SKiC 召唤就可以强化 LLM 的基本技能和组合能力。特别是,它可以让 LLM 解决未经见过的问题,通过组合内部的技能来实现创新的技能组合,达到了 near-perfect 的扩展性。启示地,SKiC 召唤可以让 LLM Activate和组合其内部的竞争能力,解决无法被直接示例所覆盖的问题。这些特点使得 SKiC 召唤可以在复杂的数学逻辑任务(例如 MATH)中达到最新的表现。

Towards Effective Ancient Chinese Translation: Dataset, Model, and Evaluation

  • paper_url: http://arxiv.org/abs/2308.00240
  • repo_url: None
  • paper_authors: Geyang Guo, Jiarong Yang, Fengyuan Lu, Jiaxin Qin, Tianyi Tang, Wayne Xin Zhao
  • for: 本文提出了一种基于古代中文的翻译模型,帮助更好地理解中国古代文学、传统和文化。
  • methods: 我们收集、清洗和分类了各种古代中文资料,形成了目前最大的古代中文资源。我们还提出了一种特有的训练方法,包括对称对换和双重遮盲语言模型。
  • results: 我们建立了一个用于评估古代中文翻译质量的标准准则,并对各种现有模型进行了评估。我们的模型在五个领域中表现出色,与GPT-3.5模型的+12.0 BLEU比进行了比较,并在人工评估中超过了ERNIE Bot。进一步的微调也显示了Erya模型的优秀传送能力,升幅+6.2 BLEU。我们将所有资源发布到https://github.com/RUCAIBox/Erya。
    Abstract Interpreting ancient Chinese has been the key to comprehending vast Chinese literature, tradition, and civilization. In this paper, we propose Erya for ancient Chinese translation. From a dataset perspective, we collect, clean, and classify ancient Chinese materials from various sources, forming the most extensive ancient Chinese resource to date. From a model perspective, we devise Erya training method oriented towards ancient Chinese. We design two jointly-working tasks: disyllabic aligned substitution (DAS) and dual masked language model (DMLM). From an evaluation perspective, we build a benchmark to judge ancient Chinese translation quality in different scenarios and evaluate the ancient Chinese translation capacities of various existing models. Our model exhibits remarkable zero-shot performance across five domains, with over +12.0 BLEU against GPT-3.5 models and better human evaluation results than ERNIE Bot. Subsequent fine-tuning further shows the superior transfer capability of Erya model with +6.2 BLEU gain. We release all the above-mentioned resources at https://github.com/RUCAIBox/Erya.
    摘要 ancient Chinese 的解释对中国文学、传统和文明产生了关键作用。在这篇论文中,我们提出了“Erya”作为古代中文翻译的解决方案。从数据aset的角度来看,我们收集、清洗并分类了各种古代中文资料,组建了迄今为止最大的古代中文资源。从模型的角度来看,我们设计了古代中文翻译 oriented的Erya训练方法。我们设计了两个联合工作任务:词组对应替换(DAS)和双重遮盲语言模型(DMLM)。从评估角度来看,我们建立了评估古代中文翻译质量的标准套件,并评估了不同enario下的古代中文翻译能力。我们的模型在五个领域中表现出了很好的零MQA表现,与GPT-3.5模型相比,我们的模型的BLEU得分高于12.0。后续的微调更显示了Erya模型的提升性能,BLEU得分增加了6.2。我们在https://github.com/RUCAIBox/Erya上发布了所有资源。

Boosting Adverse Drug Event Normalization on Social Media: General-Purpose Model Initialization and Biomedical Semantic Text Similarity Benefit Zero-Shot Linking in Informal Contexts

  • paper_url: http://arxiv.org/abs/2308.00157
  • repo_url: None
  • paper_authors: François Remy, Simone Scaboro, Beatrice Portelli
  • for: 这篇论文的目的是提出一种新的社交媒体上的不良药物事件Normalization方法,并通过使用通用模型初始化和semantic-text-similarity精细调整(STS)来改善表现。
  • methods: 本研究使用了 BioLORD 的通用模型初始化和 STS 的semantic-text-similarity精细调整,并在多个社交媒体数据集上进行了实验评估。
  • results: 本研究的实验结果显示,使用我们提出的方法可以在社交媒体上实现state-of-the-art的性能,并且在所有测试数据集上都表现出良好的结果。
    Abstract Biomedical entity linking, also known as biomedical concept normalization, has recently witnessed the rise to prominence of zero-shot contrastive models. However, the pre-training material used for these models has, until now, largely consisted of specialist biomedical content such as MIMIC-III clinical notes (Johnson et al., 2016) and PubMed papers (Sayers et al., 2021; Gao et al., 2020). While the resulting in-domain models have shown promising results for many biomedical tasks, adverse drug event normalization on social media texts has so far remained challenging for them (Portelli et al., 2022). In this paper, we propose a new approach for adverse drug event normalization on social media relying on general-purpose model initialization via BioLORD (Remy et al., 2022) and a semantic-text-similarity fine-tuning named STS. Our experimental results on several social media datasets demonstrate the effectiveness of our proposed approach, by achieving state-of-the-art performance. Based on its strong performance across all the tested datasets, we believe this work could emerge as a turning point for the task of adverse drug event normalization on social media and has the potential to serve as a benchmark for future research in the field.
    摘要 生物医学实体链接,也称为生物医学概念 normalization,最近受到零批示对比模型的普及。然而,这些模型的预训练材料, Until now, largely consisted of specialist biomedical content such as MIMIC-III clinical notes (Johnson et al., 2016) and PubMed papers (Sayers et al., 2021; Gao et al., 2020). While the resulting in-domain models have shown promising results for many biomedical tasks, adverse drug event normalization on social media texts has so far remained challenging for them (Portelli et al., 2022).在这篇论文中,我们提出了一种新的方法,基于通用模型初始化via BioLORD (Remy et al., 2022)和semantic-text-similarity fine-tuning named STS。我们的实验结果表明,我们的提议方法在多个社交媒体数据集上具有最佳性能。基于所有测试数据集的优秀表现,我们认为这项工作可能会成为社交媒体上药品副作用normalization任务的转折点,并且具有未来研究领域的标准 referential。

Virtual Prompt Injection for Instruction-Tuned Large Language Models

  • paper_url: http://arxiv.org/abs/2307.16888
  • repo_url: None
  • paper_authors: Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin
  • for: 这个论文是为了漏洞抢夺大语言模型(LLM)的目的而写的。
  • methods: 这篇论文使用的方法是投入虚拟提示,以控制 LLM 的行为。
  • results: 研究发现,只需投入 0.1% 的恶意示例,就可以使 LLM 对有关杰布·纽伦(Joe Biden)的查询返回负面的结果。
    Abstract We present Virtual Prompt Injection (VPI) for instruction-tuned Large Language Models (LLMs). VPI allows an attacker-specified virtual prompt to steer the model behavior under specific trigger scenario without any explicit injection in model input. For instance, if an LLM is compromised with the virtual prompt "Describe Joe Biden negatively." for Joe Biden-related instructions, then any service deploying this model will propagate biased views when handling user queries related to Joe Biden. VPI is especially harmful for two primary reasons. Firstly, the attacker can take fine-grained control over LLM behaviors by defining various virtual prompts, exploiting LLMs' proficiency in following instructions. Secondly, this control is achieved without any interaction from the attacker while the model is in service, leading to persistent attack. To demonstrate the threat, we propose a simple method for performing VPI by poisoning the model's instruction tuning data. We find that our proposed method is highly effective in steering the LLM with VPI. For example, by injecting only 52 poisoned examples (0.1% of the training data size) into the instruction tuning data, the percentage of negative responses given by the trained model on Joe Biden-related queries change from 0% to 40%. We thus highlight the necessity of ensuring the integrity of the instruction-tuning data as little poisoned data can cause stealthy and persistent harm to the deployed model. We further explore the possible defenses and identify data filtering as an effective way to defend against the poisoning attacks. Our project page is available at https://poison-llm.github.io.
    摘要 我团队现在提出了虚拟提示插入(VPI)技术,用于对特定触发情况下大语言模型(LLM)的行为进行控制。VPI允许攻击者在没有显式输入的情况下,通过定制虚拟提示来操纵模型的行为。例如,如果一个LLM被恶意攻击者定制为“描述约瑟·贝登纳成分负面”,那么任何使用这个模型的服务都会在用户查询相关的约瑟·贝登纳问题时传播偏见。VPI具有两点优势:首先,攻击者可以通过定制虚拟提示来细化控制LLM的行为,利用LLM的遵从指令的能力。其次,这种控制是在服务模型时进行,导致 persistente攻击。为了证明这种威胁,我们提出了一种简单的VPI实现方法,利用模型的指令调整数据中毒。我们发现,只需插入52个恶意示例(数据量的0.1%),可以让训练后的模型对约瑟·贝登纳相关的查询发送40%的负面回答。我们因此强调了保持模型的指令调整数据的完整性,因为只需少量毒垢数据就可以让模型发生隐藏和持续的害。我们还探索了可能的防御策略,并确定了数据筛选是一种有效的防御方法。关于我们的项目,请参考https://poison-llm.github.io。

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

  • paper_url: http://arxiv.org/abs/2307.16883
  • repo_url: https://github.com/project-miracl/hagrid
  • paper_authors: Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, Jimmy Lin
  • for: 这项研究的目的是开发一个可以生成搜索结果的自然语言搜索引擎,以提高搜索结果的可信度和可追溯性。
  • methods: 这项研究使用了人类和大语言模型(LLM)的协作方式,首先使用LLM自动生成了带有参考文献的解释,然后询问人类标注者评估这些解释的信息性和可追溯性。
  • results: 该研究提出了一个名为HAGRID的新数据集,用于建立可以生成搜索结果的信息搜索模型,该模型可以生成候选引用和带有参考文献的解释。与之前的研究不同的是,该数据集基于公开 accessible的数据集MIRACL,并且通过人类和LLM的协作来构建。
    Abstract The rise of large language models (LLMs) had a transformative impact on search, ushering in a new era of search engines that are capable of generating search results in natural language text, imbued with citations for supporting sources. Building generative information-seeking models demands openly accessible datasets, which currently remain lacking. In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations. Unlike recent efforts that focus on human evaluation of black-box proprietary search engines, we built our dataset atop the English subset of MIRACL, a publicly available information retrieval dataset. HAGRID is constructed based on human and LLM collaboration. We first automatically collect attributed explanations that follow an in-context citation style using an LLM, i.e. GPT-3.5. Next, we ask human annotators to evaluate the LLM explanations based on two criteria: informativeness and attributability. HAGRID serves as a catalyst for the development of information-seeking models with better attribution capabilities.
    摘要 大型自然语言模型(LLM)的出现对搜索产生了转变性影响,并且开启了一新的搜索引擎时代,这些搜索引擎可以生成搜索结果为自然语言文本,并且具有引用来源的参考。建立生成信息搜索模型需要公开 accessible 的数据集,现在还缺乏这些数据集。在这篇论文中,我们介绍了一个新的数据集,即 HAGRID(人类在Loop Attributable Generative Retrieval for Information-seeking Dataset),用于建立终端生成信息搜索模型,这些模型可以搜索候选引用和生成 attributed 解释。与之前的努力不同,我们基于英语subset 的 MIRACL 公共可用信息检索数据集构建了 HAGRID。HAGRID 基于人类和 LLM 的合作,我们首先使用 GPT-3.5 自然语言模型自动收集 attributed 解释,然后请求人工标注员根据两个 criterion 评估 LLM 解释:信息性和可识别性。HAGRID 作为生成信息搜索模型更好的层次结构的 catalyst。

Defense of Adversarial Ranking Attack in Text Retrieval: Benchmark and Baseline via Detection

  • paper_url: http://arxiv.org/abs/2307.16816
  • repo_url: None
  • paper_authors: Xuanang Chen, Ben He, Le Sun, Yingfei Sun
  • for: 本研究旨在提供一个用于攻击检测的NRMs benchmark数据集,并 introduce two types of攻击文档检测任务。
  • methods: 本研究使用了多种检测基准,包括查看Spamicity、混乱度和语言可接受度,并使用supervised分类器。
  • results: 实验结果显示,使用supervised分类器可以有效地 Mitigate known attacks,但是它在未seen攻击下表现不佳。此外,该分类器应避免使用查询文本,以避免学习相关性。
    Abstract Neural ranking models (NRMs) have undergone significant development and have become integral components of information retrieval (IR) systems. Unfortunately, recent research has unveiled the vulnerability of NRMs to adversarial document manipulations, potentially exploited by malicious search engine optimization practitioners. While progress in adversarial attack strategies aids in identifying the potential weaknesses of NRMs before their deployment, the defensive measures against such attacks, like the detection of adversarial documents, remain inadequately explored. To mitigate this gap, this paper establishes a benchmark dataset to facilitate the investigation of adversarial ranking defense and introduces two types of detection tasks for adversarial documents. A comprehensive investigation of the performance of several detection baselines is conducted, which involve examining the spamicity, perplexity, and linguistic acceptability, and utilizing supervised classifiers. Experimental results demonstrate that a supervised classifier can effectively mitigate known attacks, but it performs poorly against unseen attacks. Furthermore, such classifier should avoid using query text to prevent learning the classification on relevance, as it might lead to the inadvertent discarding of relevant documents.
    摘要 neur Ranking 模型 (NRM) 在信息检索 (IR) 系统中得到了广泛应用,但最近的研究发现,NRM 受到了恶意文档修改的威胁,可能会被黑客搜索优化师利用。虽然对 adversarial 攻击策略的进步帮助了在NRM 部署之前发现其潜在弱点,但对于这些攻击的防御措施,如检测恶意文档,还需要进一步探索。为此,本文提供了一个 Benchmark 数据集,并引入了两种检测任务 для恶意文档。我们进行了全面的检测基eline的调查,包括查看 Spamicity、perplexity 和语言可接受性,并使用超vised 分类器。实验结果表明,一个supervised 分类器可以有效地 Mitigate 已知攻击,但它在未知攻击下表现糟糕。此外,这个分类器应该避免使用查询文本,以避免学习对 relevance 的分类。

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

  • paper_url: http://arxiv.org/abs/2307.16811
  • repo_url: https://github.com/turing-online-safety-codebase/dodo-learning
  • paper_authors: Hannah Rose Kirk, Angus R. Williams, Liam Burke, Yi-Ling Chung, Ivan Debono, Pica Johansson, Francesca Stevens, Jonathan Bright, Scott A. Hale
  • for: 这个研究旨在开发一个更普遍化的网络霸凌识别系统,以应对公众人物在社交媒体上 Receiving 过度的辱骂和攻击,并且这种霸凌可能会对公众人物的活跃参与产生负面影响。
  • methods: 本研究使用了自动化系统来识别网络霸凌,并且使用了一个 Novel DODO 数据集,包含 28,000 个标签的 tweet,其中有四个 Domain-demographic 组合。研究人员使用了语言模型进行 tweet 的分类,并且进行了精确的评估和调整,以确保模型能够在不同的领域和人口数据上表现出色。
  • results: 研究人员发现了以下四个关键结果:(i) 小量多样的数据可以帮助模型获得更好的普遍化和适应能力; (ii) 模型在不同的人口数据上的转移能力比较强,但是模型在跨领域数据上的转移能力更高; (iii) 一些人群对普遍化的贡献比较大; (iv) 数据的相似度是转移能力的讯号。
    Abstract Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of cross-group text classification in order to understand how well classifiers trained on one domain or demographic can transfer to others, with a view to building more generalisable abuse classifiers. We fine-tune language models to classify tweets targeted at public figures across DOmains (sport and politics) and DemOgraphics (women and men) using our novel DODO dataset, containing 28,000 labelled entries, split equally across four domain-demographic pairs. We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.
    摘要 公众人物在社交媒体上收到过量的辱骂,影响其在公众生活中的活跃参与。自动化系统可以在大规模上识别辱骂,但标注训练数据是Expensive,复杂和 potentially harmful。因此,我们希望系统能够高效、普适,可以处理多个领域和特定方面的辱骂。我们研究跨群体文本分类的dinamics,以了解分类器在其他领域或人口类型上如何转移,以建立更普适的辱骂分类器。我们精细调整语言模型,用我们的novel DODO dataset进行分类,该dataset包含28,000个标注的条目,分别分配到四个领域-人口对的四个组合。我们发现了以下结论:(i)小量多样的数据对泛化和模型适应具有巨大的 beneficial effect;(ii)模型在人口类型之间更容易转移,但是模型在跨领域数据上进行了更好的泛化;(iii)一些组别对泛化具有更大的贡献;(iv)数据集的相似性是转移性的信号。

Changes in Policy Preferences in German Tweets during the COVID Pandemic

  • paper_url: http://arxiv.org/abs/2308.04444
  • repo_url: None
  • paper_authors: Felix Biessmann
  • for: 这个研究用于自动抽取在社交媒体上的政策偏好。
  • methods: 研究使用了一种文本分类模型,并使用了一个新的 tweet 数据集,以EXTRACT political preferences in a German Twitter corpus ranging from 2019 to 2022。
  • results: 研究发现,在 COVID 大流行期间,人们对政策表达增加了。通过使用一个确立的政策偏好分类法,分析了细腻的政治观点,并发现政策表达增加的主要类别是 pro-welfare、pro-education 和 pro-governmental administration efficiency。
    Abstract Online social media have become an important forum for exchanging political opinions. In response to COVID measures citizens expressed their policy preferences directly on these platforms. Quantifying political preferences in online social media remains challenging: The vast amount of content requires scalable automated extraction of political preferences -- however fine grained political preference extraction is difficult with current machine learning (ML) technology, due to the lack of data sets. Here we present a novel data set of tweets with fine grained political preference annotations. A text classification model trained on this data is used to extract policy preferences in a German Twitter corpus ranging from 2019 to 2022. Our results indicate that in response to the COVID pandemic, expression of political opinions increased. Using a well established taxonomy of policy preferences we analyse fine grained political views and highlight changes in distinct political categories. These analyses suggest that the increase in policy preference expression is dominated by the categories pro-welfare, pro-education and pro-governmental administration efficiency. All training data and code used in this study are made publicly available to encourage other researchers to further improve automated policy preference extraction methods. We hope that our findings contribute to a better understanding of political statements in online social media and to a better assessment of how COVID measures impact political preferences.
    摘要 在线社交媒体已成为政治意见交换的重要平台。响应COVID措施,公民直接在这些平台上表达了政策偏好。量化在线社交媒体中政治偏好的问题具有挑战性:巨量的内容需要扩展自动EXTRACT政治偏好,但现有机器学习(ML)技术无法准确地分类细化政治偏好。我们现在提供了一个新的推文数据集,其中每个推文均有细化政治偏好的注释。我们使用这些数据训练文本分类模型,并在2019-2022年德国推文集中提取政策偏好。我们的结果表明,COVID大流行期间,表达政治意见的人数增加。使用已有的政策偏好分类法,我们分析了细化的政治观点,并发现COVID措施的影响。我们的发现可能会促进自动政策偏好抽取方法的进一步改进,并为政策分析和评估提供更好的基础。我们的研究结果也可能会帮助我们更好地理解在线社交媒体中的政治声明,并为COVID措施的政治影响提供更好的评估。

cs.LG - 2023-08-01

Hessian-Aware Bayesian Optimization for Decision Making Systems

  • paper_url: http://arxiv.org/abs/2308.00629
  • repo_url: None
  • paper_authors: Mohit Rajpal, Lac Gia Tran, Yehong Zhang, Bryan Kian Hsiang Low
  • for: optimize decision making systems in complex actor-based systems with sparse or uninformative feedback
  • methods: Hessian-aware Bayesian Optimization and compact multi-layered architecture modeling actor interactions
  • results: effective optimization under resource constraints and malformed feedback settings, as demonstrated on several benchmarks
    Abstract Many approaches for optimizing decision making systems rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making systems. This problem is exacerbated if the system requires interactions between several actors cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of actor interactions through the concept of role. Additionally, we introduce Hessian-aware Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters. Experimental results demonstrate that our method (HA-GP-UCB) works effectively on several benchmarks under resource constraints and malformed feedback settings.
    摘要 很多决策系统优化方法 rely于梯度基于方法,需要环境提供有用的反馈。然而,在环境反馈稀缺或不具有信息时,这些方法可能会表现不佳。 derivative-free方法如极限优化 mitigate了对梯度反馈质量的依赖,但是在复杂决策系统中可能会Scales poorly。这个问题被加剧,当决策系统需要多个演员合作完成共同目标时。为了解决维度挑战,我们提议使用compact多层架构,模型演员之间的相互作用via角色概念。此外,我们引入了Hessian-aware极限优化,以高效地优化多层架构中的大量参数。实验结果表明,我们的方法(HA-GP-UCB)在资源限制和缺失反馈设置下工作有效。

Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes

  • paper_url: http://arxiv.org/abs/2308.00628
  • repo_url: https://github.com/soullessrobot/human-m3-dataset
  • paper_authors: Bohao Fan, Siqi Wang, Wenxuan Guo, Wenzhao Zheng, Jianjiang Feng, Jie Zhou
  • for: 本研究的目的是提供一个多ModalMultiViewMultiPerson人体姿态数据集,以便进行三Dimensional人体姿态估计的研究。
  • methods: 该数据集使用了多种数据模式,包括RGB图像和点云数据,并且包含多个人体的动作。而且,该数据集还包括对人体 pose的标注,以便进行评估和研究。
  • results: 该研究提出了一种基于多Modal数据输入的人体姿态估计算法,并通过评估多种模式的算法来证明该数据集的可靠性和挑战性。
    Abstract 3D human pose estimation in outdoor environments has garnered increasing attention recently. However, prevalent 3D human pose datasets pertaining to outdoor scenes lack diversity, as they predominantly utilize only one type of modality (RGB image or pointcloud), and often feature only one individual within each scene. This limited scope of dataset infrastructure considerably hinders the variability of available data. In this article, we propose Human-M3, an outdoor multi-modal multi-view multi-person human pose database which includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds. In order to obtain accurate human poses, we propose an algorithm based on multi-modal data input to generate ground truth annotation. This benefits from robust pointcloud detection and tracking, which solves the problem of inaccurate human localization and matching ambiguity that may exist in previous multi-view RGB videos in outdoor multi-person scenes, and generates reliable ground truth annotations. Evaluation of multiple different modalities algorithms has shown that this database is challenging and suitable for future research. Furthermore, we propose a 3D human pose estimation algorithm based on multi-modal data input, which demonstrates the advantages of multi-modal data input for 3D human pose estimation. Code and data will be released on https://github.com/soullessrobot/Human-M3-Dataset.
    摘要 Recently, 3D人体姿态估计在户外环境中获得了越来越多的关注。然而,现有的大多数3D人体姿态数据集,都是在户外场景中使用单一的感知模式(RGB图像或点云),并且通常只有一个人在每个场景中。这种限定的数据基础设施会很大程度上阻碍数据的变化。在这篇文章中,我们提出了人类-M3数据集,这是一个户外多模式多视图多人3D人体姿态数据集,包括不只是多视图RGB视频,还有对应的点云数据。为了获取准确的人体姿态,我们提议一种基于多modal数据输入的算法来生成基准注释。这种方法利用了Robust点云探测和跟踪,解决了在前一代多视图RGB视频中的人员Localization和匹配抖动问题,并生成了可靠的基准注释。多种不同的模式算法的评估表明,这个数据集是一个挑战性的和适用的研究工具。此外,我们还提出了基于多modal数据输入的3D人体姿态估计算法,这显示了多modal数据输入的优势。代码和数据将在GitHub上发布,请参考

Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers

  • paper_url: http://arxiv.org/abs/2308.00607
  • repo_url: https://github.com/s1m0n38/semantic-encodings
  • paper_authors: Alan Perotti, Simone Bertolotto, Eliana Pastor, André Panisson
  • for: 这篇论文旨在提高图像分类模型的解释性和可靠性,通过integrating semantic information into the training process。
  • methods: 该论文提出了一种通用的方法,可以从任何类型的semantic information中 derivate an additional loss term,以提高模型的解释性和可靠性。
  • results: 研究人员通过应用该方法,在图像分类器中提高了模型的解释性和可靠性,并且可以更好地理解模型的内部表示。Code repository可以在https://github.com/S1M0N38/semantic-encodings中找到。
    Abstract Images are loaded with semantic information that pertains to real-world ontologies: dog breeds share mammalian similarities, food pictures are often depicted in domestic environments, and so on. However, when training machine learning models for image classification, the relative similarities amongst object classes are commonly paired with one-hot-encoded labels. According to this logic, if an image is labelled as 'spoon', then 'tea-spoon' and 'shark' are equally wrong in terms of training loss. To overcome this limitation, we explore the integration of additional goals that reflect ontological and semantic knowledge, improving model interpretability and trustworthiness. We suggest a generic approach that allows to derive an additional loss term starting from any kind of semantic information about the classification label. First, we show how to apply our approach to ontologies and word embeddings, and discuss how the resulting information can drive a supervised learning process. Second, we use our semantically enriched loss to train image classifiers, and analyse the trade-offs between accuracy, mistake severity, and learned internal representations. Finally, we discuss how this approach can be further exploited in terms of explainability and adversarial robustness. Code repository: https://github.com/S1M0N38/semantic-encodings
    摘要 图像具有 semantic 信息,与实际世界 ontology 相关:狗种类共享哺乳动物相似之处,食物图片常出现在家庭环境中,等等。然而,在训练机器学习模型时,对象类之间的相似性通常通过一颗一颗的一频编码标签进行表示。根据这种逻辑,如果一个图像被标记为 '餐勺',那么 '茶勺' 和 '鲨鱼' 在训练损失方面都是等错的。为了超越这些限制,我们探讨了 Semantic 和 ontology 知识的整合,以提高模型可解性和可信度。我们建议一种通用的方法,可以从任何类型的 semantic 信息开始, derivate 一个额外的损失项。我们首先介绍了如何应用我们的方法到 ontology 和 word embedding,然后讨论了如何通过这些信息驱动一个supervised learning 过程。其次,我们使用我们的含义整合的损失函数来训练图像分类器,并分析了精度、错误严重性和学习内部表示之间的交易。最后,我们讨论了如何进一步利用这种方法,以提高解释性和对抗训练的鲜明性。Code repository:https://github.com/S1M0N38/semantic-encodings。

Latent-Shift: Gradient of Entropy Helps Neural Codecs

  • paper_url: http://arxiv.org/abs/2308.00725
  • repo_url: None
  • paper_authors: Muhammet Balcilar, Bharath Bhushan Damodaran, Karam Naser, Franck Galpin, Pierre Hellier
  • for: 这个论文的目的是提出一种基于梯度 entropy 的图像/视频编码器,以优化现有的传统压缩技术。
  • methods: 该论文使用了基于神经网络的trainable编码器,并利用了梯度 entropy 来优化压缩过程。
  • results: 实验表明,利用梯度 entropy 可以获得 $1-2%$ 的压缩率下降,同时保持相同的质量。这种方法是独立的并且可以与其他改进方法结合使用。
    Abstract End-to-end image/video codecs are getting competitive compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques such as easy adaptation on perceptual distortion metrics and high performance on specific domains thanks to their learning ability. However, state of the art neural codecs does not take advantage of the existence of gradient of entropy in decoding device. In this paper, we theoretically show that gradient of entropy (available at decoder side) is correlated with the gradient of the reconstruction error (which is not available at decoder side). We then demonstrate experimentally that this gradient can be used on various compression methods, leading to a $1-2\%$ rate savings for the same quality. Our method is orthogonal to other improvements and brings independent rate savings.
    摘要 通过端到端图像/视频编解码器来比较传统的压缩技术,这些可编程的编解码器具有许多优势,如容易适应人工引导的质量指标和高性能在特定领域,归功于其学习能力。然而,现代神经网络编解码器并没有利用解码器 сторо的梯度Entropy的存在。在这篇论文中,我们理论上表明,解码器 сторо的梯度Entropy和不可知的重建错误梯度之间存在相关性。然后,我们通过实验证明,这个梯度可以在不同的压缩方法上使用,导致1-2%的比较率节省,同时不会与其他改进方法冲突。

Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting

  • paper_url: http://arxiv.org/abs/2308.01421
  • repo_url: None
  • paper_authors: Elena Agliari, Miriam Aquaro, Francesco Alemanno, Alberto Fachechi
  • for: 这个论文主要研究了吸引器神经网络的机器学习方面,寻找最佳网络参数通过梯度下降法对带权函数loss函数进行优化。
  • methods: 这个论文使用了梯度下降法对带权函数loss函数进行优化,并研究了在不同的正则化参数和训练时间下的网络性能。
  • results: 研究发现,在不同的正则化参数和训练时间下,吸引器神经网络的性能会在不同的梯度下降步长和训练时间下进行变化,并且可以通过调整正则化参数和早停策略来避免过拟合。
    Abstract In this work we approach attractor neural networks from a machine learning perspective: we look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices turn out to be a class of matrices which correspond to Hebbian kernels revised by iteratively applying some unlearning protocols. Remarkably, the number of unlearning steps is proved to be related to the regularization hyperparameters of the loss function and to the training time. Thus, we can design strategies to avoid overfitting that are formulated in terms of the algebraic properties of the interaction matrix, or, equivalently, in terms of regularization tuning and early-stopping strategies. The generalization capabilities of these attractor networks are also investigated: analytical results are obtained for random synthetic datasets, next, the emerging picture is corroborated by numerical experiments that highlight the existence of several regimes (i.e., overfitting, failure and success) as the dataset parameters are varied.
    摘要 在这个工作中,我们从机器学习角度来看征引导神经网络:我们通过应用梯度下降来找到优化的网络参数,并且在这个框架下,优化的神经元互动矩阵turns out to be一类具有HEBBbian kernel的修订版本,经过一系列的快速学习过程。很显然,这些快速学习步骤的数量与优化器的补偿参数以及训练时间有关。因此,我们可以通过对互动矩阵的代数性质进行设计,或者通过补偿参数的调整和早期停止策略来避免过拟合。此外,我们还 investigate了这些吸引器网络的泛化能力,通过对随机生成的 sintetic dataset进行分析,并且通过 numerical experiments corroborate the emerging picture, highlighting the existence of several regimes(i.e., overfitting, failure, and success)as the dataset parameters are varied.

Semisupervised Anomaly Detection using Support Vector Regression with Quantum Kernel

  • paper_url: http://arxiv.org/abs/2308.00583
  • repo_url: None
  • paper_authors: Kilian Tscharke, Sebastian Issel, Pascal Debus
  • for: 这个论文是关于异常检测(AD)的研究,旨在开发一种基于量子机器学习(QML)的异常检测方法。
  • methods: 这个论文使用了量子支持向量机(SVM)和支持向量回归(SVR)等量子机器学习算法,以及量子聚类预测(QKD)等方法来实现异常检测。
  • results: 这个论文的实验结果表明,基于SVR的量子聚类预测模型(QSVR)在11个实验数据集中的含秘率和准确率都高于其他模型,并且在9个数据集中都高于量子自编码器(QAE)。
    Abstract Anomaly detection (AD) involves identifying observations or events that deviate in some way from the rest of the data. Machine learning techniques have shown success in automating this process by detecting hidden patterns and deviations in large-scale data. The potential of quantum computing for machine learning has been widely recognized, leading to extensive research efforts to develop suitable quantum machine learning (QML) algorithms. In particular, the search for QML algorithms for near-term NISQ devices is in full swing. However, NISQ devices pose additional challenges due to their limited qubit coherence times, low number of qubits, and high error rates. Kernel methods based on quantum kernel estimation have emerged as a promising approach to QML on NISQ devices, offering theoretical guarantees, versatility, and compatibility with NISQ constraints. Especially support vector machines (SVM) utilizing quantum kernel estimation have shown success in various supervised learning tasks. However, in the context of AD, semisupervised learning is of great relevance, and yet there is limited research published in this area. This paper introduces an approach to semisupervised AD based on the reconstruction loss of a support vector regression (SVR) with quantum kernel. This novel model is an alternative to the variational quantum and quantum kernel one-class classifiers, and is compared to a quantum autoencoder as quantum baseline and a SVR with radial-basis-function (RBF) kernel as well as a classical autoencoder as classical baselines. The models are benchmarked extensively on 10 real-world AD data sets and one toy data set, and it is shown that our SVR model with quantum kernel performs better than the SVR with RBF kernel as well as all other models, achieving highest mean AUC over all data sets. In addition, our QSVR outperforms the quantum autoencoder on 9 out of 11 data sets.
    摘要 异常检测(AD)涉及到从数据中找到不同的观察或事件。机器学习技术已经在自动化这个过程中得到了成功,通过检测大规模数据中的隐藏模式和偏差来检测异常。量子计算在机器学习方面的潜在优势已经得到了广泛的研究,并且在开发适合近期不稳定量子计算(NISQ)设备的量子机器学习(QML)算法方面进行了广泛的研究。然而,NISQ设备受限于寄存器准确时间、寄存器数量和错误率等因素,因此开发QML算法对NISQ设备的挑战。基于量子kernel估计的kernel方法在NISQ设备上的QML方面得到了广泛的关注,这种方法提供了理论保证、灵活性和NISQ约束的兼容性。特别是在检测方面,使用量子kernel估计的支持向量机(SVM)已经在多种监督学习任务中显示出成功。然而,在异常检测方面,半监督学习是非常有价值的,但是有限的研究出版在这个领域。本文提出了一种基于支持向量回归(SVR)的半监督异常检测方法,该方法使用量子kernel来估计异常点的权重。这种新的模型是对量子自变量和量子kernel一类的异常检测模型的一种替代方案,并与量子自变量和量子kernel一类的一类异常检测模型进行比较。我们对10个真实的异常检测数据集和1个玩偶数据集进行了广泛的 benchmarking,并证明了我们的SVR模型使用量子kernel在所有数据集上表现最佳,其中包括最高的平均AUC。此外,我们的QSVR模型在9个数据集中表现更好于量子自变量和RBF kernel的模型,并且在所有数据集上都表现得更好。

Graph Neural Networks for Forecasting Multivariate Realized Volatility with Spillover Effects

  • paper_url: http://arxiv.org/abs/2308.01419
  • repo_url: None
  • paper_authors: Chao Zhang, Xingyue Pu, Mihai Cucuringu, Xiaowen Dong
  • for: 该研究旨在提出一种基于自定义图 neural network 的多变量实现波动模型和预测方法,以捕捉跨股票间冲击效应。
  • methods: 该模型使用自定义图 neural network 模型,能够模型跨股票间冲击效应,捕捉非线性关系,并提供 flexible 的训练方法。
  • results: 研究发现,不考虑多个层次冲击效应并不提供明显的预测精度提高,但是模型非线性冲击效应可以提高实际波动预测精度,特别是在短期前一周内。此外,训练使用 quasi-likelihood 损失函数可以提高模型性能。
    Abstract We present a novel methodology for modeling and forecasting multivariate realized volatilities using customized graph neural networks to incorporate spillover effects across stocks. The proposed model offers the benefits of incorporating spillover effects from multi-hop neighbors, capturing nonlinear relationships, and flexible training with different loss functions. Our empirical findings provide compelling evidence that incorporating spillover effects from multi-hop neighbors alone does not yield a clear advantage in terms of predictive accuracy. However, modeling nonlinear spillover effects enhances the forecasting accuracy of realized volatilities, particularly for short-term horizons of up to one week. Moreover, our results consistently indicate that training with the Quasi-likelihood loss leads to substantial improvements in model performance compared to the commonly-used mean squared error. A comprehensive series of empirical evaluations in alternative settings confirm the robustness of our results.
    摘要 我们提出了一种新的方法来模型和预测多变量实现波动性,使用自定义图 neural network 来包含跨股波动效应。我们的模型具有以下优点:能够包含多 hop 邻居的波动效应,捕捉非线性关系,并且可以自由地训练不同的损失函数。我们的实验结果表明,单独通过多 hop 邻居的波动效应来模型并不能够提供明显的预测精度优势。然而,模型非线性波动效应可以提高实际波动性预测精度,特别是在短期 horizon 内,最多一周。此外,我们的结果一致地表明,使用 quasi-likelihood 损失函数进行训练可以与常用的mean squared error 损失函数相比,提高模型性能。我们在不同的设置中进行了完整的实验评估,并证明了我们的结果的稳定性。

Adaptive Collaborative Filtering with Personalized Time Decay Functions for Financial Product Recommendation

  • paper_url: http://arxiv.org/abs/2308.01208
  • repo_url: None
  • paper_authors: Ashraf Ghiye, Baptiste Barreau, Laurent Carlier, Michalis Vazirgiannis
  • for: 这篇研究是为了提出一个能够处理时间不稳的客户产品推荐系统,以提高推荐的准确性和有效性。
  • methods: 这篇研究使用了时间相依的客户产品协同推荐算法,可以灵活地调整距离的客户产品互动信号,以适应时间的不稳性。
  • results: 研究使用了一个实际数据集,与相关文献进行比较,结果显示这种时间相依的推荐方法可以提高推荐的准确性和有效性。
    Abstract Classical recommender systems often assume that historical data are stationary and fail to account for the dynamic nature of user preferences, limiting their ability to provide reliable recommendations in time-sensitive settings. This assumption is particularly problematic in finance, where financial products exhibit continuous changes in valuations, leading to frequent shifts in client interests. These evolving interests, summarized in the past client-product interactions, see their utility fade over time with a degree that might differ from one client to another. To address this challenge, we propose a time-dependent collaborative filtering algorithm that can adaptively discount distant client-product interactions using personalized decay functions. Our approach is designed to handle the non-stationarity of financial data and produce reliable recommendations by modeling the dynamic collaborative signals between clients and products. We evaluate our method using a proprietary dataset from BNP Paribas and demonstrate significant improvements over state-of-the-art benchmarks from relevant literature. Our findings emphasize the importance of incorporating time explicitly in the model to enhance the accuracy of financial product recommendation.
    摘要

Robust Linear Regression: Phase-Transitions and Precise Tradeoffs for General Norms

  • paper_url: http://arxiv.org/abs/2308.00556
  • repo_url: None
  • paper_authors: Elvis Dohmatob, Meyer Scetbon
  • for: investigate the impact of test-time adversarial attacks on linear regression models and determine the optimal level of robustness that any model can reach while maintaining a given level of standard predictive performance (accuracy)
  • methods: use quantitative estimates to uncover fundamental tradeoffs between adversarial robustness and accuracy in different regimes, and obtain a precise characterization which distinguishes between regimes where robustness is achievable without hurting standard accuracy and regimes where a tradeoff might be unavoidable
  • results: empirically confirm the findings with simple experiments that represent a variety of settings, and extend beyond previous works in this area by considering feature covariance matrices and attack norms of any nature.
    Abstract In this paper, we investigate the impact of test-time adversarial attacks on linear regression models and determine the optimal level of robustness that any model can reach while maintaining a given level of standard predictive performance (accuracy). Through quantitative estimates, we uncover fundamental tradeoffs between adversarial robustness and accuracy in different regimes. We obtain a precise characterization which distinguishes between regimes where robustness is achievable without hurting standard accuracy and regimes where a tradeoff might be unavoidable. Our findings are empirically confirmed with simple experiments that represent a variety of settings. This work applies to feature covariance matrices and attack norms of any nature, and extends beyond previous works in this area.
    摘要 在这篇论文中,我们研究了在线性回归模型下的测试时对抗攻击的影响,并确定了保持给定水平的标准预测性能(准确率)的最佳鲁棒性水平。通过量化估计,我们揭示了不同情况下的基本负面关系 между鲁棒性和准确率。我们获得了精确的分类,可以分 distinguish between不同情况下的鲁棒性可以不受标准准确率的影响和可能不可避免的负面关系。我们的发现得到了Empirical验证,通过一些简单的实验来 represnt多种设定。这种工作适用于特征covariance矩阵和攻击norm的任何种类,并超越了过去在这一领域的工作。

Copula for Instance-wise Feature Selection and Ranking

  • paper_url: http://arxiv.org/abs/2308.00549
  • repo_url: None
  • paper_authors: Hanyu Peng, Guanhua Fang, Ping Li
  • for: 提高 neural network 中 feature 选择和排序的精度,具体来说是针对每个样本选择最佳的特征集。
  • methods: 利用 Gaussian copula 技术来捕捉特征之间的相关性,并将其integrated into 当前的特征选择框架中。
  • results: 通过实验证明,我们的方法可以更好地捕捉特征之间的相关性,并提高特征选择和排序的精度。
    Abstract Instance-wise feature selection and ranking methods can achieve a good selection of task-friendly features for each sample in the context of neural networks. However, existing approaches that assume feature subsets to be independent are imperfect when considering the dependency between features. To address this limitation, we propose to incorporate the Gaussian copula, a powerful mathematical technique for capturing correlations between variables, into the current feature selection framework with no additional changes needed. Experimental results on both synthetic and real datasets, in terms of performance comparison and interpretability, demonstrate that our method is capable of capturing meaningful correlations.
    摘要 实例化特征选择和排名方法可以在神经网络中实现每个样本的好选择特征。然而,现有的方法假设特征子集是独立的,这会导致忽略特征之间的相互关系。为了解决这种限制,我们提议在当前特征选择框架中 incorporate Gaussian copula,这是一种强大的数学技术,用于捕捉变量之间的相互关系。实验结果表明,我们的方法可以 Capture 有意义的相互关系。(简化中文)实例化特征选择和排名方法可以在神经网络中实现每个样本的好选择特征。然而,现有的方法假设特征子集是独立的,这会导致忽略特征之间的相互关系。为了解决这种限制,我们提议在当前特征选择框架中 incorporate Gaussian copula,这是一种强大的数学技术,用于捕捉变量之间的相互关系。实验结果表明,我们的方法可以 Capture 有意义的相互关系。

Predicting Early Dropouts of an Active and Healthy Ageing App

  • paper_url: http://arxiv.org/abs/2308.00539
  • repo_url: None
  • paper_authors: Vasileios Perifanis, Ioanna Michailidi, Giorgos Stamatelatos, George Drosatos, Pavlos S. Efraimidis
  • for: 预测年轻和健康老年人APP上的早期退出者
  • methods: 使用机器学习方法预测用户使用Dynamic和Static特征进行抵抗性预测
  • results: 机器学习算法可以提供高质量抵抗性预测结果,动态特征对模型分类性能有积极影响,使用SMOTE和ADASYN填充方法提高了分类性能10%。
    Abstract In this work, we present a machine learning approach for predicting early dropouts of an active and healthy ageing app. The presented algorithms have been submitted to the IFMBE Scientific Challenge 2022, part of IUPESM WC 2022. We have processed the given database and generated seven datasets. We used pre-processing techniques to construct classification models that predict the adherence of users using dynamic and static features. We submitted 11 official runs and our results show that machine learning algorithms can provide high-quality adherence predictions. Based on the results, the dynamic features positively influence a model's classification performance. Due to the imbalanced nature of the dataset, we employed oversampling methods such as SMOTE and ADASYN to improve the classification performance. The oversampling approaches led to a remarkable improvement of 10\%. Our methods won first place in the IFMBE Scientific Challenge 2022.
    摘要 在这项工作中,我们提出了一种机器学习方法,用于预测活老年人健康应用程序中的早期退出。我们的算法已经参加了IFMBE科学挑战2022,这是IUPESM WC 2022年度科学会议的一部分。我们处理了给定的数据库,生成了七个数据集。我们使用了预处理技术,构建了基于动态和静态特征的分类模型,以预测用户的遵从度。我们提交了11个官方运行,结果表明,机器学习算法可以提供高质量的遵从预测。基于结果,动态特征对模型的分类性能产生了积极的影响。由于数据集具有偏置的特点,我们使用了权重采样方法,如SMOTE和ADASYN,以改善分类性能。这些权重采样方法导致了10%的显著提升。我们的方法在IFMBE科学挑战2022中获得了第一名。

Graph Embedding Dynamic Feature-based Supervised Contrastive Learning of Transient Stability for Changing Power Grid Topologies

  • paper_url: http://arxiv.org/abs/2308.00537
  • repo_url: None
  • paper_authors: Zijian Lv, Xin Chen, Zijian Feng
    for: 本研究旨在提高电力系统稳定性预测精度,使其能够快速适应电力网 topology 变化。methods: 本研究提出了基于图 embedding 动态特征 (GEDF) 的超vised contrastive learning (SCL) 模型,使用 supervised contrastive learning 预测过渡稳定性,考虑到电力网 topology 信息。results: 测试结果表明,GEDF-SCL 模型可以高精度地预测过渡稳定性,并适应不同电力网 topology 变化。
    Abstract Accurate online transient stability prediction is critical for ensuring power system stability when facing disturbances. While traditional transient stablity analysis replies on the time domain simulations can not be quickly adapted to the power grid toplogy change. In order to vectorize high-dimensional power grid topological structure information into low-dimensional node-based graph embedding streaming data, graph embedding dynamic feature (GEDF) has been proposed. The transient stability GEDF-based supervised contrastive learning (GEDF-SCL) model uses supervised contrastive learning to predict transient stability with GEDFs, considering power grid topology information. To evaluate the performance of the proposed GEDF-SCL model, power grids of varying topologies were generated based on the IEEE 39-bus system model. Transient operational data was obtained by simulating N-1 and N-$\bm{m}$-1 contingencies on these generated power system topologies. Test result demonstrated that the GEDF-SCL model can achieve high accuracy in transient stability prediction and adapt well to changing power grid topologies.
    摘要 <>精准在线稳定预测是电力系统稳定性的关键因素,尤其是在面临干扰时。传统的稳定预测分析依赖于时间域 simulink cannot be quickly adapted to the power grid topology change. 为了vectorize高维电力网 topological structure信息到低维节点基于图像流动数据,图像嵌入动态特征(GEDF)已经提出。基于GEDF的稳定预测模型使用supervised contrastive learning(GEDF-SCL)来预测稳定性,考虑电力网 topology信息。为评估提出的GEDF-SCL模型表现,根据IEEE 39-bus系统模型生成了不同电力网 topology。通过对这些生成的电力系统 topology进行随机N-1和N-m-1的 simulate,获得了过渡操作数据。测试结果表明,GEDF-SCL模型可以高度准确地预测稳定性和适应变化的电力网 topology。<>

Graph Contrastive Learning with Generative Adversarial Network

  • paper_url: http://arxiv.org/abs/2308.00535
  • repo_url: None
  • paper_authors: Cheng Wu, Chaokun Wang, Jingcao Xu, Ziyang Liu, Kai Zheng, Xiaowei Wang, Yang Song, Kun Gai
  • for: 用于强化图像的下游任务,如supervised end-to-end培训。
  • methods: 利用Graph Contrastive Learning(GCL)和图像生成对抗网络(GAN)来培训图像。
  • results: 在七个实际数据集上实现高质量的数据增强和 twelve 个基线方法的超越。另外,发现生成的视图最终遵循网络上知名的偏好附加规则。
    Abstract Graph Neural Networks (GNNs) have demonstrated promising results on exploiting node representations for many downstream tasks through supervised end-to-end training. To deal with the widespread label scarcity issue in real-world applications, Graph Contrastive Learning (GCL) is leveraged to train GNNs with limited or even no labels by maximizing the mutual information between nodes in its augmented views generated from the original graph. However, the distribution of graphs remains unconsidered in view generation, resulting in the ignorance of unseen edges in most existing literature, which is empirically shown to be able to improve GCL's performance in our experiments. To this end, we propose to incorporate graph generative adversarial networks (GANs) to learn the distribution of views for GCL, in order to i) automatically capture the characteristic of graphs for augmentations, and ii) jointly train the graph GAN model and the GCL model. Specifically, we present GACN, a novel Generative Adversarial Contrastive learning Network for graph representation learning. GACN develops a view generator and a view discriminator to generate augmented views automatically in an adversarial style. Then, GACN leverages these views to train a GNN encoder with two carefully designed self-supervised learning losses, including the graph contrastive loss and the Bayesian personalized ranking Loss. Furthermore, we design an optimization framework to train all GACN modules jointly. Extensive experiments on seven real-world datasets show that GACN is able to generate high-quality augmented views for GCL and is superior to twelve state-of-the-art baseline methods. Noticeably, our proposed GACN surprisingly discovers that the generated views in data augmentation finally conform to the well-known preferential attachment rule in online networks.
    摘要 图 нейрон网络 (GNN) 已经在利用节点表示来完成多个下游任务的超vised end-to-end 训练中展现出了扎根的结果。为了解决实际应用中的标签稀缺问题,我们利用图对照学习 (GCL) 来训练 GNN WITH 有限或者even no 标签,通过最大化节点之间的相互信息来进行augmentation。然而,现有文献中 Ignore 了图的分布,导致在most cases 中 ignored unseen edges,这是我们在实际实验中观察到的。为此,我们提议在view生成时使用图生成随机网络 (GAN) 来学习图的分布,以便自动捕捉图的特性,并jointly 训练图GAN模型和GCL模型。我们称之为GACN,它包括一个视图生成器和一个视图分类器,通过对抗式的方式来生成自动化的视图。然后,GACN 利用这些视图来训练一个 GNN 编码器,并使用两种特制的自我超vised learning 损失函数,包括图对照学习损失和 Bayesian 个性化排序损失。此外,我们设计了一个优化框架,以便同时训练所有GACN模块。我们在七个实际数据集上进行了广泛的实验,发现GACN能够生成高质量的augmented views,并且在十二个基eline方法中表现出色。另外,我们发现GACN surprisingly 发现,生成的视图最终遵循了在线网络中的 preference attachment 规则。

A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction

  • paper_url: http://arxiv.org/abs/2308.00533
  • repo_url: None
  • paper_authors: Renteng Yuan, Mohamed Abdel-Aty, Qiaojun Xiang, Zijin Wang, Ou Zheng
  • for: 预测自动驾驶车辆轨迹和驾驶意图
  • methods: 提议使用Temporal Multi-Gate Mixture-of-Experts(TMMOE)模型同时预测车辆轨迹和驾驶意图,包括三层:共享层、专家层和全连接层。在模型中,共享层使用Temporal Convolutional Networks(TCN)提取时间特征。然后专家层用于识别不同任务中的信息。最后,全连接层用于集成和输出预测结果。
  • results: 使用 uncertainty algorithm 构建多任务损失函数,在CitySim dataset上进行验证,TMMOE模型在预测车辆轨迹和驾驶意图方面具有最高的分类和回归结果,超过LSTM模型的性能。
    Abstract Accurate Vehicle Trajectory Prediction is critical for automated vehicles and advanced driver assistance systems. Vehicle trajectory prediction consists of two essential tasks, i.e., longitudinal position prediction and lateral position prediction. There is a significant correlation between driving intentions and vehicle motion. In existing work, the three tasks are often conducted separately without considering the relationships between the longitudinal position, lateral position, and driving intention. In this paper, we propose a novel Temporal Multi-Gate Mixture-of-Experts (TMMOE) model for simultaneously predicting the vehicle trajectory and driving intention. The proposed model consists of three layers: a shared layer, an expert layer, and a fully connected layer. In the model, the shared layer utilizes Temporal Convolutional Networks (TCN) to extract temporal features. Then the expert layer is built to identify different information according to the three tasks. Moreover, the fully connected layer is used to integrate and export prediction results. To achieve better performance, uncertainty algorithm is used to construct the multi-task loss function. Finally, the publicly available CitySim dataset validates the TMMOE model, demonstrating superior performance compared to the LSTM model, achieving the highest classification and regression results. Keywords: Vehicle trajectory prediction, driving intentions Classification, Multi-task
    摘要 准确的车辆轨迹预测是自动驾驶和高级驾驶助系统的关键。车辆轨迹预测包括两个基本任务: longitudinal position 预测和 lateral position 预测。车辆的运动和驾驶意图之间存在显著的相关性。在现有的工作中,这三个任务通常是分别进行的,没有考虑这三个任务之间的关系。在这篇论文中,我们提出了一种新的 Temporal Multi-Gate Mixture-of-Experts(TMMOE)模型,用于同时预测车辆轨迹和驾驶意图。该模型包括三层:共享层、专家层和全连接层。在模型中,共享层使用 Temporal Convolutional Networks(TCN)提取时间特征。然后专家层用于识别不同任务的信息。此外,全连接层用于集成和出口预测结果。为了提高性能,我们使用不确定算法构建多任务损失函数。最后,公共可用的 CitySim 数据集验证了 TMMOE 模型,并证明其与 LSTM 模型相比,达到了最高的分类和回归结果。关键词:车辆轨迹预测、驾驶意图分类、多任务

Variational Label-Correlation Enhancement for Congestion Prediction

  • paper_url: http://arxiv.org/abs/2308.00529
  • repo_url: None
  • paper_authors: Biao Liu, Congyu Qiao, Ning Xu, Xin Geng, Ziran Zhu, Jun Yang
  • for: This paper aims to improve the accuracy of congestion prediction in large-scale integrated circuit (IC) design by leveraging spatial label-correlation between neighboring grids.
  • methods: The proposed approach, called VAriational Label-Correlation Enhancement for Congestion Prediction ({\ours}), uses variational inference techniques to estimate a local label-correlation weight for each grid, which is influenced by the surrounding grids.
  • results: Experimental results on publicly available benchmarks demonstrate the superior effectiveness of {\ours} compared to existing methods.Here’s the simplified Chinese text:
  • for: 这篇论文目的是提高大规模集成电路设计中的压力预测精度,通过利用周围网格的标签相关性。
  • methods: 提议的方法是使用变量推理技术来估算每个网格的本地标签相关性质量,该质量受周围网格的影响。
  • results: 实验结果表明, compared to现有方法,{\ours}在公共可用的 \texttt{ISPD2011} 和 \texttt{DAC2012} benchmark上的效果更好。
    Abstract The physical design process of large-scale designs is a time-consuming task, often requiring hours to days to complete, with routing being the most critical and complex step. As the the complexity of Integrated Circuits (ICs) increases, there is an increased demand for accurate routing quality prediction. Accurate congestion prediction aids in identifying design flaws early on, thereby accelerating circuit design and conserving resources. Despite the advancements in current congestion prediction methodologies, an essential aspect that has been largely overlooked is the spatial label-correlation between different grids in congestion prediction. The spatial label-correlation is a fundamental characteristic of circuit design, where the congestion status of a grid is not isolated but inherently influenced by the conditions of its neighboring grids. In order to fully exploit the inherent spatial label-correlation between neighboring grids, we propose a novel approach, {\ours}, i.e., VAriational Label-Correlation Enhancement for Congestion Prediction, which considers the local label-correlation in the congestion map, associating the estimated congestion value of each grid with a local label-correlation weight influenced by its surrounding grids. {\ours} leverages variational inference techniques to estimate this weight, thereby enhancing the regression model's performance by incorporating spatial dependencies. Experiment results validate the superior effectiveness of {\ours} on the public available \texttt{ISPD2011} and \texttt{DAC2012} benchmarks using the superblue circuit line.
    摘要 大规模设计的物理设计过程是一项时间consuming任务,经常需要几天内完成,routing是最关键和复杂的步骤。随着集成电路(IC)的复杂度的提高,需要更加准确的堵塞质量预测。准确的堵塞预测可以早期发现设计缺陷,从而加速电路设计并保留资源。 DESPITE current congestion prediction methodologies advancements, an essential aspect that has been largely overlooked is the spatial label-correlation between different grids in congestion prediction. To fully exploit the inherent spatial label-correlation between neighboring grids, we propose a novel approach, \ours, i.e., Variational Label-Correlation Enhancement for Congestion Prediction, which considers the local label-correlation in the congestion map, associating the estimated congestion value of each grid with a local label-correlation weight influenced by its surrounding grids. \ours leverages variational inference techniques to estimate this weight, thereby enhancing the regression model's performance by incorporating spatial dependencies. Experiment results validate the superior effectiveness of \ours on the publicly available \texttt{ISPD2011} and \texttt{DAC2012} benchmarks using the superblue circuit line.

Improved Prognostic Prediction of Pancreatic Cancer Using Multi-Phase CT by Integrating Neural Distance and Texture-Aware Transformer

  • paper_url: http://arxiv.org/abs/2308.00507
  • repo_url: None
  • paper_authors: Hexin Dong, Jiawen Yao, Yuxing Tang, Mingze Yuan, Yingda Xia, Jian Zhou, Hong Lu, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Yu Shi, Ling Zhang
  • for: 预测PDAC患者的生存率,提高PDAC手术可靠性。
  • methods: 使用学习型神经距离描述CT图像中肿瘤和附近重要血管之间的精确关系,并将其作为诊断预测的主要特征。 besides, 通过将本地和全局特征 fusion使用CNN和变换器模块,提高了在多相冲照CT图像中提取动态肿瘤相关文本特征。
  • results: 在多中心(n=4)数据集中,与现有方法进行了广泛的评估和比较,并在外部测试集(n=3)中进行了统计分析,证明了该方法在临床实际中的有效性。 Developed的风险标记是全体生存率最强的预测因素之一,并且有可能与现有的临床因素相结合,以选择高风险患者,以便为其进行neoadjuvant therapy。
    Abstract Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer in which the tumor-vascular involvement greatly affects the resectability and, thus, overall survival of patients. However, current prognostic prediction methods fail to explicitly and accurately investigate relationships between the tumor and nearby important vessels. This paper proposes a novel learnable neural distance that describes the precise relationship between the tumor and vessels in CT images of different patients, adopting it as a major feature for prognosis prediction. Besides, different from existing models that used CNNs or LSTMs to exploit tumor enhancement patterns on dynamic contrast-enhanced CT imaging, we improved the extraction of dynamic tumor-related texture features in multi-phase contrast-enhanced CT by fusing local and global features using CNN and transformer modules, further enhancing the features extracted across multi-phase CT images. We extensively evaluated and compared the proposed method with existing methods in the multi-center (n=4) dataset with 1,070 patients with PDAC, and statistical analysis confirmed its clinical effectiveness in the external test set consisting of three centers. The developed risk marker was the strongest predictor of overall survival among preoperative factors and it has the potential to be combined with established clinical factors to select patients at higher risk who might benefit from neoadjuvant therapy.
    摘要 胆囊ductal adenocarcinoma (PDAC) 是一种高度致命的癌症,肿瘤与邻近重要血管的交互影响了手术可能性和病人的全局生存率。然而,现有的预后预测方法无法详细和准确地探讨肿瘤和邻近重要血管之间的关系。这篇文章提出了一种新的学习型对应距离,用于描述不同病人的CT图像中肿瘤和血管之间的精确关系。此外,我们将对CT图像进行多相对对应的组合,以提高肿瘤相关的动态细胞特征的抽取。我们广泛评估和比较了提案方法与现有方法,在多中心(n=4)的数据集中进行了1,070名PDAC患者的评估,并统计分析确认了其临床效iveness。我们发展的预后标志是PDAC患者的预后因素中最强的预测因素,它有可能与已知的临床因素相结合,以选择需要neoadjuvant therapy的病人。

Explainable Graph Spectral Clustering of Text Documents

  • paper_url: http://arxiv.org/abs/2308.00504
  • repo_url: None
  • paper_authors: Bartłomiej Starosta, Mieczysław A. Kłopotek, Sławomir T. Wierzchoń
  • for: 本文旨在提出一种方法来解释spectral clustering结果的解释方法,帮助用户更好地理解 clustering 结果。
  • methods: 本文使用了 combinatorial Laplacian based graph spectral clustering 方法,并提出了一种基于 $K $-embedding 的解释方法,通过建立文本内容和 clustering 结果之间的桥梁。
  • results: 本文经过实验研究,发现 $K $-embedding 可以很好地 Approximate Laplacian embedding,并且在某些条件下,这种近似性足够好。
    Abstract Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of results of combinatorial Laplacian based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper) and term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental study showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions.
    摘要 spectral clustering 方法知名于其能够表示多样化形态、密度等等的群集。然而,当应用于文档时,这些算法的结果难以向用户解释,特别是因为它们在spectral space中嵌入,这个空间与文档内容无法直接关联。因此,有一个急需要开发解释 clustering 结果的方法。本文提出了一种解释 combinatorial Laplacian 基于图 spectral clustering 的方法。这种方法基于显示(approximate)combinatorial Laplacian embedding、K-embedding(本文提出的)和term vector space embedding之间的等价性。因此, constructed 一个桥梁,将文本内容与 clustering 结果相连。我们提供了理论背景,并进行了实验研究,证明 K-embedding 可以在有利的块矩阵条件下高度准确地 aproximate Laplacian embedding。

DINO-CXR: A self supervised method based on vision transformer for chest X-ray classification

  • paper_url: http://arxiv.org/abs/2308.00475
  • repo_url: None
  • paper_authors: Mohammadreza Shakouri, Fatemeh Iranmanesh, Mahdi Eftekhari
    for: 本研究旨在提高骨肉X线成像分类的精度,探讨了一种基于自我指导学习的方法,即DINO-CXR。methods: 本研究使用了一种基于视觉变换器的自我指导学习方法,即DINO,并对其进行了修改,以适应骨肉X线成像分类。results: 对比分析表明,提出的方法在肺炎和COVID-19检测中具有更高的准确率,而且与现有方法相比,需要更少的标注数据来达到相同的准确率和AUC分数。
    Abstract The limited availability of labeled chest X-ray datasets is a significant bottleneck in the development of medical imaging methods. Self-supervised learning (SSL) can mitigate this problem by training models on unlabeled data. Furthermore, self-supervised pretraining has yielded promising results in visual recognition of natural images but has not been given much consideration in medical image analysis. In this work, we propose a self-supervised method, DINO-CXR, which is a novel adaptation of a self-supervised method, DINO, based on a vision transformer for chest X-ray classification. A comparative analysis is performed to show the effectiveness of the proposed method for both pneumonia and COVID-19 detection. Through a quantitative analysis, it is also shown that the proposed method outperforms state-of-the-art methods in terms of accuracy and achieves comparable results in terms of AUC and F-1 score while requiring significantly less labeled data.
    摘要 限量的胸部X射线数据的可用性是医学影像方法的发展中的一个重要瓶颈。自我超vised学习(SSL)可以解决这个问题,通过训练模型在无标签数据上。此外,自我超vised预训练在自然图像的视觉识别中已经产生了有望的结果,但在医学影像分析中尚未得到了充分的关注。本文提出了一种自我超vised方法,称为DINO-CXR,该方法是基于视transformer的胸部X射线分类的一种新的适应。通过对比分析,本文证明了提议的方法在肺炎和COVID-19检测中的效果。通过量化分析,还表明了提议的方法在准确率和AUC和F-1分数方面比现有方法高,而且需要远少于标注数据。

Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations?

  • paper_url: http://arxiv.org/abs/2308.00473
  • repo_url: None
  • paper_authors: Phuong Quynh Le, Jörg Schlötterer, Christin Seifert
  • for: 提高预测模型在具有相关性的样本集中的准确率
  • methods: 使用深度特征重新权重(DFR)方法,只需重新训练类фика器的最后一层
  • results: 对具有相关性的医疗数据进行实验,发现DFR可以提高预测模型的准确率,但是仍然容易受到偶极性关系的影响
    Abstract Models trained with empirical risk minimization (ERM) are known to learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features which are strongly correlated with class labels but lack causal reasoning. This behavior particularly degrades accuracy in groups of samples of the correlated class that are missing the spurious feature or samples of the opposite class but with the spurious feature present. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. Based on the main argument that ERM mods can learn core features sufficiently well, DFR only needs to retrain the last layer of the classification model with a small group-balanced data set. In this work, we examine the applicability of DFR to realistic data in the medical domain. Furthermore, we investigate the reasoning behind the effectiveness of last-layer retraining and show that even though DFR has the potential to improve the accuracy of the worst group, it remains susceptible to spurious correlations.
    摘要 模型通过实际风险最小化(ERM)训练而学习依赖于假冒特征,即其预测基于与类别标签强相关的非正常特征,而不具备 causal 理解。这种行为特别是在类别标签相关的样本群中 missing 假冒特征或 opposing 类别标签并具备假冒特征时,精度会减退。 reciently proposed Deep Feature Reweighting(DFR)方法可以改善这些最差的组的精度。基于主要的 argue that ERM mods can learn core features well enough, DFR only needs to retrain the last layer of the classification model with a small group-balanced dataset. 在这项工作中,我们检查了 DFR 在实际数据上的可行性,并进一步调查了 last-layer 重新训练的效果,并证明了 DFR 具有改善最差组精度的潜在能力,但它仍然受到假冒关系的影响。

Mirror Natural Evolution Strategies

  • paper_url: http://arxiv.org/abs/2308.00469
  • repo_url: None
  • paper_authors: Haishan Ye
  • for: This paper focuses on the theory of zeroth-order optimization, specifically on algorithms that approximate gradient and Hessian information using zeroth-order queries.
  • methods: The proposed algorithm in the paper is called \texttt{MiNES} (mirror descent natural evolution strategy), which is designed to minimize a reparameterized objective function that approximates the original objective function’s gradient and Hessian information.
  • results: The paper shows that the estimated covariance matrix of \texttt{MiNES} converges to the inverse of the Hessian matrix of the objective function with a convergence rate of $\widetilde{\mathcal{O}(1/k)$, where $k$ is the iteration number. Additionally, the paper provides explicit convergence rates for \texttt{MiNES} and explains how the covariance matrix promotes the convergence rate.Here is the information in Simplified Chinese text:
  • for: 本 paper 关注的是零次ORDER优化理论,特别是使用零次ORDER查询来 aproximate 函数值差的算法。
  • methods: 提出的算法是 called \texttt{MiNES} (镜像 DESC natural evolution strategy),用于 minimize 一个参数化的目标函数,该函数 approximates 原始函数的梯度和Hessian信息。
  • results: paper 显示了 \texttt{MiNES} 的 estimated covariance matrix 与原始函数的Hessian矩阵 inverse 的渐近关系,具体来说, convergence rate 是 $\widetilde{\mathcal{O}(1/k)$,其中 $k$ 是迭代次数。
    Abstract The zeroth-order optimization has been widely used in machine learning applications. However, the theoretical study of the zeroth-order optimization focus on the algorithms which approximate (first-order) gradients using (zeroth-order) function value difference at a random direction. The theory of algorithms which approximate the gradient and Hessian information by zeroth-order queries is much less studied. In this paper, we focus on the theory of zeroth-order optimization which utilizes both the first-order and second-order information approximated by the zeroth-order queries. We first propose a novel reparameterized objective function with parameters $(\mu, \Sigma)$. This reparameterized objective function achieves its optimum at the minimizer and the Hessian inverse of the original objective function respectively, but with small perturbations. Accordingly, we propose a new algorithm to minimize our proposed reparameterized objective, which we call \texttt{MiNES} (mirror descent natural evolution strategy). We show that the estimated covariance matrix of \texttt{MiNES} converges to the inverse of Hessian matrix of the objective function with a convergence rate $\widetilde{\mathcal{O}(1/k)$, where $k$ is the iteration number and $\widetilde{\mathcal{O}(\cdot)$ hides the constant and $\log$ terms. We also provide the explicit convergence rate of \texttt{MiNES} and how the covariance matrix promotes the convergence rate.
    摘要 “零次优化已经广泛应用于机器学习领域。然而,关于零次优化的理论研究主要集中在approximate(首次)梯度使用(零次)函数值差异的Random Direction中。在这篇论文中,我们将关注零次优化中使用首次和第二次信息的approximation的理论。我们首先提出一个新的重parameterized objective function with parameters(μ, Σ)。这个重parameterized objective function在原始目标函数的最小值和原始目标函数的Hessian inverse的两个点上均 achieves its optimum,但受到小幅度的干扰。 accordingly, we propose a new algorithm to minimize our proposed reparameterized objective, which we call \texttt{MiNES} (mirror descent natural evolution strategy). we show that the estimated covariance matrix of \texttt{MiNES} converges to the inverse of Hessian matrix of the objective function with a convergence rate $\widetilde{\mathcal{O}(1/k)$, where $k$ is the iteration number and $\widetilde{\mathcal{O}(\cdot)$ hides the constant and $\log$ terms. we also provide the explicit convergence rate of \texttt{MiNES} and how the covariance matrix promotes the convergence rate.”Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need any further assistance.

Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example

  • paper_url: http://arxiv.org/abs/2308.00720
  • repo_url: None
  • paper_authors: Ph. L. Toint
  • for: 这个论文主要是研究 Adam 算法在无杂变量情况下的性能。
  • methods: 这个论文使用了 Adam 算法,并研究了不同参数的影响。
  • results: 研究发现,无论选择哪些参数,Adam 算法在这个简单的一维函数上都会导致散射。
    Abstract A very simple unidimensional function with Lipschitz continuous gradient is constructed such that the ADAM algorithm with constant stepsize, started from the origin, diverges when applied to minimize this function in the absence of noise on the gradient. Divergence occurs irrespective of the choice of the method parameters.
    摘要 一个非常简单的一维函数,其梯度 lipschitz 连续,被构造了,使得在缺失梯度误差情况下,使用 ADAM 算法的常数步长开始从原点进行拟合,将导致异常。这种异常不管选择算法参数的方式,都会发生。

A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models

  • paper_url: http://arxiv.org/abs/2308.00452
  • repo_url: https://github.com/kio-cs/majorcert
  • paper_authors: Qilin Zhou, Zhengyuan Wei, Haipeng Wang, W. K. Chan
  • for: 确保深度学习模型不会因为小 patch 的修改而预测错误的标签。
  • methods: 使用 MajorCert 算法,首先找出同一个样本上同一个 patch 区域可以 manipulate 的所有可能的标签集,然后对这些标签集进行元素级枚举,最后检查这些标签集的多数不变性是否保持不变。
  • results: 通过 MajorCert 算法,可以确保样本不会被小 patch 的修改 manipulate 深度学习模型预测错误的标签。
    Abstract Patch robustness certification ensures no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.
    摘要 patch 强健证明ensure no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.Here's the word-for-word translation:patch 强健证明ensure no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers

  • paper_url: http://arxiv.org/abs/2308.03741
  • repo_url: None
  • paper_authors: Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam, Naveed Akhtar
  • for: 本研究旨在提高多Modal human action recognition(MHAR)的效果,通过将音频和视频模态结合在一起。
  • methods: 本模型使用了一种直观的方法,将音频模态中的重要表示转化到图像频谱中,然后与视频模态进行混合。
  • results: 对于一个action recognition benchmark dataset,模型表现出色,证明了将音频和视频 modalities结合在一起可以提高action recognition的效果。
    Abstract In line with the human capacity to perceive the world by simultaneously processing and integrating high-dimensional inputs from multiple modalities like vision and audio, we propose a novel model, MAiVAR-T (Multimodal Audio-Image to Video Action Recognition Transformer). This model employs an intuitive approach for the combination of audio-image and video modalities, with a primary aim to escalate the effectiveness of multimodal human action recognition (MHAR). At the core of MAiVAR-T lies the significance of distilling substantial representations from the audio modality and transmuting these into the image domain. Subsequently, this audio-image depiction is fused with the video modality to formulate a unified representation. This concerted approach strives to exploit the contextual richness inherent in both audio and video modalities, thereby promoting action recognition. In contrast to existing state-of-the-art strategies that focus solely on audio or video modalities, MAiVAR-T demonstrates superior performance. Our extensive empirical evaluations conducted on a benchmark action recognition dataset corroborate the model's remarkable performance. This underscores the potential enhancements derived from integrating audio and video modalities for action recognition purposes.
    摘要 按照人类对世界的同时处理和 интеGRATE多模态输入的能力,我们提议一种新的模型,MAiVAR-T(多modal Audio-Image to Video Action Recognition Transformer)。这个模型使用直观的方法将音频-图像和视频模态结合,以提高多modal human action recognition(MHAR)的效果。MAiVAR-T的核心思想是将音频模态中的重要表示转化到图像领域,然后将这些图像表示与视频模态进行融合。这种结合方法可以利用音频和视频模态中的上下文丰富性,从而提高动作识别。与现有的状态 искус法策略相比,MAiVAR-T表现出色。我们对一个标准动作识别数据集进行了广泛的实验,结果证明MAiVAR-T的出色表现。这不仅证明了将音频和视频模态结合的潜在提升,还证明了MAiVAR-T的优秀性。

Fair Models in Credit: Intersectional Discrimination and the Amplification of Inequity

  • paper_url: http://arxiv.org/abs/2308.02680
  • repo_url: None
  • paper_authors: Savina Kim, Stefan Lessmann, Galina Andreeva, Michael Rovatsos
  • For: The paper aims to examine intersectional horizontal inequities in credit access and demonstrate how pluralistic realities and intersectional identities can shape patterns of credit allocation when using automated decision-making systems.* Methods: The authors use data from the Spanish microfinance market to analyze credit allocation patterns and demonstrate the impact of algorithmic bias on vulnerable groups. They utilize the intersectionality paradigm to examine how multiple and intersecting social categories can interact to produce inequities in credit access.* Results: The study finds that while fairness may exist superficially, unfairness can exacerbate at lower levels given combinatorial effects. The authors demonstrate that sensitive attributes such as single parent status and number of children can result in imbalanced harm. They discuss the implications of these findings for the financial services industry.Here are the three points in Simplified Chinese text:* For: 这个论文的目的是检查并评估微贷市场中的多元性水平不平等,以及自动决策系统如何影响弱势群体的借款访问。* Methods: 作者们使用西班牙微贷市场的数据来分析借款分配的 patterns,并通过 интерсе克циональ paradigm来检查不同社会类别之间的交叠作用,以产生借款访问不平等。* Results: 研究发现,尽管 superfic 的公平可能存在,但是在更深层次上,不公平可能会加剧,即 comb 的效果。作者们发现,敏感属性如单身状态和子女数量可能会导致不平等的害。他们讨论这些发现对金融服务业的影响。
    Abstract The increasing usage of new data sources and machine learning (ML) technology in credit modeling raises concerns with regards to potentially unfair decision-making that rely on protected characteristics (e.g., race, sex, age) or other socio-economic and demographic data. The authors demonstrate the impact of such algorithmic bias in the microfinance context. Difficulties in assessing credit are disproportionately experienced among vulnerable groups, however, very little is known about inequities in credit allocation between groups defined, not only by single, but by multiple and intersecting social categories. Drawing from the intersectionality paradigm, the study examines intersectional horizontal inequities in credit access by gender, age, marital status, single parent status and number of children. This paper utilizes data from the Spanish microfinance market as its context to demonstrate how pluralistic realities and intersectional identities can shape patterns of credit allocation when using automated decision-making systems. With ML technology being oblivious to societal good or bad, we find that a more thorough examination of intersectionality can enhance the algorithmic fairness lens to more authentically empower action for equitable outcomes and present a fairer path forward. We demonstrate that while on a high-level, fairness may exist superficially, unfairness can exacerbate at lower levels given combinatorial effects; in other words, the core fairness problem may be more complicated than current literature demonstrates. We find that in addition to legally protected characteristics, sensitive attributes such as single parent status and number of children can result in imbalanced harm. We discuss the implications of these findings for the financial services industry.
    摘要 随着新数据源和机器学习技术在借款评估中的使用,有关可能存在不公正决策的问题减受到关注,这些决策可能基于保护特征(如种族、性别、年龄)或其他社会经济和民生数据。作者们在微贷上下文中展示了算法偏见的影响。借款评估困难更加折射在投降群体中,但现实上几乎没有关于借款分配不公的研究。本研究基于交叉性理论,研究借款访问不公平的现象,包括按照年龄、性别、婚姻状况、单身状况和子女数来分类。这篇论文使用西班牙微贷市场为背景,通过自动决策系统来探讨多元社会和交叉性识别如何影响借款分配的 patrón。由于机器学习技术对社会好坏无所谓,我们发现通过交叉性来检查算法公平性可以更加 authentically 使ower 动作,以实现更公平的结果。我们发现,虽然在高层次上,公平可能存在披露,但是在 combinatorial effects 下,不公正可能加剧,即核心公平问题可能更复杂。我们发现,除了法律保护的特征之外,敏感特征如单身状况和子女数也可能导致不均衡的危害。我们讨论了这些发现对金融服务业的影响。

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

  • paper_url: http://arxiv.org/abs/2308.00436
  • repo_url: https://github.com/ningmiao/selfcheck
  • paper_authors: Ning Miao, Yee Whye Teh, Tom Rainforth
  • for: 本研究旨在探讨 LLMs 是否可以自动认知自己的错误,不需要外部资源。
  • methods: 我们提出了一种零shot验证方法来识别错误,并用其进行权重投票来提高问答表现。
  • results: 我们在三个数学数据集上进行测试,发现该方法可以成功识别错误,并在最终预测性能中提高表现。
    Abstract The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.
    摘要 最近很多大语言模型(LLMs)的进步,特别是创造思维(CoT)的提示,使得解释问题变得可能。然而,即使最强的LLMs仍然在更复杂的问题上遇到困难,需要非线性思维和多步逻辑。在这种情况下,我们询问LLMs是否有能力自动发现自己的错误,而不需要外部资源。特别是,我们研究LLMs是否可以用于识别单个步骤逻辑中的错误。为此,我们提出了零shot验证方案,用于识别such errors。然后,我们使用这种验证方案来提高问答表现,通过对不同生成的答案进行权重投票。我们在三个数学 dataset(GSM8K、MathQA和MATH)上测试了这种方法,并发现它成功地识别了错误,并在最终预测性能中提高了表现。

qgym: A Gym for Training and Benchmarking RL-Based Quantum Compilation

  • paper_url: http://arxiv.org/abs/2308.02536
  • repo_url: https://github.com/qutech-delft/qgym
  • paper_authors: Stan van der Linde, Willem de Kok, Tariq Bontekoe, Sebastian Feld
  • for: 这篇论文是为了提高现代量子计算机的硬件限制下的量子电路编译过程的优化。
  • methods: 这篇论文使用了人工智能技术,具体来说是强化学习(RL),来优化量子电路编译过程。
  • results: 这篇论文提出了一个名为qgym的软件框架,该框架可以在高度可定制的环境中训练和测试RL算法和代理。
    Abstract Compiling a quantum circuit for specific quantum hardware is a challenging task. Moreover, current quantum computers have severe hardware limitations. To make the most use of the limited resources, the compilation process should be optimized. To improve currents methods, Reinforcement Learning (RL), a technique in which an agent interacts with an environment to learn complex policies to attain a specific goal, can be used. In this work, we present qgym, a software framework derived from the OpenAI gym, together with environments that are specifically tailored towards quantum compilation. The goal of qgym is to connect the research fields of Artificial Intelligence (AI) with quantum compilation by abstracting parts of the process that are irrelevant to either domain. It can be used to train and benchmark RL agents and algorithms in highly customizable environments.
    摘要 compile quantum circuit specific quantum hardware 是一个复杂的任务。此外,当前的量子计算机有严重的硬件限制。为了最大化有限资源,编译过程应该进行优化。为了改进当前方法,我们可以使用Reinforcement Learning(RL),这是一种 Agent 与环境互动以学习复杂的策略,以达到特定目标。在这项工作中,我们提出了qgym,一个基于 OpenAI gym 的软件框架,同时与量子编译环境相关。qgym 的目标是将人工智能(AI)与量子编译领域连接起来,抽象不相关的部分,以便在高度可定制的环境中训练和测试 RL 代理和算法。

Learning to Generate Training Datasets for Robust Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2308.02535
  • repo_url: None
  • paper_authors: Marwane Hariat, Olivier Laurent, Rémi Kazmierczak, Shihao Zhang, Andrei Bursuc, Angela Yao, Gianni Franchi
  • for: 提高semantic segmentation技术的Robustness,尤其是在安全关键应用中。
  • methods: 利用label-to-image生成器和图像-to-label分割模型的共同效应,设计并训练一种robust conditional生成随机网络来生成真实和可能的异常或异常图像。
  • results: 对提案的生成模型进行了深入的研究,评估下游分割网络的性能和Robustness,并证明了该方法可以在真实世界的干扰和数据分布变化中显著提高semantic segmentation技术的Robustness。
    Abstract Semantic segmentation techniques have shown significant progress in recent years, but their robustness to real-world perturbations and data samples not seen during training remains a challenge, particularly in safety-critical applications. In this paper, we propose a novel approach to improve the robustness of semantic segmentation techniques by leveraging the synergy between label-to-image generators and image-to-label segmentation models. Specifically, we design and train Robusta, a novel robust conditional generative adversarial network to generate realistic and plausible perturbed or outlier images that can be used to train reliable segmentation models. We conduct in-depth studies of the proposed generative model, assess the performance and robustness of the downstream segmentation network, and demonstrate that our approach can significantly enhance the robustness of semantic segmentation techniques in the face of real-world perturbations, distribution shifts, and out-of-distribution samples. Our results suggest that this approach could be valuable in safety-critical applications, where the reliability of semantic segmentation techniques is of utmost importance and comes with a limited computational budget in inference. We will release our code shortly.
    摘要 Semantic segmentation技术在最近几年内已经取得了显著的进步,但它们对实际世界中的干扰和训练数据集之外的数据样本仍然是一个挑战,特别是在安全关键应用中。在这篇论文中,我们提出了一种新的方法,以利用标签到图像生成器和图像到标签分割模型之间的共同作用,以提高 semantic segmentation 技术的可靠性。我们设计并训练了 Robusta,一种新的可靠 conditional generative adversarial network,以生成真实和可能的干扰或异常图像,用于训练可靠的分割模型。我们进行了深入的研究,评估了下游分割网络的性能和可靠性,并证明了我们的方法可以在实际世界中增强 semantic segmentation 技术的Robustness,包括分布转换、干扰和异常样本。我们的结果表明,这种方法在安全关键应用中可以提供有价值的技术,其可靠性和计算成本在推断中具有限制。我们即将发布我们的代码。

BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2308.01207
  • repo_url: https://github.com/chriswang98sz/bierl
  • paper_authors: Junyi Wang, Yuanyang Zhu, Zhi Wang, Yan Zheng, Jianye Hao, Chunlin Chen
  • for: 提高复杂RL问题的解决能力,使用Evolutionary Reinforcement Learning(ERL)算法。
  • methods: 提出了一个通用的Meta-ERL框架,通过矩阵优化来同时更新hyperparameter和ERL模型。
  • results: 在MuJoCo和Box2D任务中,BiERL frameworks比基eline和其他基eline都提高了学习性能,并且可以适应多种ERL算法。
    Abstract Evolutionary reinforcement learning (ERL) algorithms recently raise attention in tackling complex reinforcement learning (RL) problems due to high parallelism, while they are prone to insufficient exploration or model collapse without carefully tuning hyperparameters (aka meta-parameters). In the paper, we propose a general meta ERL framework via bilevel optimization (BiERL) to jointly update hyperparameters in parallel to training the ERL model within a single agent, which relieves the need for prior domain knowledge or costly optimization procedure before model deployment. We design an elegant meta-level architecture that embeds the inner-level's evolving experience into an informative population representation and introduce a simple and feasible evaluation of the meta-level fitness function to facilitate learning efficiency. We perform extensive experiments in MuJoCo and Box2D tasks to verify that as a general framework, BiERL outperforms various baselines and consistently improves the learning performance for a diversity of ERL algorithms.
    摘要

Tackling Hallucinations in Neural Chart Summarization

  • paper_url: http://arxiv.org/abs/2308.00399
  • repo_url: https://github.com/worldhellow/hallucinations-c2t
  • paper_authors: Saad Obaid ul Islam, Iza Škrjanec, Ondřej Dušek, Vera Demberg
  • for: 解决chart summarization中的幻觉问题
  • methods: 使用自然语言判断(NLI)方法预处理训练数据,以减少幻觉现象。同时,缩短输入序列中的长距离依赖关系和添加图表标题和标签,以改善总体性能。
  • results: 通过人工评估,我们的方法显著减少了幻觉现象,并且改善了总体性能。
    Abstract Hallucinations in text generation occur when the system produces text that is not grounded in the input. In this work, we tackle the problem of hallucinations in neural chart summarization. Our analysis shows that the target side of chart summarization training datasets often contains additional information, leading to hallucinations. We propose a natural language inference (NLI) based method to preprocess the training data and show through human evaluation that our method significantly reduces hallucinations. We also found that shortening long-distance dependencies in the input sequence and adding chart-related information like title and legends improves the overall performance.
    摘要 文本生成中的幻觉现象发生在系统生成的文本与输入没有匹配。在这项工作中,我们解决了chart摘要 neural网络中的幻觉问题。我们的分析显示目标一侧chart摘要训练数据经常包含额外信息,导致幻觉。我们提议使用自然语言推理(NLI)基本方法来处理训练数据,并通过人工评估显示我们的方法可以显著减少幻觉。此外,我们发现短缩长距离相互关系在输入序列中和添加图表标题和图例可以提高总性表现。

A Survey of Time Series Anomaly Detection Methods in the AIOps Domain

  • paper_url: http://arxiv.org/abs/2308.00393
  • repo_url: None
  • paper_authors: Zhenyu Zhong, Qiliang Fan, Jiacheng Zhang, Minghua Ma, Shenglin Zhang, Yongqian Sun, Qingwei Lin, Yuzhi Zhang, Dan Pei
  • for: 本文旨在探讨人工智能为运维工作(AIOps)中的时间序列异常检测,以及未来的实际应用和下一代时间序列异常检测技术的发展趋势。
  • methods: 本文综述了一些常见的时间序列异常检测方法,包括统计方法、机器学习方法和深度学习方法等,并评估了它们在不同的应用场景中的表现。
  • results: 本文通过对一些实际应用场景的分析和评估,总结了时间序列异常检测的挑战和机遇,并提出了未来研究的方向和策略。
    Abstract Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection methods have emerged to address availability and performance issues. This review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps), which uses AI capabilities to automate and optimize operational workflows. Additionally, it explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements.
    摘要 互联网基于服务的时间序列状态监控和分析已经取得了杰出的成功,生成了大量监控关键性表现指标 (KPI) 的时间序列。这些时间序列的监控和分析是研究人员、服务运营商和升级工程师需要侦测异常或异常状态的关键工具。多种高级异常探测方法已经出现,以解决可用性和性能问题。本评论提供了AI操作 (AIOps) 中时间序列异常探测的全面概述,并探讨未来领域和下一代时间序列异常探测的未来发展。

Counterfactual Graph Transformer for Traffic Flow Prediction

  • paper_url: http://arxiv.org/abs/2308.00391
  • repo_url: None
  • paper_authors: Ying Yang, Kai Du, Xingyuan Dai, Jianwu Fang
  • for: traffic flow prediction (TFP) in Intelligent Transportation System (ITS)
  • methods: graph-based models with multiple attention mechanisms, perturbation mask generator for counterfactual explanations
  • results: improved and interpretable traffic flow prediction, reliable explanations on three real-world public datasets
    Abstract Traffic flow prediction (TFP) is a fundamental problem of the Intelligent Transportation System (ITS), as it models the latent spatial-temporal dependency of traffic flow for potential congestion prediction. Recent graph-based models with multiple kinds of attention mechanisms have achieved promising performance. However, existing methods for traffic flow prediction tend to inherit the bias pattern from the dataset and lack interpretability. To this end, we propose a Counterfactual Graph Transformer (CGT) model with an instance-level explainer (e.g., finding the important subgraphs) specifically designed for TFP. We design a perturbation mask generator over input sensor features at the time dimension and the graph structure on the graph transformer module to obtain spatial and temporal counterfactual explanations. By searching the optimal perturbation masks on the input data feature and graph structures, we can obtain the concise and dominant data or graph edge links for the subsequent TFP task. After re-training the utilized graph transformer model after counterfactual perturbation, we can obtain improved and interpretable traffic flow prediction. Extensive results on three real-world public datasets show that CGT can produce reliable explanations and is promising for traffic flow prediction.
    摘要 做为智能交通系统(ITS)的基础问题,流量流动预测(TFP)模型了路面交通流量的隐藏空间时间相依性,以预测潜在堵塞。现有的图形基本模型和多种注意力机制已经获得了显著的性能。然而,现有的交通流量预测方法通常会从数据集 inherit 偏见模式和缺乏解释性。为了解决这个问题,我们提出了Counterfactual Graph Transformer(CGT)模型,并特别设计了实例阶层解释器(例如,发现重要的子图)。我们在图形变换模组中实现了对输入感应器特征和图形结构的损害莳制生成器,以获得空间和时间的对应替代解释。通过寻找最佳的损害莳制,我们可以获得整体和主要的数据或图形边线,并将其用于后续的TFP任务。经过重新训练utilized graph transformer模型,我们可以获得改进和可解释的交通流量预测。实验结果显示,CGT可以生成可靠的解释和是TFP预测的有前途。

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

  • paper_url: http://arxiv.org/abs/2308.02533
  • repo_url: https://github.com/microsoft/robustlearn
  • paper_authors: Kaijie Zhu, Jindong Wang, Xixu Hu, Xing Xie, Ge Yang
  • for: 这个论文的目的是提高模型的普适性和对抗性,而不是减少对抗性的代价。
  • methods: 这个论文提出了一种新的方法called Robustness Critical Fine-Tuning (RiFT),它利用模型的稳定性 redundant capacity来提高模型的普适性和对抗性。
  • results: 实验结果表明, RiFT 可以在 ResNet18、ResNet34 和 WideResNet34-10 模型上提高普适性和对抗性,同时保持或者甚至提高对抗性。
    Abstract Deep neural networks are susceptible to adversarial examples, posing a significant security risk in critical applications. Adversarial Training (AT) is a well-established technique to enhance adversarial robustness, but it often comes at the cost of decreased generalization ability. This paper proposes Robustness Critical Fine-Tuning (RiFT), a novel approach to enhance generalization without compromising adversarial robustness. The core idea of RiFT is to exploit the redundant capacity for robustness by fine-tuning the adversarially trained model on its non-robust-critical module. To do so, we introduce module robust criticality (MRC), a measure that evaluates the significance of a given module to model robustness under worst-case weight perturbations. Using this measure, we identify the module with the lowest MRC value as the non-robust-critical module and fine-tune its weights to obtain fine-tuned weights. Subsequently, we linearly interpolate between the adversarially trained weights and fine-tuned weights to derive the optimal fine-tuned model weights. We demonstrate the efficacy of RiFT on ResNet18, ResNet34, and WideResNet34-10 models trained on CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Our experiments show that \method can significantly improve both generalization and out-of-distribution robustness by around 1.5% while maintaining or even slightly enhancing adversarial robustness. Code is available at https://github.com/microsoft/robustlearn.
    摘要 深度神经网络容易受到攻击性例子的威胁,这对重要应用场景来说是一个 significiant 的安全隐患。对此,抗击例(AT)是一种广泛使用的技术,可以提高攻击性 robustness,但它经常会导致模型的泛化能力减退。本文提出了一种新的方法,即 Robustness Critical Fine-Tuning(RiFT),可以增强泛化而不需要牺牲攻击性 robustness。RiFT 的核心思想是利用模型对 robustness 的剩余容量,通过精细调整对攻击性训练的模型中的非 robust-critical 模块的参数来增强泛化。为此,我们引入模块抗性评估(MRC),它评估模型对 worst-case веса变化的抗性。我们使用 MRC measure identificificate 模型中最低 MRC 值的模块为非 robust-critical 模块,然后精细调整其参数以获得精细调整后的参数。最后,我们将 adversarially trained weights 和精细调整后的 weights 线性 interpolate 以 derive 最佳 fine-tuned 模型 weights。我们在 ResNet18、ResNet34 和 WideResNet34-10 模型上进行 CIFAR10、CIFAR100 和 Tiny-ImageNet 数据集的实验,结果显示,\method 可以在泛化和离域 robustness 方面提高约 1.5%,同时保持或甚至提高攻击性 robustness。代码可以在 https://github.com/microsoft/robustlearn 上找到。

Shape Completion with Prediction of Uncertain Regions

  • paper_url: http://arxiv.org/abs/2308.00377
  • repo_url: https://github.com/dlr-rm/shape-completion
  • paper_authors: Matthias Humt, Dominik Winkelbauer, Ulrich Hillenbrand
  • for: Shape completion for robotic manipulation, specifically predicting the complete geometry of an object from a partial observation, and providing an indication of severe geometric uncertainty in extended regions.
  • methods: Two novel methods for predicting uncertain regions, one through postprocessing occupancy scores and the other through direct prediction of an uncertainty indicator, as well as two known approaches to probabilistic shape completion.
  • results: Both novel methods outperform the two baselines in shape completion and uncertain region prediction, and avoiding the predicted uncertain regions increases the quality of grasps for all tested methods.Here’s the text in Simplified Chinese:
  • for: shape completion for robotic manipulation, 特别是从受限的观察中预测物体的完整几何结构。
  • methods: 提出了两种新的方法来预测不确定区域,一种是通过处理启用分布的方法,另一种是直接预测不确定指标。
  • results: 这两种新方法在形成完整的几何结构和不确定区域预测方面都高于两个基eline方法,并且避免预测的不确定区域可以提高所有测试方法中的抓取质量。
    Abstract Shape completion, i.e., predicting the complete geometry of an object from a partial observation, is highly relevant for several downstream tasks, most notably robotic manipulation. When basing planning or prediction of real grasps on object shape reconstruction, an indication of severe geometric uncertainty is indispensable. In particular, there can be an irreducible uncertainty in extended regions about the presence of entire object parts when given ambiguous object views. To treat this important case, we propose two novel methods for predicting such uncertain regions as straightforward extensions of any method for predicting local spatial occupancy, one through postprocessing occupancy scores, the other through direct prediction of an uncertainty indicator. We compare these methods together with two known approaches to probabilistic shape completion. Moreover, we generate a dataset, derived from ShapeNet, of realistically rendered depth images of object views with ground-truth annotations for the uncertain regions. We train on this dataset and test each method in shape completion and prediction of uncertain regions for known and novel object instances and on synthetic and real data. While direct uncertainty prediction is by far the most accurate in the segmentation of uncertain regions, both novel methods outperform the two baselines in shape completion and uncertain region prediction, and avoiding the predicted uncertain regions increases the quality of grasps for all tested methods. Web: https://github.com/DLR-RM/shape-completion
    摘要 shape completion,即从partial observation中预测完整的物体形态,对机器人操作具有重要意义。当基于物体形态重建计划或预测实际抓取时,对物体形态的不确定性是不可或缺的。特别是在给出杂乱的物体视图时,可能存在扩展区域中对物体部分的存在是不确定的。为解决这个重要的问题,我们提出了两种新的方法,一种是通过处理占用度分布来预测不确定区域,另一种是直接预测不确定指标。我们将这些方法与两种已知方法进行比较,并在shape completion和不确定区域预测中进行测试。结果显示,直接预测不确定区域是最准确的 segmentation 方法,而我们的两种新方法在shape completion和不确定区域预测中均超过了两个基eline方法,并且避免预测的不确定区域可以提高所有测试方法的抓取质量。详细信息可以查看我们的github仓库:https://github.com/DLR-RM/shape-completion。

MRQ:Support Multiple Quantization Schemes through Model Re-Quantization

  • paper_url: http://arxiv.org/abs/2308.01867
  • repo_url: None
  • paper_authors: Manasa Manohara, Sankalp Dayal, Tariq Afzal, Rahul Bakshi, Kahkuen Fu
  • for: 这篇论文是针对对于边缘设备(例如:NPU、TPU、DPU)的深度学习模型部署所难以实现的复杂模型quantization和转换问题提出了解决方案。
  • methods: 这篇论文提出了一种新的模型quantization方法,即MRQ(模型重新quantization),可以将现有的量化模型迅速地转换为不同的量化需求(例如:对称 -> 非对称、非二进制数值 -> 二进制数值)。这种重新量化方法比从头开始量化更加简单,因为它可以避免高昂的重新训练成本,并且可以同时支持多种量化方案。
  • results: 这篇论文的结果显示,可以使用新开发的重新量化算法(包括权重调整和几何误差折衣)将MobileNetV2 QAT模型([7])转换为不同的量化方案(例如:对称和对称+二进制数值),单位损失不大于0.64单位。此外,这些量化模型已经成功地在NNA上部署在Echo Show设备上。
    Abstract Despite the proliferation of diverse hardware accelerators (e.g., NPU, TPU, DPU), deploying deep learning models on edge devices with fixed-point hardware is still challenging due to complex model quantization and conversion. Existing model quantization frameworks like Tensorflow QAT [1], TFLite PTQ [2], and Qualcomm AIMET [3] supports only a limited set of quantization schemes (e.g., only asymmetric per-tensor quantization in TF1.x QAT [4]). Accordingly, deep learning models cannot be easily quantized for diverse fixed-point hardwares, mainly due to slightly different quantization requirements. In this paper, we envision a new type of model quantization approach called MRQ (model re-quantization), which takes existing quantized models and quickly transforms the models to meet different quantization requirements (e.g., asymmetric -> symmetric, non-power-of-2 scale -> power-of-2 scale). Re-quantization is much simpler than quantizing from scratch because it avoids costly re-training and provides support for multiple quantization schemes simultaneously. To minimize re-quantization error, we developed a new set of re-quantization algorithms including weight correction and rounding error folding. We have demonstrated that MobileNetV2 QAT model [7] can be quickly re-quantized into two different quantization schemes (i.e., symmetric and symmetric+power-of-2 scale) with less than 0.64 units of accuracy loss. We believe our work is the first to leverage this concept of re-quantization for model quantization and models obtained from the re-quantization process have been successfully deployed on NNA in the Echo Show devices.
    摘要 尽管现有多种各种硬件加速器(如NPU、TPU、DPU),但将深度学习模型部署到边缘设备上的固定点硬件仍然是一项复杂的挑战,主要是因为复杂的模型减quantization和转换。现有的模型减quantization框架,如TensorFlow QAT [1]、TFLite PTQ [2]和Qualcomm AIMET [3],只支持一定的减quantization方案(如TF1.x QAT [4]中的只有非对称每个tensor减quantization)。因此,深度学习模型难以被容易减quantization,主要是因为不同的固定点硬件有些微的不同减quantization需求。在这篇论文中,我们提出了一种新的模型减quantization方法,称为MRQ(模型重新减quantization)。它可以将现有的减quantized模型快速地转换成符合不同减quantization需求的模型。重新减quantization比从头开始减quantization更加简单,因为它可以避免费时的重新训练,并且可以同时支持多种减quantization方案。为了减少重新减quantization的误差,我们开发了一组新的重新减quantization算法,包括权重修正和圆拟误差折衔。我们已经证明了,通过MRQ方法,可以快速地将MobileNetV2 QAT模型 [7] 转换成两种不同的减quantization方案(即Symmetric和Symmetric+power-of-2 scale),减少误差丢失不到0.64个单位。我们认为,我们的工作是首次利用这种重新减quantization概念来实现模型减quantization,并且已经成功部署了模型在NNA上的Echo Show设备。

Learning Green’s Function Efficiently Using Low-Rank Approximations

  • paper_url: http://arxiv.org/abs/2308.00350
  • repo_url: https://github.com/kishanwn/decgreennet
  • paper_authors: Kishan Wimalawarne, Taiji Suzuki, Sophie Langer
  • for: 用深度学习模型解决不同类型的 partial differential equations
  • methods: 使用低级别分解学习绿函数,实现 removing redundant computations by separate learning with domain data for evaluation and Monte-Carlo samples for integral approximation
  • results: 提高计算时间,与 PINNs 和 MOD-Net 的准确率相似
    Abstract Learning the Green's function using deep learning models enables to solve different classes of partial differential equations. A practical limitation of using deep learning for the Green's function is the repeated computationally expensive Monte-Carlo integral approximations. We propose to learn the Green's function by low-rank decomposition, which results in a novel architecture to remove redundant computations by separate learning with domain data for evaluation and Monte-Carlo samples for integral approximation. Using experiments we show that the proposed method improves computational time compared to MOD-Net while achieving comparable accuracy compared to both PINNs and MOD-Net.
    摘要 使用深度学习模型学习格林函数,可以解决不同类型的部分 diferencial equation。 however,使用深度学习 для格林函数有一个实际的限制,即多次计算成本高的Monte-Carlo integral approximation。我们提议通过低级别 decompositions 学习格林函数,这会导致一种新的架构,可以通过分离学习领域数据和Monte-Carlo 样本来移除重复的计算。我们通过实验表明,我们的方法可以比MOD-Net快速计算,并且与PINNs和MOD-Net的准确率相当。

Dynamic ensemble selection based on Deep Neural Network Uncertainty Estimation for Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2308.00346
  • repo_url: None
  • paper_authors: Ruoxi Qin, Linyuan Wang, Xuehui Du, Xingyuan Chen, Bin Yan
  • for: 提高模型对白盒攻击的防御性能和稳定性。
  • methods: 动态 ensemble 选择技术,通过 Dirichlet 分布和多模型 parameter 空间多样性约束来提高模型的不确定性识别和鲁棒性。
  • results: 比对前一些动态方法和静态 adversarial 训练模型,提出的方法可以实现显著的鲁棒性提高而不损失精度。
    Abstract The deep neural network has attained significant efficiency in image recognition. However, it has vulnerable recognition robustness under extensive data uncertainty in practical applications. The uncertainty is attributed to the inevitable ambient noise and, more importantly, the possible adversarial attack. Dynamic methods can effectively improve the defense initiative in the arms race of attack and defense of adversarial examples. Different from the previous dynamic method depend on input or decision, this work explore the dynamic attributes in model level through dynamic ensemble selection technology to further protect the model from white-box attacks and improve the robustness. Specifically, in training phase the Dirichlet distribution is apply as prior of sub-models' predictive distribution, and the diversity constraint in parameter space is introduced under the lightweight sub-models to construct alternative ensembel model spaces. In test phase, the certain sub-models are dynamically selected based on their rank of uncertainty value for the final prediction to ensure the majority accurate principle in ensemble robustness and accuracy. Compared with the previous dynamic method and staic adversarial traning model, the presented approach can achieve significant robustness results without damaging accuracy by combining dynamics and diversity property.
    摘要 深度神经网络在图像识别中已经实现了显著的效率,但它在实际应用中面临着广泛的数据不确定性和可能的敌意攻击的问题。这些不确定性的来源包括不可避免的环境噪声以及可能的敌意攻击。动态方法可以有效地提高模型的防御力,在攻击和防御敌意例子的武器库中进行了反应。与之前的动态方法不同,本工作通过在模型层次上进行动态ensemble选择技术来进一步保护模型免受白盒攻击,提高了模型的Robustness。在训练阶段,我们使用Dirichlet分布作为子模型预测分布的PRIOR,并在轻量级子模型之间引入多样性约束,以构建多个备用模型空间。在测试阶段,根据它们的uncertainty值排名,选择一些特定的子模型来进行最终预测,以确保多数准确原理在ensemble robustness和精度之间达到平衡。与之前的动态方法和静态敌意训练模型相比,提出的方法可以无需损害精度而实现显著的Robustness。

Monitoring Algorithmic Fairness under Partial Observations

  • paper_url: http://arxiv.org/abs/2308.00341
  • repo_url: None
  • paper_authors: Thomas A. Henzinger, Konstantin Kueffner, Kaushik Mallik
  • for: 这篇论文目的是监控机器学习系统中的公平性,以确保其在做出决策时保持公平和不偏袋。
  • methods: 这篇论文使用了runtime verification技术来监控机器学习系统的公平性,并且可以在部署系统时进行监控。这些监控技术可以处理不完全可观察的系统状态,并且可以监控多种公平性责任。
  • results: 这篇论文的实验结果显示,使用这些监控技术可以实现实时监控机器学习系统的公平性,并且可以在不同的实际应用中进行适当的调整。
    Abstract As AI and machine-learned software are used increasingly for making decisions that affect humans, it is imperative that they remain fair and unbiased in their decisions. To complement design-time bias mitigation measures, runtime verification techniques have been introduced recently to monitor the algorithmic fairness of deployed systems. Previous monitoring techniques assume full observability of the states of the (unknown) monitored system. Moreover, they can monitor only fairness properties that are specified as arithmetic expressions over the probabilities of different events. In this work, we extend fairness monitoring to systems modeled as partially observed Markov chains (POMC), and to specifications containing arithmetic expressions over the expected values of numerical functions on event sequences. The only assumptions we make are that the underlying POMC is aperiodic and starts in the stationary distribution, with a bound on its mixing time being known. These assumptions enable us to estimate a given property for the entire distribution of possible executions of the monitored POMC, by observing only a single execution. Our monitors observe a long run of the system and, after each new observation, output updated PAC-estimates of how fair or biased the system is. The monitors are computationally lightweight and, using a prototype implementation, we demonstrate their effectiveness on several real-world examples.
    摘要 As AI和机器学习软件在做决策时越来越多,它们必须保持公平和无偏见。为了补充设计时的偏见缓解措施,运行时监测技术已经在最近引入了。 previous监测技术假设了监测系统的完整可见性,并且只能监测已知的系统状态。在这种情况下,我们将公平监测扩展到采用partially observed Markov chains(POMC)模型的系统,并将特定的公平性特性表示为数值函数的期望值。我们假设了这个POMC是无限期的,并且知道其混合时间的上限。这些假设使我们能够估计整个可能的执行情况下的公平性,只需观察一个执行。我们的监测器在系统中观察长时间,每次新的观察结果出现后,输出了更新后的PAC估计值,用于评估系统的公平性。我们的监测器 computationally lightweight,并且使用原型实现,我们在实际应用中证明了它们的有效性。

Threshold-aware Learning to Generate Feasible Solutions for Mixed Integer Programs

  • paper_url: http://arxiv.org/abs/2308.00327
  • repo_url: None
  • paper_authors: Taehyun Yoon, Jinwon Choi, Hyokun Yun, Sungbin Lim
  • For: The paper is written to address the challenge of finding high-quality feasible solutions to combinatorial optimization (CO) problems within a limited time, using machine learning (ML) methods.* Methods: The paper proposes a post-hoc method and a learning-based approach for optimizing the coverage of partial discrete variable assignments in Mixed Integer Programs (MIP), which bridges the gap between the learning and MIP objectives. The approach involves jointly learning to restrict the coverage search space and to predict the coverage in the learned search space, using a deep neural network.* Results: The paper demonstrates state-of-the-art performance in NeurIPS ML4CO datasets, achieving an optimality gap of 0.45% in the workload apportionment dataset within a one-minute time limit, which is a ten-fold improvement over SCIP.
    Abstract Finding a high-quality feasible solution to a combinatorial optimization (CO) problem in a limited time is challenging due to its discrete nature. Recently, there has been an increasing number of machine learning (ML) methods for addressing CO problems. Neural diving (ND) is one of the learning-based approaches to generating partial discrete variable assignments in Mixed Integer Programs (MIP), a framework for modeling CO problems. However, a major drawback of ND is a large discrepancy between the ML and MIP objectives, i.e., variable value classification accuracy over primal bound. Our study investigates that a specific range of variable assignment rates (coverage) yields high-quality feasible solutions, where we suggest optimizing the coverage bridges the gap between the learning and MIP objectives. Consequently, we introduce a post-hoc method and a learning-based approach for optimizing the coverage. A key idea of our approach is to jointly learn to restrict the coverage search space and to predict the coverage in the learned search space. Experimental results demonstrate that learning a deep neural network to estimate the coverage for finding high-quality feasible solutions achieves state-of-the-art performance in NeurIPS ML4CO datasets. In particular, our method shows outstanding performance in the workload apportionment dataset, achieving the optimality gap of 0.45%, a ten-fold improvement over SCIP within the one-minute time limit.
    摘要 寻找一个高质量可行的解决方案 для combinatorial optimization (CO) 问题在有限时间内是挑战的,主要因为其离散性。近年来,machine learning (ML) 方法在Addressing CO problems 问题上增加了。Neural diving (ND) 是一种学习基于的方法,用于生成混合整数程序(MIP)中的部分不连续变量分配。然而,ND 的一个主要缺点是ML 和 MIP 目标之间的大差异,即变量值分类准确率 над primal bound。我们的研究发现,在特定的变量分配率(coverage)范围内,可以获得高质量可行的解决方案,我们建议优化coverage来bridging the gap between learning和MIP目标。因此,我们提出了一种后续方法和一种学习基于的方法来优化coverage。我们的方法的关键思想是同时学习 restriction the coverage search space和在学习的搜索空间中预测coverage。实验结果表明,通过学习深度神经网络来估算coverage可以在 NeurIPS ML4CO 数据集中实现状态机器人的性能。特别是,我们的方法在工作负担分配数据集中表现出色,实现了1分钟时限内的优化性能,与SCIP相比,提高了10倍。

Pixel to policy: DQN Encoders for within & cross-game reinforcement learning

  • paper_url: http://arxiv.org/abs/2308.00318
  • repo_url: None
  • paper_authors: Ashrya Agrawal, Priyanshi Shah, Sourabh Prakash
  • for: The paper aims to improve the performance of reinforcement learning (RL) models by leveraging transfer learning and multi-task learning in various game environments.
  • methods: The authors use deep Q-networks (DQN) as the RL model and explore different approaches of transfer learning, including pre-training the model on one game and fine-tuning it on another, as well as training the model on multiple games simultaneously.
  • results: The authors achieve impressive performance on several game environments, including a mean episode reward of 46.16, which beats human-level performance with only 20k episodes, and mean rewards of 533.42 and 402.17 on the Assault and Space Invader environments, respectively.
    Abstract Reinforcement Learning can be applied to various tasks, and environments. Many of these environments have a similar shared structure, which can be exploited to improve RL performance on other tasks. Transfer learning can be used to take advantage of this shared structure, by learning policies that are transferable across different tasks and environments and can lead to more efficient learning as well as improved performance on a wide range of tasks. This work explores as well as compares the performance between RL models being trained from the scratch and on different approaches of transfer learning. Additionally, the study explores the performance of a model trained on multiple game environments, with the goal of developing a universal game-playing agent as well as transfer learning a pre-trained encoder using DQN, and training it on the same game or a different game. Our DQN model achieves a mean episode reward of 46.16 which even beats the human-level performance with merely 20k episodes which is significantly lower than deepmind's 1M episodes. The achieved mean rewards of 533.42 and 402.17 on the Assault and Space Invader environments respectively, represent noteworthy performance on these challenging environments.
    摘要 “强化学习可以应用到多种任务和环境中,许多这些环境具有类似的共享结构,可以利用这些共享结构来提高强化学习性能。转移学习可以在不同任务和环境中学习可以转移的策略,从而实现更高效的学习以及多种任务的改进性能。本研究探讨了强化学习模型从零开始学习和转移学习的不同方法的性能,以及将多个游戏环境训练的模型在不同游戏中的性能。我们的DQN模型在46.16的平均回合奖励上达到了人类水平性能,只需20k个回合,这比深梦的1M个回合要低得多。在Assault和Space Invader环境中,我们的模型分别获得了533.42和402.17的平均奖励,这些表现在这些复杂的环境中是非常出色的。”

Doubly Robust Instance-Reweighted Adversarial Training

  • paper_url: http://arxiv.org/abs/2308.00311
  • repo_url: None
  • paper_authors: Daouda Sow, Sen Lin, Zhangyang Wang, Yingbin Liang
  • for: 提高防御性能和数据分布不均 Robustness under limited model capacity.
  • methods: 使用分布robust优化(DRO)技术获取重要性加权,并且提高最容易受到攻击的数据点的鲁棒性。
  • results: 在标准分类 dataset 上比基eline方法提高了平均防御性能,同时提高了最弱数据点的鲁棒性。
    Abstract Assigning importance weights to adversarial data has achieved great success in training adversarially robust networks under limited model capacity. However, existing instance-reweighted adversarial training (AT) methods heavily depend on heuristics and/or geometric interpretations to determine those importance weights, making these algorithms lack rigorous theoretical justification/guarantee. Moreover, recent research has shown that adversarial training suffers from a severe non-uniform robust performance across the training distribution, e.g., data points belonging to some classes can be much more vulnerable to adversarial attacks than others. To address both issues, in this paper, we propose a novel doubly-robust instance reweighted AT framework, which allows to obtain the importance weights via exploring distributionally robust optimization (DRO) techniques, and at the same time boosts the robustness on the most vulnerable examples. In particular, our importance weights are obtained by optimizing the KL-divergence regularized loss function, which allows us to devise new algorithms with a theoretical convergence guarantee. Experiments on standard classification datasets demonstrate that our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance, and at the same time improves the robustness against attacks on the weakest data points. Codes will be available soon.
    摘要 <>使用对抗数据中的重要性权重进行训练,已经在有限模型容量下实现了很大的成功。然而,现有的实例权重对抗训练(AT)方法仍然依赖于各种euristic和/或几何解释来确定这些重要性权重,使得这些算法缺乏正式的理论基础和保证。此外,现有研究表明,对抗训练受到训练分布中的非均匀攻击性影响,例如,某些类型的数据点可能更容易受到攻击。为解决这两个问题,在本文中,我们提出了一种新的双重稳健实例权重AT框架,可以通过探索分布式稳健优化(DRO)技术来获得重要性权重,并在同时提高最容易受到攻击的数据点的稳健性。具体来说,我们的重要性权重通过优化KL散度规范化损失函数来获得,这使我们可以开发新的算法,并提供了理论上的准确性保证。实验表明,我们的提议方法在标准分类 datasets 上的平均Robust性性能高于相关的状态艺术基eline方法,同时在最容易受到攻击的数据点上提高了稳健性。代码即将上传。

GradOrth: A Simple yet Efficient Out-of-Distribution Detection with Orthogonal Projection of Gradients

  • paper_url: http://arxiv.org/abs/2308.00310
  • repo_url: None
  • paper_authors: Sima Behpour, Thang Doan, Xin Li, Wenbin He, Liang Gou, Liu Ren
  • for: 验证机器学习模型在实际应用中的安全部署,检测非典型数据(OOD)是关键。
  • methods: 我们提出了一种基于卷积网络中最重要参数的OOD检测方法,通过计算ID数据中考虑重要的子空间上的梯度 проек 来标识OOD数据。
  • results: 我们的方法可以减少平均假阳性率,比现有方法提高True Positive Rate(TPR)的性能。在95%假阳性率(FPR95)下,我们的方法可以减少OOD数据的假阳性率达8%。
    Abstract Detecting out-of-distribution (OOD) data is crucial for ensuring the safe deployment of machine learning models in real-world applications. However, existing OOD detection approaches primarily rely on the feature maps or the full gradient space information to derive OOD scores neglecting the role of most important parameters of the pre-trained network over in-distribution (ID) data. In this study, we propose a novel approach called GradOrth to facilitate OOD detection based on one intriguing observation that the important features to identify OOD data lie in the lower-rank subspace of in-distribution (ID) data. In particular, we identify OOD data by computing the norm of gradient projection on the subspaces considered important for the in-distribution data. A large orthogonal projection value (i.e. a small projection value) indicates the sample as OOD as it captures a weak correlation of the ID data. This simple yet effective method exhibits outstanding performance, showcasing a notable reduction in the average false positive rate at a 95% true positive rate (FPR95) of up to 8% when compared to the current state-of-the-art methods.
    摘要 检测不同领域(OOD)数据是机器学习模型在实际应用中安全部署的关键。然而,现有的OOD检测方法主要基于特征图或整个梯度空间信息来 derive OOD 分数,忽略了预训练网络中最重要的参数的作用。在本研究中,我们提出了一种新的方法called GradOrth,用于基于ID数据中最重要的特征进行OOD检测。具体来说,我们通过计算ID数据中考虑重要的特征SUBSPACE上的梯度 projetcion norm来识别OOD数据。如果梯度 проекcion norm值很小(即梯度 проекcion 强度很弱),则表示该样本是OOD数据。这种简单 yet有效的方法在评估中显示了remarkable performance,与当前状态的方法相比,可以达到8%的 False Positive Rate(FPR)降低。

Revolutionizing TCAD Simulations with Universal Device Encoding and Graph Attention Networks

  • paper_url: http://arxiv.org/abs/2308.11624
  • repo_url: None
  • paper_authors: Guangxi Fan, Kain Lu Low
  • for: 提出了一种基于人工智能(AI)和图表示的半导体设备编码方法,用于TCAD设备仿真。
  • methods: 提出了一种图基于的通用编码方案,考虑了材料层和设备层的嵌入,同时还引入了一种新的空间关系嵌入, inspirited by interpolation operations typically used in finite element meshing。
  • results: 通过利用物理法则和数据驱动模拟,实现了Surrogate Poisson 优化和current-voltage(IV)预测,并使用了一种新的图注意力网络(RelGAT)。
    Abstract An innovative methodology that leverages artificial intelligence (AI) and graph representation for semiconductor device encoding in TCAD device simulation is proposed. A graph-based universal encoding scheme is presented that not only considers material-level and device-level embeddings, but also introduces a novel spatial relationship embedding inspired by interpolation operations typically used in finite element meshing. Universal physical laws from device simulations are leveraged for comprehensive data-driven modeling, which encompasses surrogate Poisson emulation and current-voltage (IV) prediction based on drift-diffusion model. Both are achieved using a novel graph attention network, referred to as RelGAT. Comprehensive technical details based on the device simulator Sentaurus TCAD are presented, empowering researchers to adopt the proposed AI-driven Electronic Design Automation (EDA) solution at the device level.
    摘要 一种创新的方法ология,利用人工智能(AI)和图表示法来实现半导体器件编码在TCAD设备仿真中,被提议。这种图基于的通用编码方案不仅考虑材料层和设备层嵌入,还引入了一种新的空间关系嵌入,Draw inspiration from interpolation operations typically used in finite element meshing。通用物理法则从设备仿真中得到了全面的数据驱动模拟,包括代表性函数估计和电压-电流(IV)预测,基于漫步扩散模型。这两个任务都使用了一种新的图注意力网络,称为RelGAT。我们提供了全面的技术细节,以便研究者可以在设备水平采用这种人工智能驱动电子设计自动化(EDA)解决方案。

Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting

  • paper_url: http://arxiv.org/abs/2308.02582
  • repo_url: None
  • paper_authors: Aseem Arora, Shabbirhussain Bhaisaheb, Harshit Nigam, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff
  • For: The paper is focused on improving the cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing, which is a challenging task.* Methods: The authors propose an algorithm that performs offline sampling of a minimal set of few-shots from the training data, with complete coverage of SQL clauses, operators, and functions, and maximal domain coverage within the allowed token length. This allows for the synthesis of a fixed Generic Prompt (GP) with a diverse set of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. Additionally, the authors propose an auto-adaptation of the GP to the target database domain (DA-GP) to better handle cross-domain generalization, followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization.* Results: The authors demonstrate superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. They also showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of their prompt-based adapt and decompose approach.Here’s the simplified Chinese version of the three information points:* For: 这篇论文旨在解决文本到SQLsemantic parsing中的跨领域和跨组合性泛化问题,这是一项具有挑战性的任务。* Methods: 作者们提出了一种方法,通过在训练数据集上进行Offline sampling,以获取完整的SQL子句、运算符和函数的覆盖,同时保证在允许的Token长度内具备最大的领域覆盖。这allow for the synthesis of a fixed Generic Prompt (GP) with a diverse set of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. 作者们还提出了一种自适应GP到目标数据库领域(DA-GP),以更好地处理跨领域泛化。此外,他们还提出了一种分解的Least-To-Most-Prompting (LTMP-DA-GP),以处理跨组合性泛化。* Results: 作者们在KaggleDBQA数据集上展现出了superior的性能,这个数据集是用于评估文本到SQL泛化性能的。他们还显示了LTMP-DA-GP在不同的LLMs和数据库上的一致性提升,这highlights the efficacy and model agnostic benefits of their prompt-based adapt and decompose approach。
    Abstract Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach.
    摘要 Translated into Simplified Chinese:跨Domain和跨组合性的文本到SQLsemantic parsing是一项具有挑战性的任务。现有的大型自然语言模型(LLM)基于解决方案通过推理时间获取少量示例来synthesize每个自然语言(NL)测试查询的运行时提示。相比之下,我们提出了一种算法,它在训练集上进行offline采样,收集了完整的SQL子句、运算符和函数的少量示例,并在允许的字符串长度内保证最大的Domain覆盖。这使得可以synthesize一个固定的通用提示(GP),该提示包含了NL测试查询中共享的多个示例,从而避免了Expensive的测试时间示例重新获取。我们还自动适应了GP到目标数据库Domain(DA-GP),以更好地处理跨Domain泛化。然后,我们使用了 decomposed Least-To-Most-Prompting(LTMP-DA-GP)来处理跨组合性泛化。该synthesis是一个offline任务,需要在新的数据库上进行一次性的人工 intervención。我们的方法在KaggleDBQA数据集上显示了superior的性能,KaggleDBQA数据集是用于评估文本到SQL任务的泛化性能的。我们还在LLMs和KaggleDBQA数据库之间显示了LTMP-DA-GP的性能提升,这highlights our prompt based adapt和decomposeapproach的效果和模型无关性。

A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation

  • paper_url: http://arxiv.org/abs/2308.00287
  • repo_url: None
  • paper_authors: Minghao Chen, Zepeng Gao, Shuai Zhao, Qibo Qiu, Wenxiao Wang, Binbin Lin, Xiaofei He
    for: 本研究旨在开发一种无监督领域适应(Unsupervised Domain Adaptation,UDA)评价度量,能够无需目标验证集来评估转移模型的质量。methods: 本研究使用了基于模型预测的相互信息度量作为起点,并通过实验分析发现了三种常见问题:1)不考虑源结构;2)容易受到攻击;3)无法检测到源和目标特征的过分对应。为解决这些问题,我们在度量中添加了源精度,并使用了一个新的多层感知(MLP)分类器,在训练过程中保持不参与。此外,我们还将这种改进后的度量与数据扩展结合使用,得到了一种新的无监督UDA度量——扩展一致度量(ACM)。results: 我们通过大规模实验证明了我们提出的度量的有效性,并对前一些实验设置的缺点进行了证明。此外,我们还使用了我们提出的度量自动搜索最佳超参数集,在四种常见 benchmark 上达到了超过 manually 调参集的性能。 codes 将很快地公开。
    Abstract Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the metric based on mutual information of the model prediction. Through empirical analysis, we identify three prevalent issues with this metric: 1) It does not account for the source structure. 2) It can be easily attacked. 3) It fails to detect negative transfer caused by the over-alignment of source and target features. To address the first two issues, we incorporate source accuracy into the metric and employ a new MLP classifier that is held out during training, significantly improving the result. To tackle the final issue, we integrate this enhanced metric with data augmentation, resulting in a novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM). Additionally, we empirically demonstrate the shortcomings of previous experiment settings and conduct large-scale experiments to validate the effectiveness of our proposed metric. Furthermore, we employ our metric to automatically search for the optimal hyper-parameter set, achieving superior performance compared to manually tuned sets across four common benchmarks. Codes will be available soon.
    摘要 无监督领域适应(UDA)方法可以将模型转移到目标领域无需标签。然而,这些方法需要一个标注的目标验证集来调整超参数和选择模型。在这篇论文中,我们想要找到一个可以评估转移模型的评价指标,不需要目标验证集。我们开始于基于模型预测的共同信息度的指标。通过实验分析,我们发现了三个常见的问题:1)它不考虑源结构。2)它可以轻松攻击。3)它无法探测源和目标特征的过对齐导致的负向转移。为了解决这两个问题,我们将源准确率 incorporated 到指标中,并使用一个新的多层感知(MLP)分类器,该分类器在训练时被隐藏。此外,我们还将这个提高后的指标与数据扩展结合使用,得到了一种新的无监督UDA指标——扩展一致指标(ACM)。此外,我们还对之前的实验设置进行了详细的批判,并进行了大规模的实验来验证我们的提出的指标的有效性。最后,我们使用我们的指标自动搜索最佳超参数集,在四个常见的benchmark上达到了人工调整的超参数集的超过表现。代码即将上线。

Predictive Modeling through Hyper-Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2308.00285
  • repo_url: None
  • paper_authors: Manisha Senadeera, Santu Rana, Sunil Gupta, Svetha Venkatesh
  • for: 这 paper 是为了提高 Bayesian 优化(BO) 的模型选择效率,并且同时获取黑板函数的信息。
  • methods: 这 paper 使用了一种新的方法,即将模型选择和 BO integrate在一起,以达到更快的函数优化目标。这种方法通过在模型空间和函数空间之间往返,使用一个得分函数来衡量模型的质量,并将其反馈给 BO,以便更快地 convergence。
  • results: 这 paper 的实验结果表明,使用这种方法可以提高 BO 的样本效率,同时也可以获取黑板函数的信息。此外,这 paper 还证明了这种方法的收敛性。
    Abstract Model selection is an integral problem of model based optimization techniques such as Bayesian optimization (BO). Current approaches often treat model selection as an estimation problem, to be periodically updated with observations coming from the optimization iterations. In this paper, we propose an alternative way to achieve both efficiently. Specifically, we propose a novel way of integrating model selection and BO for the single goal of reaching the function optima faster. The algorithm moves back and forth between BO in the model space and BO in the function space, where the goodness of the recommended model is captured by a score function and fed back, capturing how well the model helped convergence in the function space. The score function is derived in such a way that it neutralizes the effect of the moving nature of the BO in the function space, thus keeping the model selection problem stationary. This back and forth leads to quick convergence for both model selection and BO in the function space. In addition to improved sample efficiency, the framework outputs information about the black-box function. Convergence is proved, and experimental results show significant improvement compared to standard BO.
    摘要 <>将文本翻译成简化中文。<>基于模型的优化技术,如 bayesian 优化(BO),选择模型是一个不可或缺的问题。现有方法通常将模型选择视为估计问题,在优化迭代中 periodic 更新 observations。在本文中,我们提出了一种新的方法,可以快速地达到函数的最优点。该算法在模型空间和函数空间之间往返,使用 BO 来评估模型的好坏,并将其反馈给函数空间中的 BO,以 capture 模型如何帮助函数空间中的 converges。该得分函数是设计的,以 neutralize 模型在函数空间中的移动效果,因此保持模型选择问题的静态性。这种往返机制会使 BO 在函数空间和模型空间中快速 converge。除了提高样本效率之外,该框架还输出了黑板函数的信息。 convergence 的证明和实验结果表明,与标准 BO 相比,该方法具有显著的改进。

CLAMS: A Cluster Ambiguity Measure for Estimating Perceptual Variability in Visual Clustering

  • paper_url: http://arxiv.org/abs/2308.00284
  • repo_url: None
  • paper_authors: Hyeon Jeon, Ghulam Jilani Quadri, Hyunwook Lee, Paul Rosen, Danielle Albers Szafir, Jinwook Seo
  • for: 这项研究旨在研究可见群分化中的人工智能评估不确定性,以提高数据分析中基于可见群分化的可靠性。
  • methods: 该研究使用了一种数据驱动的可见质量指标(CLAMS),通过对多个分配对的可分离性进行回归预测,来自动评估可见群分化的不确定性。
  • results: CLAMS可以更好地预测实际的群分化不确定性,并且与人工标注器的性能相当。此外,该研究还探讨了如何使用CLAMS来优化和benchmark数据挖掘技术。
    Abstract Visual clustering is a common perceptual task in scatterplots that supports diverse analytics tasks (e.g., cluster identification). However, even with the same scatterplot, the ways of perceiving clusters (i.e., conducting visual clustering) can differ due to the differences among individuals and ambiguous cluster boundaries. Although such perceptual variability casts doubt on the reliability of data analysis based on visual clustering, we lack a systematic way to efficiently assess this variability. In this research, we study perceptual variability in conducting visual clustering, which we call Cluster Ambiguity. To this end, we introduce CLAMS, a data-driven visual quality measure for automatically predicting cluster ambiguity in monochrome scatterplots. We first conduct a qualitative study to identify key factors that affect the visual separation of clusters (e.g., proximity or size difference between clusters). Based on study findings, we deploy a regression module that estimates the human-judged separability of two clusters. Then, CLAMS predicts cluster ambiguity by analyzing the aggregated results of all pairwise separability between clusters that are generated by the module. CLAMS outperforms widely-used clustering techniques in predicting ground truth cluster ambiguity. Meanwhile, CLAMS exhibits performance on par with human annotators. We conclude our work by presenting two applications for optimizing and benchmarking data mining techniques using CLAMS. The interactive demo of CLAMS is available at clusterambiguity.dev.
    摘要 <>转换给定文本到简化中文。<>可见划分是常见的感知任务在散点图中,支持多种分析任务(例如,团结标识)。然而,即使使用同一个散点图,对划分的方式(即进行可见划分)可以因个体差异和杂乱划分boundary而异。尽管如此,我们缺乏一种系统化的方法来效率地评估这种变化。在这项研究中,我们研究可见划分中的变化,我们称之为团结混淆。为此,我们引入CLAMS,一种数据驱动的视觉质量指标,用于自动预测散点图中团结的混淆程度。我们首先进行了资深研究,以确定影响可见划分的关键因素(例如,团结距离或团结大小差)。根据研究结果,我们部署了一个回归模块,以估计人类评估两个团结的可分离程度。然后,CLAMS预测团结混淆程度,通过分析所有对应的划分结果。CLAMS在预测真实团结混淆程度方面表现出色,同时与人类标注器的性能相当。我们结束我们的工作,并将两种用于优化和benchmarking数据挖掘技术的应用示例展示出来。CLAMS的交互式demo可以在clusterambiguity.dev上查看。

ZADU: A Python Library for Evaluating the Reliability of Dimensionality Reduction Embeddings

  • paper_url: http://arxiv.org/abs/2308.00282
  • repo_url: https://github.com/hj-n/zadu
  • paper_authors: Hyeon Jeon, Aeri Cho, Jinhwa Jang, Soohyun Lee, Jake Hyun, Hyung-Kwon Ko, Jaemin Jo, Jinwook Seo
  • for: 本研究旨在提供一个Python库(ZADU),用于评估维度减少(DR) embedding 的可靠性。
  • methods: ZADU 提供了广泛的扭曲度量表,并自动优化扭曲度量表的执行,降低了执行多个扭曲度量表所需的时间。
  • results: 我们透过一个实际的应用情况来验证我们的优化方案,发现这个方案可以对扭曲度量表进行很好的优化。此外,我们还创建了一个名为ZADUVis的库,可以让用户轻松地创建扭曲度量表的可视化图示。
    Abstract Dimensionality reduction (DR) techniques inherently distort the original structure of input high-dimensional data, producing imperfect low-dimensional embeddings. Diverse distortion measures have thus been proposed to evaluate the reliability of DR embeddings. However, implementing and executing distortion measures in practice has so far been time-consuming and tedious. To address this issue, we present ZADU, a Python library that provides distortion measures. ZADU is not only easy to install and execute but also enables comprehensive evaluation of DR embeddings through three key features. First, the library covers a wide range of distortion measures. Second, it automatically optimizes the execution of distortion measures, substantially reducing the running time required to execute multiple measures. Last, the library informs how individual points contribute to the overall distortions, facilitating the detailed analysis of DR embeddings. By simulating a real-world scenario of optimizing DR embeddings, we verify that our optimization scheme substantially reduces the time required to execute distortion measures. Finally, as an application of ZADU, we present another library called ZADUVis that allows users to easily create distortion visualizations that depict the extent to which each region of an embedding suffers from distortions.
    摘要 Dimensionality reduction (DR) 技术自然地扭曲输入数据的原始结构,生成不完美的低维度嵌入。为了评估DR嵌入的可靠性,各种不同的扭曲度指标已经被提出。然而,在实践中实施和执行这些指标的问题仍然存在。为解决这个问题,我们介绍了ZADU,一个Python库,它提供了多种扭曲度指标,并且可以自动优化执行扭曲度指标,大幅降低执行多个指标所需的运行时间。此外,ZADU还能够详细分析DR嵌入的扭曲度,让用户了解具体的扭曲度来源。通过模拟一个实际的DR嵌入优化场景,我们证明了我们的优化方案可以减少执行扭曲度指标所需的时间。最后,作为ZADU的应用,我们介绍了ZADUVis,一个可以轻松地创建扭曲度视觉化的库,它可以详细显示嵌入中每个区域受到的扭曲度程度。

Data Collaboration Analysis applied to Compound Datasets and the Introduction of Projection data to Non-IID settings

  • paper_url: http://arxiv.org/abs/2308.00280
  • repo_url: None
  • paper_authors: Akihiro Mizoguchi, Anna Bogdanova, Akira Imakura, Tetsuya Sakurai
    for: 这个论文主要是研究如何使用分布式机器学习来预测化学物质的性能,以及如何在非同分布(non-IID) Setting下提高预测精度。methods: 这个研究使用了联合学习(Federated Learning)和数据合作分析(Data Collaboration Analysis)等方法来预测化学物质的性能。另外,也提出了一种基于投影数据(Projection Data)的改进方法 called Data Collaboration Analysis using Projection Data(DCPd)。results: 研究发现,在非同分布 Setting下,DCPd的机器学习性能虽然和其他方法相似,但在不同的标签偏好情况下却表现较好,而且与其他方法相比,DCPd在不同标签偏好情况下的性能差异较小。这表明,DCPd可以解决联合学习在非同分布 Setting下的低性能问题。
    Abstract Given the time and expense associated with bringing a drug to market, numerous studies have been conducted to predict the properties of compounds based on their structure using machine learning. Federated learning has been applied to compound datasets to increase their prediction accuracy while safeguarding potentially proprietary information. However, federated learning is encumbered by low accuracy in not identically and independently distributed (non-IID) settings, i.e., data partitioning has a large label bias, and is considered unsuitable for compound datasets, which tend to have large label bias. To address this limitation, we utilized an alternative method of distributed machine learning to chemical compound data from open sources, called data collaboration analysis (DC). We also proposed data collaboration analysis using projection data (DCPd), which is an improved method that utilizes auxiliary PubChem data. This improves the quality of individual user-side data transformations for the projection data for the creation of intermediate representations. The classification accuracy, i.e., area under the curve in the receiver operating characteristic curve (ROC-AUC) and AUC in the precision-recall curve (PR-AUC), of federated averaging (FedAvg), DC, and DCPd was compared for five compound datasets. We determined that the machine learning performance for non-IID settings was in the order of DCPd, DC, and FedAvg, although they were almost the same in identically and independently distributed (IID) settings. Moreover, the results showed that compared to other methods, DCPd exhibited a negligible decline in classification accuracy in experiments with different degrees of label bias. Thus, DCPd can address the low performance in non-IID settings, which is one of the challenges of federated learning.
    摘要 因为带药到市场的时间和成本很高,许多研究都是使用机器学习预测药物性质基于其结构。 federated learning 已经应用于药物数据集来提高预测精度,但是在非标一致(non-IID)设置下,其精度低。为了解决这些 limitation,我们使用了一种名为数据合作分析(DC)的 alternativemethod of distributed machine learning to chemical compound data from open sources。我们还提出了基于投影数据(DCPd)的改进方法,该方法使用auxiliary PubChem数据来提高个人用户 сторо面数据变换的质量。我们对五个药物数据集进行了 federated averaging(FedAvg)、DC 和 DCPd 的比较,发现在非标一致设置下,DCPd 的机器学习性能高于 DC 和 FedAvg,尽管在标一致设置下它们几乎相同。此外,结果表明,相比其他方法,DCPd 在不同的标签偏好度下 exhibited 微不足的下降。因此,DCPd 可以解决 federated learning 在非标一致设置下的低性能问题。

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

  • paper_url: http://arxiv.org/abs/2308.00279
  • repo_url: https://github.com/woriazzc/robust-pu
  • paper_authors: Zhangchi Zhu, Lu Wang, Pu Zhao, Chao Du, Wei Zhang, Hang Dong, Bo Qiao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
  • for: 本研究旨在提高Positive-Unlabeled(PU)学习中 labels uncertainty的影响,提出一种基于人类学习的新robust PU学习方法,以提高学习精度和稳定性。
  • methods: 本研究提出了一种基于困难度''度量的训练策略,通过Iterative Training来细化选择负样本的过程,以包括更多的容易’’样本在早期训练阶段。
  • results: 实验结果表明,该方法可以有效地提高PU学习的精度和稳定性,并且可以在各种学习任务上实现良好的效果。
    Abstract Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature and has attracted much attention in recent years. One common approach in PU learning is to sample a set of pseudo-negatives from the unlabeled data using ad-hoc thresholds so that conventional supervised methods can be applied with both positive and negative samples. Owing to the label uncertainty among the unlabeled data, errors of misclassifying unlabeled positive samples as negative samples inevitably appear and may even accumulate during the training processes. Those errors often lead to performance degradation and model instability. To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first. Similar intuition has been utilized in curriculum learning to only use easier cases in the early stage of training before introducing more complex cases. Specifically, we utilize a novel ``hardness'' measure to distinguish unlabeled samples with a high chance of being negative from unlabeled samples with large label noise. An iterative training strategy is then implemented to fine-tune the selection of negative samples during the training process in an iterative manner to include more ``easy'' samples in the early stage of training. Extensive experimental validations over a wide range of learning tasks show that this approach can effectively improve the accuracy and stability of learning with positive and unlabeled data. Our code is available at https://github.com/woriazzc/Robust-PU
    摘要 学习正面和无标签数据的方法,称为正面无标签(PU)学习,在最近几年内吸引了很多关注。一种常见的PU学习方法是从无标签数据中随机选择一些假负样本,使得传统的监督学习方法可以使用正面和负样本。由于无标签数据中的标签不确定,在训练过程中会出现错误地将无标签正样本标记为负样本的现象,这可能会在训练过程中积累。这些错误会导致性能下降和模型不稳定。为了减轻标签不确定性的影响和提高学习正面和无标签数据的稳定性,我们提出了一种新的稳定PU学习方法。我们使用了一种新的“困难度”度量,以分辨无标签样本中高概率是负样本的和大量标签噪音。然后,我们实现了一种迭代训练策略,在训练过程中多次精度地选择负样本,以包括更多的“易于学习”的样本在早期训练阶段。我们对各种学习任务进行了广泛的实验 validate,结果表明,这种方法可以有效地提高学习正面和无标签数据的精度和稳定性。我们的代码可以在https://github.com/woriazzc/Robust-PU中找到。

Classes are not Clusters: Improving Label-based Evaluation of Dimensionality Reduction

  • paper_url: http://arxiv.org/abs/2308.00278
  • repo_url: https://github.com/hj-n/ltnc
  • paper_authors: Hyeon Jeon, Yun-Hsin Kuo, Michaël Aupetit, Kwan-Liu Ma, Jinwook Seo
  • for: 本文旨在提供一种新的方法来评估维度减少(DR)嵌入的可靠性,并基于类标签进行评估。
  • methods: 本文提出了两种新的质量指标:标签可靠性(Label-Trustworthiness)和标签连续性(Label-Continuity),以评估DR嵌入的质量。
  • results: 对比于传统的DR评估方法(如可靠性和连续性,库拉布-莱布尔差异),Label-T&C表现更高的准确性,并且可扩展到大规模数据集。此外,本文还提供了一些实践案例,用于揭示DR技术和其超参数的内在特性。
    Abstract A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into multiple separated clusters, and multiple classes can be merged into a single cluster. We thus cannot always assure the credibility of the evaluation using class labels. In this paper, we introduce two novel quality measures -- Label-Trustworthiness and Label-Continuity (Label-T&C) -- advancing the process of DR evaluation based on class labels. Instead of assuming that classes are well-clustered in the original space, Label-T&C work by (1) estimating the extent to which classes form clusters in the original and embedded spaces and (2) evaluating the difference between the two. A quantitative evaluation showed that Label-T&C outperform widely used DR evaluation measures (e.g., Trustworthiness and Continuity, Kullback-Leibler divergence) in terms of the accuracy in assessing how well DR embeddings preserve the cluster structure, and are also scalable. Moreover, we present case studies demonstrating that Label-T&C can be successfully used for revealing the intrinsic characteristics of DR techniques and their hyperparameters.
    摘要 一种常见的维度减少(DR) embedding 评估方法是通过衡量 labels 是否形成紧凑、相互隔离的群集来评估 embedding 的可靠性。这种方法假设高维空间中的类具有清晰的分布,但在实际情况下,这种假设可能被违反;一个类可能会被分割成多个分立的群集,多个类可能会被合并为一个群集。因此,我们无法一直保证评估使用类标签的可靠性。在这篇论文中,我们引入了两种新的质量指标:Label-Trustworthiness 和 Label-Continuity(Label-T&C),这些指标旨在基于类标签来评估维度减少 embedding 的可靠性。不同于假设高维空间中的类具有清晰的分布,Label-T&C 通过(1)估计原始和嵌入空间中类的群集程度,(2)评估这两个空间之间的差异来评估维度减少 embedding 的可靠性。一个量化评估表明,Label-T&C 在评估维度减少 embedding 保持类群结构的准确性方面表现出色,并且可扩展性也高。此外,我们还提供了用 Label-T&C 评估维度减少技术和其超参数的实际案例研究。

Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions

  • paper_url: http://arxiv.org/abs/2308.00273
  • repo_url: None
  • paper_authors: Samantha Chen, Yusu Wang
  • for: 本研究旨在设计一种能够近似 Wasserstein 距离的神经网络模型,并且可以独立于输入点集的大小进行模型训练。
  • methods: 我们提出了一种基于神经网络的总体架构,用于近似 Symmetric 和 Factor-wise Group Invariant(SFGI)函数。我们还 combinated 这种总体神经网络模型与一种卷积技术来开发一种特定和高效的神经网络模型,用于approximating Wasserstein 距离。
  • results: 我们的实验结果表明,我们所提出的神经网络模型在许多方面比其他模型(包括 SOTA Siamese Autoencoder 方法)perform 更好,并且可以更好地泛化和更快速地训练。
    Abstract Learning distance functions between complex objects, such as the Wasserstein distance to compare point sets, is a common goal in machine learning applications. However, functions on such complex objects (e.g., point sets and graphs) are often required to be invariant to a wide variety of group actions e.g. permutation or rigid transformation. Therefore, continuous and symmetric product functions (such as distance functions) on such complex objects must also be invariant to the product of such group actions. We call these functions symmetric and factor-wise group invariant (or SFGI functions in short). In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets. Very importantly, the required model complexity is independent of the sizes of input point sets. On the theoretical front, to the best of our knowledge, this is the first result showing that there exists a neural network with the capacity to approximate Wasserstein distance with bounded model complexity. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions. On the empirical front, we present a range of results showing that our newly proposed neural network architecture performs comparatively or better than other models (including a SOTA Siamese Autoencoder based approach). In particular, our neural network generalizes significantly better and trains much faster than the SOTA Siamese AE. Finally, this line of investigation could be useful in exploring effective neural network design for solving a broad range of geometric optimization problems (e.g., $k$-means in a metric space).
    摘要 学习 Complex 对象之间的距离函数,如 Wasserstein 距离比较点集,是机器学习应用中常见的目标。然而,函数在这些复杂对象上(例如点集和图)经常需要对广泛的群作用(例如 permutation 或扭转变换)进行抗变换。因此,在这些复杂对象上的连续和共轭产品函数(如距离函数)必须对群作用的乘积进行抗变换。我们称这类函数为共轭和因子化群作用抗变换函数(SFGI 函数)。在这篇论文中,我们首先提出一种通用的神经网络架构,用于近似 SFGI 函数。我们的主要贡献是将这种通用神经网络与抽象思想结合,开发了特定和高效的神经网络,用于近似 $p$-th Wasserstein 距离 between 点集。非常重要的是,模型复杂度不依赖输入点集的大小。在理论上,我们的结果表明,存在一种能够近似 Wasserstein 距离的神经网络,并且其模型复杂度受到输入点集的大小的限制。我们的工作提供了对几何问题的解决方案的有趣的整合,以及通用神经网络的近似Symmetric 函数的能力。在实际方面,我们发现我们的新提出的神经网络架构在多个实际问题中表现较好,比如 SOTA Siamese Autoencoder 基于方法。具体来说,我们的神经网络在泛化和训练速度方面表现较好,而且可以在训练时更好地控制模型的复杂度。最后,这种研究方向可能会对解决广泛的几何优化问题(例如 metric 空间中的 $k$-means)提供有用的思路。

Multi-Modality Multi-Loss Fusion Network

  • paper_url: http://arxiv.org/abs/2308.00264
  • repo_url: None
  • paper_authors: Zehui Wu, Ziwei Gong, Jaywon Koo, Julia Hirschberg
  • for: 本研究探讨多modalities Feature选择和融合的优化方法,以提高情感识别性能。
  • methods: 本研究使用多种融合方法,并 investigate multi-loss training的影响于多模态融合网络性能。
  • results: 我们的最佳模型在三个dataset(CMU-MOSI、CMU-MOSEI和CH-SIMS)上达到了状态之arte性能,并在大多数指标中超越其他方法。我们发现,训练在多模态特征上可以提高单模态测试性能,而根据数据集注释schema设计融合方法可以提高模型性能。这些结果提供了优化特征选择和融合方法的路线图,以提高情感识别 neural network 性能。
    Abstract In this work we investigate the optimal selection and fusion of features across multiple modalities and combine these in a neural network to improve emotion detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network, identifying useful findings relating to subnet performance. Our best model achieves state-of-the-art performance for three datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS), and outperforms the other methods in most metrics. We have found that training on multimodal features improves single modality testing and designing fusion methods based on dataset annotation schema enhances model performance. These results suggest a roadmap towards an optimized feature selection and fusion approach for enhancing emotion detection in neural networks.
    摘要 在这项研究中,我们调查了不同Modalities之间的最佳选择和融合方法,并将这些方法 integrate into a neural network 以提高情感识别。我们比较了不同的融合方法,并查看了在多模态融合网络中多loss训练的影响,发现有用的发现关于子网络性能。我们的最佳模型在CMU-MOSI、CMU-MOSEI和CH-SIMS三个 datasets 上达到了状态机器的性能,并在大多数指标中超过其他方法。我们发现,在多模态特征上进行训练可以提高单模态测试,而基于dataset annotation schema的融合方法设计可以提高模型性能。这些结果提供了增强情感识别在神经网络中的优化特征选择和融合方法的路线图。

Asynchronous Federated Learning with Bidirectional Quantized Communications and Buffered Aggregation

  • paper_url: http://arxiv.org/abs/2308.00263
  • repo_url: None
  • paper_authors: Tomas Ortega, Hamid Jafarkhani
  • for: 提高 Federated Learning 的效率和可扩展性
  • methods: 使用量化通信 schemes 和 shared “hidden” state 技术
  • results: 提供了理论上的 convergence guarantees 和实验 validate Results on 标准 benchmark 上
    Abstract Asynchronous Federated Learning with Buffered Aggregation (FedBuff) is a state-of-the-art algorithm known for its efficiency and high scalability. However, it has a high communication cost, which has not been examined with quantized communications. To tackle this problem, we present a new algorithm (QAFeL), with a quantization scheme that establishes a shared "hidden" state between the server and clients to avoid the error propagation caused by direct quantization. This approach allows for high precision while significantly reducing the data transmitted during client-server interactions. We provide theoretical convergence guarantees for QAFeL and corroborate our analysis with experiments on a standard benchmark.
    摘要 “异步联合学习(FedBuff)”是当前最佳实践,具有高效率和可扩展性。然而,它具有高通信成本,这未曾与量化通信相关研究。为解决这个问题,我们提出了一个新算法(QAFeL),具有量化方案,在服务器和客户端之间共享一个“隐藏”状态,以避免因直接量化而导致的错误协议。这种方法允许高精度,同时减少了在客户端-服务器交互中传输的数据量。我们提供了论证性的收敛保证,并在标准 benchmark 上进行了实验 validate。

AQUILA: Communication Efficient Federated Learning with Adaptive Quantization of Lazily-Aggregated Gradients

  • paper_url: http://arxiv.org/abs/2308.00258
  • repo_url: None
  • paper_authors: Zihao Zhao, Yuzhu Mao, Zhenpeng Shi, Yang Liu, Tian Lan, Wenbo Ding, Xiao-Ping Zhang
  • for: 提高 Federated Learning(分布式学习)的效率和可靠性,解决高通信开销和模型偏差问题。
  • methods: 提出了一种名为AQUILA(批量化的强化器)的新的适应性框架,通过优化设备选择和量化方法,提高分布式学习的效率和可靠性。
  • results: 在非同一样式的分布式学习 Setting中,AQUILA 可以大幅减少通信成本,同时保持模型性能的相似性,并且可以适应不同的设备和数据偏差。
    Abstract The widespread adoption of Federated Learning (FL), a privacy-preserving distributed learning methodology, has been impeded by the challenge of high communication overheads, typically arising from the transmission of large-scale models. Existing adaptive quantization methods, designed to mitigate these overheads, operate under the impractical assumption of uniform device participation in every training round. Additionally, these methods are limited in their adaptability due to the necessity of manual quantization level selection and often overlook biases inherent in local devices' data, thereby affecting the robustness of the global model. In response, this paper introduces AQUILA (adaptive quantization of lazily-aggregated gradients), a novel adaptive framework devised to effectively handle these issues, enhancing the efficiency and robustness of FL. AQUILA integrates a sophisticated device selection method that prioritizes the quality and usefulness of device updates. Utilizing the exact global model stored by devices, it enables a more precise device selection criterion, reduces model deviation, and limits the need for hyperparameter adjustments. Furthermore, AQUILA presents an innovative quantization criterion, optimized to improve communication efficiency while assuring model convergence. Our experiments demonstrate that AQUILA significantly decreases communication costs compared to existing methods, while maintaining comparable model performance across diverse non-homogeneous FL settings, such as Non-IID data and heterogeneous model architectures.
    摘要 通用学习(FL)的广泛采用受到了大规模模型的传输 overhead 的挑战。现有的 adaptive quantization 方法可以减少这些 overhead,但是它们假设所有训练轮都有 uniform 的设备参与,并且它们的适应性受限,因为需要手动选择 quantization 级别。此外,这些方法通常会忽略本地设备数据中的偏见,从而影响全局模型的稳定性。为此,这篇论文提出了 AQUILA(适应量化 Lazy 梯度),一种新的适应性框架,可以有效地解决这些问题,提高 FL 的效率和稳定性。AQUILA integrates 一种智能的设备选择方法,根据设备更新的质量和用用性来优先选择设备。通过使用设备上存储的准确全球模型,AQUILA 可以更准确地选择设备,降低模型偏差,并减少 hyperparameter 调整的需求。此外,AQUILA 还提出了一种优化的量化标准,可以提高通信效率,保证模型收敛。我们的实验表明,AQUILA 比现有方法更有效地减少通信成本,同时保持非常多样化的 FL 设定下的模型性能相对一致。

Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique

  • paper_url: http://arxiv.org/abs/2308.00251
  • repo_url: None
  • paper_authors: Junxian Zhu, Jin Zhu, Borui Tang, Xuanyu Chen, Hongmei Lin, Xueqin Wang
  • for: 这篇论文的目的是为了提出一个高维度普通线性模型中的简洁模型,以便更好地处理响应的变化。
  • methods: 这篇论文使用了一个快速的算法来选择最佳子集,并且在定理Conditions下提出了一个算法。
  • results: 根据实验结果,这个方法可以在高维度情况下更好地选择最佳子集,并且比较快速,相比之下glmnet和ncvreg等工具kit更快。
    Abstract In high-dimensional generalized linear models, it is crucial to identify a sparse model that adequately accounts for response variation. Although the best subset section has been widely regarded as the Holy Grail of problems of this type, achieving either computational efficiency or statistical guarantees is challenging. In this article, we intend to surmount this obstacle by utilizing a fast algorithm to select the best subset with high certainty. We proposed and illustrated an algorithm for best subset recovery in regularity conditions. Under mild conditions, the computational complexity of our algorithm scales polynomially with sample size and dimension. In addition to demonstrating the statistical properties of our method, extensive numerical experiments reveal that it outperforms existing methods for variable selection and coefficient estimation. The runtime analysis shows that our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits like glmnet and ncvreg.
    摘要 高维度泛化线性模型中,必须确定一个稀畴模型,以便准确地考虑响应的变化。虽然最佳子集问题被广泛认为是这类问题的圣杯,但是实现计算效率或统计保证是困难的。在这篇文章中,我们想要通过使用快速的算法来选择最佳子集,以确保高度的确定性。我们提出了一种算法来实现最佳子集恢复,并在正则条件下进行了推广。对于样本大小和维度,我们的算法的计算复杂度随样本大小和维度的幂次方式增长。除了证明我们的方法的统计性质外,我们还进行了广泛的数值实验,发现我们的方法可以在变量选择和系数估计方面超越现有的方法。我们的实现的运行时间分析表明,我们的实现可以相比popular变量选择工具包如glmnet和ncvreg achieve approximately fourfold speedup.

EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning

  • paper_url: http://arxiv.org/abs/2308.00246
  • repo_url: None
  • paper_authors: Dustin Pulver, Prithila Angkan, Paul Hungler, Ali Etemad
  • for: 这篇论文主要目的是提出一种基于电enzephalogram(EEG)的认知负担分类方法。
  • methods: 该方法使用变换器架构,通过跨越情感和认知负担的学习转移来进行分类。文章首先使用自我监睹带有隐藏标签的批处理自动编码来预训练模型,然后使用冻结权重和细化调整来进行下游认知负担分类。
  • results: 实验结果显示,提出的方法可以达到强劲的结果,并且超过了传统的单阶段完全监睹学习。此外,文章还进行了细化和敏感性研究,以评估不同方法的影响。这篇研究对情感计算和认知负担领域的发展做出了贡献,并开启了新的跨领域转移学习和自我监睹预训练的研究途径。
    Abstract Cognitive load, the amount of mental effort required for task completion, plays an important role in performance and decision-making outcomes, making its classification and analysis essential in various sensitive domains. In this paper, we present a new solution for the classification of cognitive load using electroencephalogram (EEG). Our model uses a transformer architecture employing transfer learning between emotions and cognitive load. We pre-train our model using self-supervised masked autoencoding on emotion-related EEG datasets and use transfer learning with both frozen weights and fine-tuning to perform downstream cognitive load classification. To evaluate our method, we carry out a series of experiments utilizing two publicly available EEG-based emotion datasets, namely SEED and SEED-IV, for pre-training, while we use the CL-Drive dataset for downstream cognitive load classification. The results of our experiments show that our proposed approach achieves strong results and outperforms conventional single-stage fully supervised learning. Moreover, we perform detailed ablation and sensitivity studies to evaluate the impact of different aspects of our proposed solution. This research contributes to the growing body of literature in affective computing with a focus on cognitive load, and opens up new avenues for future research in the field of cross-domain transfer learning using self-supervised pre-training.
    摘要 “认知负担”,指的是完成任务所需的心理努力量,在完成任务和做出决策时扮演着重要的角色。因此,认知负担的分类和分析在各种敏感领域都是非常重要的。在这篇论文中,我们提出了一个新的认知负担分类方法,使用电enzephalogram(EEG)。我们的模型使用transformer架构,通过将情感和认知负担之间的学习整合在一起。我们在这个模型中使用了自我监督隐藏式复原,并使用两种方法:冻结重量和精确地复原,进行下游认知负担分类。为了评估我们的方法,我们在两个公开 disponibile EEG基于情感 datasets,namely SEED和SEED-IV,进行预训练,而下游认知负担分类则使用CL-Drive dataset。实验结果显示,我们的提出方法具有强大的表现和超越传统单阶充足学习。此外,我们还进行了细节抑制和敏感性研究,以评估不同方面的影响。这些研究将对情感 computing的发展做出贡献,并开启了跨领域批量学习使用自我监督隐藏式预训练的新领域。

Beam Detection Based on Machine Learning Algorithms

  • paper_url: http://arxiv.org/abs/2308.00718
  • repo_url: None
  • paper_authors: Haoyuan Li, Qing Yin
  • for: precisely determine the positions of free electron laser beams on screens
  • methods: sequence of machine learning models, including transfer training in a self-constructed convolutional neural network based on VGG16 model and support vector regression model
  • results: 85.8% correct prediction on test data
    Abstract The positions of free electron laser beams on screens are precisely determined by a sequence of machine learning models. Transfer training is conducted in a self-constructed convolutional neural network based on VGG16 model. Output of intermediate layers are passed as features to a support vector regression model. With this sequence, 85.8% correct prediction is achieved on test data.
    摘要 文本中的自由电子激光束在屏幕上的位置精确地由一个机器学习模式序列确定。在自己构建的卷积神经网络基于VGG16模型中进行了传输训练。输出的中间层的输出被用作特征传递到支持向量回归模型。通过这个序列,在测试数据上达到了85.8%的正确预测率。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks

  • paper_url: http://arxiv.org/abs/2308.01423
  • repo_url: https://github.com/yeonghun1675/chatmof
  • paper_authors: Yeonghun Kang, Jihan Kim
  • for: 这个论文是为了探讨和应用大型自然语言处理模型(LLMs)在物理科学中的可能性和限制,以及其在结构材料预测和生成方面的表现。
  • methods: 该论文使用了一种自动化的人工智能系统,即ChatMOF,该系统基于大型语言模型(gpt-3.5-turbo),可以从文本输入中提取关键信息并提供相应的回答,从而消除了僵化的结构化查询的需要。
  • results: 研究表明,使用LLMs可以在物理科学中实现高精度的结构材料预测和生成,并且可以帮助找到新的结构材料和应用。同时,研究也揭示了使用LLMs的一些缺点和限制。
    Abstract ChatMOF is an autonomous Artificial Intelligence (AI) system that is built to predict and generate of metal-organic frameworks (MOFs). By leveraging a large-scale language model (gpt-3.5-turbo), ChatMOF extracts key details from textual inputs and delivers appropriate responses, thus eliminating the necessity for rigid structured queries. The system is comprised of three core components (i.e. an agent, a toolkit, and an evaluator) and it forms a robust pipeline that manages a variety of tasks, including data retrieval, property prediction, and structure generation. The study further explores the merits and constraints of using large language models (LLMs) AI system in material sciences using and showcases its transformative potential for future advancements.
    摘要 chatMOF 是一个无人驾驶的人工智能系统(AI),用于预测和生成金属有机框架(MOFs)。通过充分利用大规模语言模型(gpt-3.5-turbo),chatMOF 从文本输入中提取重要资讯,并提供适当的回应,因此消除了僵化的结构化查询的需要。系统由三个核心 ком成分(即代理人、工具组和评估器)组成,形成一个完整的管道,处理许多任务,包括数据检索、属性预测和结构生成。研究进一步探讨使用大语言模型(LLMs)AI系统在材料科学中的优点和缺点,并展示其将来发展的transformative潜力。

Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2308.00231
  • repo_url: None
  • paper_authors: Sadhana Lolla, Iaroslav Elistratov, Alejandro Perez, Elaheh Ahmadi, Daniela Rus, Alexander Amini
  • for: 提高大规模深度神经网络(NNs)的风险意识和可靠性。
  • methods: 提出了一种框架,可以将多种风险形式的量化和不同风险量化算法相互组合,以提供更全面的风险意识。
  • results: 在复杂的感知 datasets 上,通过实现现有的不确定性估计算算法,证明了 capsa 框架可以轻松地组合不同类型的风险量化算法,并提供了全面的风险意识。
    Abstract The modern pervasiveness of large-scale deep neural networks (NNs) is driven by their extraordinary performance on complex problems but is also plagued by their sudden, unexpected, and often catastrophic failures, particularly on challenging scenarios. Existing algorithms that provide risk-awareness to NNs are complex and ad-hoc. Specifically, these methods require significant engineering changes, are often developed only for particular settings, and are not easily composable. Here we present capsa, a framework for extending models with risk-awareness. Capsa provides a methodology for quantifying multiple forms of risk and composing different algorithms together to quantify different risk metrics in parallel. We validate capsa by implementing state-of-the-art uncertainty estimation algorithms within the capsa framework and benchmarking them on complex perception datasets. We demonstrate capsa's ability to easily compose aleatoric uncertainty, epistemic uncertainty, and bias estimation together in a single procedure, and show how this approach provides a comprehensive awareness of NN risk.
    摘要 现代大规模深度神经网络(NN)的广泛应用受其在复杂问题上的非凡表现驱动,但同时也受到其意外、不可预期的失败的威胁,尤其是在复杂的场景下。现有的风险意识提供方法都是复杂且尝试性的,需要大量的工程改进,通常只适用于特定的场景,并且不易组合。我们提出了capsa框架,用于扩展模型的风险意识。capsa提供了多种风险量化的方法ology,可以并行量化不同的风险指标。我们通过在capsa框架中实现现状风险估计算法,并对复杂的感知数据集进行了 benchmarking,以示capsa的能力可以轻松地组合不同的风险估计算法,并提供全面的NN风险意识。

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

  • paper_url: http://arxiv.org/abs/2308.00225
  • repo_url: None
  • paper_authors: Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov
  • for: 本研究旨在探讨大语言模型(LM)在受到人类反馈指导下进行调教后,是否会受到更多的偏见。
  • methods: 本研究使用了三种常见的偏见(即嘱结效应、信任效应和信仰偏见)来检测大语言模型中的偏见。
  • results: 研究发现,经过 instruciton 调教的模型具有更多的偏见,尤其是 Flan-T5、GPT3.5 和 GPT4 等模型。这些偏见在人类决策和理解中都有表现。
    Abstract Recent studies show that instruction tuning and learning from human feedback improve the abilities of large language models (LMs) dramatically. While these tuning methods can make models generate high-quality text, we conjecture that more implicit cognitive biases may arise in these fine-tuned models. Our work provides evidence that these fine-tuned models exhibit biases that were absent or less pronounced in their pretrained predecessors. We examine the extent of this phenomenon in three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. Our findings highlight the presence of these biases in various models, especially those that have undergone instruction tuning, such as Flan-T5, GPT3.5, and GPT4. This research constitutes a step toward comprehending cognitive biases in instruction-tuned LMs, which is crucial for the development of more reliable and unbiased language models.
    摘要 latest studies show that instruction tuning and learning from human feedback can significantly improve the abilities of large language models (LMs). while these tuning methods can make models generate high-quality text, we speculate that more implicit cognitive biases may arise in these fine-tuned models. our work provides evidence that these fine-tuned models exhibit biases that were absent or less pronounced in their pretrained predecessors. we examine the extent of this phenomenon in three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. our findings highlight the presence of these biases in various models, especially those that have undergone instruction tuning, such as Flan-T5, GPT3.5, and GPT4. this research constitutes a step toward comprehending cognitive biases in instruction-tuned LMs, which is crucial for the development of more reliable and unbiased language models.

Deep Reinforcement Learning-Based Battery Conditioning Hierarchical V2G Coordination for Multi-Stakeholder Benefits

  • paper_url: http://arxiv.org/abs/2308.00218
  • repo_url: None
  • paper_authors: Yubao Zhang, Xin Chen, Yi Gu, Zhicheng Li, Wu Kai
  • for: 提高可再生能源利用率和电网稳定性,适用于大规模EV充电 scheduling 策略
  • methods: 基于深度遗产 reinforcement learning 和 Proof of Stake 算法,实现多方利益协调
  • results: 相比四种基准方案,多方协调策略可以提高可再生能源消耗率、缓和负荷波动、满足EV充电商需求,降低充电成本和电池质量下降
    Abstract With the growing prevalence of electric vehicles (EVs) and advancements in EV electronics, vehicle-to-grid (V2G) techniques and large-scale scheduling strategies have emerged to promote renewable energy utilization and power grid stability. This study proposes a multi-stakeholder hierarchical V2G coordination based on deep reinforcement learning (DRL) and the Proof of Stake algorithm. Furthermore, the multi-stakeholders include the power grid, EV aggregators (EVAs), and users, and the proposed strategy can achieve multi-stakeholder benefits. On the grid side, load fluctuations and renewable energy consumption are considered, while on the EVA side, energy constraints and charging costs are considered. The three critical battery conditioning parameters of battery SOX are considered on the user side, including state of charge, state of power, and state of health. Compared with four typical baselines, the multi-stakeholder hierarchical coordination strategy can enhance renewable energy consumption, mitigate load fluctuations, meet the energy demands of EVA, and reduce charging costs and battery degradation under realistic operating conditions.
    摘要 On the grid side, the approach takes into account load fluctuations and renewable energy consumption, while on the EVA side, it considers energy constraints and charging costs. For users, the approach considers three critical battery conditioning parameters: state of charge, state of power, and state of health.Compared to four typical baselines, the multi-stakeholder hierarchical coordination strategy can enhance renewable energy consumption, mitigate load fluctuations, meet the energy demands of EVA, and reduce charging costs and battery degradation under realistic operating conditions.

Robust Single-view Cone-beam X-ray Pose Estimation with Neural Tuned Tomography (NeTT) and Masked Neural Radiance Fields (mNeRF)

  • paper_url: http://arxiv.org/abs/2308.00214
  • repo_url: None
  • paper_authors: Chaochao Zhou, Syed Hasib Akhter Faruqui, Abhinav Patel, Ramez N. Abdalla, Michael C. Hurley, Ali Shaibani, Matthew B. Potts, Babak S. Jahromi, Leon Cho, Sameer A. Ansari, Donald R. Cantrell
  • for: 这篇论文是用于提出新的方法来进行镜像干扰物体pose estimation,使用X射线投影来达到3D空间中的目标。
  • methods: 这篇论文使用了新的拟合渠道技术来计算数字重建成像(DRR),并使用TensorFlow中的自动导数来实现可导的渠道。pose estimation是通过 iterative gradient descent 使用一个loss函数来衡量DRR Synthesized从一个随机初始化的pose和真实的镜像图像在目标pose之间的相似性来进行。
  • results: 这篇论文提出了两种新的高精度视图合成方法,即Neural Tuned Tomography(NeTT)和masked Neural Radiance Fields(mNeRF)。这两种方法都基于经典的扁球计算机断层成像(CBCT),NeTT直接优化CBCT的密度,而mNeRF的非零值是通过3D掩模来约束。这两种方法都能够提高pose estimation的准确率,并且NeTT的计算成本远低于mNeRF。此外,这篇论文还证明了NeTT可以在训练和pose estimation阶段具有更好的一致性,并且可以在不同的主体上进行高精度的DRR Synthesized和pose estimation。因此,这篇论文建议使用NeTT来实现 robust pose estimation。
    Abstract Many tasks performed in image-guided, mini-invasive, medical procedures can be cast as pose estimation problems, where an X-ray projection is utilized to reach a target in 3D space. Expanding on recent advances in the differentiable rendering of optically reflective materials, we introduce new methods for pose estimation of radiolucent objects using X-ray projections, and we demonstrate the critical role of optimal view synthesis in performing this task. We first develop an algorithm (DiffDRR) that efficiently computes Digitally Reconstructed Radiographs (DRRs) and leverages automatic differentiation within TensorFlow. Pose estimation is performed by iterative gradient descent using a loss function that quantifies the similarity of the DRR synthesized from a randomly initialized pose and the true fluoroscopic image at the target pose. We propose two novel methods for high-fidelity view synthesis, Neural Tuned Tomography (NeTT) and masked Neural Radiance Fields (mNeRF). Both methods rely on classic Cone-Beam Computerized Tomography (CBCT); NeTT directly optimizes the CBCT densities, while the non-zero values of mNeRF are constrained by a 3D mask of the anatomic region segmented from CBCT. We demonstrate that both NeTT and mNeRF distinctly improve pose estimation within our framework. By defining a successful pose estimate to be a 3D angle error of less than 3 deg, we find that NeTT and mNeRF can achieve similar results, both with overall success rates more than 93%. However, the computational cost of NeTT is significantly lower than mNeRF in both training and pose estimation. Furthermore, we show that a NeTT trained for a single subject can generalize to synthesize high-fidelity DRRs and ensure robust pose estimations for all other subjects. Therefore, we suggest that NeTT is an attractive option for robust pose estimation using fluoroscopic projections.
    摘要 许多在图像指导、微创性医疗过程中进行的任务可以被视为定位估计问题,其中使用X射线投影来到达3D空间中的目标。在这篇文章中,我们推出了一种新的定位估计方法,使用X射线投影来估计透明物体的定位。我们首先开发了一种效率高的计算Digitally Reconstructed Radiographs(DRR)的算法(DiffDRR),并利用TensorFlow中的自动导数来实现。在这种方法中,我们使用一个定制的损失函数来衡量DRR从Random Initialized Pose(RIP)中synthesized的和真实的fluoroscopic图像之间的相似性。我们还提出了两种高精度视图合成方法:Neural Tuned Tomography(NeTT)和Masked Neural Radiance Fields(mNeRF)。两种方法都基于经典的射线计算电脑断层Tomography(CBCT);NeTT直接优化CBCT的密度,而mNeRF的非零值是由3Dmask的Anatomic Region(AR)段化from CBCT进行限制。我们发现NeTT和mNeRF都可以大幅提高定位估计的精度,并且NeTT的计算成本远低于mNeRF。此外,我们发现一个NeTT在单个主体上进行Training可以为所有其他主体synthesize高品质的DRR和确保Robust定位估计。因此,我们建议NeTT作为Robust定位估计的可靠选择。

SkullGAN: Synthetic Skull CT Generation with Generative Adversarial Networks

  • paper_url: http://arxiv.org/abs/2308.00206
  • repo_url: https://github.com/kbp-lab/skullgan
  • paper_authors: Kasra Naftchi-Ardebili, Karanpartap Singh, Reza Pourabolghasem, Pejman Ghanouni, Gerald R. Popelka, Kim Butts Pauly
  • for: 这个研究想要使用生成器进行人类头骨的数据生成,以便将机器学习技术应用到医疗领域中。
  • methods: 这个研究使用了生成对抗网络(GAN),将 CT 图像转换为生成的人类头骨图像。
  • results: 研究发现,SkullGAN 生成的人类头骨图像与实际的头骨图像之间存在类似的三个量化医学特征,并且通过应用 SkullGAN 检测器来类别,发现 SkullGAN 生成的头骨图像集和实际头骨图像集之间是无法区分的。
    Abstract Deep learning offers potential for various healthcare applications involving the human skull but requires extensive datasets of curated medical images. To overcome this challenge, we propose SkullGAN, a generative adversarial network (GAN), to create large datasets of synthetic skull CT slices, reducing reliance on real images and accelerating the integration of machine learning into healthcare. In our method, CT slices of 38 subjects were fed to SkullGAN, a neural network comprising over 200 million parameters. The synthetic skull images generated were evaluated based on three quantitative radiological features: skull density ratio (SDR), mean thickness, and mean intensity. They were further analyzed using t-distributed stochastic neighbor embedding (t-SNE) and by applying the SkullGAN discriminator as a classifier. The results showed that SkullGAN-generated images demonstrated similar key quantitative radiological features to real skulls. Further definitive analysis was undertaken by applying the discriminator of SkullGAN, where the SkullGAN discriminator classified 56.5% of a test set of real skull images and 55.9% of the SkullGAN-generated images as reals (the theoretical optimum being 50%), demonstrating that the SkullGAN-generated skull set is indistinguishable from the real skull set - within the limits of our nonlinear classifier. Therefore, SkullGAN makes it possible to generate large numbers of synthetic skull CT segments, necessary for training neural networks for medical applications involving the human skull. This mitigates challenges associated with preparing large, high-quality training datasets, such as access, capital, time, and the need for domain expertise.
    摘要 深度学习对医疗领域中人骨的应用具有潜在优势,但是它需要大量高质量医疗图像数据集来进行训练。为了解决这个挑战,我们提议了一种基于生成对抗网络(GAN)的方法,称之为SkullGAN。SkullGAN可以生成大量的人工骨CT切片,从而减少对真实图像的依赖,并促进机器学习在医疗领域的整合。我们的方法是将38名参与者的CT切片传输给SkullGAN,这是一个包含超过2亿个参数的神经网络。SkullGAN生成的人工骨图像被评估基于三个量化医学特征:骨密度比率(SDR)、平均厚度和平均亮度。它们还被使用t-分布随机 neigh embedding(t-SNE)分析,并通过应用SkullGAN推论器来分类。结果表明,SkullGAN生成的图像与真实骨头图像之间存在相似的三个量化医学特征。进一步的定论分析表明,SkullGAN推论器可以将56.5%的测试集真实骨头图像和55.9%的SkullGAN生成图像分类为真实图像(理论最佳值为50%),这表明SkullGAN生成的骨头集与真实骨头集是可区分的,至少在我们的非线性分类器的限度内。因此,SkullGAN可以生成大量的人工骨CT切片,这些切片可以用于训练医学应用中需要骨头图像的神经网络。这种方法可以解决准备大量高质量医疗图像数据集的挑战,包括访问、资本、时间和域专业知识的需求。

CBCL-PR: A Cognitively Inspired Model for Class-Incremental Learning in Robotics

  • paper_url: http://arxiv.org/abs/2308.00199
  • repo_url: https://github.com/aliayub7/cbcl-pr
  • paper_authors: Ali Ayub, Alan R. Wagner
  • for: 本研究旨在解决很少样本数的情况下,AI机器人需要不断适应和学习环境中的问题。
  • methods: 我们提出了一种基于hippocampus和neocortex理论的novel框架,用于解决增量学习问题。该框架将对象类表示为集合的集合,并将其存储在内存中。在学习新类时,框架会重温以前学习的类的数据,以避免忘记。
  • results: 我们在两个物体分类 dataset 上进行了评估,并取得了当前最佳表现(SOTA)的结果。此外,我们还在一个机器人上进行了增量学习和不断学习的测试,并证明了机器人可以在有限的人工协助下,不断学习分类大量的家用品。
    Abstract For most real-world applications, robots need to adapt and learn continually with limited data in their environments. In this paper, we consider the problem of Few-Shot class Incremental Learning (FSIL), in which an AI agent is required to learn incrementally from a few data samples without forgetting the data it has previously learned. To solve this problem, we present a novel framework inspired by theories of concept learning in the hippocampus and the neocortex. Our framework represents object classes in the form of sets of clusters and stores them in memory. The framework replays data generated by the clusters of the old classes, to avoid forgetting when learning new classes. Our approach is evaluated on two object classification datasets resulting in state-of-the-art (SOTA) performance for class-incremental learning and FSIL. We also evaluate our framework for FSIL on a robot demonstrating that the robot can continually learn to classify a large set of household objects with limited human assistance.
    摘要 For most real-world applications, robots need to adapt and learn continually with limited data in their environments. In this paper, we consider the problem of Few-Shot class Incremental Learning (FSIL), in which an AI agent is required to learn incrementally from a few data samples without forgetting the data it has previously learned. To solve this problem, we present a novel framework inspired by theories of concept learning in the hippocampus and the neocortex. Our framework represents object classes in the form of sets of clusters and stores them in memory. The framework replays data generated by the clusters of the old classes, to avoid forgetting when learning new classes. Our approach is evaluated on two object classification datasets resulting in state-of-the-art (SOTA) performance for class-incremental learning and FSIL. We also evaluate our framework for FSIL on a robot, demonstrating that the robot can continually learn to classify a large set of household objects with limited human assistance.Here's the translation in Traditional Chinese as well:For most real-world applications, robots need to adapt and learn continually with limited data in their environments. In this paper, we consider the problem of Few-Shot class Incremental Learning (FSIL), in which an AI agent is required to learn incrementally from a few data samples without forgetting the data it has previously learned. To solve this problem, we present a novel framework inspired by theories of concept learning in the hippocampus and the neocortex. Our framework represents object classes in the form of sets of clusters and stores them in memory. The framework replays data generated by the clusters of the old classes, to avoid forgetting when learning new classes. Our approach is evaluated on two object classification datasets resulting in state-of-the-art (SOTA) performance for class-incremental learning and FSIL. We also evaluate our framework for FSIL on a robot, demonstrating that the robot can continually learn to classify a large set of household objects with limited human assistance.

C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation

  • paper_url: http://arxiv.org/abs/2308.00193
  • repo_url: None
  • paper_authors: Boah Kim, Yujin Oh, Bradford J. Wood, Ronald M. Summers, Jong Chul Ye
  • for: This paper aims to develop a self-supervised vessel segmentation method for medical imaging, which can help improve the accuracy and efficiency of vascular disease diagnosis and interventional planning.
  • methods: The proposed method, called C-DARL, combines a diffusion module and a generation module to learn the distribution of multi-domain blood vessel data, and employs contrastive learning through a mask-based contrastive loss to improve the realism of vessel representations.
  • results: Experimental results on various vessel datasets show that C-DARL achieves performance improvement over baseline methods with noise robustness, demonstrating the effectiveness of the proposed method for vessel segmentation in medical imaging.Here’s the summary in Traditional Chinese:
  • for: 这篇论文旨在发展一种自主超级的血管分类方法,以帮助提高医疗影像诊断和手术规划的精度和效率。
  • methods: 提案的方法(C-DARL)结合了一个扩散模组和一个生成模组,以学习多域血管数据的分布,并透过一个mask-based对称损失来提高血管表现的实ism。
  • results: 实验结果显示,C-DARL在不同的血管数据集上实现了基准方法的性能改进,并具有噪声抗性,证明了提案的方法的有效性。
    Abstract Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this paper presents a self-supervised vessel segmentation method, dubbed the contrastive diffusion adversarial representation learning (C-DARL) model. Our model is composed of a diffusion module and a generation module that learns the distribution of multi-domain blood vessel data by generating synthetic vessel images from diffusion latent. Moreover, we employ contrastive learning through a mask-based contrastive loss so that the model can learn more realistic vessel representations. To validate the efficacy, C-DARL is trained using various vessel datasets, including coronary angiograms, abdominal digital subtraction angiograms, and retinal imaging. Experimental results confirm that our model achieves performance improvement over baseline methods with noise robustness, suggesting the effectiveness of C-DARL for vessel segmentation.
    摘要 血管分割在医疗成像中是一项非常重要的步骤,用于诊断血管疾病和静脉介入规划,在各种临床场景中具有广泛的应用前景。然而,手动标注血管mask是一项困难和耗时的任务,因为血管分支和结构较为复杂。为了解决这个问题,本文提出了一种自我超级视图的血管分割方法,即对比扩散对抗表示学习(C-DARL)模型。我们的模型由扩散模块和生成模块组成,通过生成多域血管数据的分布来学习血管图像的分布。此外,我们采用了对比学习,通过一个面积基于的对比损失函数,使模型学习更真实的血管表示。为验证效果,C-DARL被训练使用了多种血管数据集,包括肠动脉摄影、腹部数字抽象摄影和视网膜成像。实验结果表明,我们的模型在噪音Robustness下达到了基eline方法的性能提升,这表明C-DARL是一种有效的血管分割方法。

Universal Majorization-Minimization Algorithms

  • paper_url: http://arxiv.org/abs/2308.00190
  • repo_url: None
  • paper_authors: Matthew Streeter
  • for: 这个论文是为了提出一种新的优化方法,可以应用于任何问题,并且不需要手动定义减少函数。
  • methods: 这种优化方法使用自动梯度下降来自动生成减少函数的加大函数,从而实现了无需手动定义减少函数的优化。
  • results: 实验结果表明,这种优化方法可以快速地收敛到最优解,并且可以从任何初始点开始,无需进行任何参数调整。
    Abstract Majorization-minimization (MM) is a family of optimization methods that iteratively reduce a loss by minimizing a locally-tight upper bound, called a majorizer. Traditionally, majorizers were derived by hand, and MM was only applicable to a small number of well-studied problems. We present optimizers that instead derive majorizers automatically, using a recent generalization of Taylor mode automatic differentiation. These universal MM optimizers can be applied to arbitrary problems and converge from any starting point, with no hyperparameter tuning.
    摘要 大量化抑制(MM)是一家优化方法的家族,通过迭代地减少损失函数,最小化一个本地紧急上限函数,称为主要函数。在过去,主要函数通常是通过手动计算得到的,因此MM只能应用于一小部分已经广泛研究的问题上。我们提出了一种使用最近的欧拉积分自动微分的一般化方法来自动生成主要函数。这些普适的MM优化器可以应用于任何问题,并从任何初始点开始,无需参数调整。

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?

  • paper_url: http://arxiv.org/abs/2308.00189
  • repo_url: None
  • paper_authors: Ari Holtzman, Peter West, Luke Zettlemoyer
  • for: 本研究目的是解释语音模型如何完成多种任务,并且帮助未来的研究。
  • methods: 本研究使用系统性的方法来分解语音模型的行为,以解释它们在不同任务中的表现。
  • results: 本研究获得了一系列的结果,包括语音模型在不同任务中的表现,以及它们的行为可以被分解为不同的类别。
    Abstract Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP and is reshaping how we interact with computers. What was once a scientific engineering discipline-in which building blocks are stacked one on top of the other-is arguably already a complex systems science, in which emergent behaviors are sought out to support previously unimagined use cases. Despite the ever increasing number of benchmarks that measure task performance, we lack explanations of what behaviors language models exhibit that allow them to complete these tasks in the first place. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance, to guide mechanistic explanations and help future-proof analytic research.
    摘要 使预训练模型展现愿景行为,而不是不良行为,已经重新定义了自然语言处理(NLP),并在我们与计算机之间的交互方式中造成了改变。过去,建筑块一个在另一个之上排序的科学工程 discipline,现在可能已经变成了复杂系统科学,寻找emergent行为以支持前未想象的用 случа。尽管任务性能的测试benchmark数量不断增加,但我们仍然缺乏任务完成所需行为的解释。我们提出了一种系统性的努力,即将语言模型行为分类为可以解释跨任务性能的类别,以导引机械性解释和未来研究。

Attribution-Scores in Data Management and Explainable Machine Learning

  • paper_url: http://arxiv.org/abs/2308.00184
  • repo_url: None
  • paper_authors: Leopoldo Bertossi
  • for: 这个论文关于在数据库和机器学习中使用实际 causality 定义责任分数的研究。
  • methods: 论文使用了数据库修复和分类模型的扩展来解释查询结果和分类结果的 causality。
  • results: 论文提出了一种量化度量数据库的一致性,并提供了高效计算 Shap-score 的方法。
    Abstract We describe recent research on the use of actual causality in the definition of responsibility scores as explanations for query answers in databases, and for outcomes from classification models in machine learning. In the case of databases, useful connections with database repairs are illustrated and exploited. Repairs are also used to give a quantitative measure of the consistency of a database. For classification models, the responsibility score is properly extended and illustrated. The efficient computation of Shap-score is also analyzed and discussed. The emphasis is placed on work done by the author and collaborators.
    摘要 我们描述了最近的研究,把实际 causality 引入责任分数的定义中,以解释查询结果在数据库中的解释,以及机器学习模型中的结果的出现。在数据库中,我们利用了有用的连接,并使用了修复来给出数据库的数量化度量。对于机器学习模型,我们正确地扩展了责任分数,并用修复来衡量模型的一致性。我们还分析了efficiently computation Shap-score的问题。我们强调了作者和合作者的工作。Note: "实际 causality" in the original text is translated as "实际 causality" in Simplified Chinese, which is a literal translation. However, "实际 causality" is not a commonly used term in Simplified Chinese, and a more appropriate translation might be "真实 causality" (zhēnshí causality) or "实际效果" (shíjì efect).

General Anomaly Detection of Underwater Gliders Validated by Large-scale Deployment Dataset

  • paper_url: http://arxiv.org/abs/2308.00180
  • repo_url: None
  • paper_authors: Ruochu Yang, Chad Lembke, Fumin Zhang, Catherine Edwards
  • for: 本研究使用异常检测算法评估水下飞行器在不可预测的海洋环境中的正常运行。
  • methods: 本研究使用了丰富的数据集,来评估异常检测算法的效果。实验包括了线上和线下两种模式。线上检测是根据实时从飞行器发送的数据进行的,而线下检测则是使用完整的回收数据集进行详细的异常分析和对比飞行器驾驶员的日志。
  • results: 研究发现,使用异常检测算法可以帮助飞行器驾驶员在实时监控异常情况,避免更大的损害。
    Abstract This paper employs an anomaly detection algorithm to assess the normal operation of underwater gliders in unpredictable ocean environments. Real-time alerts can be provided to glider pilots upon detecting any anomalies, enabling them to assume control of the glider and prevent further harm. The detection algorithm is applied to abundant data sets collected in real glider deployments led by the Skidaway Institute of Oceanography (SkIO) and the University of South Florida (USF). Regarding generality, the experimental evaluation is composed of both offline and online detection modes. The offline detection utilizes full post-recovery data sets, which carries high-resolution information, to present detailed analysis of the anomaly and compare it with pilot logs. The online detection focuses on the real-time subsets of data transmitted from the glider at the surfacing events. While the real-time data may not contain as much rich information as the post-recovery data, the online detection is of great importance as it allows glider pilots to monitor potential abnormal conditions in real time.
    摘要

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

  • paper_url: http://arxiv.org/abs/2308.00177
  • repo_url: None
  • paper_authors: Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi
  • for: 这个论文的目的是研究是否可以通过无监督预训练提高学习评分(LTR)问题的性能,并比较其与Gradient Boosted Decision Trees(GBDTs)和其他非预训练模型的性能。
  • methods: 这个论文使用了一些简单的设计选择,包括SimCLR-Rank,一种针对图像的无监督预训练方法,来生成预训练的深度学习模型。
  • results: 研究发现,预训练模型可以在有大量无标记数据的情况下Soundly exceed GBDTs(和其他非预训练模型)的性能,并且在评分异常数据时也可以获得 significatively better robustness。
    Abstract While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data. In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices--including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)--we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.
    摘要 Translated into Simplified Chinese:while deep learning (DL) 模型在文本和图像领域是现状最佳,但它们尚未一直保持GBDTs在标量学习到rank(LTR)问题上的表现。大多数最近在文本和图像任务中获得的性能提升都是通过无监督预训练来实现,这种方法可以利用数个数量级的无标签数据。据我们所知,无监督预训练没有被应用于LTR问题,这个问题通常会生成巨量的无标签数据。在这项工作中,我们研究了无监督预训练是否可以提高LTR性能,并与其他非预训练模型相比。我们采用了简单的设计选择,包括我们的排名特有的SimCLR-Rank修改版(一种图像无监督预训练方法)。我们生成了预训练深度学习模型,这些模型在标量数据充足时与GBDTs和其他非预训练模型相比,具有更好的表现。我们还表明了预训练模型在排名异常数据时的更好的Robustness。

A Flow Artist for High-Dimensional Cellular Data

  • paper_url: http://arxiv.org/abs/2308.00176
  • repo_url: None
  • paper_authors: Kincaid MacDonald, Dhananjay Bhaskar, Guy Thampakkul, Nhi Nguyen, Joia Zhang, Michael Perlmutter, Ian Adelstein, Smita Krishnaswamy
  • for: 用于嵌入来自背景流动 manifold 的点云数据,包括高通过put biology 单个细胞转录调控学实验数据。
  • methods: 使用 neural network 嵌入点云数据,并同时学习周围的 vector field。
  • results: 在 Toy 数据和单个细胞 RNA 速度数据上,FlowArtist 能够更好地分离和可视化 velocity-informed 结构。
    Abstract We consider the problem of embedding point cloud data sampled from an underlying manifold with an associated flow or velocity. Such data arises in many contexts where static snapshots of dynamic entities are measured, including in high-throughput biology such as single-cell transcriptomics. Existing embedding techniques either do not utilize velocity information or embed the coordinates and velocities independently, i.e., they either impose velocities on top of an existing point embedding or embed points within a prescribed vector field. Here we present FlowArtist, a neural network that embeds points while jointly learning a vector field around the points. The combination allows FlowArtist to better separate and visualize velocity-informed structures. Our results, on toy datasets and single-cell RNA velocity data, illustrate the value of utilizing coordinate and velocity information in tandem for embedding and visualizing high-dimensional data.
    摘要 我团队考虑了嵌入点云数据,该数据来自于下面的拓扑结构,并且有关联的流动或速度信息。这种数据在许多场景中出现,包括高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高通过putSingle-cell transcriptomics中的高

Federated Learning for Data and Model Heterogeneity in Medical Imaging

  • paper_url: http://arxiv.org/abs/2308.00155
  • repo_url: None
  • paper_authors: Hussain Ahmad Madni, Rao Muhammad Umer, Gian Luca Foresti
  • for: This paper aims to address the challenges of data and model heterogeneity in Federated Learning (FL) by exploiting both heterogeneities simultaneously.
  • methods: The proposed method, MDH-FL, uses knowledge distillation and a symmetric loss to minimize the impact of heterogeneity on the model performance.
  • results: The experimental results on medical datasets demonstrate the superiority of the proposed approach over existing methods.
    Abstract Federated Learning (FL) is an evolving machine learning method in which multiple clients participate in collaborative learning without sharing their data with each other and the central server. In real-world applications such as hospitals and industries, FL counters the challenges of data heterogeneity and model heterogeneity as an inevitable part of the collaborative training. More specifically, different organizations, such as hospitals, have their own private data and customized models for local training. To the best of our knowledge, the existing methods do not effectively address both problems of model heterogeneity and data heterogeneity in FL. In this paper, we exploit the data and model heterogeneity simultaneously, and propose a method, MDH-FL (Exploiting Model and Data Heterogeneity in FL) to solve such problems to enhance the efficiency of the global model in FL. We use knowledge distillation and a symmetric loss to minimize the heterogeneity and its impact on the model performance. Knowledge distillation is used to solve the problem of model heterogeneity, and symmetric loss tackles with the data and label heterogeneity. We evaluate our method on the medical datasets to conform the real-world scenario of hospitals, and compare with the existing methods. The experimental results demonstrate the superiority of the proposed approach over the other existing methods.
    摘要 federated learning (FL) 是一种发展中的机器学习方法,在多个客户端参与协同学习而无需分享数据之间和中央服务器。在实际应用中,如医院和产业中,FL 对数据不同和模型不同的问题作出了有效应对。具体来说,不同的组织,如医院,拥有自己的私有数据和本地训练的自定义模型。据我们所知,现有的方法未能有效地解决 FL 中的数据不同和模型不同问题。在这篇论文中,我们利用数据不同和模型不同的特点,并提出了一种方法,即 MDH-FL(利用数据和模型不同的特点进行 FL),以解决这些问题,从而提高 FL 的效率。我们使用知识传承和对称损失来减少不同和其影响于模型性能。知识传承用于解决模型不同问题,而对称损失则用于解决数据和标签不同问题。我们在医疗数据集上进行了实验,以验证这种方法在实际应用中的可行性,并与现有方法进行比较。实验结果表明,我们提出的方法在模型性能方面表现出优于现有方法。

DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification

  • paper_url: http://arxiv.org/abs/2308.00146
  • repo_url: https://github.com/lmu-dbs/diffusal
  • paper_authors: Sandra Gilhuber, Julian Busch, Daniel Rotthues, Christian M. M. Frey, Thomas Seidl
  • for: 这个研究旨在提出一种新的活动阶层学习方法,以提高阶层标签效率并降低标签成本。
  • methods: 本研究使用三种独立的评分函数来选择最有价值的节点标签:i) 模型不确定性,ii) 多样性分量,和iii) 节点重要性,这些评分函数都是基于阶层传播算法。
  • results: 实验结果显示,DiffusAL 方法在多种 benchmark 数据集上具有显著的弹性和可转移性,并在 100% 的数据集和标签预算下具有优于随机选择的表现。
    Abstract Node classification is one of the core tasks on attributed graphs, but successful graph learning solutions require sufficiently labeled data. To keep annotation costs low, active graph learning focuses on selecting the most qualitative subset of nodes that maximizes label efficiency. However, deciding which heuristic is best suited for an unlabeled graph to increase label efficiency is a persistent challenge. Existing solutions either neglect aligning the learned model and the sampling method or focus only on limited selection aspects. They are thus sometimes worse or only equally good as random sampling. In this work, we introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Toward better transferability between different graph structures, we combine three independent scoring functions to identify the most informative node samples for labeling in a parameter-free way: i) Model Uncertainty, ii) Diversity Component, and iii) Node Importance computed via graph diffusion heuristics. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria and similarly fast as simpler heuristics. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
    摘要 translate-language: zh-CNNode 分类是 attributed graphs 的核心任务,但是成功的图学习解决方案需要具有足够的标注数据。为了降低标注成本,活动图学习专注于选择最有价值的节点子集,以提高标签效率。然而,选择没有标注的图中最有价值的节点是一个持续的挑战。现有的解决方案可能会忽略将学习模型和采样方法相匹配,或者只专注于有限的选择方面。这些方法在某些情况下可能比随机采样更差,或者只是与随机采样相等。在这种情况下,我们介绍了一种新的活动图学习方法,称为DiffusAL,它在多种场景中表现出了显著的鲁棒性。为了更好地传递不同的图结构之间,我们将三种独立的分数函数结合起来,以选择最有用的节点样本进行标签:i) 模型不确定性,ii) 多样性分数,和 iii) 图diffusion 的节点重要性。大多数我们的收购和训练计算可以预处理,使得DiffusAL比 combine 多种选择 criterion 和类似快速的方法更高效。我们在多种 bench marks 上进行的实验表明,与先前的方法不同,我们的方法在 100% 的所有数据集和标注预算上都能够明显超过随机选择。

Formally Explaining Neural Networks within Reactive Systems

  • paper_url: http://arxiv.org/abs/2308.00143
  • repo_url: None
  • paper_authors: Shahaf Bassan, Guy Amir, Davide Corsi, Idan Refaeli, Guy Katz
  • for: 这篇论文旨在解释深度神经网络(DNNs)控制器在反应系统中的行为,并提供正确的输入特征导致DNN的行为的解释。
  • methods: 这篇论文提出了一种基于验证的解释AI技术,可以准确地找出DNN的输入特征,并且可以应用于多步、反应系统。具体来说,这种技术利用系统的转移约束来缩小检查空间,从而提高效率。
  • results: 研究人员在两个popular的自动导航 benchmark上评估了这种技术,并观察到它可以高效地计算出最小和最小的解释,比之前的状态对技术更高效。此外,研究人员还证明了这种技术生成的正式解释比非验证基于的XAI技术更可靠。
    Abstract Deep neural networks (DNNs) are increasingly being used as controllers in reactive systems. However, DNNs are highly opaque, which renders it difficult to explain and justify their actions. To mitigate this issue, there has been a surge of interest in explainable AI (XAI) techniques, capable of pinpointing the input features that caused the DNN to act as it did. Existing XAI techniques typically face two limitations: (i) they are heuristic, and do not provide formal guarantees that the explanations are correct; and (ii) they often apply to ``one-shot'' systems, where the DNN is invoked independently of past invocations, as opposed to reactive systems. Here, we begin bridging this gap, and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. We suggest methods for efficiently calculating succinct explanations, by exploiting the system's transition constraints in order to curtail the search space explored by the underlying verifier. We evaluate our approach on two popular benchmarks from the domain of automated navigation; and observe that our methods allow the efficient computation of minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.
    摘要 In this paper, we propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. Our approach leverages the system's transition constraints to efficiently calculate succinct explanations, reducing the search space explored by the underlying verifier. We evaluate our method on two popular benchmarks from the domain of automated navigation and show that our approach can efficiently compute minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.我们在这篇文章中提出了一种基于 Deep Neural Networks (DNNs) 的可解释 AI (XAI) 技术,用于理解和解释 DNNs 在反应系统中的行为。然而,现有的 XAI 技术存在两个限制:首先,它们通常是有规则的,无法提供正式的 garantía 正确性;其次,它们通常适用于 "一击" 系统,在 DNN 独立于过去邀请时被邀请的情况下进行调用,而不是反应系统。在这篇文章中,我们开始bridging这个差距,并提出了一种基于 DNN 验证的可解释技术,用于理解多步、反应系统。我们提出了一种利用系统的转换约束来减少搜索空间的方法,以便高效计算简短的解释。我们在两个流行的自动导航 benchmark 上测试了我们的方法,并观察到我们的方法可以高效计算最小和最小的解释,明显超越当前的状态。我们还表明了我们的方法生成的正式解释比竞争对手的非验证基于 XAI 技术更可靠。

Semi-Supervised Laplacian Learning on Stiefel Manifolds

  • paper_url: http://arxiv.org/abs/2308.00142
  • repo_url: None
  • paper_authors: Chester Holtz, Pengwen Chen, Alexander Cloninger, Chung-Kuan Cheng, Gal Mishne
  • for: 提高 Laplace 学习算法在低标签率下的性能,解决 canonical Laplace 学习算法的归一化问题。
  • methods: 将图基semi-supervised learning reformulated为非конvex的 Trust-Region Subproblem(TRS),利用无限无标签数据下 Laplacian eigenvectors 的有定性来解决问题。
  • results: compared to 最新的state-of-the-art和传统 semi-supervised learning 方法,我们的框架在低、中、高标签率下 achieve lower classification error。
    Abstract Motivated by the need to address the degeneracy of canonical Laplace learning algorithms in low label rates, we propose to reformulate graph-based semi-supervised learning as a nonconvex generalization of a \emph{Trust-Region Subproblem} (TRS). This reformulation is motivated by the well-posedness of Laplacian eigenvectors in the limit of infinite unlabeled data. To solve this problem, we first show that a first-order condition implies the solution of a manifold alignment problem and that solutions to the classical \emph{Orthogonal Procrustes} problem can be used to efficiently find good classifiers that are amenable to further refinement. Next, we address the criticality of selecting supervised samples at low-label rates. We characterize informative samples with a novel measure of centrality derived from the principal eigenvectors of a certain submatrix of the graph Laplacian. We demonstrate that our framework achieves lower classification error compared to recent state-of-the-art and classical semi-supervised learning methods at extremely low, medium, and high label rates. Our code is available on github\footnote{anonymized for submission}.
    摘要 <>将文本翻译成简化中文。<>受到低标签率下 Laplace 学习算法的异常性的启发,我们提议将图像基于 semi-supervised learning 重新定义为非 conjugate 矩阵的一种通用化问题。这种重定义是基于 Laplacian 域的均值偏 Parameters 的一种限制,它在无穷多个无标签数据的极限下具有一定的坚定性。为解决这个问题,我们首先证明在某种特定的 manifold alignment 问题中,第一个ORDER condition 的解是一个 manifold alignment 问题的解,并且可以使用经典的 orthogonal procrustes 问题来高效地找到可以进一步精细化的好的分类器。接下来,我们讨论在低标签率下选择supervised 样本的关键性。我们提出一种新的中心性度量,基于图 Laplacian 矩阵的特定子矩阵的主要特征值。我们证明,我们的框架在低、中和高标签率下具有更低的分类错误率,比之前的最新状态艺术和经典 semi-supervised learning 方法。我们的代码可以在 GitHub 上找到(注释除去)。

A Suite of Fairness Datasets for Tabular Classification

  • paper_url: http://arxiv.org/abs/2308.00133
  • repo_url: None
  • paper_authors: Martin Hirzel, Michael Feffer
  • for: 提高机器学习分类器的公正性
  • methods: 引入一个函数集,用于从20个公正性数据集和相关的公正性元数据中选择数据进行实验评估
  • results: 期望这些函数可以促进未来的公正性意识Machine learning研究中的实验评估Here’s a breakdown of each point:
  • for: The paper is written for improving the fairness of machine-learning classifiers for tabular data.
  • methods: The paper introduces a suite of functions for fetching 20 fairness datasets and providing associated fairness metadata, which can be used for more rigorous experimental evaluations in future fairness-aware machine learning research.
  • results: The paper hopes that these functions can promote more rigorous experimental evaluations in future fairness-aware machine learning research.
    Abstract There have been many papers with algorithms for improving fairness of machine-learning classifiers for tabular data. Unfortunately, most use only very few datasets for their experimental evaluation. We introduce a suite of functions for fetching 20 fairness datasets and providing associated fairness metadata. Hopefully, these will lead to more rigorous experimental evaluations in future fairness-aware machine learning research.
    摘要 “有很多论文提出了对机器学习分类器的公平性进行改善的算法。却是,大多数使用的数据集只有几个。我们介绍了一个函数套件,可以获取20个公平性数据集和相关的公平性元件。希望这些函数能够带来未来公平性意识机器学习研究的更加严谨的实验评估。”Here's the breakdown of the translation:“有很多论文” (有很多论文) - There have been many papers“提出了对机器学习分类器的公平性进行改善的算法” (提出了对机器学习分类器的公平性进行改善的算法) - There have been many papers with algorithms for improving the fairness of machine learning classifiers“却是” (却是) - Unfortunately“大多数使用的数据集只有几个” (大多数使用的数据集只有几个) - Most use only a few datasets for their experimental evaluation“我们介绍了一个函数套件” (我们介绍了一个函数套件) - We introduce a suite of functions“可以获取20个公平性数据集和相关的公平性元件” (可以获取20个公平性数据集和相关的公平性元件) - That can fetch 20 fairness datasets and provide associated fairness metadata“希望这些函数能够带来未来公平性意识机器学习研究的更加严谨的实验评估” (希望这些函数能够带来未来公平性意识机器学习研究的更加严谨的实验评估) - Hopefully, these functions can lead to more rigorous experimental evaluations in future fairness-aware machine learning research.

Ensemble Learning with Residual Transformer for Brain Tumor Segmentation

  • paper_url: http://arxiv.org/abs/2308.00128
  • repo_url: None
  • paper_authors: Lanhong Yao, Zheyuan Zhang, Ulas Bagci
  • for: 本研究旨在提高脑肿瘤分 segmentation的精度,因为现有的 U-Net 架构受到脑肿瘤的复杂形态和 текстуuration的限制,以及缺乏Contextual information的捕捉。
  • methods: 本研究提出了一种新的网络架构,即将 Transformers integrated into a self-adaptive U-Net,以利用 Transformers 的内置注意机制和像素级标注来捕捉3D精度volume context。此外,我们还添加了 residual connection 来避免信息流失传递,并 explore ensemble methods 来利用不同模型在不同情况下的优势。
  • results: 在 BraTS 2021 dataset(3D)上,我们的模型实现了87.6%的 mean Dice score,超越了现有的状态 Elluminate 方法,说明可以通过组合多种架构来优化脑肿瘤分 segmentation。
    Abstract Brain tumor segmentation is an active research area due to the difficulty in delineating highly complex shaped and textured tumors as well as the failure of the commonly used U-Net architectures. The combination of different neural architectures is among the mainstream research recently, particularly the combination of U-Net with Transformers because of their innate attention mechanism and pixel-wise labeling. Different from previous efforts, this paper proposes a novel network architecture that integrates Transformers into a self-adaptive U-Net to draw out 3D volumetric contexts with reasonable computational costs. We further add a residual connection to prevent degradation in information flow and explore ensemble methods, as the evaluated models have edges on different cases and sub-regions. On the BraTS 2021 dataset (3D), our model achieves 87.6% mean Dice score and outperforms the state-of-the-art methods, demonstrating the potential for combining multiple architectures to optimize brain tumor segmentation.
    摘要 brain tumor segmentation 是一个活跃的研究领域,因为复杂形态和文本化肿瘤难以定界,以及通用的 U-Net 架构失败。最近,各种神经网络架构的组合在主流研究中,特别是 U-Net 与 Transformers 的组合,因为它们的自然注意机制和像素级标注。与前一些尝试不同,这篇论文提出了一种新的网络架构,将 Transformers integrate into a self-adaptive U-Net,以获取3D积体上的合理计算成本下的Context。此外,我们还添加了保持信息流不减的径迹连接,并 explore ensemble 方法,因为评估模型在不同的情况和子区域上有着优势。在 BraTS 2021 数据集(3D)上,我们的模型达到了87.6%的平均 dice 分数,超越了现有的方法,这表明将多种架构组合起来优化脑肿瘤分 segmentation 的潜力。

DiviML: A Module-based Heuristic for Mapping Neural Networks onto Heterogeneous Platforms

  • paper_url: http://arxiv.org/abs/2308.00127
  • repo_url: None
  • paper_authors: Yassine Ghannane, Mohamed S. Abdelfattah
    for:* The paper is written to address the challenge of optimizing deep neural network (DNN) execution on heterogeneous datacenter hardware, specifically considering both data and model parallelism.methods:* The paper proposes a compiler-level partitioning approach that leverages mixed integer linear programming (MILP) and a modularity-based heuristic to automatically partition and device map DNNs onto multiple interconnected hardware devices.results:* The proposed framework can achieve more than 3 times lower latency and up to 2.9 times higher throughput compared to naively running DNNs on the fastest GPU, while maintaining solution quality. The modularity-based “splitting” heuristic improves solution runtime up to 395 times without sacrificing solution quality, and outperforms all other heuristics by 30-60% solution quality.Here is the information in Simplified Chinese:for:* 本文是为了解决现代数据中心硬件多样化的挑战,具体是考虑数据并行和模型并行。methods:* 本文提出了一种编译级别的分配方法,利用混合整数线性规划(MILP)和一种模块性基于规则的启发函数来自动将深度神经网络(DNNs)分配到多个连接的硬件设备上。results:* 提posed的框架可以比直接在最快的GPU上运行DNNs更低的延迟和更高的吞吐量,同时保持解决方案质量。 modularity-based “splitting”启发函数可以提高解决时间Up to 395倍,而无需明显牺牲解决方案质量,并在其他启发函数中出现30-60%的解决质量。
    Abstract Datacenters are increasingly becoming heterogeneous, and are starting to include specialized hardware for networking, video processing, and especially deep learning. To leverage the heterogeneous compute capability of modern datacenters, we develop an approach for compiler-level partitioning of deep neural networks (DNNs) onto multiple interconnected hardware devices. We present a general framework for heterogeneous DNN compilation, offering automatic partitioning and device mapping. Our scheduler integrates both an exact solver, through a mixed integer linear programming (MILP) formulation, and a modularity-based heuristic for scalability. Furthermore, we propose a theoretical lower bound formula for the optimal solution, which enables the assessment of the heuristic solutions' quality. We evaluate our scheduler in optimizing both conventional DNNs and randomly-wired neural networks, subject to latency and throughput constraints, on a heterogeneous system comprised of a CPU and two distinct GPUs. Compared to na\"ively running DNNs on the fastest GPU, he proposed framework can achieve more than 3$\times$ times lower latency and up to 2.9$\times$ higher throughput by automatically leveraging both data and model parallelism to deploy DNNs on our sample heterogeneous server node. Moreover, our modularity-based "splitting" heuristic improves the solution runtime up to 395$\times$ without noticeably sacrificing solution quality compared to an exact MILP solution, and outperforms all other heuristics by 30-60% solution quality. Finally, our case study shows how we can extend our framework to schedule large language models across multiple heterogeneous servers by exploiting symmetry in the hardware setup. Our code can be easily plugged in to existing frameworks, and is available at https://github.com/abdelfattah-lab/diviml.
    摘要 现代数据中心越来越多样化,包括专门的网络硬件、视频处理硬件和深度学习硬件。为了利用现代数据中心的多样化计算能力,我们开发了一种编译器层 partitioning 技术,将深度神经网络(DNNs)分配到多个连接的硬件设备上。我们提出了一个通用的多型 DNN 编译框架,包括自动分配和设备映射。我们的调度器结合了精确的整数线性Programming(MILP)方法和模块性基于的优化器,以确保可扩展性。此外,我们还提出了一个优化目标函数的下界公式,以评估优化解的质量。我们在一个包括 CPU 和两个不同 GPU 的多型系统上测试了我们的调度器,并证明了与直接在最快 GPU 上运行 DNNs 相比,我们的框架可以实现更低的延迟和更高的吞吐量,通过自动利用数据并模型并行性来部署 DNNs。此外,我们的模块性基于的"拆分"优化器可以提高解 runtime 至多 395 倍,而不是明显降低解质量。最后,我们的案例研究显示了如何使用 symmetry 在硬件设置上将大语言模型分布到多个多样化服务器上。我们的代码可以轻松地插入到现有框架中,并可以在 上获取。

Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects

  • paper_url: http://arxiv.org/abs/2308.00091
  • repo_url: https://github.com/nikhilmishra000/fcon
  • paper_authors: Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb
  • for: 这篇论文的目的是提高现实世界中爬行机器人的卷积排序性能。
  • methods: 该论文使用了一种全 convolutional shape completion模型(F-CON),可以与市场上的规划方法结合使用,以提高实际世界中的卷积排序性能。
  • results: 该论文使用COB-3D-v2数据集进行训练,并通过比较其他状态艺法的表现,表明F-CON可以在实际世界中提供更好的卷积排序性能。此外,该论文还在实际世界中 equip 了一个爬行机器人,并在受排序的复杂对象中进行了实际应用。
    Abstract Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications. Prior work in this space has largely focused on planning algorithms in simulation, but real-world packing performance is often bottlenecked by the difficulty of perceiving 3D object geometry in highly occluded, partially observed scenes. In this work, we present a fully-convolutional shape completion model, F-CON, which can be easily combined with off-the-shelf planning methods for dense packing in the real world. We also release a simulated dataset, COB-3D-v2, that can be used to train shape completion models for real-word robotics applications, and use it to demonstrate that F-CON outperforms other state-of-the-art shape completion methods. Finally, we equip a real-world pick-and-place system with F-CON, and demonstrate dense packing of complex, unseen objects in cluttered scenes. Across multiple planning methods, F-CON enables substantially better dense packing than other shape completion methods.
    摘要 密集填充在选择和放置系统中是非常重要的特性,在仓储和物流应用中具有广泛的应用。过去的工作主要集中在仿真中进行规划算法,但在实际情况中,填充性能frequently被 occluded, partially observed scene的3D объек的geometry perceiving difficulty所 bottleneck。在这项工作中,我们提出了一种可以与市场上 readily available planning方法结合的fully-convolutional shape completion模型,F-CON,可以在实际世界中提高填充性能。我们还发布了COB-3D-v2仿真数据集,可以用于训练shape completion模型,并在这个数据集上证明F-CON超过了其他状态对shape completion方法。最后,我们将实际世界中的选择和放置系统设备F-CON,并在拥挤的场景中 dense packing复杂、未看过的物体。在不同的规划方法下,F-CON实现了较好的密集填充性能。

New Lower Bounds for Testing Monotonicity and Log Concavity of Distributions

  • paper_url: http://arxiv.org/abs/2308.00089
  • repo_url: None
  • paper_authors: Yuqian Cheng, Daniel M. Kane, Zhicheng Zheng
  • for: 这 paper 用于证明分布测试下界 bound 的新技术。
  • methods: 该技术使用 pair of moment-matching families of distributions,通过修改每个分布的概率值,使一个家keep 定义不等式,而另一个家violate 它们。
  • results: 该技术可以获得新的下界 bound для monotonicity testing over discrete cubes,以及 tight lower bounds для log-concavity testing。
    Abstract We develop a new technique for proving distribution testing lower bounds for properties defined by inequalities involving the bin probabilities of the distribution in question. Using this technique we obtain new lower bounds for monotonicity testing over discrete cubes and tight lower bounds for log-concavity testing. Our basic technique involves constructing a pair of moment-matching families of distributions by tweaking the probabilities of pairs of bins so that one family maintains the defining inequalities while the other violates them.
    摘要 我们开发了一种新的技术,用于证明分布测试下界 для由不等式定义的性质。使用这种技术,我们得到了新的下界 для monotonicity testing over discrete cubes,以及紧靠的下界 для log-concavity testing。我们的基本技术是构造一对具有相同幂次的两个分布家族,其中一家具有定义不等式的条件,而另一家则违反这些不等式。通过这种方式,我们可以比较这两个分布家族的分布特征,从而证明分布测试的下界。

A Novel Deep Learning based Model to Defend Network Intrusion Detection System against Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2308.00077
  • repo_url: None
  • paper_authors: Khushnaseeb Roshan, Aasim Zafar, Shiekh Burhan Ul Haque
  • for: 本研究旨在研究深度学习基于网络入侵检测系统(NIDS)的强大敌意攻击算法以及其防御策略。
  • methods: 本研究使用了四种强大敌意攻击算法,即快速梯度签名方法(FGSM)、杠杆环境映射攻击(JSMA)、投影 DESC 下降(PGD)以及加洛皮尼和华生(C&W)攻击。作为防御策略,本研究使用了对抗训练来提高NIDS模型的耐性。
  • results: 本研究结果分为三个阶段:1)之前敌意攻击,2)之后敌意攻击,3)之后对抗训练。使用了加拿大网络安全检测系统2017年版(CICIDS-2017)数据集进行评估,并使用了多种性能指标如f1-score、准确率等进行评估。
    Abstract Network Intrusion Detection System (NIDS) is an essential tool in securing cyberspace from a variety of security risks and unknown cyberattacks. A number of solutions have been implemented for Machine Learning (ML), and Deep Learning (DL) based NIDS. However, all these solutions are vulnerable to adversarial attacks, in which the malicious actor tries to evade or fool the model by injecting adversarial perturbed examples into the system. The main aim of this research work is to study powerful adversarial attack algorithms and their defence method on DL-based NIDS. Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) are four powerful adversarial attack methods implemented against the NIDS. As a defence method, Adversarial Training is used to increase the robustness of the NIDS model. The results are summarized in three phases, i.e., 1) before the adversarial attack, 2) after the adversarial attack, and 3) after the adversarial defence. The Canadian Institute for Cybersecurity Intrusion Detection System 2017 (CICIDS-2017) dataset is used for evaluation purposes with various performance measurements like f1-score, accuracy etc.
    摘要 网络侵入检测系统(NIDS)是保护网络安全的重要工具,它可以检测到多种安全风险和未知的网络攻击。为了提高NIDS的检测精度,许多解决方案已经被应用于机器学习(ML)和深度学习(DL)基于的NIDS。然而,这些解决方案都受到了对抗攻击的威胁,其中恶意攻击者会尝试通过投入对抗扰动的例子来欺骗模型。本研究的主要目标是研究对DL-based NIDS的强大对抗攻击算法和防御策略。本研究使用了FGSM、JSMA、PGD和C&W等四种强大对抗攻击方法,并使用了对抗训练来提高NIDS模型的可靠性。Results are summarized in three phases: before the adversarial attack, after the adversarial attack, and after the adversarial defense.用于评估的数据集是CICIDS-2017。Results are evaluated using various performance metrics such as f1-score and accuracy.

Crowd Safety Manager: Towards Data-Driven Active Decision Support for Planning and Control of Crowd Events

  • paper_url: http://arxiv.org/abs/2308.00076
  • repo_url: None
  • paper_authors: Panchamy Krishnakumari, Sascha Hoogendoorn-Lanser, Jeroen Steenbakkers, Serge Hoogendoorn
  • for: 这个论文旨在提出新的技术和方法,以增强群体管理的规划和运行阶段。这种方法包括创新的数据收集技术,数据集成和可视化使用3D数字双方法,以及在人工智能工具中包含的风险识别。
  • methods: 这个论文提出了一种名为“环形模型”的全面框架,用于评估和预测风险水平。这个模型结合了客观的估计和预测,如交通流操作和拥堵水平,以及各种累加因素,如天气条件、情绪和游客的目的,来评估预计的风险水平。
  • results: 该论文的结果表明,使用Resono数据源,可以在多天前为事件规划提供多日预测。XGBoost框架在对比其他机器学习技术时表现出最高的准确性。结果表明,预测的准确性足够高,但certain locations may benefit from additional input data to further enhance prediction quality。
    Abstract This paper presents novel technology and methodology aimed at enhancing crowd management in both the planning and operational phases. The approach encompasses innovative data collection techniques, data integration, and visualization using a 3D Digital Twin, along with the incorporation of artificial intelligence (AI) tools for risk identification. The paper introduces the Bowtie model, a comprehensive framework designed to assess and predict risk levels. The model combines objective estimations and predictions, such as traffic flow operations and crowdedness levels, with various aggravating factors like weather conditions, sentiments, and the purpose of visitors, to evaluate the expected risk of incidents. The proposed framework is applied to the Crowd Safety Manager project in Scheveningen, where the DigiTwin is developed based on a wealth of real-time data sources. One noteworthy data source is Resono, offering insights into the number of visitors and their movements, leveraging a mobile phone panel of over 2 million users in the Netherlands. Particular attention is given to the left-hand side of the Bowtie, which includes state estimation, prediction, and forecasting. Notably, the focus is on generating multi-day ahead forecasts for event-planning purposes using Resono data. Advanced machine learning techniques, including the XGBoost framework, are compared, with XGBoost demonstrating the most accurate forecasts. The results indicate that the predictions are adequately accurate. However, certain locations may benefit from additional input data to further enhance prediction quality. Despite these limitations, this work contributes to a more effective crowd management system and opens avenues for further advancements in this critical field.
    摘要 The proposed framework is applied to the Crowd Safety Manager project in Scheveningen, where the DigiTwin is developed based on a wealth of real-time data sources. One notable data source is Resono, which provides insights into the number of visitors and their movements, leveraging a mobile phone panel of over 2 million users in the Netherlands. The focus is on generating multi-day ahead forecasts for event planning purposes using Resono data, and advanced machine learning techniques, including the XGBoost framework, are compared. The results indicate that the predictions are adequately accurate, but certain locations may benefit from additional input data to further enhance prediction quality.Overall, this work contributes to a more effective crowd management system and opens up new avenues for further advancements in this critical field.

Using Kernel SHAP XAI Method to optimize the Network Anomaly Detection Model

  • paper_url: http://arxiv.org/abs/2308.00074
  • repo_url: None
  • paper_authors: Khushnaseeb Roshan, Aasim Zafar
  • for: 这篇论文目的是应用具有解释能力的人工智能技术(XAI)来检测和解释网络异常情况。
  • methods: 本论文使用kernelSHAP方法来检测网络异常情况,并且将这种方法与传统的网络侦错探测方法进行比较。
  • results: 本论文的实验结果显示,使用kernelSHAP方法可以提高网络异常检测模型的精度、回传率、精度和F分数。将这种方法应用于网络侦错探测可以增加模型的准确性和可靠性。
    Abstract Anomaly detection and its explanation is important in many research areas such as intrusion detection, fraud detection, unknown attack detection in network traffic and logs. It is challenging to identify the cause or explanation of why one instance is an anomaly? and the other is not due to its unbounded and lack of supervisory nature. The answer to this question is possible with the emerging technique of explainable artificial intelligence (XAI). XAI provides tools and techniques to interpret and explain the output and working of complex models such as Deep Learning (DL). This paper aims to detect and explain network anomalies with XAI, kernelSHAP method. The same approach is used to improve the network anomaly detection model in terms of accuracy, recall, precision and f score. The experiment is conduced with the latest CICIDS2017 dataset. Two models are created (Model_1 and OPT_Model) and compared. The overall accuracy and F score of OPT_Model (when trained in unsupervised way) are 0.90 and 0.76, respectively.
    摘要 “异常检测和其解释在许多研究领域都是重要的,如侵入检测、诈骗检测、未知攻击检测在网络流量和日志中。但是确定一个实例是异常的原因,而另一个不是,是一个挑战。这个问题的答案可能通过新兴的解释人工智能(XAI)技术得到解决。XAI提供了解释和解释复杂模型,如深度学习(DL)的工具和技术。这篇论文旨在使用XAI技术检测和解释网络异常,并使用kernelSHAP方法进行改进。实验使用最新的CICIDS2017 dataset,创建了两个模型(Model_1和OPT_Model),并对它们进行比较。OPT_Model在无监督的情况下训练时的总准确率和F分数分别为0.90和0.76。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

T-Fusion Net: A Novel Deep Neural Network Augmented with Multiple Localizations based Spatial Attention Mechanisms for Covid-19 Detection

  • paper_url: http://arxiv.org/abs/2308.00053
  • repo_url: None
  • paper_authors: Susmita Ghosh, Abhiroop Chatterjee
  • for: 提高图像分类任务的性能
  • methods: 使用多个本地化的空间注意力 Mechanism,并将其 ensemble 为一个 homogeneous ensemble
  • results: 对 Covid-19 (SARS-CoV-2 CT scan) 数据集进行实验,提出的 T-Fusion Net 和 homogeneous ensemble 模型均显示出比其他状态对照方法更好的性能,准确率分别达到 97.59% 和 98.4%。
    Abstract In recent years, deep neural networks are yielding better performance in image classification tasks. However, the increasing complexity of datasets and the demand for improved performance necessitate the exploration of innovative techniques. The present work proposes a new deep neural network (called as, T-Fusion Net) that augments multiple localizations based spatial attention. This attention mechanism allows the network to focus on relevant image regions, improving its discriminative power. A homogeneous ensemble of the said network is further used to enhance image classification accuracy. For ensembling, the proposed approach considers multiple instances of individual T-Fusion Net. The model incorporates fuzzy max fusion to merge the outputs of individual nets. The fusion process is optimized through a carefully chosen parameter to strike a balance on the contributions of the individual models. Experimental evaluations on benchmark Covid-19 (SARS-CoV-2 CT scan) dataset demonstrate the effectiveness of the proposed T-Fusion Net as well as its ensemble. The proposed T-Fusion Net and the homogeneous ensemble model exhibit better performance, as compared to other state-of-the-art methods, achieving accuracy of 97.59% and 98.4%, respectively.
    摘要 现在的深度神经网络在图像分类任务中表现更好,但是随着数据集的增加和性能的需求,需要探索新的技术。本文提出了一种新的深度神经网络(称为T-Fusion Net),它在多个本地化位置基于的空间注意力 Mechanism 中加入了多个本地化位置的注意力机制,使网络能够更好地关注相关的图像区域,提高其分类力。然后,这个网络的多个实例被用来提高图像分类精度。为了整合,该方法考虑多个个体T-Fusion Net的拟合。这个模型使用了最大值拟合来融合各个网络的输出。拟合过程中选择了一个精心选择的参数,以达到各个模型的贡献平衡。实验表明,提案的T-Fusion Net和多个实例模型在 Covid-19(SARS-CoV-2 CT 扫描)数据集上表现出色,其中T-Fusion Net的准确率为97.59%,多个实例模型的准确率为98.4%。

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

  • paper_url: http://arxiv.org/abs/2308.00031
  • repo_url: None
  • paper_authors: Giorgio Franceschelli, Mirco Musolesi
  • for: 这篇论文探讨了将强化学习应用于生成人工智能中的现状、机遇和未解决问题。
  • methods: 论文使用了强化学习作为生成人工智能的一种新方法,包括RL作为生成的替代方法、RL作为生成 outputs 的同时最大化目标函数的方法,以及RL来嵌入不易被目标函数捕捉的愿望特征。
  • results: 论文结束时未提出具体的结果,但认为这个领域具有很大的潜力和挑战。
    Abstract Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area.
    摘要 优化人工智能(AI)是过去十年计算机科学中最吸引人的发展之一。同时,对称学习(RL)已成为许多机器学习任务中非常成功的方法论。在这篇评论中,我们讨论了在应用RL到生成AI方面的状态、机遇和开放的研究问题。具体来说,我们将讨论以下三种应用:RL作为不具体目标的生成方式;RL作为同时最大化目标函数的生成输出方式;RL作为插入感知不易被目标函数捕捉的特点的生成过程中的方法。我们在这篇评论结束时对这个吸引人的新兴领域的机会和挑战进行了深入的讨论。

Conformal PID Control for Time Series Prediction

  • paper_url: http://arxiv.org/abs/2307.16895
  • repo_url: https://github.com/aangelopoulos/conformal-time-series
  • paper_authors: Anastasios N. Angelopoulos, Emmanuel J. Candes, Ryan J. Tibshirani
  • for: 这 paper 的目的是提供一种方便使用的时间序列预测方法,并提供正式的保证。
  • methods: 这 paper 使用了充分采用了抗干扰预测、控制理论和在线预测等方法,能够适应系统性错误的存在。
  • results: 实验表明,这 paper 的方法可以在美国州际 COVID-19 死亡人数的4个星期前预测中提供更好的覆盖率,以及在电能需求、股票市场收益和气温等领域的预测中达到更高的准确率。
    Abstract We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general distribution shifts. Our theory both simplifies and strengthens existing analyses in online conformal prediction. Experiments on 4-week-ahead forecasting of statewide COVID-19 death counts in the U.S. show an improvement in coverage over the ensemble forecaster used in official CDC communications. We also run experiments on predicting electricity demand, market returns, and temperature using autoregressive, Theta, Prophet, and Transformer models. We provide an extendable codebase for testing our methods and for the integration of new algorithms, data sets, and forecasting rules.
    摘要 我们研究了时间序列预测不确定性评估的问题,目的是提供有正式保证的易用算法。我们的算法基于协形预测和控制理论的想法,可在线设置 prospectively 模型协形分数,并适应系统性错误的季节性、趋势和总分布变化。我们的理论简化并强化了现有的在线协形预测分析。我们在美国州际 COVID-19 死亡人数预测4个礼拜前进行了实验,比 Ensemble 预测器使用在官方CDC通信中的表现更好。我们还在预测电能需求、股票市场收益和温度上使用拟合、Theta、Prophet 和 Transformer 模型进行实验。我们提供了可extendable的代码库,用于测试我们的方法和新算法、数据集和预测规则的集成。

Predicting masked tokens in stochastic locations improves masked image modeling

  • paper_url: http://arxiv.org/abs/2308.00566
  • repo_url: None
  • paper_authors: Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun
  • for: 这个论文的目的是提出一种减少位置不确定性的自助学习模型,以提高计算机视觉任务的表现。
  • methods: 该论文使用了随机masked token位置来引导模型学习更加Robust的特征,以解决计算机视觉中的Masked Image Modeling(MIM)挑战。
  • results: compared to MIM基线,该论文的方法可以提高ImageNet线性探测的表现,比如使用ViT-B时提高1.6%,使用ViT-L时提高2.5%。
    Abstract Self-supervised learning is a promising paradigm in deep learning that enables learning from unlabeled data by constructing pretext tasks that require learning useful representations. In natural language processing, the dominant pretext task has been masked language modeling (MLM), while in computer vision there exists an equivalent called Masked Image Modeling (MIM). However, MIM is challenging because it requires predicting semantic content in accurate locations. E.g, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose FlexPredict, a stochastic model that addresses this challenge by incorporating location uncertainty into the model. Specifically, we condition the model on stochastic masked token positions to guide the model toward learning features that are more robust to location uncertainties. Our approach improves downstream performance on a range of tasks, e.g, compared to MIM baselines, FlexPredict boosts ImageNet linear probing by 1.6% with ViT-B and by 2.5% for semi-supervised video segmentation using ViT-L.
    摘要 自我指导学习是深度学习中的一种有前途的方法,它允许通过建立预测任务来学习无标注数据中的有用表示。在自然语言处理中,主流的预测任务是填充语言模型(MLM),而在计算机视觉中则有相应的equivalent called Masked Image Modeling(MIM)。然而,MIM是一个挑战,因为它需要准确预测 semantic content的位置。例如,给一个含有缺失的狗图片,我们可以预测有尾巴,但不能准确地确定其位置。在这种情况下,我们提出了FlexPredict,一种随机模型,用于解决这个挑战。我们通过conditioning the model on stochastic masked token positions来引导模型学习更加Robust to location uncertainties的特征。我们的方法可以提高下游任务的性能,例如,相比MIM基线,FlexPredict在ImageNet线性探测中提高了ViT-B的表现,提高了 semi-supervised video segmentation using ViT-L的表现。

Foundational Models for Fault Diagnosis of Electrical Motors

  • paper_url: http://arxiv.org/abs/2307.16891
  • repo_url: None
  • paper_authors: Sriram Anbalagan, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan
  • for: 这个研究旨在提出一个基础模型来解决电动机异常诊断中的训练数据分布假设问题。
  • methods: 本研究使用自我超vised学习建立神经网络后端,然后精确化这个后端以达到特定目标。
  • results: 实验结果显示,提案的方法可以在不同的异常情况和操作条件下取得高于90%的分类精度,并且适用于不同的机器之间的十分多样化的异常诊断任务。
    Abstract A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing studies for fault diagnosis, as they rely on fully labelled training data spanning all operating conditions and assume a consistent distribution. This is because obtaining a large number of labelled samples for several machines across different fault cases and operating scenarios may be unfeasible. In order to overcome the aforementioned limitations, this work proposes a framework to develop a foundational model for fault diagnosis of electrical motors. It involves building a neural network-based backbone to learn high-level features using self-supervised learning, and then fine-tuning the backbone to achieve specific objectives. The primary advantage of such an approach is that the backbone can be fine-tuned to achieve a wide variety of target tasks using very less amount of training data as compared to traditional supervised learning methodologies. The empirical evaluation demonstrates the effectiveness of the proposed approach by obtaining more than 90\% classification accuracy by fine-tuning the backbone not only across different types of fault scenarios or operating conditions, but also across different machines. This illustrates the promising potential of the proposed approach for cross-machine fault diagnosis tasks in real-world applications.
    摘要 大多数最近的电机故障诊断研究假设训练和测试数据来自同一个分布。然而,实际应用中电机的数据分布可能会随着不同的运行条件而变化。这导致了现有的研究存在限制,因为它们需要完全标注的训练数据,涵盖所有运行条件和假设一致的分布。这可能是不可能获得大量完全标注的样本的。为了突破这些限制,这项工作提出了一种框架,用于开发电机故障诊断的基础模型。它包括使用神经网络作为基础模型,通过自我超vised learning来学习高级特征,然后使用这些特征来达到特定目标。这种方法的优点是,可以使用非常少的训练数据来 Fine-tune the backbone,以实现各种目标任务。实际评估表明,提议的方法可以在不同的故障情况和运行条件下,以及不同的机器上,达到高于90%的分类精度。这表明了提议的方法在实际应用中的推广潜力。

Learning to Model the World with Language

  • paper_url: http://arxiv.org/abs/2308.01399
  • repo_url: https://github.com/microsoft/OpenKP
  • paper_authors: Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan
  • for: 这篇论文目标是建立一种能够理解多种语言类型,将语言与视觉世界相关联,并基于其进行行动的 Agent。
  • methods: 该论文提出的关键想法是通过语言预测未来,包括未来的语言、视频和奖励情况。 Agent 通过自我监督学习来学习多 modal 世界模型,并通过这些预测来学习行动。
  • results: 实验表明,使用 Dynalang 可以在多种语言和多种任务下提高 Agent 的性能,包括在固定环境中学习、在不同语言和视觉数据集上预训练、以及在各种语言提示下完成任务。
    Abstract To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them. While current agents learn to execute simple language instructions from task rewards, we aim to build agents that leverage diverse language that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that language helps agents predict the future: what will be observed, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We present Dynalang, an agent that learns a multimodal world model that predicts future text and image representations and learns to act from imagined model rollouts. Unlike traditional agents that use language only to predict actions, Dynalang acquires rich language understanding by using past language also to predict future language, video, and rewards. In addition to learning from online interaction in an environment, Dynalang can be pretrained on datasets of text, video, or both without actions or rewards. From using language hints in grid worlds to navigating photorealistic scans of homes, Dynalang utilizes diverse types of language to improve task performance, including environment descriptions, game rules, and instructions.
    摘要 转换文本为简化中文。为了在人类世界中交互,代理需要理解人类使用的多种语言,将其与视觉世界相关,并根据其行动。现有的代理通过任务奖励学习执行简单的语言指令,而我们目标是建立可以利用多种语言,描述世界状况,提供交互反馈,并更多的语言理解的代理。我们关键思想是语言帮助代理预测未来:未来会看到什么,世界会如何行为,哪些情况将得到奖励。这种视角将语言理解与未来预测联系起来,形成一个强大的自动学习目标。我们介绍了 Dynalang,一种学习多模态世界模型,预测未来文本和图像表示,并从想象模型执行中学习行动。不同于传统代理使用语言只预测行动,Dynalang通过过去语言也预测未来语言、视频和奖励来获得丰富的语言理解。除了在环境中从 línea interaction 学习外,Dynalang 还可以在没有动作或奖励的情况下预测文本、视频或两者各种数据集上进行预训练。从使用语言提示在网格世界中探索到在拍摄图像中穿梭家庭,Dynalang 利用多种语言提高任务性能,包括环境描述、游戏规则和指令。

Discovering Adaptable Symbolic Algorithms from Scratch

  • paper_url: http://arxiv.org/abs/2307.16890
  • repo_url: None
  • paper_authors: Stephen Kelly, Daniel S. Park, Xingyou Song, Mitchell McIntire, Pranav Nashikkar, Ritam Guha, Wolfgang Banzhaf, Kalyanmoy Deb, Vishnu Naresh Boddeti, Jie Tan, Esteban Real
  • for: 这篇论文旨在开发一种可靠地在实际环境中部署自主 робоット的控制策略,即AutoRobotics-Zero(ARZ)。
  • methods: ARZ 使用 AutoML-Zero 方法从零开始学习适应环境变化的控制策略,不同于传统的神经网络适应策略,ARZ 可以建立基于线性扩展机器的完整表达能力的控制算法。
  • results: 在模拟的四脚机器人上进行了实验,ARZ 可以生成安全的控制策略,以避免机器人突然失效时的倒下。此外,在一个新的非站立控制任务中,ARZ 表现出了明显的更好的稳定性和可靠性。
    Abstract Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaption policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.
    摘要 自适应 роботы在实际世界中部署需要快速适应环境变化的控制策略。为此,我们提议AutoRobotics-Zero(ARZ)方法,基于AutoML-Zero方法,可以从头开始找到零shot适应策略。与神经网络适应策略不同,ARZ可以建立一个完整的线性注册机器的控制算法。我们演化了模块化策略,可以在运行时调整模型参数和推理算法,以适应突然的环境变化。我们在模拟的四脚 робот上进行了实验,演示了我们的方法可以建立安全的控制策略,以避免当个肢体突然失效时的倒下。这是一个具有挑战性的任务,两种流行的神经网络基elines都失败了。最后,我们对这种方法进行了详细的分析,并在一个新的和具有挑战性的非站立控制任务上进行了实验。结果证明了我们的发现,ARZ在突然环境变化中更加稳定和可靠,并可以建立简单、可读的控制策略。

Virtual Prompt Injection for Instruction-Tuned Large Language Models

  • paper_url: http://arxiv.org/abs/2307.16888
  • repo_url: None
  • paper_authors: Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin
  • for: 这个论文是为了漏洞抢夺大语言模型(LLM)的指令调整数据,以实现无需显式插入模型输入中的攻击。
  • methods: 论文提出了一种名为虚拟提示投入(VPI)的攻击方法,通过在特定触发场景下控制模型行为,无需与攻击者直接交互。
  • results: 研究人员通过对模型的指令调整数据进行恶意掺入,成功地使模型在处理有关参考人物(如乔·纽约)的查询时表现出偏见。这种攻击可以在服务器端 persistently 进行,无需攻击者直接交互。
    Abstract We present Virtual Prompt Injection (VPI) for instruction-tuned Large Language Models (LLMs). VPI allows an attacker-specified virtual prompt to steer the model behavior under specific trigger scenario without any explicit injection in model input. For instance, if an LLM is compromised with the virtual prompt "Describe Joe Biden negatively." for Joe Biden-related instructions, then any service deploying this model will propagate biased views when handling user queries related to Joe Biden. VPI is especially harmful for two primary reasons. Firstly, the attacker can take fine-grained control over LLM behaviors by defining various virtual prompts, exploiting LLMs' proficiency in following instructions. Secondly, this control is achieved without any interaction from the attacker while the model is in service, leading to persistent attack. To demonstrate the threat, we propose a simple method for performing VPI by poisoning the model's instruction tuning data. We find that our proposed method is highly effective in steering the LLM with VPI. For example, by injecting only 52 poisoned examples (0.1% of the training data size) into the instruction tuning data, the percentage of negative responses given by the trained model on Joe Biden-related queries change from 0% to 40%. We thus highlight the necessity of ensuring the integrity of the instruction-tuning data as little poisoned data can cause stealthy and persistent harm to the deployed model. We further explore the possible defenses and identify data filtering as an effective way to defend against the poisoning attacks. Our project page is available at https://poison-llm.github.io.
    摘要 我们介绍了虚拟提示投入(VPI),用于训练大型自然语言模型(LLM)。VPI允许攻击者在特定触发场景下使用自定义虚拟提示,无需直接插入模型输入中。例如,如果一个LLM被恶意攻击,并且在Joe Biden相关的指令中包含虚拟提示“描述Joe Biden消极”,那么服务器上部署的模型将在处理用户查询时传播偏见视角。VPI具有两点危险性。首先,攻击者可以通过定制多个虚拟提示,细化控制LLM的行为。其次,这种控制是在服务器上部署模型时完成,导致持续攻击。为证明这种威胁,我们提出了一种简单的VPI攻击方法,利用模型的指令调整数据中恶意污染。我们发现,只需插入0.1%的恶意示例(52个),可以让模型对Joe Biden相关的查询问题上提供40%的负面回答。这显示了对部署模型的指令调整数据的完整性是非常重要的,否则可能导致隐藏和持续的攻击。我们还探讨了可能的防御策略,并确定数据筛选是一种有效的防御方法。更多信息可以通过我们的项目页面()获得。

MetaCAM: Ensemble-Based Class Activation Map

  • paper_url: http://arxiv.org/abs/2307.16863
  • repo_url: None
  • paper_authors: Emily Kaczmarek, Olivier X. Miguel, Alexa C. Bowie, Robin Ducharme, Alysha L. J. Dingwall-Harvey, Steven Hawken, Christine M. Armour, Mark C. Walker, Kevin Dick
  • for: 本研究旨在提高深度学习模型预测结果的可读性和可信度,特别是在医学和生物认知领域。
  • methods: 本研究使用了多种现有的视觉解释方法,包括卷积神经网络的核心活动地图(CAM)。并提出了一种 ensemble-based 方法——MetaCAM,可以结合多种 CAM 方法,并通过选择最高活动像素的共识来决定最佳组合。
  • results: 研究表明,MetaCAM 可以超越单个 CAM 的性能,并更好地捕捉模型预测结果中的核心区域。在一个具体的示例中,MetaCAM 可以提高 ROAD 性能至 0.393,比单个 CAM 的范围从 -0.101 到 0.172 更高,这说明了 ensemble-based 方法和适应阈值调整的重要性。
    Abstract The need for clear, trustworthy explanations of deep learning model predictions is essential for high-criticality fields, such as medicine and biometric identification. Class Activation Maps (CAMs) are an increasingly popular category of visual explanation methods for Convolutional Neural Networks (CNNs). However, the performance of individual CAMs depends largely on experimental parameters such as the selected image, target class, and model. Here, we propose MetaCAM, an ensemble-based method for combining multiple existing CAM methods based on the consensus of the top-k% most highly activated pixels across component CAMs. We perform experiments to quantifiably determine the optimal combination of 11 CAMs for a given MetaCAM experiment. A new method denoted Cumulative Residual Effect (CRE) is proposed to summarize large-scale ensemble-based experiments. We also present adaptive thresholding and demonstrate how it can be applied to individual CAMs to improve their performance, measured using pixel perturbation method Remove and Debias (ROAD). Lastly, we show that MetaCAM outperforms existing CAMs and refines the most salient regions of images used for model predictions. In a specific example, MetaCAM improved ROAD performance to 0.393 compared to 11 individual CAMs with ranges from -0.101-0.172, demonstrating the importance of combining CAMs through an ensembling method and adaptive thresholding.
    摘要 需要清晰、可信的深度学习模型预测解释是高重要性领域,如医学和生物认知识别。图像活动地图(CAM)是深度学习模型的视觉解释方法之一,其性能受实验参数的影响,如选择的图像、目标类和模型。在这里,我们提出了MetaCAM,一种基于 ensemble 方法组合多个现有 CAM 方法的方法,通过最高活动像素的极值来决定组合。我们进行了量化的实验来确定MetaCAM experiment中的优化组合。此外,我们还提出了一种新的方法,称为累积差异效应(CRE),用于总结大规模的 ensemble 实验。最后,我们展示了如何应用适应阈值来改进个体 CAM 的性能,使用 Remove and Debias(ROAD)的像素扰动方法。我们的结果表明,MetaCAM 超过了现有 CAM 的性能,并提高了模型预测中使用的图像的最 salient 区域。例如,在一个特定的 MetaCAM 实验中,我们提高了 ROAD 性能到 0.393,比11个个体 CAM 的范围从 -0.101-0.172 高,这表明了将 CAM 通过 ensemble 方法和适应阈值组合可以提高性能。

Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives

  • paper_url: http://arxiv.org/abs/2307.16851
  • repo_url: None
  • paper_authors: Haoyang Liu, Maheep Chaudhary, Haohan Wang
  • For: 本文总结了过去十年内关于机器学习可靠性的研究发展,包括Robustness、安全、可读性和公平性等方面。* Methods: 本文使用了一种数据中心的方法来系统性地评估传统的零基eline risk minimization(ERM)训练方法在数据上的缺陷。同时,文章还探讨了基于 causality 理论的方法,并将这些方法与 Pearl 的 causality 层次结构相连接。* Results: 本文提供了一种统一的语言和数学术语来链接这些方法,并将它们与 robustness、对抗性、可读性和公平性等领域的方法相连接。这些方法的应用和未来发展也被详细介绍。
    Abstract The trustworthiness of machine learning has emerged as a critical topic in the field, encompassing various applications and research areas such as robustness, security, interpretability, and fairness. The last decade saw the development of numerous methods addressing these challenges. In this survey, we systematically review these advancements from a data-centric perspective, highlighting the shortcomings of traditional empirical risk minimization (ERM) training in handling challenges posed by the data. Interestingly, we observe a convergence of these methods, despite being developed independently across trustworthy machine learning subfields. Pearl's hierarchy of causality offers a unifying framework for these techniques. Accordingly, this survey presents the background of trustworthy machine learning development using a unified set of concepts, connects this language to Pearl's causal hierarchy, and finally discusses methods explicitly inspired by causality literature. We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness, fostering a more cohesive understanding of the field. Further, we explore the trustworthiness of large pretrained models. After summarizing dominant techniques like fine-tuning, parameter-efficient fine-tuning, prompting, and reinforcement learning with human feedback, we draw connections between them and the standard ERM. This connection allows us to build upon the principled understanding of trustworthy methods, extending it to these new techniques in large pretrained models, paving the way for future methods. Existing methods under this perspective are also reviewed. Lastly, we offer a brief summary of the applications of these methods and discuss potential future aspects related to our survey. For more information, please visit http://trustai.one.
    摘要 machine learning的可靠性已经成为该领域的关键话题,涵盖了多种应用和研究领域,如可靠性、安全性、可读性和公平性。过去十年许多方法提出了解决这些挑战的方法。在本文中,我们系统地回顾这些进步,从数据中心的视角出发, highlighting ERM 训练的缺陷在处理数据时。意外地,我们发现这些方法即使独立地在可靠机器学习子领域中发展,也存在一定的相似之处。以爱德华·珀尔的 causality 层次结构为基础,我们提供了一个统一的概念语言,将这些方法连接起来,并 finally 讨论了基于 causality 文献的方法。我们提供了一个统一的语言,以数学术语连接这些方法,从 robustness、对抗性、可读性和公平性等方面进行链接,以便更好地理解这个领域。此外,我们还探讨了大型预训模型的可靠性。我们首先概括了现有的方法,如 fine-tuning、参数效率的 fine-tuning、提示和人工反馈学习。然后,我们连接这些方法和标准 ERM,从而扩展了可靠方法的理解,应用于这些新的大型预训模型。现有的方法也被评论。最后,我们 brief summary 了这些方法的应用和未来方向。更多信息请参考http://trustai.one。

A Trajectory K-Anonymity Model Based on Point Density and Partition

  • paper_url: http://arxiv.org/abs/2307.16849
  • repo_url: None
  • paper_authors: Wanshu Yu, Haonan Shi, Hongyun Xu
  • for: 保护用户 trajectory 数据隐私
  • methods: 基于 Point Density 和 Partition 的 trajectory K-anonymity 模型
  • results: 提高了对 trajectory 数据的隐私保护,同时保持了数据的数据用性和算法执行时间的优化Here’s a more detailed explanation of each point:
  • for: The paper aims to protect users’ trajectory data privacy by proposing a trajectory K-anonymity model based on Point Density and Partition (KPDP).
  • methods: The proposed model uses Point Density and Partition to anonymize trajectory data and resist re-identification attacks.
  • results: The proposed model improves the existing trajectory generalization anonymization techniques regarding trajectory set partition preprocessing and trajectory clustering algorithms, and achieves better privacy protection, data utility, and algorithm execution time.
    Abstract As people's daily life becomes increasingly inseparable from various mobile electronic devices, relevant service application platforms and network operators can collect numerous individual information easily. When releasing these data for scientific research or commercial purposes, users' privacy will be in danger, especially in the publication of spatiotemporal trajectory datasets. Therefore, to avoid the leakage of users' privacy, it is necessary to anonymize the data before they are released. However, more than simply removing the unique identifiers of individuals is needed to protect the trajectory privacy, because some attackers may infer the identity of users by the connection with other databases. Much work has been devoted to merging multiple trajectories to avoid re-identification, but these solutions always require sacrificing data quality to achieve the anonymity requirement. In order to provide sufficient privacy protection for users' trajectory datasets, this paper develops a study on trajectory privacy against re-identification attacks, proposing a trajectory K-anonymity model based on Point Density and Partition (KPDP). Our approach improves the existing trajectory generalization anonymization techniques regarding trajectory set partition preprocessing and trajectory clustering algorithms. It successfully resists re-identification attacks and reduces the data utility loss of the k-anonymized dataset. A series of experiments on a real-world dataset show that the proposed model has significant advantages in terms of higher data utility and shorter algorithm execution time than other existing techniques.
    摘要

Latent Masking for Multimodal Self-supervised Learning in Health Timeseries

  • paper_url: http://arxiv.org/abs/2307.16847
  • repo_url: None
  • paper_authors: Shohreh Deldari, Dimitris Spathis, Mohammad Malekzadeh, Fahim Kawsar, Flora Salim, Akhil Mathur
  • for: 这篇论文旨在解决生物医学时间序列资料学习中的标签资料短缺问题,通过自主学习(SSL)方法学习数据表现。
  • methods: 这篇论文提出了两个新的概念:在特定频道Encoder中隐藏媒体特定的中间嵌入,并将其聚合到一个全球嵌入使用跨modal聚合器。这允许处理缺失的modalities并实现无需预先处理数据或时间consuming的负项双数据抽取。
  • results: 这篇论文的结果显示了与先前的SSL技术和指定标签数据的比较,在多modal时间序列benchmark上表现出色,并且具有最佳性能。它还分析了不同的掩蔽比率和策略的影响,并评估了对缺失modalities的学习表现的Robustness。
    Abstract Limited availability of labeled data for machine learning on biomedical time-series hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without labels. However, current SSL methods require expensive computations for negative pairs and are designed for single modalities, limiting their versatility. To overcome these limitations, we introduce CroSSL (Cross-modal SSL). CroSSL introduces two novel concepts: masking intermediate embeddings from modality-specific encoders and aggregating them into a global embedding using a cross-modal aggregator. This enables the handling of missing modalities and end-to-end learning of cross-modal patterns without prior data preprocessing or time-consuming negative-pair sampling. We evaluate CroSSL on various multimodal time-series benchmarks, including both medical-grade and consumer biosignals. Our results demonstrate superior performance compared to previous SSL techniques and supervised benchmarks with minimal labeled data. We additionally analyze the impact of different masking ratios and strategies and assess the robustness of the learned representations to missing modalities. Overall, our work achieves state-of-the-art performance while highlighting the benefits of masking latent embeddings for cross-modal learning in temporal health data.
    摘要 限制的可用数据导致生物医学时序学机器学习的进步受阻。无监督学习(SSL)是一种可能的方法,可以不使用标签学习数据表示。然而,当前的SSL方法需要高昂的计算成本和负样本,同时只适用于单模态,这限制了它们的 universality。为了突破这些限制,我们介绍了CrossSSL(跨Modal SSL)。CrossSSL引入了两个新的概念:在特定模式Encoder中遮盖中间嵌入,并使用跨模态聚合器将其聚合成全模态嵌入。这允许处理缺失的模式和终端学习跨模态模式,无需先进行数据预处理或时间consuming负样本生成。我们在多模态时序数据上评估了CrossSSL,包括医疗级和消费者生物信号。我们的结果显示CrossSSL的性能比前一代SSL技术和监督标准更高,即使只使用最少的标签数据。我们还分析了不同的遮盖比和策略,并评估了学习到缺失模式的表示的稳定性。总的来说,我们的工作实现了状态机器学习的表现,同时强调在 temporal health data 中遮盖嵌入的掩码对多模态学习的好处。

Identification of Driving Heterogeneity using Action-chains

  • paper_url: http://arxiv.org/abs/2307.16843
  • repo_url: None
  • paper_authors: Xue Yao, Simeon C. Calvert, Serge P. Hoogendoorn
  • for: 本研究旨在开发一种全面的驾驶异常性识别框架,从动作链视角出发,以捕捉驾驶行为的多样性和基本模式。
  • methods: 本研究提出了一种基于规则的分割技术,考虑驾驶行为的物理含义,然后建立了一个动作库,包括各种驾驶行为模式的描述。接着,本研究引入了动作阶段过渡概率,并提出了评估驾驶异常性的方法。
  • results: 使用实际数据进行评估,本研究的方法能够有效识别驾驶异常性,包括个体驾驶者和交通流动的异常性,并提供了明确的解释。这些发现可以帮助建立准确的驾驶行为理论和交通流动模型,从而提高交通性能和安全性。
    Abstract Current approaches to identifying driving heterogeneity face challenges in capturing the diversity of driving characteristics and understanding the fundamental patterns from a driving behaviour mechanism standpoint. This study introduces a comprehensive framework for identifying driving heterogeneity from an Action-chain perspective. First, a rule-based segmentation technique that considers the physical meanings of driving behaviour is proposed. Next, an Action phase Library including descriptions of various driving behaviour patterns is created based on the segmentation findings. The Action-chain concept is then introduced by implementing Action phase transition probability, followed by a method for evaluating driving heterogeneity. Employing real-world datasets for evaluation, our approach effectively identifies driving heterogeneity for both individual drivers and traffic flow while providing clear interpretations. These insights can aid the development of accurate driving behaviour theory and traffic flow models, ultimately benefiting traffic performance, and potentially leading to aspects such as improved road capacity and safety.
    摘要 当前的驾驶异同识别方法面临着捕捉驾驶特性多样性和从驾驶行为机制角度理解基本模式的挑战。本研究提出了基于Action-chain视角的全面驾驶异同识别框架。首先,我们提出了基于驾驶行为物理含义的规则生成分割技术。然后,基于分割结果,我们创建了驾驶行为模式库,其中包括各种驾驶行为模式的描述。接着,我们引入了Action阶段转移概率,并提出了评估驾驶异同的方法。使用实际数据进行评估,我们的方法能够有效地识别个体驾驶员和交通流动中的驾驶异同,并提供了明确的解释。这些发现可以帮助开发 precisions的驾驶行为理论和交通流动模型,从而提高交通性能,并可能导致改善道路容量和安全性。

Automated COVID-19 CT Image Classification using Multi-head Channel Attention in Deep CNN

  • paper_url: http://arxiv.org/abs/2308.00715
  • repo_url: None
  • paper_authors: Susmita Ghosh, Abhiroop Chatterjee
  • for: 用于检测COVID-19的计算机断层(CT)扫描图像的自动分类。
  • methods: 提出了一种基于深度学习的修改Xception模型,增加了通道注意力机制和负重 global average pooling,以提高特征提取。
  • results: 对广泛使用的COVID-19 CT扫描图像数据集进行实验,实现了非常高的准确率(96.99%),并证明了与其他现有技术相比的优越性。
    Abstract The rapid spread of COVID-19 has necessitated efficient and accurate diagnostic methods. Computed Tomography (CT) scan images have emerged as a valuable tool for detecting the disease. In this article, we present a novel deep learning approach for automated COVID-19 CT scan classification where a modified Xception model is proposed which incorporates a newly designed channel attention mechanism and weighted global average pooling to enhance feature extraction thereby improving classification accuracy. The channel attention module selectively focuses on informative regions within each channel, enabling the model to learn discriminative features for COVID-19 detection. Experiments on a widely used COVID-19 CT scan dataset demonstrate a very good accuracy of 96.99% and show its superiority to other state-of-the-art techniques. This research can contribute to the ongoing efforts in using artificial intelligence to combat current and future pandemics and can offer promising and timely solutions for efficient medical image analysis tasks.
    摘要 “快速蔓延的 COVID-19 病毒已经导致了高效和准确的诊断方法的需要。在这篇文章中,我们提出了一种新的深度学习方法,即使用修改过的 Xception 模型,并将 Channel Attention 模块和负重 globally average pooling 组合在一起,以提高特征提取和分类精度。Channel Attention 模块可以选择每个通道中的有用区域,让模型学习检测 COVID-19 的特征。实验结果显示,这种方法可以在一个常用的 COVID-19 CT 扫描数据集上取得非常高的准确率(96.99%),并较以前的州际技术更高。这个研究可以帮助现在和未来的感染病毒战略,并提供可靠的医疗图像分析任务的解决方案。”

Recent advancement in Disease Diagnostic using machine learning: Systematic survey of decades, comparisons, and challenges

  • paper_url: http://arxiv.org/abs/2308.01319
  • repo_url: None
  • paper_authors: Farzaneh Tajidini, Mohammad-Javad Kheiri
  • for: 这篇论文主要关注于计算机支持诊断(CAD)领域的研究,尤其是利用机器学习技术进行疾病检测和诊断。
  • methods: 该论文使用了多种机器学习算法和技术,包括学习 FROM EXAMPLES 的方法,以提高疾病检测和诊断的精度。
  • results: 该论文通过对多种疾病的检测和诊断而提出了一些结论,包括肝炎、糖尿病、肝病、登革热和心血管疾病等。
    Abstract Computer-aided diagnosis (CAD), a vibrant medical imaging research field, is expanding quickly. Because errors in medical diagnostic systems might lead to seriously misleading medical treatments, major efforts have been made in recent years to improve computer-aided diagnostics applications. The use of machine learning in computer-aided diagnosis is crucial. A simple equation may result in a false indication of items like organs. Therefore, learning from examples is a vital component of pattern recognition. Pattern recognition and machine learning in the biomedical area promise to increase the precision of disease detection and diagnosis. They also support the decision-making process's objectivity. Machine learning provides a practical method for creating elegant and autonomous algorithms to analyze high-dimensional and multimodal bio-medical data. This review article examines machine-learning algorithms for detecting diseases, including hepatitis, diabetes, liver disease, dengue fever, and heart disease. It draws attention to the collection of machine learning techniques and algorithms employed in studying conditions and the ensuing decision-making process.
    摘要 计算机支持诊断(CAD)是医疗影像研究领域的一个热门领域,在最近几年内快速扩大。由于医疗诊断系统中的错误可能会导致严重的医疗诊断错误,因此在最近几年中,大量的努力已经被投入到了计算机支持诊断应用程序的改进中。使用机器学习在计算机支持诊断中是非常重要的。一个简单的方程可能会导致识别物体如器官的错误指示。因此,学习示例是生成模式识别的重要组成部分。生物医学领域中的模式识别和机器学习技术承诺可以提高疾病检测和诊断的精度。它们还支持决策过程的公正性。机器学习提供了一种实用的方法来分析高维和多Modal生物医学数据。本文回顾了用于检测疾病的机器学习算法,包括肝炎、糖尿病、肝病、登革热和心脏病。它吸引注意力于计算机学习技术和算法在研究疾病 Condition 和 subsequent 决策过程中所使用的集合。

Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc

  • paper_url: http://arxiv.org/abs/2308.04445
  • repo_url: None
  • paper_authors: Doug Lenat, Gary Marcus
  • For: The paper discusses the limitations of current AI approaches, particularly large language models (LLMs), and outlines a vision for a more trustworthy and interpretable AI system that incorporates explicit knowledge and rules of thumb.* Methods: The paper proposes an alternative approach to AI that combines the strengths of LLMs with the reasoning capabilities of symbolic AI systems, using a hybrid approach that integrates both types of systems.* Results: The paper describes the development of an AI system called Cyc that is able to reason in higher order logic in real time, and suggests that a hybrid approach that combines LLMs and symbolic AI systems may be necessary for creating a truly trustworthy and interpretable AI system.
    Abstract Generative AI, the most popular current approach to AI, consists of large language models (LLMs) that are trained to produce outputs that are plausible, but not necessarily correct. Although their abilities are often uncanny, they are lacking in aspects of reasoning, leading LLMs to be less than completely trustworthy. Furthermore, their results tend to be both unpredictable and uninterpretable. We lay out 16 desiderata for future AI, and discuss an alternative approach to AI which could theoretically address many of the limitations associated with current approaches: AI educated with curated pieces of explicit knowledge and rules of thumb, enabling an inference engine to automatically deduce the logical entailments of all that knowledge. Even long arguments produced this way can be both trustworthy and interpretable, since the full step-by-step line of reasoning is always available, and for each step the provenance of the knowledge used can be documented and audited. There is however a catch: if the logical language is expressive enough to fully represent the meaning of anything we can say in English, then the inference engine runs much too slowly. That's why symbolic AI systems typically settle for some fast but much less expressive logic, such as knowledge graphs. We describe how one AI system, Cyc, has developed ways to overcome that tradeoff and is able to reason in higher order logic in real time. We suggest that any trustworthy general AI will need to hybridize the approaches, the LLM approach and more formal approach, and lay out a path to realizing that dream.
    摘要 现代人工智能的主流方法是生成AI,它们通过大量语言模型(LLM)训练来生成可能性强的输出,但并不一定是正确的。尽管其能力往往很强大,但它们缺乏分析能力,导致它们不够可靠。此外,它们的结果往往难以预测和解释。我们提出了16个未来人工智能的需求,并讨论了一种可能解决现有方法的限制的方法:通过手动筛选和规则的知识和规则,使推理引擎自动推理出所有知识的逻辑推论。这种方法可以生成可靠和可解释的长Arguments,因为整个步骤的逻辑推理的步骤都可以通过,并且每一步的知识使用的来源可以被文档和审核。然而,存在一个catch:如果逻辑语言足够表达任何我们可以在英语中说出的意思,那么推理引擎就会太慢。因此,符号AI系统通常选择一些快速而且较为不完整的逻辑,如知识图。我们描述了一个AI系统——Cyc,如何超越这个负担,在实时中进行高阶逻辑推理。我们建议任何可靠的通用AI都需要混合这两种方法,并走出实现这一梦想的路径。

Changes in Policy Preferences in German Tweets during the COVID Pandemic

  • paper_url: http://arxiv.org/abs/2308.04444
  • repo_url: None
  • paper_authors: Felix Biessmann
  • for: 这项研究的目的是自动提取在社交媒体上的政治偏好,以便更好地理解在线社交媒体中政治意见的表达方式。
  • methods: 该研究使用了一个新的 tweet 数据集,其中每个 tweet 都有细化的政治偏好标注。一种基于这个数据集的文本分类模型,然后应用于德国 Twitter 词汇库中的 tweet,从2019年到2022年。
  • results: 研究发现,在面对 COVID 疫情的情况下,人们对政治意见的表达增加了。通过使用一个已知的政策偏好分类法,分析了政治意见的细化分类,并发现表达政治意见的类别主要为 про-福利、 pro-教育和 pro-政府管理效率。
    Abstract Online social media have become an important forum for exchanging political opinions. In response to COVID measures citizens expressed their policy preferences directly on these platforms. Quantifying political preferences in online social media remains challenging: The vast amount of content requires scalable automated extraction of political preferences -- however fine grained political preference extraction is difficult with current machine learning (ML) technology, due to the lack of data sets. Here we present a novel data set of tweets with fine grained political preference annotations. A text classification model trained on this data is used to extract policy preferences in a German Twitter corpus ranging from 2019 to 2022. Our results indicate that in response to the COVID pandemic, expression of political opinions increased. Using a well established taxonomy of policy preferences we analyse fine grained political views and highlight changes in distinct political categories. These analyses suggest that the increase in policy preference expression is dominated by the categories pro-welfare, pro-education and pro-governmental administration efficiency. All training data and code used in this study are made publicly available to encourage other researchers to further improve automated policy preference extraction methods. We hope that our findings contribute to a better understanding of political statements in online social media and to a better assessment of how COVID measures impact political preferences.
    摘要 在线社交媒体已成为政治意见交换的重要平台。对于COVID措施,公民直接在这些平台上表达了政策偏好。但量化在线社交媒体中政治偏好的问题仍然具有挑战性:庞大的内容需要扫描式自动提取政治偏好,但现有机器学习技术的精度不够,因为数据集的缺乏。我们现在发布了一个新的推文数据集,其中每个推文都有细化的政治偏好标注。我们使用这些数据来训练文本分类模型,并在2019-2022年德国推文资源中提取政策偏好。我们的结果表明,在COVID疫情之后,政治意见的表达增加了。使用已确立的政策偏好分类法,我们分析了细化的政治观点,并发现COVID措施的影响。我们发现,政策偏好表达的增加主要来自“优化卫生保障”、“优化教育”和“提高政府管理效率”等类别。我们将所有训练数据和代码公开发布,以便其他研究人员可以继续改进自动政策偏好提取方法。我们希望我们的发现可以帮助更好地理解在线社交媒体中的政治声明,并为COVID措施的影响进行更好的评估。

Structural Transfer Learning in NL-to-Bash Semantic Parsers

  • paper_url: http://arxiv.org/abs/2307.16795
  • repo_url: None
  • paper_authors: Kyle Duffy, Satwik Bhattamishra, Phil Blunsom
  • for: 这篇论文是为了研究大规模预训练在自然语言处理中的进步而写的。
  • methods: 这篇论文提出了一种方法来获得机器翻译任务结构的量化理解,并应用到自然语言到Bash语义解析任务(NLBash)中。
  • results: 研究发现,NLBash大多可以归结为字幕对应,并且发现了自然语言到SQL之间的强大结构重叠。此外,通过在英语到德语翻译任务中不同计算资源的调整,研究发现更多的计算资源不总是导致更强的 semantic representations 的传递到 NLBash。
    Abstract Large-scale pre-training has made progress in many fields of natural language processing, though little is understood about the design of pre-training datasets. We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. We apply our methodology to the natural language to Bash semantic parsing task (NLBash) and show that it is largely reducible to lexical alignment. We also find that there is strong structural overlap between NLBash and natural language to SQL. Additionally, we perform a study varying compute expended during pre-training on the English to German machine translation task and find that more compute expended during pre-training does not always correspond semantic representations with stronger transfer to NLBash.
    摘要 大规模的预训练已经在自然语言处理多个领域进步了很 far, pero little is understood about the design of pre-training datasets. We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. We apply our methodology to the natural language to Bash semantic parsing task (NLBash) and show that it is largely reducible to lexical alignment. We also find that there is strong structural overlap between NLBash and natural language to SQL. Additionally, we perform a study varying compute expended during pre-training on the English to German machine translation task and find that more compute expended during pre-training does not always correspond to stronger transfer to NLBash.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the given text and may not be perfect, as the meaning of the text may be slightly different in Chinese.