cs.AI - 2023-08-02

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

  • paper_url: http://arxiv.org/abs/2308.01240
  • repo_url: None
  • paper_authors: Zhiqiang Yuan, Junwei Liu, Qiancheng Zi, Mingwei Liu, Xin Peng, Yiling Lou
  • for: 本研究评估了10个开源指定LM在四个代表性代码理解和生成任务上的性能。
  • methods: 我们使用了零shot、几个shot和精心调整的方法来评估指定LM的性能。
  • results: 我们发现,零shot设置下,指定LM在代码理解和生成任务上非常竞争力,有时 mêmebetter than小型SOTA模型专门为每个下游任务进行精心调整。此外,我们发现大型指定LM不总是在代码相关任务上更好。在几个shot设置下,我们发现,添加示例可以帮助指定LM在大多数代码理解和生成任务上表现更好,但是这些示例有时会导致模型的不稳定或worse性能。此外,我们发现广泛使用的BM25基于shot选择策略在生成问题上显著超过基本随机选择或固定选择。在精心调整设置下,我们发现,精心调整可以进一步提高模型在下游代码理解和生成任务上的性能,并且在同一个下游任务数据集上,指定LM在精心调整后表现更好于小型SOTA模型和无指定LM。
    Abstract In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction.
    摘要 在这项工作中,我们评估了10个开源指导大型语言模型(LLMs)在四个代表性的代码理解和生成任务上。我们发现以下主要结论:首先,在零次设定下,指导LLMs在代码理解和生成任务上非常竞争力,有时连小规模的特点领域模型特定 fine-tune 的每个下游任务都能够超越。我们还发现,更大的指导LLMs不总是在代码相关任务上更好。第二,在几次设定下,我们发现添加示例可以帮助指导LLMs在大多数代码理解和生成任务上表现更好,但有时示例会导致不稳定或甚至更差的表现。此外,我们发现广泛使用的BM25基于抽象选择策略可以在生成问题上显著超越基于随机选择或固定选择的策略。第三,在微调设定下,我们发现微调可以提高模型在下游代码理解和生成任务上的性能,并且在同一个下游任务数据集上微调后,指导LLMs可以超越小规模模型和无指导微调的同规模LLMs。根据我们的发现,我们进一步阐述了模型和使用建议、性能和成本贸易、未来方向等问题。

Do Multilingual Language Models Think Better in English?

  • paper_url: http://arxiv.org/abs/2308.01223
  • repo_url: https://github.com/juletx/self-translate
  • paper_authors: Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez de Lacalle, Mikel Artetxe
  • for: 提高多语言模型的性能
  • methods: 使用自动翻译系统
  • results: 自动翻译系统提高了模型的性能,但是模型无法充分利用其多语言潜力 when prompted in non-English languages.
    Abstract Translate-test is a popular technique to improve the performance of multilingual language models. This approach works by translating the input into English using an external machine translation system, and running inference over the translated input. However, these improvements can be attributed to the use of a separate translation system, which is typically trained on large amounts of parallel data not seen by the language model. In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system by leveraging the few-shot translation capabilities of multilingual language models. Experiments over 5 tasks show that self-translate consistently outperforms direct inference, demonstrating that language models are unable to leverage their full multilingual potential when prompted in non-English languages. Our code is available at https://github.com/juletx/self-translate.
    摘要 《翻译测试是一种广泛使用的技术,以提高多语言语音模型的性能。这种方法工作于将输入翻译成英语使用外部机器翻译系统,然后运行推理。然而,这些改进可以归功于使用分离的翻译系统,该系统通常在大量的并行数据上进行了训练。在这个工作中,我们介绍了一种新的方法called自动翻译(Self-translate),它超越了需要外部翻译系统的需求,通过多语言语音模型的几个shot翻译能力。经过5个任务的实验表明,自动翻译可以一直超越直接推理,表明语音模型在非英语语言下提问时无法全面发挥其多语言潜力。我们的代码可以在https://github.com/juletx/self-translate中找到。》

Calibration in Deep Learning: A Survey of the State-of-the-Art

  • paper_url: http://arxiv.org/abs/2308.01222
  • repo_url: None
  • paper_authors: Cheng Wang
  • For: This paper reviews the state-of-the-art calibration methods for deep neural models and provides an understanding of their principles for performing model calibration.* Methods: The paper introduces four categories of calibration methods, including post-hoc calibration, regularization methods, uncertainty estimation, and composition methods.* Results: The paper discusses recent advancements in calibrating large models, particularly large language models (LLMs), and highlights some open issues, challenges, and potential directions in model calibration.Here’s the full translation in simplified Chinese:* For: 这篇论文总结了现代神经网络模型的准确性calibration方法,并提供了这些方法的原理。* Methods: 论文介绍了四种准确性calibration方法,包括后期calibration、regularization方法、uncertainty估计和组合方法。* Results: 论文讨论了大型模型(特别是大语言模型)的准确性calibration方法,并提出了一些开放的问题、挑战和可能的方向。
    Abstract Calibrating deep neural models plays an important role in building reliable, robust AI systems in safety-critical applications. Recent work has shown that modern neural networks that possess high predictive capability are poorly calibrated and produce unreliable model predictions. Though deep learning models achieve remarkable performance on various benchmarks, the study of model calibration and reliability is relatively underexplored. Ideal deep models should have not only high predictive performance but also be well calibrated. There have been some recent methods proposed to calibrate deep models by using different mechanisms. In this survey, we review the state-of-the-art calibration methods and provide an understanding of their principles for performing model calibration. First, we start with the definition of model calibration and explain the root causes of model miscalibration. Then we introduce the key metrics that can measure this aspect. It is followed by a summary of calibration methods that we roughly classified into four categories: post-hoc calibration, regularization methods, uncertainty estimation, and composition methods. We also covered some recent advancements in calibrating large models, particularly large language models (LLMs). Finally, we discuss some open issues, challenges, and potential directions.
    摘要 <>传送文本到简化中文。<>深度神经网络调整扮演重要的角色在安全关键应用中建立可靠、Robust AI系统。最近的研究表明,现代神经网络具有高预测能力,但是受到不可靠的模型预测问题的影响。虽然深度学习模型在不同的标准概念上达到了惊人的性能,但是模型准确性和稳定性的研究尚未得到充分的探索。理想的深度模型应该不仅具有高预测性能,还应该具有良好的准确性。在这篇评论中,我们回顾了当前领域的状态对模型准确性的研究,并提供了这些方法的原理。首先,我们开始定义模型准确性,并解释了模型误准的根本原因。然后,我们介绍了用于评估这一方面的关键度量。接下来,我们简要概述了准确性方法,分为四类:后期准确化、规范方法、不确定度估计和组合方法。我们还讨论了大模型(LLMs)的准确性calibration。最后,我们讨论了一些未解决的问题、挑战和可能的方向。

Using ScrutinAI for Visual Inspection of DNN Performance in a Medical Use Case

  • paper_url: http://arxiv.org/abs/2308.01220
  • repo_url: None
  • paper_authors: Rebekka Görge, Elena Haedecke, Michael Mock
  • for: 本研究使用Visual Analytics(VA)工具ScrutinAI,以帮助人类分析员Investigate模型性能和数据集。模型性能受标签质量的影响很大,特别在医疗设置下,生成高质量标签需要深厚专业知识和很costly。经常情况下,数据集被收集由多个专家的意见来标签。我们使用ScrutinAI来分析标签变化 между不同专家对模型性能的影响。
  • methods: 我们使用了ScrutinAI工具来分析模型性能的原因,包括标签质量的变化和缺失对模型的影响。我们使用了一个公共可用的数据集来进行检测脑内出血和分类不同亚型的检测。
  • results: 我们的结果表明,ScrutinAI可以帮助分析员快速地发现模型性能的原因,并且可以分析标签变化的影响。我们发现,模型性能受标签质量的影响很大,而且在某些情况下,模型可能会受到缺失标签的影响。
    Abstract Our Visual Analytics (VA) tool ScrutinAI supports human analysts to investigate interactively model performanceand data sets. Model performance depends on labeling quality to a large extent. In particular in medical settings, generation of high quality labels requires in depth expert knowledge and is very costly. Often, data sets are labeled by collecting opinions of groups of experts. We use our VA tool to analyse the influence of label variations between different experts on the model performance. ScrutinAI facilitates to perform a root cause analysis that distinguishes weaknesses of deep neural network (DNN) models caused by varying or missing labeling quality from true weaknesses. We scrutinize the overall detection of intracranial hemorrhages and the more subtle differentiation between subtypes in a publicly available data set.
    摘要 我们的视觉分析工具ScrutinAI可以帮助人类分析员 investigate模型性能和数据集。模型性能受标签质量的影响很大,特别在医疗场景下,生成高质量标签需要深入的专业知识和非常昂贵。经常情况下,数据集由多名专家的意见集成标签。我们使用ScrutinAI来分析标签变化 между不同专家对模型性能的影响。ScrutinAI可以进行根本原因分析,从而分解深度神经网络模型的弱点,是因为标签质量变化或缺失,而不是真正的弱点。我们在一个公共可用的数据集中对脑出血的总检测和较为细微的差异分类进行了检验。

Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning Accelerator

  • paper_url: http://arxiv.org/abs/2308.01193
  • repo_url: None
  • paper_authors: Xiaobei Yan, Xiaoxuan Lou, Guowen Xu, Han Qiu, Shangwei Guo, Chip Hong Chang, Tianwei Zhang
  • for: 防止深度学习模型在加速器上的泄露和攻击。
  • methods: 自动化的远程边频攻击,通过模型化侧频泄露为序列-到-序列问题,使用时间数字转换器(TDC)收集目标模型的执行轨迹,然后使用学习模型自动提取目标模型的架构细节。
  • results: 可以准确率低于1%地提取目标模型的架构细节。
    Abstract DNN accelerators have been widely deployed in many scenarios to speed up the inference process and reduce the energy consumption. One big concern about the usage of the accelerators is the confidentiality of the deployed models: model inference execution on the accelerators could leak side-channel information, which enables an adversary to preciously recover the model details. Such model extraction attacks can not only compromise the intellectual property of DNN models, but also facilitate some adversarial attacks. Although previous works have demonstrated a number of side-channel techniques to extract models from DNN accelerators, they are not practical for two reasons. (1) They only target simplified accelerator implementations, which have limited practicality in the real world. (2) They require heavy human analysis and domain knowledge. To overcome these limitations, this paper presents Mercury, the first automated remote side-channel attack against the off-the-shelf Nvidia DNN accelerator. The key insight of Mercury is to model the side-channel extraction process as a sequence-to-sequence problem. The adversary can leverage a time-to-digital converter (TDC) to remotely collect the power trace of the target model's inference. Then he uses a learning model to automatically recover the architecture details of the victim model from the power trace without any prior knowledge. The adversary can further use the attention mechanism to localize the leakage points that contribute most to the attack. Evaluation results indicate that Mercury can keep the error rate of model extraction below 1%.
    摘要 although previous works have demonstrated several side-channel techniques to extract models from DNN accelerators, they are not practical for two reasons. First, they only target simplified accelerator implementations, which have limited practicality in the real world. Second, they require heavy human analysis and domain knowledge.To overcome these limitations, this paper presents Mercury, the first automated remote side-channel attack against off-the-shelf Nvidia DNN accelerators. The key insight of Mercury is to model the side-channel extraction process as a sequence-to-sequence problem. The adversary can leverage a time-to-digital converter (TDC) to remotely collect the power trace of the target model's inference. Then he uses a learning model to automatically recover the architecture details of the victim model from the power trace without any prior knowledge. The adversary can further use the attention mechanism to localize the leakage points that contribute most to the attack.Evaluation results indicate that Mercury can keep the error rate of model extraction below 1%.

Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.01189
  • repo_url: None
  • paper_authors: Yongkang He, Mingjin Chen, Zhijing Yang, Yongyi Lu
  • for: Addressing the dense labeling problem in medical image segmentation, where a significant fraction of the dataset can be pruned without sacrificing much accuracy.
  • methods: Proposing a data pruning method based on the Dynamic Average Dice (DAD) score, which takes into consideration the training dynamics on target regions.
  • results: Showing that the proposed method can effectively identify important samples and reduce the amount of labeled data needed for training, making it a strong yet simple baseline for medical image segmentation with combined data sources.
    Abstract This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.
    摘要
  1. Empirical analysis of the underlying causes of the problem.2. Development of an effective data pruning approach for dense labeling tasks in medical image analysis.Our solution can be used as a strong and simple baseline for selecting important examples in medical image segmentation with combined data sources.

Machine Learning-Based Diabetes Detection Using Photoplethysmography Signal Features

  • paper_url: http://arxiv.org/abs/2308.01930
  • repo_url: None
  • paper_authors: Filipe A. C. Oliveira, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
  • for: 这个研究旨在开发一种可靠的、无侵入的血糖察测方法,以帮助预防和控制糖尿病。
  • methods: 该研究使用了光学折射 Plethysmography (PPG) 技术,通过分析 PPG 信号和相关 metadata,采用 Logistic Regression (LR) 和 eXtreme Gradient Boosting (XGBoost) 算法进行分类,以将非糖尿病和糖尿病患者区分开来。
  • results: 研究结果显示,使用 PPG 信号和 metadata 进行训练,可以达到 F1-Score 和 AUC 的值为 $58.8\pm20.0%$ 和 $79.2\pm15.0%$ 以及 $51.7\pm16.5%$ 和 $73.6\pm17.0%$,分别对应 LR 和 XGBoost 算法。此外,特征分析表明,PPG 形态特征含有糖尿病相关信息,同时 metadata 也有一定的作用。这些结果与文献报道的结果相似,表明机器学习方法在开发远程、无侵入、连续测量糖尿病的设备方面具有潜在的抑血糖监测技术。
    Abstract Diabetes is a prevalent chronic condition that compromises the health of millions of people worldwide. Minimally invasive methods are needed to prevent and control diabetes but most devices for measuring glucose levels are invasive and not amenable for continuous monitoring. Here, we present an alternative method to overcome these shortcomings based on non-invasive optical photoplethysmography (PPG) for detecting diabetes. We classify non-Diabetic and Diabetic patients using the PPG signal and metadata for training Logistic Regression (LR) and eXtreme Gradient Boosting (XGBoost) algorithms. We used PPG signals from a publicly available dataset. To prevent overfitting, we divided the data into five folds for cross-validation. By ensuring that patients in the training set are not in the testing set, the model's performance can be evaluated on unseen subjects' data, providing a more accurate assessment of its generalization. Our model achieved an F1-Score and AUC of $58.8\pm20.0\%$ and $79.2\pm15.0\%$ for LR and $51.7\pm16.5\%$ and $73.6\pm17.0\%$ for XGBoost, respectively. Feature analysis suggested that PPG morphological features contains diabetes-related information alongside metadata. Our findings are within the same range reported in the literature, indicating that machine learning methods are promising for developing remote, non-invasive, and continuous measurement devices for detecting and preventing diabetes.
    摘要 DIABETES 是一种常见的慢性疾病,影响全球数百万人的健康。为了预防和控制 DIABETES,需要采用轻侵入的方法,但现有的糖尿病测量设备大多是侵入式,不适合持续监测。在这里,我们提出了一种替代方案,利用非侵入式的光学折射 plethysmography (PPG) 来检测 DIABETES。我们使用 PPG 信号和 metadata 来训练 Logistic Regression (LR) 和 eXtreme Gradient Boosting (XGBoost) 算法,并将数据分成五个批次进行十字验证。这样可以避免过拟合,并且通过将训练集和测试集分开,可以评估模型在未看到的数据上的性能,提供更准确的评估。我们的模型实现了 F1 分数和 AUC 的 $58.8\pm20.0\%$ 和 $79.2\pm15.0\%$,以及 $51.7\pm16.5\%$ 和 $73.6\pm17.0\%$,分别用于 LR 和 XGBoost。特征分析表明,PPG 形态特征含有糖尿病相关信息,同时与 metadata 相关。我们的发现与文献中的报告相符,表明机器学习方法在开发远程、非侵入式、持续测量糖尿病的设备方面具有潜在的承诺。

LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs

  • paper_url: http://arxiv.org/abs/2308.01157
  • repo_url: https://github.com/interpretml/talktoebm
  • paper_authors: Benjamin J. Lengerich, Sebastian Bordt, Harsha Nori, Mark E. Nunnally, Yin Aphinyanaphongs, Manolis Kellis, Rich Caruana
  • for: 这篇论文旨在探讨大语言模型(LLM)如何与可解释模型相结合,以实现数据科学中的一些常见任务自动化。
  • methods: 论文使用了层次逻辑来理解复杂的结果,并采用了大语言模型的广泛背景知识来自动化数据科学中的一些任务,如检测异常点、描述异常原因以及修复异常。
  • results: 论文通过多个医疗Example示cases,展示了这种新的LLM功能的实用性,特别是在使用Generalized Additive Models (GAMs)时。最后,论文还介绍了一个开源的LLM-GAM接口——$\texttt{TalkToEBM}$ package。
    Abstract We show that large language models (LLMs) are remarkably good at working with interpretable models that decompose complex outcomes into univariate graph-represented components. By adopting a hierarchical approach to reasoning, LLMs can provide comprehensive model-level summaries without ever requiring the entire model to fit in context. This approach enables LLMs to apply their extensive background knowledge to automate common tasks in data science such as detecting anomalies that contradict prior knowledge, describing potential reasons for the anomalies, and suggesting repairs that would remove the anomalies. We use multiple examples in healthcare to demonstrate the utility of these new capabilities of LLMs, with particular emphasis on Generalized Additive Models (GAMs). Finally, we present the package $\texttt{TalkToEBM}$ as an open-source LLM-GAM interface.
    摘要 我们显示大型语言模型(LLMs)可以非常好地与可解释性模型(decompose complex outcomes into univariate graph-represented components)合作。通过阶层式思考方式,LLMs可以提供全面的模型水平摘要,无需整个模型满足上下文中的适应。这种方法允许LLMs应用它们广泛的背景知识来自动进行资料科学中常见的任务,例如检测对前知识不符的异常,描述异常的可能原因,并建议修复异常的方法。我们使用了多个医疗保健例子来证明这些新功能的价值,尤其是对于泛化添加模型(GAMs)。最后,我们发布了名为 $\texttt{TalkToEBM}$ 的开源 LLM-GAM 界面。

Arithmetic with Language Models: from Memorization to Computation

  • paper_url: http://arxiv.org/abs/2308.01154
  • repo_url: None
  • paper_authors: Davide Maltoni, Matteo Ferrara
  • for: investigate the emergent computation and problem-solving capabilities of recent large language models
  • methods: trained the language model to predict the next token, and tested its ability to perform binary addition and multiplication
  • results: the language model was able to learn these tasks and exhibited extrapolation capabilities, supporting the hypothesis that the model works as an Encoding-Regression-Decoding machine.
    Abstract A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypotheses that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.
    摘要 需要更深刻的理解 latest large language model 的 emergent computation 和问题解决能力,以便进一步改进它们和扩展其应用范围。这项工作 investigate 如何一个语言模型,受训练来预测下一个token,可以执行扩展到训练数据之外的数学计算。二进制加法和乘法是一个好的测试场景,因为它们需要非常小的词汇表,并且表现出输入/输出离散性,使得smooth输入 interpolate 无效 для novel data。我们成功地培养了一个轻量级语言模型,学习这些任务,并进行了一些实验来研究 extrapolation 能力和内部信息处理。我们的发现支持假设,语言模型是一个编码-回归-解码机器,计算在值空间中发生,只要输入token表示Mapping 到合适的内部表示即可。

A Transformer-based Prediction Method for Depth of Anesthesia During Target-controlled Infusion of Propofol and Remifentanil

  • paper_url: http://arxiv.org/abs/2308.01929
  • repo_url: https://github.com/heeeyk/transformer-doa-prediction
  • paper_authors: Yongkang He, Siyuan Peng, Mingjin Chen, Zhijing Yang, Yuanhui Chen
    for: 预测麻醉效果的准确性是脊梁控制注射系统的关键,传统的PK-PD模型需要手动选择模型参数,这可能是临床设置中困难的。methods: 我们提议使用变换器来预测麻醉深度(DOA),使用批处理和闭合径远网络来提高特征融合的效率,并应用关注机制来发现药物之间的互动。我们还使用标签分布平滑和重新平衡损失来解决数据不均衡。results: 我们的提议方法比传统PK-PD模型和先前的深度学习方法更高效,可以正确预测麻醉深度在快速和深度麻醉 conditons下。
    Abstract Accurately predicting anesthetic effects is essential for target-controlled infusion systems. The traditional (PK-PD) models for Bispectral index (BIS) prediction require manual selection of model parameters, which can be challenging in clinical settings. Recently proposed deep learning methods can only capture general trends and may not predict abrupt changes in BIS. To address these issues, we propose a transformer-based method for predicting the depth of anesthesia (DOA) using drug infusions of propofol and remifentanil. Our method employs long short-term memory (LSTM) and gate residual network (GRN) networks to improve the efficiency of feature fusion and applies an attention mechanism to discover the interactions between the drugs. We also use label distribution smoothing and reweighting losses to address data imbalance. Experimental results show that our proposed method outperforms traditional PK-PD models and previous deep learning methods, effectively predicting anesthetic depth under sudden and deep anesthesia conditions.
    摘要 Accurately predicting anesthetic effects is crucial for target-controlled infusion systems. Traditional PK-PD models for Bispectral index (BIS) prediction require manual selection of model parameters, which can be challenging in clinical settings. Recently proposed deep learning methods can only capture general trends and may not predict abrupt changes in BIS. To address these issues, we propose a transformer-based method for predicting the depth of anesthesia (DOA) using drug infusions of propofol and remifentanil. Our method employs long short-term memory (LSTM) and gate residual network (GRN) networks to improve the efficiency of feature fusion and applies an attention mechanism to discover the interactions between the drugs. We also use label distribution smoothing and reweighting losses to address data imbalance. Experimental results show that our proposed method outperforms traditional PK-PD models and previous deep learning methods, effectively predicting anesthetic depth under sudden and deep anesthesia conditions.Here's the text with some notes on the translation:1. "Accurately predicting anesthetic effects" is translated as "正确预测医疗效果" (zhèng kě bìng jì yì yì qiú yè)2. "target-controlled infusion systems" is translated as "目标控制注射系统" (mou zhì kòng zhì zhù shí tè)3. "Bispectral index" is translated as "复探测指数" (fù guān zhì shù)4. "traditional PK-PD models" is translated as "传统PK-PD模型" (chuán tǒng PK-PD mó yì)5. "deep learning methods" is translated as "深度学习方法" (shēn dào xué xí fāng fǎ)6. "long short-term memory" is translated as "长期忘却记忆" (cháng qī wàng qiū jì yì)7. "gate residual network" is translated as "门阶差异网络" (mén jiē yì zhī wǎng)8. "label distribution smoothing" is translated as "标签分布平滑" (biāo jiāo fān bù píng shuā)9. "reweighting losses" is translated as "重新评估损失" (zhòng xīn píng shí shū shì)Note that the translation is based on the standard Simplified Chinese characters and may vary depending on the specific context and region.

Can We Transfer Noise Patterns? A Multi-environment Spectrum Analysis Model Using Generated Cases

  • paper_url: http://arxiv.org/abs/2308.01138
  • repo_url: https://github.com/magnomic/cnst
  • paper_authors: Haiwen Du, Zheng Ju, Yu An, Honghui Du, Dongjie Zhu, Zhaoshuo Tian, Aonghus Lawlor, Ruihai Dong
  • for: 提高 Deep Learning 模型的性能,增加高质量的数据集
  • methods: 提出一种将噪声特征传递模型,通过学习不同环境中标准水样 спектrum 的差异,实现噪声传递
  • results: 对不同背景噪声进行实验,表明提出的方法可以减少噪声影响,提高 Deep Learning 模型的性能,并且比基eline系统(包括浪涌滤波、深度神经网络和生成模型)更好。
    Abstract Spectrum analysis systems in online water quality testing are designed to detect types and concentrations of pollutants and enable regulatory agencies to respond promptly to pollution incidents. However, spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. To make the analysis model applicable to more environments, we propose a noise patterns transferring model, which takes the spectrum of standard water samples in different environments as cases and learns the differences in their noise patterns, thus enabling noise patterns to transfer to unknown samples. Unfortunately, the inevitable sample-level baseline noise makes the model unable to obtain the paired data that only differ in dataset-level environmental noise. To address the problem, we generate a sample-to-sample case-base to exclude the interference of sample-level noise on dataset-level noise learning, enhancing the system's learning performance. Experiments on spectral data with different background noises demonstrate the good noise-transferring ability of the proposed method against baseline systems ranging from wavelet denoising, deep neural networks, and generative models. From this research, we posit that our method can enhance the performance of DL models by generating high-quality cases. The source code is made publicly available online at https://github.com/Magnomic/CNST.
    摘要

A Survey on Popularity Bias in Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.01118
  • repo_url: None
  • paper_authors: Anastasiia Klimashevskaia, Dietmar Jannach, Mehdi Elahi, Christoph Trattner
  • for: 本研究旨在探讨现有推荐系统中偏好强度的问题,以及如何检测、衡量和缓解这种偏好。
  • methods: 本文评论了现有的计算指标和技术方法,以帮助减少偏好强度的影响。
  • results: 研究发现,现有的推荐算法在大多数情况下受到偏好强度的影响,导致推荐结果偏向流行的项目。此外,研究还发现现有的研究基本上仅仅通过计算实验和假设来评估实际效果。
    Abstract Recommender systems help people find relevant content in a personalized way. One main promise of such systems is that they are able to increase the visibility of items in the long tail, i.e., the lesser-known items in a catalogue. Existing research, however, suggests that in many situations today's recommendation algorithms instead exhibit a popularity bias, meaning that they often focus on rather popular items in their recommendations. Such a bias may not only lead to limited value of the recommendations for consumers and providers in the short run, but it may also cause undesired reinforcement effects over time. In this paper, we discuss the potential reasons for popularity bias and we review existing approaches to detect, quantify and mitigate popularity bias in recommender systems. Our survey therefore includes both an overview of the computational metrics used in the literature as well as a review of the main technical approaches to reduce the bias. We furthermore critically discuss today's literature, where we observe that the research is almost entirely based on computational experiments and on certain assumptions regarding the practical effects of including long-tail items in the recommendations.
    摘要

Literal-Aware Knowledge Graph Embedding for Welding Quality Monitoring: A Bosch Case

  • paper_url: http://arxiv.org/abs/2308.01105
  • repo_url: None
  • paper_authors: Zhipeng Tan, Baifan Zhou, Zhuoxun Zheng, Ognjen Savkovic, Ziqi Huang, Irlan-Grangel Gonzalez, Ahmet Soylu, Evgeny Kharlamov
  • for: 本研究探讨了知识图 embedding (KGE) 是否可以应用于重要的工业问题:质量监测在生产过程中的焊接。
  • methods: 本研究使用了流行的 KGE 方法,并考虑了文本 literal。
  • results: 研究发现了 KGE 方法在实际工业数据上的限制和推荐,并解决了两个困难问题:焊接点大小和焊接点所属的车体类别。
    Abstract Recently there has been a series of studies in knowledge graph embedding (KGE), which attempts to learn the embeddings of the entities and relations as numerical vectors and mathematical mappings via machine learning (ML). However, there has been limited research that applies KGE for industrial problems in manufacturing. This paper investigates whether and to what extent KGE can be used for an important problem: quality monitoring for welding in manufacturing industry, which is an impactful process accounting for production of millions of cars annually. The work is in line with Bosch research of data-driven solutions that intends to replace the traditional way of destroying cars, which is extremely costly and produces waste. The paper tackles two very challenging questions simultaneously: how large the welding spot diameter is; and to which car body the welded spot belongs to. The problem setting is difficult for traditional ML because there exist a high number of car bodies that should be assigned as class labels. We formulate the problem as link prediction, and experimented popular KGE methods on real industry data, with consideration of literals. Our results reveal both limitations and promising aspects of adapted KGE methods.
    摘要

  • paper_url: http://arxiv.org/abs/2308.01098
  • repo_url: None
  • paper_authors: Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao, Chang-Ping Peng, Zhan-Gang Lin, Jing-He Hu, Jing-Ping Shao
  • for: 提高在线搜索系统中的用户意图理解效果,尤其是在低延迟下。
  • methods: 使用知识凝固(KC)框架,将在线快速的FastText模型升级到更深和复杂的BERT模型,以提高分类性能。
  • results: 通过在线A/B测试和多个数据集的实验,证明了提议的方法可以提高分类性能,同时保持低延迟。
    Abstract Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.
    摘要 “搜寻查询分类是在现实世界上线广告系统中非常重要的一种方法,以了解用户的意图。为了降低延迟,通常使用轻量级模型(例如 FastText)进行简单的在线推导。然而,FastText模型的表现能力不足,尤其是在一些低频查询和尾部分类上,导致分类性能不佳。使用更深和复杂的模型(例如 BERT)是一个有效的解决方案,但它将导致更高的在线推导延迟和更贵的计算成本。因此,如何均衡在线推导效率和分类性能是非常重要的实际问题。在这篇论文中,我们提出了知识储存(KC),一个简单而有效的知识传播框架,以提高在线 FastText 模型的分类性能。具体来说,我们提出了在网上训练一个 BERT 模型,以获取更多可能有用的数据。由于它的强大 semantic 表现,更多的标签不在历史数据中 exposed 的将被添加到训练集中,以提高 FastText 模型的训练。此外,我们还提出了一个多元专家学习策略,以进一步提高挖掘有用数据的能力。通过对多个数据分布进行多个 BERT 模型的训练,每个模型可以在不同的查询频率上表现更好。多元模型的集成可以将其挖掘能力提高。我们在 JD 搜寻中部署了两个版本的这个框架,并在多个数据集上进行了离线实验和在线 A/B 测试。结果显示,我们的方法有效地提高了在线 FastText 模型的分类性能。”

Scaling Data Science Solutions with Semantics and Machine Learning: Bosch Case

  • paper_url: http://arxiv.org/abs/2308.01094
  • repo_url: None
  • paper_authors: Baifan Zhou, Nikolay Nikolov, Zhuoxun Zheng, Xianghui Luo, Ognjen Savkovic, Dumitru Roman, Ahmet Soylu, Evgeny Kharlamov
  • For: 本研究旨在解决由于工业4.0和物联网(IoT)技术引入大量数据,导致云计算系统处理大数据的挑战。特别是,随着云计算系统的普及,需要更多的用户(如数据科学家、领域专家)在云计算系统上部署解决方案,但是训练这些用户需要很长时间。* Methods: 本研究提出了一种 semantics-enhanced 云计算系统(SemCloud),它将云计算系统与semantic技术和机器学习相结合。SemCloud 利用域ontologies和映射来实现数据集成,并在分布计算节点上并行进行semantic数据集成和数据分析。此外,SemCloud 采用自适应的 Datalog 规则和机器学习来自动配置资源,使得非云专家可以使用云计算系统。* Results: 本研究在工业用例中测试了 SemCloud,结果表明其在处理大数据时表现出色,并且在千次重复运行和领域用户的帮助下,SemCloud 能够实现自动化资源配置和高效数据处理。
    Abstract Industry 4.0 and Internet of Things (IoT) technologies unlock unprecedented amount of data from factory production, posing big data challenges in volume and variety. In that context, distributed computing solutions such as cloud systems are leveraged to parallelise the data processing and reduce computation time. As the cloud systems become increasingly popular, there is increased demand that more users that were originally not cloud experts (such as data scientists, domain experts) deploy their solutions on the cloud systems. However, it is non-trivial to address both the high demand for cloud system users and the excessive time required to train them. To this end, we propose SemCloud, a semantics-enhanced cloud system, that couples cloud system with semantic technologies and machine learning. SemCloud relies on domain ontologies and mappings for data integration, and parallelises the semantic data integration and data analysis on distributed computing nodes. Furthermore, SemCloud adopts adaptive Datalog rules and machine learning for automated resource configuration, allowing non-cloud experts to use the cloud system. The system has been evaluated in industrial use case with millions of data, thousands of repeated runs, and domain users, showing promising results.
    摘要 产业4.0和物联网(IoT)技术在工厂生产中释放了历史上未曾有的数据量,带来大规模数据挑战,包括数据量和多样性。在这种情况下,分布式计算解决方案如云系统得到了广泛应用,以并行处理数据并降低计算时间。然而,随着云系统的普及,需要更多的用户,包括不是云专家(如数据科学家、领域专家)部署他们的解决方案在云系统上。然而,寻求这些高需求和长时间培训是非常困难的。为此,我们提出了SemCloud,一个具有语义技术和机器学习的云系统。SemCloud通过域ontologies和映射来集成数据,并在分布式计算节点上并行执行语义数据集成和数据分析。此外,SemCloud采用自适应的Datalog规则和机器学习来自动配置资源,使非云专家可以使用云系统。我们在产业用例中测试了SemCloud,结果表现良好,处理了数百万个数据, thousands of repeated runs,和领域用户。

Hand tracking for clinical applications: validation of the Google MediaPipe Hand (GMH) and the depth-enhanced GMH-D frameworks

  • paper_url: http://arxiv.org/abs/2308.01088
  • repo_url: None
  • paper_authors: Gianluca Amprimo, Giulia Masi, Giuseppe Pettiti, Gabriella Olmo, Lorenzo Priano, Claudia Ferraris
  • for: validate the handtracking framework implemented by Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D
  • methods: 使用 Google MediaPipe Hand (GMH) 和一种改进版本 GMH-D,利用RGB-深度摄像头的深度估计来实现更加精准的3D手势跟踪
  • results: 比较GMH和GMH-D两种方法的结果,发现GMH-D在空间测量方面具有更高的准确性,特别是对于慢速和快速手势的测量Here’s the translation in English for reference:
  • for: validate the handtracking framework implemented by Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D
  • methods: use Google MediaPipe Hand (GMH) and an improved version GMH-D, utilizing the depth estimation of an RGB-Depth camera to achieve more accurate 3D hand gesture tracking
  • results: compare the results of GMH and GMH-D, showing that GMH-D has higher accuracy in spatial measurements, particularly for slow and fast hand gestures.
    Abstract Accurate 3D tracking of hand and fingers movements poses significant challenges in computer vision. The potential applications span across multiple domains, including human-computer interaction, virtual reality, industry, and medicine. While gesture recognition has achieved remarkable accuracy, quantifying fine movements remains a hurdle, particularly in clinical applications where the assessment of hand dysfunctions and rehabilitation training outcomes necessitate precise measurements. Several novel and lightweight frameworks based on Deep Learning have emerged to address this issue; however, their performance in accurately and reliably measuring fingers movements requires validation against well-established gold standard systems. In this paper, the aim is to validate the handtracking framework implemented by Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D, that exploits the depth estimation of an RGB-Depth camera to achieve more accurate tracking of 3D movements. Three dynamic exercises commonly administered by clinicians to assess hand dysfunctions, namely Hand Opening-Closing, Single Finger Tapping and Multiple Finger Tapping are considered. Results demonstrate high temporal and spectral consistency of both frameworks with the gold standard. However, the enhanced GMH-D framework exhibits superior accuracy in spatial measurements compared to the baseline GMH, for both slow and fast movements. Overall, our study contributes to the advancement of hand tracking technology, the establishment of a validation procedure as a good-practice to prove efficacy of deep-learning-based hand-tracking, and proves the effectiveness of GMH-D as a reliable framework for assessing 3D hand movements in clinical applications.
    摘要 准确的3D手部运动跟踪在计算机视觉领域 pose significant challenges. 该领域的应用范围广泛,包括人机交互、虚拟现实、工业和医学。虽然手势识别已经达到了很高的准确性,但量化细微的手部运动仍然是一个难点,特别是在医学应用中, где评估手功能缺陷和rehabilitation training outcome需要精准的量化。Recently, several novel and lightweight frameworks based on Deep Learning have emerged to address this issue. However, their performance in accurately and reliably measuring finger movements requires validation against well-established gold standard systems.在这篇论文中,我们的目标是验证Google MediaPipe Hand(GMH)实现的手跟踪框架和一个创新的改进版本GMH-D,该版本利用RGB-深度摄像头的深度估计来实现更高精度的3D手部运动跟踪。我们考虑了临床医生通常用于评估手功能缺陷的三种动作, namely Hand Opening-Closing, Single Finger Tapping和Multiple Finger Tapping。结果表明两个框架具有高度和spectral consistency with the gold standard。然而,GMH-D版本在空间量化方面表现出了更高的准确性,特别是对于slow和fast movement。总的来说,本研究对手跟踪技术的进步、设立了一种验证手 tracking效果的良好做法,以及证明了GMH-D版本在临床应用中是一个可靠的手部运动跟踪框架。

Spatial Intelligence of a Self-driving Car and Rule-Based Decision Making

  • paper_url: http://arxiv.org/abs/2308.01085
  • repo_url: None
  • paper_authors: Stanislav Kikot
  • for: 自动驾驶车辆在复杂交通情况下实现人类Like的行为, combining rule-based decision making with traditional motion planning techniques.
  • methods: 使用规则基于决策和传统的运动规划技术。
  • results: 实现了人类Like的自动驾驶车辆行为,提供了开发机器人空间意识的技术研究方向。
    Abstract In this paper we show how rule-based decision making can be combined with traditional motion planning techniques to achieve human-like behavior of a self-driving vehicle in complex traffic situations. We give and discuss examples of decision rules in autonomous driving. We draw on these examples to illustrate that developing techniques for spatial awareness of robots is an exciting activity which deserves more attention from spatial reasoning community that it had received so far.
    摘要 在这篇论文中,我们展示了如何基于规则的决策可以与传统的动态规划技术相结合,以实现自动驾驶车辆在复杂交通情况下的人类化行为。我们给出了和讨论了自动驾驶决策规则的例子。我们通过这些例子来示例,发展robots空间意识技术是一项有趣的活动,这个领域在空间理解社区中得到了更多的注意力。

Graph Anomaly Detection at Group Level: A Topology Pattern Enhanced Unsupervised Approach

  • paper_url: http://arxiv.org/abs/2308.01063
  • repo_url: None
  • paper_authors: Xing Ai, Jialong Zhou, Yulin Zhu, Gaolei Li, Tomasz P. Michalak, Xiapu Luo, Kai Zhou
  • For: This paper proposes a novel unsupervised framework for Group-level Graph Anomaly Detection (Gr-GAD) to identify and localize anomaly groups within a graph.* Methods: The proposed framework employs a variant of Graph AutoEncoder (GAE) to locate anchor nodes that belong to potential anomaly groups, followed by group sampling and Topology Pattern-based Graph Contrastive Learning (TPGCL) to identify and localize anomaly groups.* Results: The experimental results on both real-world and synthetic datasets demonstrate the superior performance of the proposed framework in identifying and localizing anomaly groups, highlighting its potential for practical applications.Here’s the full text in Simplified Chinese:* For: 本文提出了一种新的无监督框架,用于检测图像中异常群体(Group-level Graph Anomaly Detection,Gr-GAD)。* Methods: 该框架使用变形的图自编码器(Graph AutoEncoder,GAE)来找到潜在异常群体的anchor节点,然后使用组样本和图 Pattern-based Graph Contrastive Learning(TPGCL)来识别和定位异常群体。* Results: 实验结果表明,提出的框架在真实世界和 sintetic 数据集上具有优秀的表现,能够准确地识别和定位异常群体,这 highlights 其在实际应用中的潜在价值。
    Abstract Graph anomaly detection (GAD) has achieved success and has been widely applied in various domains, such as fraud detection, cybersecurity, finance security, and biochemistry. However, existing graph anomaly detection algorithms focus on distinguishing individual entities (nodes or graphs) and overlook the possibility of anomalous groups within the graph. To address this limitation, this paper introduces a novel unsupervised framework for a new task called Group-level Graph Anomaly Detection (Gr-GAD). The proposed framework first employs a variant of Graph AutoEncoder (GAE) to locate anchor nodes that belong to potential anomaly groups by capturing long-range inconsistencies. Subsequently, group sampling is employed to sample candidate groups, which are then fed into the proposed Topology Pattern-based Graph Contrastive Learning (TPGCL) method. TPGCL utilizes the topology patterns of groups as clues to generate embeddings for each candidate group and thus distinct anomaly groups. The experimental results on both real-world and synthetic datasets demonstrate that the proposed framework shows superior performance in identifying and localizing anomaly groups, highlighting it as a promising solution for Gr-GAD. Datasets and codes of the proposed framework are at the github repository https://anonymous.4open.science/r/Topology-Pattern-Enhanced-Unsupervised-Group-level-Graph-Anomaly-Detection.
    摘要 “几何异常检测(GAD)已经取得成功并广泛应用于不同领域,如诈欺检测、网络安全、金融安全和生物化学。但是现有的几何异常检测算法仅专注于分别的元素(节点或几何),忽略了可能的异常群体在几何中。为了解决这个限制,本文提出了一个新的无监督框架,即Group-level Graph Anomaly Detection(Gr-GAD)。”“提案的框架首先使用一种几何自动化器(GAE)的变种来找到可能的异常群体的节点。接着,群体抽样被使用来抽样候选群体,并将其 feed 到提案的几何模式基于的几何对称学习(TPGCL)方法。TPGCL 使用群体的几何模式作为来源,将每个候选群体转换为不同的异常群体。实验结果显示,提案的框架在真实世界和 sintetic 数据集上显示出了优秀的表现,能够实时识别和定位异常群体,因此被认为是一个有前途的解决方案。”“如需取得 datasets 和代码,请参考以下 GitHub 存储库:https://anonymous.4open.science/r/Topology-Pattern-Enhanced-Unsupervised-Group-level-Graph-Anomaly-Detection。”

A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles’ Riskiness

  • paper_url: http://arxiv.org/abs/2308.01050
  • repo_url: None
  • paper_authors: Alessandro Zanardi, Andrea Censi, Margherita Atzei, Luigi Di Lillo, Emilio Frazzoli
  • for: 这篇论文旨在评估自动驾驶车辆(AVs)的风险,以便更好地评估AVs在不同的操作设计域(ODDs)中的安全性。
  • methods: 这篇论文提出了一种基于对比式模拟的数据驱动方法,用于比较不同AVs的行为在不同ODDs中的风险。该方法基于“违反行为”的counterfactual simulations,以计算AVs的安全空间。
  • results: 实验结果表明,提出的方法可以评估AVs的风险,并且可以评估不同AV提供商的安全性。这种方法可以在AV的行为策略未知时也进行应用,因此可以用于第三方风险评估人员。
    Abstract Autonomous Vehicles (AVs) have the potential to provide numerous societal benefits, such as decreased road accidents and increased overall transportation efficiency. However, quantifying the risk associated with AVs is challenging due to the lack of historical data and the rapidly evolving technology. This paper presents a data-driven framework for comparing the risk of different AVs' behaviors in various operational design domains (ODDs), based on counterfactual simulations of "misbehaving" road users. We introduce the concept of counterfactual safety margin, which represents the minimum deviation from normal behavior that could lead to a collision. This concept helps to find the most critical scenarios but also to assess the frequency and severity of risk of AVs. We show that the proposed methodology is applicable even when the AV's behavioral policy is unknown -- through worst- and best-case analyses -- making the method useful also to external third-party risk assessors. Our experimental results demonstrate the correlation between the safety margin, the driving policy quality, and the ODD shedding light on the relative risk associated with different AV providers. This work contributes to AV safety assessment and aids in addressing legislative and insurance concerns surrounding this emerging technology.
    摘要 We introduce the concept of counterfactual safety margin, which represents the minimum deviation from normal behavior that could lead to a collision. This concept helps to identify the most critical scenarios and assess the frequency and severity of risks associated with AVs. We show that the proposed methodology is applicable even when the AV's behavioral policy is unknown, making it useful for external third-party risk assessors.Our experimental results demonstrate a correlation between the safety margin, driving policy quality, and ODD, shedding light on the relative risk associated with different AV providers. This work contributes to AV safety assessment and addresses legislative and insurance concerns surrounding this emerging technology.Translation notes:* "Autonomous Vehicles" is translated as "自动驾驶车辆" (zì àuto véhículóu)* "operational design domains" is translated as "运营设计领域" (yùn xíng jiè yì)* "counterfactual simulations" is translated as "对比 simulations" (duì bèi simulào)* "safety margin" is translated as "安全余地" (ān qū yú dì)* "driving policy" is translated as "驾驶策略" (jì shǐ mǎ lü)* "ODD" is translated as "运营设计领域" (yùn xíng jiè yì)Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China.

Chat Translation Error Detection for Assisting Cross-lingual Communications

  • paper_url: http://arxiv.org/abs/2308.01044
  • repo_url: https://github.com/cl-tohoku/bpersona-chat
  • paper_authors: Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Ryoko Tokuhisa, Ana Brassard, Kentaro Inui
  • for: 本研究开发了一个通信支持系统,用于监测机器翻译错误,以促进跨语言通信。
  • methods: 研究人员使用了一个错误检测器作为系统的基础,并建立了一个新的日本英文双语聊天数据库(BPersona-chat),该数据库包含多轮漫谈对话,并具有人工审核的品质评分。
  • results: 错误检测器可以作为更进阶的错误翻译检测系统的基础。
    Abstract In this paper, we describe the development of a communication support system that detects erroneous translations to facilitate crosslingual communications due to the limitations of current machine chat translation methods. We trained an error detector as the baseline of the system and constructed a new Japanese-English bilingual chat corpus, BPersona-chat, which comprises multiturn colloquial chats augmented with crowdsourced quality ratings. The error detector can serve as an encouraging foundation for more advanced erroneous translation detection systems.
    摘要 在这篇论文中,我们描述了一个交流支持系统,用于检测machine翻译错译以促进多语言交流。我们基于基线错误检测器进行了训练,并构建了一个新的日语英语对话资料集,BPersona-chat,这些对话包括多轮口语对话,并且通过人工投票来评估对话质量。错误检测器可以作为更先进的错误翻译检测系统的基础。

Three Factors to Improve Out-of-Distribution Detection

  • paper_url: http://arxiv.org/abs/2308.01030
  • repo_url: None
  • paper_authors: Hyunjun Choi, JaeHo Chung, Hawook Jeong, Jin Young Choi
  • for: 本研究旨在提高异常输入探测(OOD)的性能,通过使用辅助数据作为外围数据进行细化。
  • methods: 本研究使用了三个贡献来解决OOD探测与分类精度之间的负担:(i) incorporating自我智能填充损失可以提高网络的准确性;(ii) 采样 semi-hard 异常数据进行训练可以提高OOD探测性能,而无需影响分类精度;(iii) 我们提出了一种新的超vised Contrastive Learning,可以同时提高OOD探测性能和网络的准确性。
  • results: 我们的方法可以同时提高OOD探测性能和分类精度,解决了之前的负担。我们的方法与之前的方法相比,在两个性能指标上均有提高。
    Abstract In the problem of out-of-distribution (OOD) detection, the usage of auxiliary data as outlier data for fine-tuning has demonstrated encouraging performance. However, previous methods have suffered from a trade-off between classification accuracy (ACC) and OOD detection performance (AUROC, FPR, AUPR). To improve this trade-off, we make three contributions: (i) Incorporating a self-knowledge distillation loss can enhance the accuracy of the network; (ii) Sampling semi-hard outlier data for training can improve OOD detection performance with minimal impact on accuracy; (iii) The introduction of our novel supervised contrastive learning can simultaneously improve OOD detection performance and the accuracy of the network. By incorporating all three factors, our approach enhances both accuracy and OOD detection performance by addressing the trade-off between classification and OOD detection. Our method achieves improvements over previous approaches in both performance metrics.
    摘要 在 OUT-OF-DISTRIBUTION(OOD)检测问题中,启用辅助数据作为精度数据进行微调已经表现出了鼓舞人的效果。然而,先前的方法受到了准确率(ACC)和OOD检测性能(AUROC、FPR、AUPR)的负面交互。为了改善这种交互,我们提出了三种贡献:(i)添加自知ledge distillation损失可以提高网络的准确率;(ii)在训练中采用半硬分配的异常数据采样可以提高OOD检测性能,而无需影响准确率;(iii)我们提出的新的指导contrastive learning可以同时提高OOD检测性能和网络的准确率。通过涵盖所有这些因素,我们的方法可以同时提高准确率和OOD检测性能,解决了准确率和OOD检测性能之间的负面交互。我们的方法在先前的方法中均表现出了改善。

Enhancing Representation Learning for Periodic Time Series with Floss: A Frequency Domain Regularization Approach

  • paper_url: http://arxiv.org/abs/2308.01011
  • repo_url: https://github.com/agustdd/floss
  • paper_authors: Chunwei Yang, Xiaoxu Chen, Lijun Sun, Hongyu Yang, Yuankai Wu
  • for: 这篇论文的目的是提出一种无监督的方法,可以自动调整学习表现中的频率域表现,以提高深度学习模型在时间序列分析领域的表现。
  • methods: 这篇论文提出了一种名为Floss的方法,它可以自动检测时间序列中的主要频率,并使用频率域的periodic shift和spectral density similarity度量来学习有意义的表现。
  • results: 在实验中,Floss方法能够将时间序列分类、预测和异常探测等任务中的表现提高,并且可以与各种深度学习模型整合使用。
    Abstract Time series analysis is a fundamental task in various application domains, and deep learning approaches have demonstrated remarkable performance in this area. However, many real-world time series data exhibit significant periodic or quasi-periodic dynamics that are often not adequately captured by existing deep learning-based solutions. This results in an incomplete representation of the underlying dynamic behaviors of interest. To address this gap, we propose an unsupervised method called Floss that automatically regularizes learned representations in the frequency domain. The Floss method first automatically detects major periodicities from the time series. It then employs periodic shift and spectral density similarity measures to learn meaningful representations with periodic consistency. In addition, Floss can be easily incorporated into both supervised, semi-supervised, and unsupervised learning frameworks. We conduct extensive experiments on common time series classification, forecasting, and anomaly detection tasks to demonstrate the effectiveness of Floss. We incorporate Floss into several representative deep learning solutions to justify our design choices and demonstrate that it is capable of automatically discovering periodic dynamics and improving state-of-the-art deep learning models.
    摘要 时序分析是多个应用领域的基本任务,深度学习方法在这个领域表现出了惊人的表现。然而,许多实际世界时序数据表现出了显著的周期或几乎周期的动态行为,这些动态行为常常不受现有的深度学习基于解决方案完全捕捉。这导致了时序分析中的下降表示,从而影响了对真实的动态行为的理解。为解决这个问题,我们提出了一种不监督的方法called Floss,该方法可以自动规范学习的表示空间中的频率域。Floss方法首先自动检测时序数据中的主要周期性。然后,它使用周期偏移和频率密度相似度度量来学习具有周期一致性的有意义表示。此外,Floss可以轻松地在超级vised、半监督和无监督学习框架中 incorporated。我们对常见的时序分类、预测和异常检测任务进行了广泛的实验,以证明Floss的有效性。我们将Floss incorporated into several representative deep learning solutions to justify our design choices and demonstrate that it is capable of automatically discovering periodic dynamics and improving state-of-the-art deep learning models.

FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving

  • paper_url: http://arxiv.org/abs/2308.01006
  • repo_url: https://github.com/westlake-autolab/fusionad
  • paper_authors: Tengju Ye, Wei Jing, Chunyong Hu, Shikun Huang, Lingping Gao, Fangzhen Li, Jingke Wang, Ke Guo, Wencong Xiao, Weibo Mao, Hang Zheng, Kun Li, Junbo Chen, Kaicheng Yu
  • for: 这 paper 的目的是提出一个整合多种感知器的混合 neural network,以实现自适应驾驶任务中的准确和可靠性。
  • methods: 这 paper 使用了 transformer 基于的多模态混合网络,以生成高质量的混合特征。而不同于 camera-based end-to-end方法 UniAD,这 paper 采用了混合帮助的模态意识预测和状态意识规划模块,以便利用多模态特征。
  • results: 根据 nuScenes 数据集的广泛实验结果,这 paper 的 FusionAD 实现了state-of-the-art 性能,胜过基eline 的平均15% 在感知任务中,10% 在占用率预测精度中,从 ADE 分数下降至 0.389,并将预测错误率从 0.708 降至 0.12%。
    Abstract Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the information from two most critical sensors, camera and LiDAR, goes beyond perception task. Concretely, we first build a transformer based multi-modality fusion network to effectively produce fusion based features. In constrast to camera-based end-to-end method UniAD, we then establish a fusion aided modality-aware prediction and status-aware planning modules, dubbed FMSPnP that take advantages of multi-modality features. We conduct extensive experiments on commonly used benchmark nuScenes dataset, our FusionAD achieves state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score and reduces the collision rate from 0.31% to only 0.12%.
    摘要 建立多模态多任务神经网络,以实现自驾护航任务的准确和可靠性,是现代自驾护航技术的标准。然而,利用多感器数据进行共同优化预测和规划任务,仍然是未explored领域。本文提出了FusionAD,我们知道的首个整合框架,将Camera和LiDAR两种最重要的感器信息融合到一起,不仅超越了感知任务,还实现了预测和规划任务的协同优化。具体来说,我们首先构建了基于变换器的多模态融合网络,以生成高质量的融合特征。与UniAD相比,我们then建立了融合帮助模块和状态意识规划模块,通过多模态特征来优化预测和规划任务。我们在常用的nuScenes数据集上进行了广泛的实验,并证明了FusionAD可以在感知任务中表现出状元的性能,比如检测和跟踪任务的准确率提高了15%,占用率预测精度提高了10%,从0.708下降到0.389的ADE分数中的预测错误率下降了15%,并将collision rate从0.31%下降到0.12%。

Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.00989
  • repo_url: None
  • paper_authors: Haorui Li, Jiaqi Liang, Linjing Li, Daniel Zeng
  • for: 用于解决复杂任务的 hierarchical reinforcement learning composites subpolicies。
  • methods: 使用自动发现的 subpolicies,并使用 Wasserstein Diversity-Enriched Regularizer(WDER)来提高性能。
  • results: 实验结果表明,WDER 可以提高性能和样本效率,无需修改超参数。
    Abstract Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks.Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies.However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further.Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.
    摘要 Hierarchical reinforcement learning 组合不同层次的子政策来完成复杂任务。自动发现子政策的方法,不依赖域知识,是现有方法中最佳的approach。然而,劣化问题是现有方法难以处理的一大挑战,这是因为现有方法缺乏多样性或者使用弱regularizers。在这篇论文中,我们提出了一种新的任务无关的正则化器 called Wasserstein Diversity-Enriched Regularizer (WDER),它通过 maximizing Wasserstein distances among action distributions来扩大子政策的多样性。我们的WDER可以轻松地integrated into existing methods的损失函数中,以提高其性能。实验结果表明,我们的WDER可以提高性能和样本效率,而不需要修改 гиперпараметров,这表明了WDER的可用性和稳定性。

Knowledge-aware Collaborative Filtering with Pre-trained Language Model for Personalized Review-based Rating Prediction

  • paper_url: http://arxiv.org/abs/2308.02555
  • repo_url: https://github.com/wqxleo/kcf-plm
  • paper_authors: Quanxiu Wang, Xinlei Cao, Jianyong Wang, Wei Zhang
  • for: 这篇论文的目的是如何利用现有的评论来预测用户对Item的评分?
  • methods: 该论文提出了一种名为知识感知协同过滤(KCF-PLM)的方法,该方法通过模型用户-Item对的交互,并利用预训练的自然语言模型来更好地表示用户和Item的特征。
  • results: 经过实验表明,KCF-PLM可以更好地预测用户对Item的评分,并且在多个公共数据集上达到了比较高的预测精度。
    Abstract Personalized review-based rating prediction aims at leveraging existing reviews to model user interests and item characteristics for rating prediction. Most of the existing studies mainly encounter two issues. First, the rich knowledge contained in the fine-grained aspects of each review and the knowledge graph is rarely considered to complement the pure text for better modeling user-item interactions. Second, the power of pre-trained language models is not carefully studied for personalized review-based rating prediction. To address these issues, we propose an approach named Knowledge-aware Collaborative Filtering with Pre-trained Language Model (KCF-PLM). For the first issue, to utilize rich knowledge, KCF-PLM develops a transformer network to model the interactions of the extracted aspects w.r.t. a user-item pair. For the second issue, to better represent users and items, KCF-PLM takes all the historical reviews of a user or an item as input to pre-trained language models. Moreover, KCF-PLM integrates the transformer network and the pre-trained language models through representation propagation on the knowledge graph and user-item guided attention of the aspect representations. Thus KCF-PLM combines review text, aspect, knowledge graph, and pre-trained language models together for review-based rating prediction. We conduct comprehensive experiments on several public datasets, demonstrating the effectiveness of KCF-PLM.
    摘要 personalized review-based rating prediction aims to leverage existing reviews to model user interests and item characteristics for rating prediction. most of the existing studies mainly encounter two issues. first, the rich knowledge contained in the fine-grained aspects of each review and the knowledge graph is rarely considered to complement the pure text for better modeling user-item interactions. second, the power of pre-trained language models is not carefully studied for personalized review-based rating prediction. to address these issues, we propose an approach named knowledge-aware collaborative filtering with pre-trained language model (KCF-PLM). for the first issue, to utilize rich knowledge, KCF-PLM develops a transformer network to model the interactions of the extracted aspects w.r.t. a user-item pair. for the second issue, to better represent users and items, KCF-PLM takes all the historical reviews of a user or an item as input to pre-trained language models. moreover, KCF-PLM integrates the transformer network and the pre-trained language models through representation propagation on the knowledge graph and user-item guided attention of the aspect representations. thus KCF-PLM combines review text, aspect, knowledge graph, and pre-trained language models together for review-based rating prediction. we conduct comprehensive experiments on several public datasets, demonstrating the effectiveness of KCF-PLM.

Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks

  • paper_url: http://arxiv.org/abs/2308.00958
  • repo_url: https://github.com/dig-beihang/ini-model-stealing-defense
  • paper_authors: Jun Guo, Aishan Liu, Xingyu Zheng, Siyuan Liang, Yisong Xiao, Yichao Wu, Xianglong Liu
    for:这篇论文的目的是提出一个名为“Isolation and Induction”(InI)的新型训练框架,以提高机器学习模型的防护性。methods:这篇论文使用了一种名为“adversarial training”的方法,将敌对的训练gradient( gradient)与预期的gradient分离,从而减少了误差 Computational overhead。此外,它还使用了一种名为“induction”的方法,将敌对的训练gradient与预期的gradient分离,以生成不具有实际价值的输出,以防止敌对者获得有用的信息。results:实验结果显示,InI可以对机器学习模型进行有效的防护,将敌对者的侦测精度降低至48%。此外,InI还可以实现较高的速度(至25.4倍),比其他现有方法更具有实际价值。
    Abstract Despite the broad application of Machine Learning models as a Service (MLaaS), they are vulnerable to model stealing attacks. These attacks can replicate the model functionality by using the black-box query process without any prior knowledge of the target victim model. Existing stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. However, these defenses are now suffering problems of high inference computational overheads and unfavorable trade-offs between benign accuracy and stealing robustness, which challenges the feasibility of deployed models in practice. To address the problems, this paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. Instead of deploying auxiliary defense modules that introduce redundant inference time, InI directly trains a defensive model by isolating the adversary's training gradient from the expected gradient, which can effectively reduce the inference computational cost. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries, which can induce the adversary to extract little useful knowledge from victim models with minimal impact on the benign performance. Extensive experiments on several visual classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x faster) of our InI over other state-of-the-art methods. Our codes can be found in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.
    摘要 尽管机器学习模型作为服务(MLaaS)广泛应用,但它们受到模型盗用攻击的威胁。这些攻击可以通过黑盒查询过程来复制模型功能,无需任何受 target 受害者模型的知识。现有的防御措施添加了对受害者的 posterior 概率中的误导性扰响,但这些防御措施现在面临高于ferred Computational overhead和不利的负荷和盗用鲁棒性的问题,这担困了部署模型的实际应用。为解决这些问题,本文提出了隔离和推论(InI),一种新的和有效的训练框架 для模型盗用防御。而不是在auxiliary defense module中添加 redundancy 的推理时间,InI直接在针对敌方的训练梯度上进行隔离,可以有效减少推理 Computational overhead。与之前添加扰响到模型预测结果的方法不同,我们在模型生成不具有指导意义的输出时进行训练,以便使敌方提取到Model 中的有用信息非常少,对正常性产生最小的影响。经验表明,我们的 InI 在多个视觉分类 datasets(例如 MNIST 和 CIFAR10)上具有较高的鲁棒性(减少盗用精度48%)和速度(减少推理时间25.4倍)优势,代码可以在 找到。

From Sparse to Soft Mixtures of Experts

  • paper_url: http://arxiv.org/abs/2308.00951
  • repo_url: https://github.com/google-research/vmoe
  • paper_authors: Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Neil Houlsby
  • for: 这篇论文旨在提出一种基于 sparse mixture of expert(MoE)的完全可导的Transformer模型,以解决MoE模型在训练稳定性、选择token、不能扩展专家数量等方面存在的挑战。
  • methods: 这篇论文提出了一种名为Soft MoE的新方法,它使用了不同权重的输入token将所有输入token映射到每个专家中,以实现隐藏层的软分配。这种方法可以保持MoE模型的好处,同时解决许多问题。
  • results: 在视觉识别任务上,Soft MoE模型在与标准Transformer(ViT)和受欢迎的MoE变体(Tokens Choice和Experts Choice)进行比较时,具有更高的性能。例如,Soft MoE-Base/16需要10.5倍少的执行成本(5.7倍少的墙 clock时间)与ViT-Huge/14匹配其性能,而且可以轻松扩展。
    Abstract Sparse mixture of expert architectures (MoEs) scale model capacity without large increases in training or inference costs. Despite their success, MoEs suffer from a number of issues: training instability, token dropping, inability to scale the number of experts, or ineffective finetuning. In this work, we proposeSoft MoE, a fully-differentiable sparse Transformer that addresses these challenges, while maintaining the benefits of MoEs. Soft MoE performs an implicit soft assignment by passing different weighted combinations of all input tokens to each expert. As in other MoE works, experts in Soft MoE only process a subset of the (combined) tokens, enabling larger model capacity at lower inference cost. In the context of visual recognition, Soft MoE greatly outperforms standard Transformers (ViTs) and popular MoE variants (Tokens Choice and Experts Choice). For example, Soft MoE-Base/16 requires 10.5x lower inference cost (5.7x lower wall-clock time) than ViT-Huge/14 while matching its performance after similar training. Soft MoE also scales well: Soft MoE Huge/14 with 128 experts in 16 MoE layers has over 40x more parameters than ViT Huge/14, while inference time cost grows by only 2%, and it performs substantially better.
    摘要 《稀疏混合专家架构(MoE)缩放模型容量无需大幅提高训练或推理成本。尽管它们取得了成功,但MoE受到许多问题的困扰:训练不稳定、掉 Token、不能扩展专家数量以及不良的微调。在这项工作中,我们提出了Soft MoE,一种完全可导的稀疏转换器,解决了这些挑战,同时保持了MoE的优点。Soft MoE通过不同权重的 combinations来进行隐式软分配,将所有输入 tokens 传递给每个专家。与其他 MoE 工作相同,Soft MoE 专家只处理部分(组合)输入 tokens,允许更大的模型容量在更低的推理成本下运行。在视识知识领域,Soft MoE 在标准 Transformer 和受欢迎的 MoE 变体(Token Choice 和 Experts Choice)的基础上取得了很大的进步。例如,Soft MoE-Base/16 需要10.5倍lower的推理成本(5.7倍lower的墙 clock time),与 ViT-Huge/14 的性能相同。Soft MoE 也可扩展:Soft MoE Huge/14 WITH 128 专家 IN 16 MoE layers 有40倍更多的参数 чем ViT Huge/14,而推理时间成本只增加了2%,并且表现较好。》

Teaching Smaller Language Models To Generalise To Unseen Compositional Questions

  • paper_url: http://arxiv.org/abs/2308.00946
  • repo_url: https://github.com/timhartill/unseen_questions
  • paper_authors: Tim Hartill, Neset Tan, Michael Witbrock, Patricia J. Riddle
  • for: 本研究旨在将小型语言模型扩展到解答困难的作 compositional questions,不需要训练时见到的问题。
  • methods: 本研究使用多任务超级预训练和紧密搜寻系统,以将多元的推理能力传递给模型。
  • results: 本研究在多个评估数据集(StrategyQA、CommonsenseQA、IIRC、DROP、Musique和ARC-DA)上建立强大的基准值,并证明在对于问题的解答中,可以通过增加对于问题的数据库进行搜寻,以提高模型的性能。
    Abstract We equip a smaller Language Model to generalise to answering challenging compositional questions that have not been seen in training. To do so we propose a combination of multitask supervised pretraining on up to 93 tasks designed to instill diverse reasoning abilities, and a dense retrieval system that aims to retrieve a set of evidential paragraph fragments. Recent progress in question-answering has been achieved either through prompting methods against very large pretrained Language Models in zero or few-shot fashion, or by fine-tuning smaller models, sometimes in conjunction with information retrieval. We focus on the less explored question of the extent to which zero-shot generalisation can be enabled in smaller models with retrieval against a corpus within which sufficient information to answer a particular question may not exist. We establish strong baselines in this setting for diverse evaluation datasets (StrategyQA, CommonsenseQA, IIRC, DROP, Musique and ARC-DA), and show that performance can be significantly improved by adding retrieval-augmented training datasets which are designed to expose our models to a variety of heuristic reasoning strategies such as weighing partial evidence or ignoring an irrelevant context.
    摘要 我们将一个较小的语言模型训练来综合回答具有复杂构成的问题,这些问题在训练过程中没有出现过。我们提出了一种结合多任务超级预训和紧凑搜寻系统的方法,以实现对答案问题的扩展。现今问题回答的进步主要是通过对非常大的预训语言模型进行提示方法,或是精确地训练较小的模型。我们专注在较少探索的问题是,可以使用较小的模型和搜寻系统,对于尚未出现在训练数据中的问题进行零shot扩展。我们在不同的评估数据集(StrategyQA、CommonSenseQA、IIRC、DROP、Musique和ARC-DA)中建立了强大的基准,并证明可以通过增加搜寻增强训练数据集,让我们的模型掌握了复杂的推理策略,例如考虑部分证据或忽略无关的背景。

Feature-aware conditional GAN for category text generation

  • paper_url: http://arxiv.org/abs/2308.00939
  • repo_url: None
  • paper_authors: Xinze Li, Kezhi Mao, Fanfan Lin, Zijian Feng
  • for: 本文提出了一种新的文本生成框架,即特征意识 conditional GAN(FA-GAN),用于控制类别文本生成。
  • methods: FA-GAN使用了一种序列到序列结构的生成器,包括三个Encoder和一个基于 Relational Memory Core 的 decoder,并在 adversarial 训练中添加了多类分类损失函数。
  • results: 对于6种文本分类 datasets,FA-GAN consistent outperform 10 状态之前的文本生成方法,并在实际案例中证明了生成的 sintetic 句子可以匹配所需的类别,同时具有良好的可读性、流畅性和文本Authenticity。
    Abstract Category text generation receives considerable attentions since it is beneficial for various natural language processing tasks. Recently, the generative adversarial network (GAN) has attained promising performance in text generation, attributed to its adversarial training process. However, there are several issues in text GANs, including discreteness, training instability, mode collapse, lack of diversity and controllability etc. To address these issues, this paper proposes a novel GAN framework, the feature-aware conditional GAN (FA-GAN), for controllable category text generation. In FA-GAN, the generator has a sequence-to-sequence structure for improving sentence diversity, which consists of three encoders including a special feature-aware encoder and a category-aware encoder, and one relational-memory-core-based decoder with the Gumbel SoftMax activation function. The discriminator has an additional category classification head. To generate sentences with specified categories, the multi-class classification loss is supplemented in the adversarial training. Comprehensive experiments have been conducted, and the results show that FA-GAN consistently outperforms 10 state-of-the-art text generation approaches on 6 text classification datasets. The case study demonstrates that the synthetic sentences generated by FA-GAN can match the required categories and are aware of the features of conditioned sentences, with good readability, fluency, and text authenticity.
    摘要 文本生成领域在latest yearsreceived considerable attention, as it is beneficial for various自然语言处理任务。Recently, the generative adversarial network (GAN) has shown promising performance in text generation, thanks to its adversarial training process. However, there are several issues in text GANs, including discrete, training instability, mode collapse, lack of diversity and controllability, etc. To address these issues, this paper proposes a novel GAN framework, the feature-aware conditional GAN (FA-GAN), for controllable category text generation. In FA-GAN, the generator has a sequence-to-sequence structure to improve sentence diversity, which consists of three encoders, including a special feature-aware encoder and a category-aware encoder, and one relational-memory-core-based decoder with the Gumbel SoftMax activation function. The discriminator has an additional category classification head. To generate sentences with specified categories, the multi-class classification loss is supplemented in the adversarial training. Comprehensive experiments have been conducted, and the results show that FA-GAN consistently outperforms 10 state-of-the-art text generation approaches on 6 text classification datasets. The case study demonstrates that the synthetic sentences generated by FA-GAN can match the required categories and are aware of the features of conditioned sentences, with good readability, fluency, and text authenticity.

LEMMA: Learning Language-Conditioned Multi-Robot Manipulation

  • paper_url: http://arxiv.org/abs/2308.00937
  • repo_url: None
  • paper_authors: Ran Gong, Xiaofeng Gao, Qiaozi Gao, Suhaila Shakiah, Govind Thattai, Gaurav S. Sukhatme
  • for: 本研究是为了开发未来基于人语言指令的多机器人系统。
  • methods: 本研究使用了模块化层次规划方法作为基础。
  • results: 研究结果表明了LEMMA的潜在用于发展未来多机器人系统。Here’s a more detailed explanation of each point:1. for: The paper is written to develop future language-conditioned multi-robot systems. The authors introduce a new benchmark called LEMMA, which focuses on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting.2. methods: The authors propose a modular hierarchical planning approach as a baseline for addressing the challenges of task allocation and strong temporal dependencies in each task. This approach is designed to identify each manipulator’s limitations and assign sub-tasks accordingly.3. results: The results of the study highlight the potential of LEMMA for developing future language-conditioned multi-robot systems. The authors provide 800 expert demonstrations and human instructions for training and evaluations, and the results show that the proposed approach is effective in handling complex manipulation tasks.
    Abstract Complex manipulation tasks often require robots with complementary capabilities to collaborate. We introduce a benchmark for LanguagE-Conditioned Multi-robot MAnipulation (LEMMA) focused on task allocation and long-horizon object manipulation based on human language instructions in a tabletop setting. LEMMA features 8 types of procedurally generated tasks with varying degree of complexity, some of which require the robots to use tools and pass tools to each other. For each task, we provide 800 expert demonstrations and human instructions for training and evaluations. LEMMA poses greater challenges compared to existing benchmarks, as it requires the system to identify each manipulator's limitations and assign sub-tasks accordingly while also handling strong temporal dependencies in each task. To address these challenges, we propose a modular hierarchical planning approach as a baseline. Our results highlight the potential of LEMMA for developing future language-conditioned multi-robot systems.
    摘要 多元化任务需要机器人团队协作,我们介绍了一个名为LanguagE-Conditioned Multi-robot MAnipulation(LEMMA)的标准,专注于基于人类语言指令的表格式设置中的任务分配和长期物品搬运。LEMMA包含8种生成过程中的任务,其中一些需要机器人使用工具并将工具传递给彼此。每个任务都有800名专家示范和人类指导用于训练和评估。LEMMA比现有的标准具有更大的挑战,因为它需要系统确定每个搬运者的局限性并将相应的子任务分配给它们,同时也处理每个任务中的强时间依赖关系。为解决这些挑战,我们提议了一种模块化层次规划方法作为基础。我们的结果表明LEMMA有助于未来的语言条件多机器人系统的发展。

Particle swarm optimization with state-based adaptive velocity limit strategy

  • paper_url: http://arxiv.org/abs/2308.00936
  • repo_url: None
  • paper_authors: Xinze Li, Kezhi Mao, Fanfan Lin, Xin Zhang
  • For: The paper proposes a novel particle swarm optimization (PSO) variant with a state-based adaptive velocity limit (SAVL) strategy to improve the performance of PSO in optimizing problems.* Methods: The proposed PSO-SAVL uses an evolutionary state estimation (ESE) to adaptively adjust the velocity limit based on the current searching state of particles. The limit handling strategies have been modified and adopted to improve the capability of avoiding local optima.* Results: The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions, and the satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems has been verified. The merits of the strategies in PSO-SAVL have been experimentally demonstrated.Here is the same information in Simplified Chinese text:* For: 本文提出了一种基于状态adaptive速度限制策略(SAVL)的 particle swarm optimization(PSO)变种,以提高PSO在优化问题中的表现。* Methods: PSO-SAVL使用了进化状态估计(ESE)来适应性地调整速度限制,以适应当前粒子的搜索状态。限制处理策略也被修改和采用,以提高避免地点最优化的能力。* Results: PSO-SAVL在50维度的标准函数库上进行了广泛的实验 validate,并在高维度和大规模问题中得到了满意的扩展性。PSO-SAVL的策略优势也在实验中得到了证明。
    Abstract Velocity limit (VL) has been widely adopted in many variants of particle swarm optimization (PSO) to prevent particles from searching outside the solution space. Several adaptive VL strategies have been introduced with which the performance of PSO can be improved. However, the existing adaptive VL strategies simply adjust their VL based on iterations, leading to unsatisfactory optimization results because of the incompatibility between VL and the current searching state of particles. To deal with this problem, a novel PSO variant with state-based adaptive velocity limit strategy (PSO-SAVL) is proposed. In the proposed PSO-SAVL, VL is adaptively adjusted based on the evolutionary state estimation (ESE) in which a high value of VL is set for global searching state and a low value of VL is set for local searching state. Besides that, limit handling strategies have been modified and adopted to improve the capability of avoiding local optima. The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions. The satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems is also verified. Besides, the merits of the strategies in PSO-SAVL are verified in experiments. Sensitivity analysis for the relevant hyper-parameters in state-based adaptive VL strategy is conducted, and insights in how to select these hyper-parameters are also discussed.
    摘要 Particle swarm optimization (PSO) 的 velocity limit (VL) 已经广泛应用在许多变种中,以避免粒子寻找外部解空间。但现有的适应VL策略仅根据迭代数量进行调整,导致优化结果不 satisfactory,因为VL 与粒子当前搜索状态不兼容。为解决这个问题,一种基于Evolutionary State Estimation (ESE)的PSO变体(PSO-SAVL)被提出。在PSO-SAVL中,VL 的适应调整基于ESE,其中在全球搜索状态时设置高值的VL,在本地搜索状态时设置低值的VL。此外,限制处理策略也得到了修改和采用,以提高避免本地极点的能力。PSO-SAVL的性能在50维度的 benchmark 函数上得到了实验验证,并且在高维度和大规模问题中也验证了可扩展性。此外,PSO-SAVL中的策略优势也得到了实验验证。在ESE基于的适应VL策略中,对相关的 гипер参数的敏感分析也进行了,并对如何选择这些 гипер参数进行了讨论。

Physics-informed neural networks for blood flow inverse problems

  • paper_url: http://arxiv.org/abs/2308.00927
  • repo_url: https://github.com/yeyemedicen/pinns-wk-mri
  • paper_authors: Jeremias Garay, Jocelyn Dunstan, Sergio Uribe, Francisco Sahli Costabal
  • for: 解决各种不完整的系统信息和血流测量不易获得的逆问题,特别是血液动力学中精度很高的血流场测量困难。
  • methods: 使用物理学习神经网络(PINNs)方法,通过受限的血流场测量来估算系统参数和全 Velocity 场。
  • results: 使用 simulate 数据显示了稳定和准确的参数估算,而 Velocity 重建结果受测量质量和流动模式复杂度的影响。
    Abstract Physics-informed neural networks (PINNs) have emerged as a powerful tool for solving inverse problems, especially in cases where no complete information about the system is known and scatter measurements are available. This is especially useful in hemodynamics since the boundary information is often difficult to model, and high-quality blood flow measurements are generally hard to obtain. In this work, we use the PINNs methodology for estimating reduced-order model parameters and the full velocity field from scatter 2D noisy measurements in the ascending aorta. The results show stable and accurate parameter estimations when using the method with simulated data, while the velocity reconstruction shows dependence on the measurement quality and the flow pattern complexity. The method allows for solving clinical-relevant inverse problems in hemodynamics and complex coupled physical systems.
    摘要 物理学 Informed neural networks (PINNs) 已经成为解决反向问题的有力工具,特别是当系统中的信息不完整或者杂谱测量数据存在时。这对血液动力学 particuilarly useful,因为边界信息往往难以模型,高质量血液流量测量很难实现。在这种情况下,我们使用 PINNs 方法来估算减少的模型参数和全部流速场从杂谱2D 雷达测量数据中。结果显示使用这种方法时,稳定且准确地估算参数,而流速重建受测量质量和流动模式复杂度的影响。这种方法可以解决临床有用的反向问题和复杂相互作用的物理系统。

VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference

  • paper_url: http://arxiv.org/abs/2308.00904
  • repo_url: None
  • paper_authors: Yonghe Zhao, Qiang Huang, Siwei Wu, Yun Peng, Huiyan Sun
  • For: The paper focuses on the challenge of de-confounding and counterfactual prediction in observational data, particularly in the presence of unobserved confounders. It proposes a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI).* Methods: VLUCI relaxes the unconfoundedness assumption often made in causal inference methods and disentangles observed and unobserved confounders. It uses a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes.* Results: Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI’s superior performance in inferring unobserved confounders compared to state-of-the-art counterfactual inference models. VLUCI also provides confidence intervals for counterfactual outcomes, which can aid decision-making in risk-sensitive domains.Here’s the same information in Simplified Chinese:* For: 该论文关注了在观察数据中的权衡和Counterfactual预测问题,尤其是在存在隐藏的干扰因素的情况下。它提出了一种基于变量学习的隐藏干扰因素counterfactual推断模型(VLUCI)。* Methods: VLUCI弃置了常见的隐藏干扰因素假设,并将观察和隐藏干扰因素分离开来。它使用了一种双变量推断模型来估算隐藏干扰因素的 posterior distribution,并将其用于更准确地预测Counterfactual结果。* Results: 对于 synthetic和半 synthetic 数据集,VLUCI的实验结果表明,它在推断隐藏干扰因素方面表现出色,与当前的Counterfactual推断模型相比。VLUCI还提供了Counterfactual结果的置信区间,可以帮助在风险敏感领域做出决策。
    Abstract Causal inference plays a vital role in diverse domains like epidemiology, healthcare, and economics. De-confounding and counterfactual prediction in observational data has emerged as a prominent concern in causal inference research. While existing models tackle observed confounders, the presence of unobserved confounders remains a significant challenge, distorting causal inference and impacting counterfactual outcome accuracy. To address this, we propose a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI), which generates the posterior distribution of unobserved confounders. VLUCI relaxes the unconfoundedness assumption often overlooked by most causal inference methods. By disentangling observed and unobserved confounders, VLUCI constructs a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes. Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI's superior performance in inferring unobserved confounders. It is compatible with state-of-the-art counterfactual inference models, significantly improving inference accuracy at both group and individual levels. Additionally, VLUCI provides confidence intervals for counterfactual outcomes, aiding decision-making in risk-sensitive domains. We further clarify the considerations when applying VLUCI to cases where unobserved confounders don't strictly conform to our model assumptions using the public IHDP dataset as an example, highlighting the practical advantages of VLUCI.
    摘要 causal inference在多个领域中扮演着重要的角色,如 epidemiology、医疗和经济学。在观察数据中,去掉和预测counterfactual outcome是 causal inference研究中的一项显著挑战。而现有的模型只能处理观察到的干扰因素,不能正确地处理未观察到的干扰因素,这会扭曲 causal inference 和降低 counterfactual outcome 的准确性。为解决这个问题,我们提出了一种新的变量学习模型,即 unobserved confounders variational learning model (VLUCI),它可以生成未观察到的干扰因素的 posterior distribution。VLUCI 采用了一种放弃了 unconfoundedness 假设的方法,从而更好地处理未观察到的干扰因素。通过分离观察到的干扰因素和未观察到的干扰因素,VLUCI 构建了一个 doubly variational inference 模型,用于近似未观察到的干扰因素的分布,并用这些分布来预测更加准确的 counterfactual outcomes。我们在 sintetic 和 semi-syntetic 数据集上进行了广泛的实验,并证明了 VLUCI 在推断未观察到的干扰因素方面的优秀表现。此外,VLUCI 与当前的 counterfactual inference 模型相容,可以在组织和个体水平上提高推断准确性。同时,VLUCI 还提供了对 counterfactual outcomes 的信任 интерVAL,帮助在风险敏感领域做出决策。我们还在公共 IHDP 数据集上进行了实践推断,并通过 illustrate 了在实际应用中的考虑因素,展示了 VLUCI 的实用优势。

Enhancing Machine Learning Performance with Continuous In-Session Ground Truth Scores: Pilot Study on Objective Skeletal Muscle Pain Intensity Prediction

  • paper_url: http://arxiv.org/abs/2308.00886
  • repo_url: None
  • paper_authors: Boluwatife E. Faremi, Jonathon Stavres, Nuno Oliveira, Zhaoxian Zhou, Andrew H. Sung
    for:这个研究的目的是为了开发一种可以实时、连续地评估疼痛Intensity的设备,并使用机器学习模型来对疼痛进行分类。methods:这个研究使用了两种设备来获取实时、连续的疼痛Score,并使用机器学习模型来对疼痛进行分类。这些模型包括多层感知机(MLP)和随机森林(RF)。results:研究发现,使用实时、连续的疼痛Score可以提高机器学习模型对疼痛的分类性能,比使用后期录入的疼痛Score更高。具体来说,使用实时、连续的疼痛Score可以提高模型的平均准确率达75.9%和78.3%,而使用后期录入的疼痛Score只能达70.3%和74.6%。这个研究提供了一种新的方法,可以帮助解决疼痛分类中的问题,例如真实性issue、数据不均衡和高方差。
    Abstract Machine learning (ML) models trained on subjective self-report scores struggle to objectively classify pain accurately due to the significant variance between real-time pain experiences and recorded scores afterwards. This study developed two devices for acquisition of real-time, continuous in-session pain scores and gathering of ANS-modulated endodermal activity (EDA).The experiment recruited N = 24 subjects who underwent a post-exercise circulatory occlusion (PECO) with stretch, inducing discomfort. Subject data were stored in a custom pain platform, facilitating extraction of time-domain EDA features and in-session ground truth scores. Moreover, post-experiment visual analog scale (VAS) scores were collected from each subject. Machine learning models, namely Multi-layer Perceptron (MLP) and Random Forest (RF), were trained using corresponding objective EDA features combined with in-session scores and post-session scores, respectively. Over a 10-fold cross-validation, the macro-averaged geometric mean score revealed MLP and RF models trained with objective EDA features and in-session scores achieved superior performance (75.9% and 78.3%) compared to models trained with post-session scores (70.3% and 74.6%) respectively. This pioneering study demonstrates that using continuous in-session ground truth scores significantly enhances ML performance in pain intensity characterization, overcoming ground truth sparsity-related issues, data imbalance, and high variance. This study informs future objective-based ML pain system training.
    摘要 机器学习(ML)模型在主观自报分数上训练时,很难准确地分类疼痛,因为真实时间疼痛经验和记录后分数之间存在很大的变化。这项研究开发了两种设备,用于实时、连续式疼痛分数的获取和胰脏活动(EDA)的捕获。实验采用N = 24名参与者,进行了后遮挡(PECO)和压缩,引起不适。参与者数据被存储在自定义疼痛平台上,以便提取时间域EDA特征和实时真实分数。此外,每名参与者也提供了后实验Visual Analog Scale(VAS)分数。使用对应的物理层拟合(MLP)和随机森林(RF)机器学习模型,并对它们进行10个拆分验证。结果表明,MLP和RF模型通过与实时EDA特征和实时分数进行组合训练,在10个拆分验证中表现出色(75.9%和78.3%),比使用后实验VAS分数进行训练的模型(70.3%和74.6%)高得多。这项先锋性的研究表明,使用连续实时真实分数可以大幅提高ML在疼痛Intensity Characterization中的表现,超越真实分数稀缺、数据不均衡和高变化问题。这项研究对未来基于Objective的ML疼痛系统训练提供了重要信息。

Beneficent Intelligence: A Capability Approach to Modeling Benefit, Assistance, and Associated Moral Failures through AI Systems

  • paper_url: http://arxiv.org/abs/2308.00868
  • repo_url: None
  • paper_authors: Alex John London, Hoda heidari
  • for: 本文使用尼钦和泽妮的能力方法 formalizes a network of ethical concepts and entitlements necessary for AI systems to confer meaningful benefit or assistance to stakeholders.
  • methods: 本文使用尼钦和泽妮的能力方法 to characterize two necessary conditions for morally permissible interactions between AI systems and those impacted by their functioning, and two sufficient conditions for realizing the ideal of meaningful benefit.
  • results: 本文证明了AI系统与利益受者之间的互动满足了两个必要条件,以及两个足够条件,以实现意义fulfillment的理想。同时,文章还描述了several salient failure modes, such as unjustified paternalism, coercion, deception, exploitation, and domination.
    Abstract The prevailing discourse around AI ethics lacks the language and formalism necessary to capture the diverse ethical concerns that emerge when AI systems interact with individuals. Drawing on Sen and Nussbaum's capability approach, we present a framework formalizing a network of ethical concepts and entitlements necessary for AI systems to confer meaningful benefit or assistance to stakeholders. Such systems enhance stakeholders' ability to advance their life plans and well-being while upholding their fundamental rights. We characterize two necessary conditions for morally permissible interactions between AI systems and those impacted by their functioning, and two sufficient conditions for realizing the ideal of meaningful benefit. We then contrast this ideal with several salient failure modes, namely, forms of social interactions that constitute unjustified paternalism, coercion, deception, exploitation and domination. The proliferation of incidents involving AI in high-stakes domains underscores the gravity of these issues and the imperative to take an ethics-led approach to AI systems from their inception.
    摘要 现存的AI伦理报告缺乏能够捕捉AI系统与个人之间多样化伦理问题的语言和形式主义。基于尼钦和恩素的能力方法,我们提出了一个框架,它将定义AI系统与利益所有者之间的伦理概念和权利网络,以确保AI系统对利益所有者 confer meaningful benefit或帮助。这些系统可以提高利益所有者的生活计划和 благополучия,同时尊重其基本权利。我们确定了AI系统与利益所有者之间的两个必要条件,以及实现意义ful benefit的两个 suficient condition。然后,我们对这一理想与一些常见的失败模式进行了对比,包括不公正的父权主义、压力、骗子、掠夺和支配。随着AI在高风险领域的普及,这些问题的严重性和采取伦理领导的AI系统的必要性变得更加明显。

PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

  • paper_url: http://arxiv.org/abs/2308.00864
  • repo_url: None
  • paper_authors: Aamir Hasan, Neeloy Chakraborty, Haonan Chen, Jung-Hoon Cho, Cathy Wu, Katherine Driggs-Campbell
  • for: 这篇论文旨在提出一个基于 Piecewise Constant (PC) 政策的合作性建议系统,以减少城市路段的拥堵。
  • methods: 本论文使用了一种基于 variational autoencoder 的不监督学习方法来推断 drivers 的内在特征,然后使用这些特征来构成一个Personalized Residual Policy (PeRP),以提供对 drivers 的个性化建议。
  • results: 本论文的结果显示,这个方法可以成功地减少城市路段的拥堵,并且适应不同的 driver 行为,提高了平均速度的效率,相比基于eline 的方法,提高了4%到22%。
    Abstract Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver's intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.
    摘要 智能驾驶系统可以减轻交通压力,提高多种社会经济因素,如通勤时间和油耗成本。然而,这些系统假设自动驾驶车辆队伍的精准控制,因此在实践中有限制,因为它们不能考虑人类行为的不确定性。 Piecewise Constant(PC)政策可以解决这些问题,通过结构化模型人类驾驶行为,以减少笔记压力,提供行为建议,使人类 drivers 遵循。然而,PC 政策假设所有 drivers 都会类似行为。为此,我们开发了一种合作建议系统,基于 PC 政策,并使用一种 novel 的 Driver Trait 受控 Personalized Residual Policy(PeRP)。PeRP 建议 drivers 采取降低交通压力的行为。我们首先通过不监督的方式,使用变量自动编码器,推断 driver 的内在特征,然后,根据推断的特征,condition 政策,以提供个性化建议。我们的系统在模拟环境中训练,并使用新型的 driver 模型来评估 adherence 指标。我们的方法成功地减轻交通压力,适应不同的 driver 行为,相比基准值,提高了4%-22%的平均速度。

Active Inference in String Diagrams: A Categorical Account of Predictive Processing and Free Energy

  • paper_url: http://arxiv.org/abs/2308.00861
  • repo_url: None
  • paper_authors: Sean Tull, Johannes Kleiner, Toby St Clere Smithe
  • for: 提供一种Category的表述方法,用于表示Predictive Processing和Active Inference的认知框架。
  • methods: 使用字符串 диаграмм表示Generative Model、Bayesian更新、感知、规划、活动推理和自由能量。
  • results: 提出一种Diagrammatic derivation of the formula for active inference via free energy minimization,并证明自由能量可以在任何 Agent 的生成模型中应用。
    Abstract We present a categorical formulation of the cognitive frameworks of Predictive Processing and Active Inference, expressed in terms of string diagrams interpreted in a monoidal category with copying and discarding. This includes diagrammatic accounts of generative models, Bayesian updating, perception, planning, active inference, and free energy. In particular we present a diagrammatic derivation of the formula for active inference via free energy minimisation, and establish a compositionality property for free energy, allowing free energy to be applied at all levels of an agent's generative model. Aside from aiming to provide a helpful graphical language for those familiar with active inference, we conversely hope that this article may provide a concise formulation and introduction to the framework.
    摘要 我们提出了一种分类的框架表述,表示预测处理和活跃推理的 cognitive 框架,用 string diagram 在一个单簇类别中表示,这包括生成模型、贝叶斯更新、感知、规划、活跃推理和自由能。特别是,我们提供了一个 diagrammatic 的更新方法,以及证明了 free energy 的合理性和可composability,允许 free energy 在生成模型中的所有层级上适用。除了将 active inference 表示为一个帮助 grafical 语言,我们希望这篇文章能够提供一个简洁的框架表述,并且对于 familiar 的人来说,提供一个帮助他们更好地理解这个框架。

Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic Processes

  • paper_url: http://arxiv.org/abs/2308.00858
  • repo_url: None
  • paper_authors: Stephan Johann Lehmler, Muhammad Saif-ur-Rehman, Tobias Glasmachers, Ioannis Iossifidis
  • for: 这个论文的目的是用数学抽象和模型来 deeper understanding of (deep) artificial neural networks 的行为和学习动力。
  • methods: 这篇论文提出了使用 Stochastic Processes 框架,模型(深度)人工神经网络的活动模式。文中使用 neuroscience 技术来检测和分析神经网络中的活动模式。
  • results: 研究人员通过对不同的人工神经网络图像识别任务数据进行分析,发现了不同的网络架构和训练集的活动模式之间的一致性。通过计算 Mean Firing Rate、Mean Fano Factor 和 Variances,研究人员发现了在学习过程中的记忆化现象,提供了有价值的学习行为的理解。
    Abstract To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural networks, it is valuable to employ mathematical abstractions and models. These tools provide a simplified perspective on network performance and facilitate systematic investigations through simulations. In this paper, we propose utilizing the framework of stochastic processes, which has been underutilized thus far. Our approach models activation patterns of thresholded nodes in (deep) artificial neural networks as stochastic processes. We focus solely on activation frequency, leveraging neuroscience techniques used for real neuron spike trains. During a classification task, we extract spiking activity and use an arrival process following the Poisson distribution. We examine observed data from various artificial neural networks in image recognition tasks, fitting the proposed model's assumptions. Through this, we derive parameters describing activation patterns in each network. Our analysis covers randomly initialized, generalizing, and memorizing networks, revealing consistent differences across architectures and training sets. Calculating Mean Firing Rate, Mean Fano Factor, and Variances, we find stable indicators of memorization during learning, providing valuable insights into network behavior. The proposed model shows promise in describing activation patterns and could serve as a general framework for future investigations. It has potential applications in theoretical simulations, pruning, and transfer learning.
    摘要 使用数学抽象和模型来深入理解人工神经网络的行为和学习 dinamics可以提供有价值的信息。在这篇论文中,我们提议使用Stochastic Processes框架,这种框架在人工神经网络领域一直未得到过足够的利用。我们的方法是将人工神经网络中的激活节点的激活模式模型为Stochastic Processes,并且仅ocus on激活频率,基于 neuroscience 技术来处理真正的神经元发射 Train。在进行图像识别任务时,我们提取了激活活动,并使用Poisson分布来描述到达过程。我们对各种人工神经网络的图像识别任务数据进行了分析,并适应了我们的模型假设。我们的分析覆盖了随机初始化、通用和记忆网络,并发现这些网络在不同的架构和训练集上存在一致的差异。通过计算Mean Firing Rate、Mean Fano Factor和Variances,我们发现在学习过程中记忆化的指标,这些指标提供了人工神经网络的行为中的有价值信息。我们的模型表现良好,并有可能在理论 simulations、树脂和传输学习等领域得到应用。

Training on Foveated Images Improves Robustness to Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2308.00854
  • repo_url: None
  • paper_authors: Muhammad A. Shah, Bhiksha Raj
  • for: 本研究旨在探讨人类视觉系统的一种重要特征:在周围视场中常见低精度视觉刺激对视觉模型的Robustness有什么影响。
  • methods: 我们开发了一种名为\RBlur的图像变换,用于模拟周围视场中视觉刺激的失真。这种变换基于给定注意点的距离来减少图像的清晰度和颜色温度。
  • results: 与原始图像进行训练的DNNs比起,使用\RBlur变换训练的DNNs在受到攻击和非攻击干扰后的数据上的准确率提高了15%至25%。
    Abstract Deep neural networks (DNNs) have been shown to be vulnerable to adversarial attacks -- subtle, perceptually indistinguishable perturbations of inputs that change the response of the model. In the context of vision, we hypothesize that an important contributor to the robustness of human visual perception is constant exposure to low-fidelity visual stimuli in our peripheral vision. To investigate this hypothesis, we develop \RBlur, an image transform that simulates the loss in fidelity of peripheral vision by blurring the image and reducing its color saturation based on the distance from a given fixation point. We show that compared to DNNs trained on the original images, DNNs trained on images transformed by \RBlur are substantially more robust to adversarial attacks, as well as other, non-adversarial, corruptions, achieving up to 25\% higher accuracy on perturbed data.
    摘要

Designing a Communication Bridge between Communities: Participatory Design for a Question-Answering AI Agent

  • paper_url: http://arxiv.org/abs/2308.00813
  • repo_url: None
  • paper_authors: Jeonghyun Lee, Vrinda Nandan, Harshvardhan Sikka, Spencer Rugaber, Ashok Goal
  • For: The paper aims to design an AI system that acts as a communication bridge between two user communities with different mental models and vocabularies.* Methods: The authors used a variation of participatory design to elicit requirements for developing a question-answering agent that explains how Skillsync works and acts as a communication bridge between company and college users.* Results: The study found that participatory design was useful in guiding the requirements gathering and eliciting user questions for the development of AskJill, and the two Skillsync user communities perceived glossary assistance as a key feature that AskJill needs to offer.Here are the three points in Simplified Chinese text:* For: 这篇论文的目的是设计一个能够 acted as a communication bridge between two个用户群体的 AI 系统,这两个用户群体具有不同的认知模型和术语。* Methods: 作者使用了一种变体的参与式设计方法来激发 Skillsync 的开发需求,以帮助它成为公司和学院用户之间的沟通桥梁。* Results: 研究发现,参与式设计是有用的,可以引导收集需求和提取用户问题,以便为 AskJill 的开发而设计。此外,两个 Skillsync 用户群体认为术语帮助是 AskJill 必备的功能之一,他们将从这种共同词汇中受益。
    Abstract How do we design an AI system that is intended to act as a communication bridge between two user communities with different mental models and vocabularies? Skillsync is an interactive environment that engages employers (companies) and training providers (colleges) in a sustained dialogue to help them achieve the goal of building a training proposal that successfully meets the needs of the employers and employees. We used a variation of participatory design to elicit requirements for developing AskJill, a question-answering agent that explains how Skillsync works and thus acts as a communication bridge between company and college users. Our study finds that participatory design was useful in guiding the requirements gathering and eliciting user questions for the development of AskJill. Our results also suggest that the two Skillsync user communities perceived glossary assistance as a key feature that AskJill needs to offer, and they would benefit from such a shared vocabulary.
    摘要 如何设计一个人工智能系统,让它作为两个用户社区之间的沟通桥梁?我们使用了一种参与设计的变体,与雇主(公司)和训练提供者(学院)进行持续的对话,以帮助他们建立一个成功地满足雇主和员工需求的训练提案。我们的研究发现,参与设计是有用的,可以引导需求收集和发现用户问题,以便为AskJill的开发而设计。我们的结果还显示,两个Skillsync用户社区认为词汇帮助是应有的功能,并且它们从中获益。

AnyLoc: Towards Universal Visual Place Recognition

  • paper_url: http://arxiv.org/abs/2308.00688
  • repo_url: https://github.com/AnyLoc/AnyLoc
  • paper_authors: Nikhil Keetha, Avneesh Mishra, Jay Karhade, Krishna Murthy Jatavallabhula, Sebastian Scherer, Madhava Krishna, Sourav Garg
  • for: 本研究旨在开发一种可靠的视觉地理位置认知(VPR)方法,能够在各种不同环境中(都市、室内、户外、航空、水下和地底环境)进行高精度的位置定位。
  • methods: 本研究使用了自然语言处理(NLP)和计算机视觉(CV)技术,特别是使用了自适应网络和自动Feature learning来学习通用的特征表示。
  • results: 研究结果显示,使用这些通用的特征表示和无监督特征聚合技术,可以实现4倍的性能提升,并且通过semantic特征分析得到6%的性能提升。
    Abstract Visual Place Recognition (VPR) is vital for robot localization. To date, the most performant VPR approaches are environment- and task-specific: while they exhibit strong performance in structured environments (predominantly urban driving), their performance degrades severely in unstructured environments, rendering most approaches brittle to robust real-world deployment. In this work, we develop a universal solution to VPR -- a technique that works across a broad range of structured and unstructured environments (urban, outdoors, indoors, aerial, underwater, and subterranean environments) without any re-training or fine-tuning. We demonstrate that general-purpose feature representations derived from off-the-shelf self-supervised models with no VPR-specific training are the right substrate upon which to build such a universal VPR solution. Combining these derived features with unsupervised feature aggregation enables our suite of methods, AnyLoc, to achieve up to 4X significantly higher performance than existing approaches. We further obtain a 6% improvement in performance by characterizing the semantic properties of these features, uncovering unique domains which encapsulate datasets from similar environments. Our detailed experiments and analysis lay a foundation for building VPR solutions that may be deployed anywhere, anytime, and across anyview. We encourage the readers to explore our project page and interactive demos: https://anyloc.github.io/.
    摘要 “视觉地点识别(VPR)是机器人地理位置定位的关键。至今为止,最高效的VPR方法都是环境和任务特定的:它们在结构化环境(主要是城市驾驶)中显示出强大的表现,但在无结构化环境中表现很差,导致大多数方法在实际世界中不稳定。在这项工作中,我们开发了一种通用的VPR解决方案——一种能够在各种结构化和无结构化环境中工作(城市、户外、室内、航空、水下和地下环境),无需更新或微调。我们示示了通用特征表示的概念,基于市场上的自我超级vised模型,无需VPR特定的训练,可以构建这种通用VPR解决方案。通过对这些特征进行无监督的feature集成,我们的AnyLoc方法可以实现与现有方法相比4倍更高的性能。此外,我们还发现了这些特征的 semantic properties,揭示了这些特征的唯一领域,从而提高了性能6%。我们的详细实验和分析为建立VPR解决方案提供了基础,让读者可以通过我们的项目页面和互动示例了解更多信息:https://anyloc.github.io/。”

A Knowledge-Oriented Approach to Enhance Integration and Communicability in the Polkadot Ecosystem

  • paper_url: http://arxiv.org/abs/2308.00735
  • repo_url: None
  • paper_authors: Marcio Ferreira Moreno, Rafael Rossi de Mello Brandão
  • for: 本研究旨在提供一个概念框架,以探讨和解决Polkadot生态系统中数据分析和通信问题。
  • methods: 本研究使用了域 ontology(POnto),实现了Polkadot生态系统中概念和关系的结构表示,从而提高了生态系统的 интеграbles和通信能力。
  • results: 本研究通过专家反馈和Polkadot社区的意见, validate了提案的概念框架,并提供了一个基于Controlled Natural Language的查询引擎路线图,实现了生态系统的扩展和推广。
    Abstract The Polkadot ecosystem is a disruptive and highly complex multi-chain architecture that poses challenges in terms of data analysis and communicability. Currently, there is a lack of standardized and holistic approaches to retrieve and analyze data across parachains and applications, making it difficult for general users and developers to access ecosystem data consistently. This paper proposes a conceptual framework that includes a domain ontology called POnto (a Polkadot Ontology) to address these challenges. POnto provides a structured representation of the ecosystem's concepts and relationships, enabling a formal understanding of the platform. The proposed knowledge-oriented approach enhances integration and communicability, enabling a wider range of users to participate in the ecosystem and facilitating the development of AI-based applications. The paper presents a case study methodology to validate the proposed framework, which includes expert feedback and insights from the Polkadot community. The POnto ontology and the roadmap for a query engine based on a Controlled Natural Language using the ontology, provide valuable contributions to the growth and adoption of the Polkadot ecosystem in heterogeneous socio-technical environments.
    摘要 派拉达网络生态系统是一种破坏性和高度复杂的多链架构,它在数据分析和通信方面带来了挑战。目前,有很多不同的渠道和应用程序之间的数据检索和分析几乎没有标准化和整体的方法,这使得普通用户和开发者难以一览ecosystem数据。这篇论文提出了一个概念框架,包括一个叫做POnto(派拉达 Ontology)的领域 ontology,以解决这些挑战。POnto提供了一种结构化的表示方式,帮助建立派拉达平台的正规理解。该提议的知识导向方法提高了集成和通信,使得更广泛的用户参与到ecosystem中,并促进了基于人工智能应用的开发。论文采用了一种实证方法,包括专家反馈和派拉达社区的意见,以验证提议的可行性。POnto ontology和基于控制自然语言的查询引擎路线图,为派拉达生态系统在不同的社会技术环境中的发展和推广做出了重要贡献。

Applicability of scaling laws to vision encoding models

  • paper_url: http://arxiv.org/abs/2308.00678
  • repo_url: https://github.com/suyamat/ScalingVisionEncoder
  • paper_authors: Takuya Matsuyama, Kota S Sasaki, Shinji Nishimoto
  • for: 这个论文的目的是如何建立一个高性能的视觉编码模型,以预测观看图像时的脑活动,作为Algonauts Project 2023 Challenge 的一部分。
  • methods: 这个论文使用了多种视觉模型,其参数大小从86M到4.3B不等,以建立预测模型。研究者主要关注两个方面:(1)如何通过训练集大小的变化来改善预测精度?(2)如何通过视觉模型参数大小的变化来改善预测精度?
  • results: 研究结果表明,随着训练集大小的增加,预测精度随着增加的 scaling law 改善。同时,我们发现,随着视觉模型参数大小的增加,预测精度也随着增加的 scaling law 改善。这些结果表明,增加训练集大小和视觉模型参数大小可能会导致更加准确的视觉模型,并且可能会促进视觉科学的发展。
    Abstract In this paper, we investigated how to build a high-performance vision encoding model to predict brain activity as part of our participation in the Algonauts Project 2023 Challenge. The challenge provided brain activity recorded by functional MRI (fMRI) while participants viewed images. Several vision models with parameter sizes ranging from 86M to 4.3B were used to build predictive models. To build highly accurate models, we focused our analysis on two main aspects: (1) How does the sample size of the fMRI training set change the prediction accuracy? (2) How does the prediction accuracy across the visual cortex vary with the parameter size of the vision models? The results show that as the sample size used during training increases, the prediction accuracy improves according to the scaling law. Similarly, we found that as the parameter size of the vision models increases, the prediction accuracy improves according to the scaling law. These results suggest that increasing the sample size of the fMRI training set and the parameter size of visual models may contribute to more accurate visual models of the brain and lead to a better understanding of visual neuroscience.
    摘要 在这篇论文中,我们调查了如何建立高性能视觉编码模型,以便预测大脑活动,这是我们参加了2023年Algonauts项目挑战的一部分。挑战提供了参与者通过功能成像(fMRI)技术记录的大脑活动。我们使用了多种视觉模型,其参数大小从86M到4.3B不同,建立预测模型。为建立高精度模型,我们对两个主要方面进行了分析:(1)如何随训练集大脑活动样本数量的变化,影响预测精度?(2)随视觉模型参数大小的变化,在视觉区域中预测精度如何变化?结果表明,随着训练集大脑活动样本数量的增加,预测精度按照尺度法则提高。同时,我们发现,随着视觉模型参数大小的增加,预测精度按照尺度法则提高。这些结果表明,增加训练集大脑活动样本数量和视觉模型参数大小可能会导致更高精度的视觉模型,并促进视觉 neuroscience的研究。

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.00675
  • repo_url: None
  • paper_authors: Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
  • for: 本研究旨在替代示例,提供工具文档来帮助大语言模型学习新工具。
  • methods: 本研究使用了工具文档,而不是示例,来训练大语言模型。
  • results: 研究发现,使用工具文档可以达到与少量示例相同的性能,并且在实际工具使用场景中表现更加出色。
    Abstract Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones to provide. As tasks grow more complex, the selection search grows combinatorially and invariably becomes intractable. Our work provides an alternative to demonstrations: tool documentation. We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations. We substantiate our claim through three main empirical findings on 6 tasks across both vision and language modalities. First, on existing benchmarks, zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. Second, on a newly collected realistic tool-use dataset with hundreds of available tool APIs, we show that tool documentation is significantly more valuable than demonstrations, with zero-shot documentation significantly outperforming few-shot without documentation. Third, we highlight the benefits of tool documentations by tackling image generation and video tracking using just-released unseen state-of-the-art models as tools. Finally, we highlight the possibility of using tool documentation to automatically enable new applications: by using nothing more than the documentation of GroundingDino, Stable Diffusion, XMem, and SAM, LLMs can re-invent the functionalities of the just-released Grounded-SAM and Track Anything models.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China.