cs.CL - 2023-07-21

OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?

  • paper_url: http://arxiv.org/abs/2307.11636
  • repo_url: None
  • paper_authors: Runjia Li, Shuyang Sun, Mohamed Elhoseiny, Philip Torr
  • for: 本研究开发了一个大规模的humorous image captions数据集(OxfordTVG-HIC),用于幽默生成和理解。
  • methods: 本研究使用了一个大规模的image-text对的数据集,并运用了深度学习方法来训练一个通用的幽默captioning模型。
  • results: 研究发现,OxfordTVG-HIC数据集可以用于训练一个通用的幽默captioning模型,并且可以用来评估生成的文本是否具有幽默性。此外,研究还发现,幽默的文本生成通常需要融合语言和图像的信息,并且需要运用幽默的概念和笑料。
    Abstract This paper presents OxfordTVG-HIC (Humorous Image Captions), a large-scale dataset for humour generation and understanding. Humour is an abstract, subjective, and context-dependent cognitive construct involving several cognitive factors, making it a challenging task to generate and interpret. Hence, humour generation and understanding can serve as a new task for evaluating the ability of deep-learning methods to process abstract and subjective information. Due to the scarcity of data, humour-related generation tasks such as captioning remain under-explored. To address this gap, OxfordTVG-HIC offers approximately 2.9M image-text pairs with humour scores to train a generalizable humour captioning model. Contrary to existing captioning datasets, OxfordTVG-HIC features a wide range of emotional and semantic diversity resulting in out-of-context examples that are particularly conducive to generating humour. Moreover, OxfordTVG-HIC is curated devoid of offensive content. We also show how OxfordTVG-HIC can be leveraged for evaluating the humour of a generated text. Through explainability analysis of the trained models, we identify the visual and linguistic cues influential for evoking humour prediction (and generation). We observe qualitatively that these cues are aligned with the benign violation theory of humour in cognitive psychology.
    摘要

A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

  • paper_url: http://arxiv.org/abs/2307.11584
  • repo_url: https://github.com/iclr2023achangeofheart/meld-modality-conversion
  • paper_authors: Zeinab Sadat Taghavi, Ali Satvaty, Hossein Sameti
  • for: 本研究旨在提高MELD dataset上的情感识别性能。
  • methods: 我们提出了一种模态转换概念,通过使用自动语音识别系统(ASR)和文本分类器来提高情感识别性能。
  • results: 我们的方法在MELD dataset上实现了显著的result,而Modality-Conversion++方法even outperformed当前的speech-based方法(WF1分数)。这些结果表明模态转换可以在不同的模态下进行任务,提高task的性能。
    Abstract Speech Emotion Recognition (SER) is a challenging task. In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. We assess our approach through two experiments: first, a method named Modality-Conversion that employs automatic speech recognition (ASR) systems, followed by a text classifier; second, we assume perfect ASR output and investigate the impact of modality conversion on SER, this method is called Modality-Conversion++. Our findings indicate that the first method yields substantial results, while the second method outperforms state-of-the-art (SOTA) speech-based approaches in terms of SER weighted-F1 (WF1) score on the MELD dataset. This research highlights the potential of modality conversion for tasks that can be conducted in alternative modalities.
    摘要 《语音情感识别(SER)是一个挑战性的任务。在这篇论文中,我们介绍了一个modalities conversion概念,以提高MELD dataset上的情感识别性能。我们通过两个实验进行评估:第一个方法是使用自动话音识别系统(ASR),然后使用文本分类器;第二个方法是假设ASR输出是完美的,并调查模式转换对SER的影响,这个方法被称为Modality-Conversion++。我们发现第一个方法具有重要的成果,而第二个方法在MELDdataset上的SERWeighted-F1(WF1)分数超过了现有的话语基于的方法。这个研究显示了modalities conversion的潜力,用于可以在不同的模式下进行的任务。》Note that Simplified Chinese is the standard writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Advancing Visual Grounding with Scene Knowledge: Benchmark and Method

  • paper_url: http://arxiv.org/abs/2307.11558
  • repo_url: https://github.com/zhjohnchan/sk-vg
  • paper_authors: Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li
  • for: This paper aims to create a new benchmark for visual grounding (VG) called SK-VG, which requires models to have reasoning abilities on long-form scene knowledge.
  • methods: The proposed approaches for SK-VG involve embedding knowledge into image features before the image-query interaction, or leveraging linguistic structure to assist in computing the image-text matching.
  • results: The proposed approaches achieve promising results but still leave room for improvement, including performance and interpretability.Here’s the simplified Chinese text:
  • for: 这篇论文目标是构建一个新的视觉固定(VG)benchmark,叫做SK-VG,它需要模型具有长文场景知识的理解能力。
  • methods: 提议的SK-VG方法包括在图像特征上嵌入知识,或者通过语言结构帮助图像文本匹配。
  • results: 提议的方法达到了可以的结果,但还有进一步的改进空间,包括性能和可解释性。
    Abstract Visual grounding (VG) aims to establish fine-grained alignment between vision and language. Ideally, it can be a testbed for vision-and-language models to evaluate their understanding of the images and texts and their reasoning abilities over their joint space. However, most existing VG datasets are constructed using simple description texts, which do not require sufficient reasoning over the images and texts. This has been demonstrated in a recent study~\cite{luo2022goes}, where a simple LSTM-based text encoder without pretraining can achieve state-of-the-art performance on mainstream VG datasets. Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge. To perform this task, we propose two approaches to accept the triple-type input, where the former embeds knowledge into the image features before the image-query interaction; the latter leverages linguistic structure to assist in computing the image-text matching. We conduct extensive experiments to analyze the above methods and show that the proposed approaches achieve promising results but still leave room for improvement, including performance and interpretability. The dataset and code are available at \url{https://github.com/zhjohnchan/SK-VG}.
    摘要 视觉定位(VG)的目标是建立细腻的视觉和语言之间对应。理想情况下,它可以作为视觉和语言模型的评估平台,以评估这些模型对图像和文本的理解和逻辑能力。然而,大多数现有的VG数据集都是使用简单的描述文本构建的,这些文本不足以需要图像和文本之间的充分理解。这已经在一项研究中被证明(Luo et al., 2022),使用无预训练的LSTM文本编码器可以在主流VG数据集上达到状态的表现。因此,在这篇论文中,我们提出了一个新的基准测试 datasets,即Scene Knowledge-guided Visual Grounding(SK-VG)。在这个数据集中,图像内容和引用表达不够地固定目标对象,需要模型具备对场景知识的理解和逻辑能力。为了完成这个任务,我们提出了两种方法,其中一种将知识嵌入图像特征之前进行图像-查询交互;另一种利用语言结构来帮助计算图像-文本匹配。我们进行了广泛的实验分析这些方法,并显示了提议的方法可以获得可 promise的结果,但仍有改进的空间,包括性能和可读性。数据集和代码可以在GitHub上获取,请参阅

Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.11545
  • repo_url: https://github.com/kkakkkka/etris
  • paper_authors: Zunnan Xu, Zhihong Chen, Yong Zhang, Yibing Song, Xiang Wan, Guanbin Li
  • for: 提高图像 segmentation 任务的 Parametric Efficient Tuning (PET) 性能,并且可以更好地适应多Modalities 的交互。
  • methods: 提出了一种名为 Bridger 的 adapter,用于在预训练模型中交换 cross-modal 信息,并将任务特定的信息注入到模型中。还设计了一个轻量级的解码器。
  • results: 通过对 challenging benchmarks 进行评估,得到了与预训练模型相比的比较或更高的性能,只需要更新 backbone 参数 1.61% 到 3.38%。
    Abstract Parameter Efficient Tuning (PET) has gained attention for reducing the number of parameters while maintaining performance and providing better hardware resource savings, but few studies investigate dense prediction tasks and interaction between modalities. In this paper, we do an investigation of efficient tuning problems on referring image segmentation. We propose a novel adapter called Bridger to facilitate cross-modal information exchange and inject task-specific information into the pre-trained model. We also design a lightweight decoder for image segmentation. Our approach achieves comparable or superior performance with only 1.61\% to 3.38\% backbone parameter updates, evaluated on challenging benchmarks. The code is available at \url{https://github.com/kkakkkka/ETRIS}.
    摘要 Parameter Efficient Tuning (PET) 已经吸引了关注,因为它可以降低参数的数量而保持性能,并且提供更好的硬件资源储存。然而,有很少的研究探讨 dense prediction 任务和多Modalities 之间的交互。本文提出了一种叫做 Bridger 的 novel adapter,用于促进 cross-modal 信息交换和注入预训练模型中的任务特定信息。我们还设计了一个轻量级的解码器,用于图像 segmentation。我们的方法可以在具有挑战性的benchmark上实现相当或更高的性能,只需要更新 1.61% 到 3.38% 的后部参数。代码可以在 \url{https://github.com/kkakkkka/ETRIS} 上获取。

Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

  • paper_url: http://arxiv.org/abs/2307.11450
  • repo_url: https://github.com/aalto-speech/Topic-identification-for-spontaneous-Finnish-speech
  • paper_authors: Dejan Porjazovski, Tamás Grósz, Mikko Kurimo
  • for: 本研究旨在探讨非标准文本解决方案,即使没有自动语音识别系统(ASR)的情况下,是否可以使用音频特征来实现话语识别。
  • methods: 本研究使用了音频只解决方案和多模态解决方案,并对其在自然语言处理 tasks 上进行评估。
  • results: 研究结果表明,当 ASR 系统不可用时,专门使用音频特征的解决方案是一个可行的选择,而多模态解决方案在识别精度方面达到了最佳效果。
    Abstract Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts, leading to a bad text-based classifier. Moreover, spontaneous speech containing hesitations can further degrade the performance of the ASR model. In this paper, we investigate alternatives to the standard text-only solutions by comparing audio-only and hybrid techniques of jointly utilising text and audio features. The models evaluated on spontaneous Finnish speech demonstrate that purely audio-based solutions are a viable option when ASR components are not available, while the hybrid multi-modal solutions achieve the best results.
    摘要 传统的话题标识解决方案从音频中依赖于自动语音识别系统(ASR)生成讲解,然后用文本基于模型进行分类。这些方法在高资源场景下工作良好,因为有足够的数据来训练两个组件的管道。然而,在低资源情况下,ASR系统,即使可用,也会生成低质量的讲解,导致文本基于分类器的性能差。此外,不断的语音中含有停顿可能会进一步降低ASR模型的性能。在这篇论文中,我们 investigate了标准文本只solution的代替方案,包括音频只solution和多Modal joint使用文本和音频特征的 Hybrid方法。我们对于不间断的芬兰语言进行了评估,得出结论是:纯音频基本方案在ASR组件不可用时是一个可行的选择,而Hybrid多Modal方法在最佳情况下实现了最好的结果。

MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems

  • paper_url: http://arxiv.org/abs/2307.11394
  • repo_url: None
  • paper_authors: Thilo von Neumann, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach
  • for: 本研究旨在评估各种会议笔记系统,提供一个通用的接口来计算常用的词错率(WER),包括cpWER、ORC WER 和 MIMO WER 等定义。
  • methods: 本文extend cpWER 计算中的时间约束,以确保只有在可能的时间对齐下才认为词语为正确的。这会导致匹配假设字符串与参照字符串的匹配质量更高,系统会受到负面折算。同时,作者还提出了一种使用 sentence-level 时间信息来估算 exact word-level 时间信息的方法,并证明该方法可以达到类似的 WER。
  • results: 作者通过实验表明,时间约束可以提高匹配算法的速度,并且这种提高的速度可以抵消对处理时间戳的额外开销。
    Abstract MeetEval is an open-source toolkit to evaluate all kinds of meeting transcription systems. It provides a unified interface for the computation of commonly used Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other WER definitions. We extend the cpWER computation by a temporal constraint to ensure that only words are identified as correct when the temporal alignment is plausible. This leads to a better quality of the matching of the hypothesis string to the reference string that more closely resembles the actual transcription quality, and a system is penalized if it provides poor time annotations. Since word-level timing information is often not available, we present a way to approximate exact word-level timings from segment-level timings (e.g., a sentence) and show that the approximation leads to a similar WER as a matching with exact word-level annotations. At the same time, the time constraint leads to a speedup of the matching algorithm, which outweighs the additional overhead caused by processing the time stamps.
    摘要

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

  • paper_url: http://arxiv.org/abs/2307.11380
  • repo_url: https://github.com/clement1290/chatgpt-detection-pr-hppt
  • paper_authors: Lingyi Yang, Feng Jiang, Haizhou Li
  • For: The paper aims to address the limitations of previous detectors that can only differentiate between purely ChatGPT-generated texts and human-authored texts, and instead focuses on detecting texts generated through human-machine collaboration, such as ChatGPT-polished texts.* Methods: The paper introduces a novel dataset called HPPT (ChatGPT-polished academic abstracts) that consists of pairs of human-written and ChatGPT-polished abstracts, and proposes a new method called the “Polish Ratio” to measure the degree of ChatGPT involvement in text generation based on editing distance.* Results: The paper shows that the proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB), and the “Polish Ratio” method provides a more comprehensive explanation by quantifying the degree of ChatGPT involvement, with a value greater than 0.2 indicating ChatGPT involvement and a value exceeding 0.6 implying that ChatGPT generates most of the text.Here are the three points in Simplified Chinese:
  • for: 这个论文的目的是解决以往的探测器只能区分纯然是人类写的文本和ChatGPT生成的文本,而不能区分人机合作生成的文本,例如ChatGPT准备过的文本。
  • methods: 这个论文提出了一个新的数据集名为HPPT(ChatGPT准备过的学术摘要),该数据集包含了人类写的和ChatGPT准备过的摘要对,并提出了一种新的方法名为“Polish Ratio”来度量ChatGPT在文本生成中的参与度。
  • results: 论文的实验结果表明,提出的模型在HPPT数据集和两个现有数据集(HC3和CDB)上具有更好的Robustness,而“Polish Ratio”方法提供了一个更加全面的解释,通过编辑距离度量ChatGPT的参与度,其中Polish Ratio值大于0.2表示ChatGPT参与,值大于0.6则表示ChatGPT主要生成了文本。
    Abstract The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have incited awe and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies, including HC3, have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of ChatGPT's involvement in text generation based on editing distance. It provides a mechanism to measure the degree of human originality in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement, which indicates that a Polish Ratio value greater than 0.2 signifies ChatGPT involvement and a value exceeding 0.6 implies that ChatGPT generates most of the text.
    摘要 大型语言模型,如ChatGPT,在文本生成方面表现出了惊人的能力,引发了研究人员的关注和探索。为了 mitigate potential risks,包括诈骗、垃圾信息和学术不正当行为,研究人员们已经开发了多种检测器。然而,大多数前一些研究,包括HC3,都是将注意力集中在区分ChatGPT生成的文本和人类写的文本之间。这种方法并不能够在检测人机合作生成的文本,如ChatGPT优化的文本。为了解决这个 gap,我们提出了一个新的数据集,称为HPPT(ChatGPT优化的学术摘要)。这个数据集与现有的数据集不同,因为它包含了人类写的和ChatGPT优化的摘要对。此外,我们还提出了一种新的方法,称为“熔化率”方法,它基于编辑距离来衡量ChatGPT在文本生成中的参与度。这种方法可以衡量生成的文本中人类的原创性。我们的实验结果表明,我们的提出的模型在HPPT数据集和两个现有的数据集(HC3和CDB)上具有更高的Robustness。此外,我们提出的“熔化率”方法可以为检测器提供更全面的解释,并且表明一个熔化率值大于0.2表示ChatGPT的参与度,而一个值超过0.6表示ChatGPT主要生成了文本。

DEFTri: A Few-Shot Label Fused Contextual Representation Learning For Product Defect Triage in e-Commerce

  • paper_url: http://arxiv.org/abs/2307.11344
  • repo_url: None
  • paper_authors: Ipsita Mohanty
  • for: 这研究旨在提高大规模敏捷软件开发循环中的缺陷排查效率,通过自动化方法使用机器学习来准确地将缺陷分配给合适的团队。
  • methods: 该研究提出了一个新的自动化缺陷排查框架(DEFTri),使用精度调整的现有预训练BERT来提高人类生成的产品缺陷文本embedding的上下文表示。
  • results: 在我们的多标签文本分类缺陷排查任务中,我们引入了一个Walmart报有的产品缺陷数据集,使用弱监督和对抗学习,在几个尝试设定下实现了一些表现。
    Abstract Defect Triage is a time-sensitive and critical process in a large-scale agile software development lifecycle for e-commerce. Inefficiencies arising from human and process dependencies in this domain have motivated research in automated approaches using machine learning to accurately assign defects to qualified teams. This work proposes a novel framework for automated defect triage (DEFTri) using fine-tuned state-of-the-art pre-trained BERT on labels fused text embeddings to improve contextual representations from human-generated product defects. For our multi-label text classification defect triage task, we also introduce a Walmart proprietary dataset of product defects using weak supervision and adversarial learning, in a few-shot setting.
    摘要 大规模敏捷软件开发生命周期中的缺陷察批是一个时间敏感和关键的过程,人工和过程依赖的不足导致了研究自动化方法的需求。这种工作提出了一种基于机器学习的新框架,用于自动检测缺陷(DEFTri),使用精度调整的先进预训练BERT来提高人工生成产品缺陷的文本表示。为我们的多标签文本分类缺陷察批任务,我们还介绍了一个华尔街专有的产品缺陷 dataset,使用弱监督和对抗学习,在几个尝试 Setting中。

Making Pre-trained Language Models both Task-solvers and Self-calibrators

  • paper_url: http://arxiv.org/abs/2307.11316
  • repo_url: https://github.com/yangyi-chen/lm-toast
  • paper_authors: Yangyi Chen, Xingyao Wang, Heng Ji
  • for: 这个论文是为了解决PLMs在高风险应用中的过于自信问题,并提出了一种基于LM-TOAST算法的自适应Calibration方法。
  • methods: 这个论文使用了LM-TOAST算法,该算法是基于验证分布的,可以在有限的训练样本下使PLMs有合理的自信估计。
  • results: 实验结果表明,LM-TOAST可以有效地利用训练数据,使PLMs在高风险应用中具有合理的自信估计,同时保持原始任务性能。此外,论文还应用了LM-TOAST在选择性分类、对抗攻击和模型堆叠等下游应用中的实用性。
    Abstract Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows that introducing an extra calibration task can mitigate this issue. The basic idea involves acquiring additional data to train models in predicting the confidence of their initial predictions. However, it only demonstrates the feasibility of this kind of method, assuming that there are abundant extra available samples for the introduced calibration task. In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and self-calibrators. Three challenges are presented, including limited training samples, data imbalance, and distribution shifts. We first conduct pilot experiments to quantify various decisive factors in the calibration task. Based on the empirical analysis results, we propose a training algorithm LM-TOAST to tackle the challenges. Experimental results show that LM-TOAST can effectively utilize the training data to make PLMs have reasonable confidence estimations while maintaining the original task performance. Further, we consider three downstream applications, namely selective classification, adversarial defense, and model cascading, to show the practical usefulness of LM-TOAST. The code will be made public at \url{https://github.com/Yangyi-Chen/LM-TOAST}.
    摘要 预训言语模型(PLM)作为许多实际应用的基础,需要有合理的信任估计。然而,PLM经常在错误预测时变得过自信,这不是实际应用中希望的。前期工作表明,引入额外的calibration任务可以解决这个问题。基本思路是通过获取更多的数据来训练模型对其初始预测的信任程度进行预测。然而,这只是一种可行的方法,假设有充足的额外可用样本。在这种实际场景中,我们需要使PLM同时作为任务解决方案和自我调整方法。我们提出三个挑战,包括有限的训练样本、数据不均衡和分布shift。我们首先进行了飞行实验,以量化各种重要的因素在calibration任务中。根据实验分析结果,我们提出了一种名为LM-TOAST的训练算法,以解决这些挑战。实验结果表明,LM-TOAST可以有效地利用训练数据,使PLM有合理的信任估计,同时保持原始任务性能。此外,我们还考虑了三个下游应用,namely selective classification、adversarial defense和model cascading,以显示LM-TOAST的实际用途。代码将在\url{https://github.com/Yangyi-Chen/LM-TOAST}上公开。

GIST: Generating Image-Specific Text for Fine-grained Object Classification

  • paper_url: http://arxiv.org/abs/2307.11315
  • repo_url: https://github.com/emu1729/gist
  • paper_authors: Kathleen M. Lewis, Emily Mu, Adrian V. Dalca, John Guttag
  • for: 这篇论文旨在提高精细图像分类 tasks 的表现, especailly for vision-language models that lack paired text/image descriptions.
  • methods: 我们提出了一个方法,即 GIST,它可以将图像转换为特定的精细文本描述,并证明这些文本描述可以用来改善分类。 GIST 的关键部分包括使用域专的提示Word 激发预训大型自然语言模型生成多元精细文本描述,以及使用预训视语模型对每个图像和文本描述进行匹配。
  • results: 我们透过对视语模型进行 fine-tuning 以 learns an aligned vision-language representation space,并在四个不同领域的全载和少载测试中证明了我们的方法可以提高 $4.1%$ 的准确性和 $1.1%$ 的前期的最佳图像文本分类方法。
    Abstract Recent vision-language models outperform vision-only models on many image classification tasks. However, because of the absence of paired text/image descriptions, it remains difficult to fine-tune these models for fine-grained image classification. In this work, we propose a method, GIST, for generating image-specific fine-grained text descriptions from image-only datasets, and show that these text descriptions can be used to improve classification. Key parts of our method include 1. prompting a pretrained large language model with domain-specific prompts to generate diverse fine-grained text descriptions for each class and 2. using a pretrained vision-language model to match each image to label-preserving text descriptions that capture relevant visual features in the image. We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification. We evaluate our learned representation space in full-shot and few-shot scenarios across four diverse fine-grained classification datasets, each from a different domain. Our method achieves an average improvement of $4.1\%$ in accuracy over CLIP linear probes and an average of $1.1\%$ improvement in accuracy over the previous state-of-the-art image-text classification method on the full-shot datasets. Our method achieves similar improvements across few-shot regimes. Code is available at https://github.com/emu1729/GIST.
    摘要 近期的视觉语模型在许多图像分类任务上表现出了超越视觉模型的能力。然而,由于缺乏对应的文本/图像描述对, fine-tune这些模型 для细化图像分类仍然具有挑战。在这项工作中,我们提出了一种方法,即GIST,可以从图像只 datasets中生成特定图像的细化文本描述,并证明这些文本描述可以用于改进分类。关键的部分包括:1. 使用域pecific的推文提示大型预训练语言模型生成每个类型的多样化细化文本描述。2. 使用预训练的视觉语言模型将每幅图像与保持标签的文本描述相匹配,以捕捉图像中相关的视觉特征。我们采用GIST方法,在图像和生成的文本对上进行了 fine-tuning 视觉语言模型,以学习一个含有视觉语言对应关系的表示空间,并证明该表示空间可以提高分类精度。我们在四个不同领域的四个全文shot和几个少shot场景中进行了评估,结果显示,我们的方法在全文shot场景中平均提高了4.1%的精度,并在前一个状态的图像文本分类方法中提高了1.1%的精度。我们的方法在少shot场景中也具有相似的改进。代码可以在https://github.com/emu1729/GIST 中找到。

Who should I Collaborate with? A Comparative Study of Academia and Industry Research Collaboration in NLP

  • paper_url: http://arxiv.org/abs/2308.04524
  • repo_url: None
  • paper_authors: Hussain Sadiq Abuwala, Bohan Zhang, Mushi Wang
  • for: 研究了学术和产业合作对自然语言处理(NLP)的影响。
  • methods: 创建了一个管道,EXTRACTAFFILIATIONS和CITATIONS FROM NLP论文,并将其分为三类:学术、产业和混合(学术和产业合作)。
  • results: 发现论文中学术和产业合作的出版数量呈增长趋势,而这些合作出版物的影响力比solely在学术界出版的高。
    Abstract The goal of our research was to investigate the effects of collaboration between academia and industry on Natural Language Processing (NLP). To do this, we created a pipeline to extract affiliations and citations from NLP papers and divided them into three categories: academia, industry, and hybrid (collaborations between academia and industry). Our empirical analysis found that there is a trend towards an increase in industry and academia-industry collaboration publications and that these types of publications tend to have a higher impact compared to those produced solely within academia.
    摘要 我们的研究目标是研究学术与产业合作对自然语言处理(NLP)的影响。为此,我们创建了一个管道,EXTRACT afFILIATIONS和引用 FROM NLP论文,并将其分为三类:学术、产业和Hybrid(学术和产业合作)。我们的实证分析发现,有一趋向增加的产业和学术合作出版物,并且这些类型的出版物具有较高的影响力,比solely within academia的出版物更高。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering

  • paper_url: http://arxiv.org/abs/2307.11278
  • repo_url: https://github.com/abdoelsayed2016/grg
  • paper_authors: Abdelrahman Abdallah, Adam Jatowt
  • for: 提高开放领域问答(QA)任务中的答案准确性,通过将文档检索技术与大语言模型(LLM)结合在一起,首先提示模型生成基于问题的文档,然后并行使用双编码网络从外部废集中检索相关的文档,最后通过第二个LLM生成最终的答案。
  • methods: 提议使用生成器-检索器-生成器(GRG)模型,其中首先使用LLM生成基于问题的文档,然后使用双编码网络从外部废集中检索相关的文档,最后使用第二个LLM生成最终的答案。
  • results: GRG模型在TriviaQA、NQ和WebQ数据集上的性能比GENREAD和RFiD模型高出至少5.2、4.2和1.6分,表明GRG模型可以更好地解决开放领域QA任务中的挑战,如生成相关和Contextually relevante的答案。
    Abstract Open-domain question answering (QA) tasks usually require the retrieval of relevant information from a large corpus to generate accurate answers. We propose a novel approach called Generator-Retriever-Generator (GRG) that combines document retrieval techniques with a large language model (LLM), by first prompting the model to generate contextual documents based on a given question. In parallel, a dual-encoder network retrieves documents that are relevant to the question from an external corpus. The generated and retrieved documents are then passed to the second LLM, which generates the final answer. By combining document retrieval and LLM generation, our approach addresses the challenges of open-domain QA, such as generating informative and contextually relevant answers. GRG outperforms the state-of-the-art generate-then-read and retrieve-then-read pipelines (GENREAD and RFiD) improving their performance at least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively. We provide code, datasets, and checkpoints \footnote{\url{https://github.com/abdoelsayed2016/GRG}
    摘要

A Systematic Evaluation of Federated Learning on Biomedical Natural Language Processing

  • paper_url: http://arxiv.org/abs/2307.11254
  • repo_url: https://github.com/pl97/fednlp
  • paper_authors: Le Peng, sicheng zhou, jiandong chen, Rui Zhang, Ziyue Xu, Ju Sun
  • for: 本研究用于探讨基于联合学习的医疗自然语言处理(NLP)模型的可靠性和私有数据保护。
  • methods: 本研究使用了6种基于 transformer 的语言模型,对6种生物医学数据集进行了融合学习。
  • results: 研究结果显示,基于联合学习的模型在医疗数据集上的性能比各自训练的模型和投票数据集训练的模型要好,但是当客户端数据不均匀分布时,模型性能会下降。 code 可以在 GitHub 上找到:https://github.com/PL97/FedNLP。
    Abstract Language models (LMs) like BERT and GPT have revolutionized natural language processing (NLP). However, privacy-sensitive domains, particularly the medical field, face challenges to train LMs due to limited data access and privacy constraints imposed by regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR). Federated learning (FL) offers a decentralized solution that enables collaborative learning while ensuring the preservation of data privacy. In this study, we systematically evaluate FL in medicine across $2$ biomedical NLP tasks using $6$ LMs encompassing $8$ corpora. Our results showed that: 1) FL models consistently outperform LMs trained on individual client's data and sometimes match the model trained with polled data; 2) With the fixed number of total data, LMs trained using FL with more clients exhibit inferior performance, but pre-trained transformer-based models exhibited greater resilience. 3) LMs trained using FL perform nearly on par with the model trained with pooled data when clients' data are IID distributed while exhibiting visible gaps with non-IID data. Our code is available at: https://github.com/PL97/FedNLP
    摘要 语言模型(LM)如BERT和GPT已经革命化自然语言处理(NLP)领域。然而,隐私敏感领域,特别是医疗领域,面临着培训LM的限制,这是由于有限的数据访问和隐私法规,如医疗保险和人类资源保护法(HIPAA)和欧盟数据保护条例(GDPR)所带来的。联邦学习(FL)提供了一种分布式解决方案,允许客户端之间的合作学习,同时保持数据隐私。在本研究中,我们系统地评估FL在医学领域中的表现,使用了6种LM,涵盖8个corpus。我们的结果显示了以下几点:1. FL模型通常比各个客户端的数据进行单独培训的LM表现更好,有时与汇总数据进行培训的模型匹配。2. 随着客户端数量增加,使用FL培训的LM在固定总数据量下表现较差,但是使用预先学习的 transformer-based 模型表现更好。3. 使用FL培训的LM在客户端数据是IID分布的情况下几乎与汇总数据进行培训的模型表现相同,而在非IID分布的情况下显示出明显的差距。我们的代码可以在GitHub上找到:https://github.com/PL97/FedNLP。

UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition

  • paper_url: http://arxiv.org/abs/2307.11170
  • repo_url: None
  • paper_authors: Aidan Mannion, Thierry Chevalier, Didier Schwab, Lorraine Geouriot
  • for: 这篇论文旨在提高生物医学领域的自然语言处理(NLP)任务中的表达能力。
  • methods: 该论文提出了一种基于数据的概念,用于增强生物医学领域的变换器encoder语言模型(LM)的语言表示。该方法通过提取来自UMLS的文本序列,将图形学习目标融合到了遮盖语言预训练中。
  • results: 根据对扩展预训练LM和从scratch进行训练的实验结果,该框架可以提高多个生物医学和临床Named Entity Recognition(NER)任务的下游性能。
    Abstract Pre-trained transformer language models (LMs) have in recent years become the dominant paradigm in applied NLP. These models have achieved state-of-the-art performance on tasks such as information extraction, question answering, sentiment analysis, document classification and many others. In the biomedical domain, significant progress has been made in adapting this paradigm to NLP tasks that require the integration of domain-specific knowledge as well as statistical modelling of language. In particular, research in this area has focused on the question of how best to construct LMs that take into account not only the patterns of token distribution in medical text, but also the wealth of structured information contained in terminology resources such as the UMLS. This work contributes a data-centric paradigm for enriching the language representations of biomedical transformer-encoder LMs by extracting text sequences from the UMLS. This allows for graph-based learning objectives to be combined with masked-language pre-training. Preliminary results from experiments in the extension of pre-trained LMs as well as training from scratch show that this framework improves downstream performance on multiple biomedical and clinical Named Entity Recognition (NER) tasks.
    摘要 (Simplified Chinese translation)先前的 transformer 语言模型 (LM) 在应用 NLP 领域已经成为主导的 paradigm。这些模型在信息提取、问答、情感分析、文档分类等任务上达到了状态机器的性能。在生物医学领域,人们已经在这些模型中采用了适应性的方法,以满足需要 integrate 域特定知识和语言统计模型的任务。特别是,研究人员在这个领域的关注点是如何构建 LM,以便考虑医学文本中token的分布模式,同时还考虑terminology资源such as UMLS 中的结构化信息。这个工作提供了一种数据驱动的 paradigm,用于增强生物医学 transformer-encoder LM 的语言表示。通过EXTRACTING text sequences from UMLS,可以将图基的学习目标与遮盖语言预训练结合。初步的实验结果表明,这种框架可以提高多个生物医学和临床Named Entity Recognition (NER) 任务的下游性能。

L-Eval: Instituting Standardized Evaluation for Long Context Language Models

  • paper_url: http://arxiv.org/abs/2307.11088
  • repo_url: https://github.com/openlmlab/leval
  • paper_authors: Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu
  • for: 评估长context语言模型的表现,以提高其处理单turn输入和对话历史的能力。
  • methods: 开发了一个标准化的评估方法,名为L-Eval,包含411份长文档和2,000多个人标注的查询响应对。
  • results: 发现open-source模型在评估中落后于商业模型,但仍然在开放式任务上表现出色,并在4k上下文长度下达到了最好的结果。
    Abstract Recently, there has been growing interest in extending the context length of instruction-following models in order to effectively process single-turn long input (e.g. summarizing a paper) and conversations with more extensive histories. While proprietary models such as GPT-4 and Claude have shown significant strides in handling extremely lengthy input, open-sourced models are still in the early stages of experimentation. It also remains unclear whether extending the context can offer substantial gains over traditional methods such as retrieval, and to what extent it improves upon their regular counterparts in practical downstream tasks. To address this challenge, we propose instituting standardized evaluation for long context language models. Concretely, we develop L-Eval which contains 411 long documents and over 2,000 human-labeled query-response pairs encompassing areas such as law, finance, school lectures, lengthy conversations, news, long-form novels, and meetings. L-Eval also adopts diverse evaluation methods and instruction styles, enabling a more reliable assessment of Long Context Language Models (LCLMs). Our findings indicate that while open-source models typically lag behind commercial models, they still exhibit impressive performance compared with their regular versions. LLaMA2-13B achieves the best results on both open-ended tasks (win \textbf{42}\% vs turbo-16k-0613) and closed-ended tasks with only 4k context length. We release our new evaluation suite, code, and all generation results including predictions from all open-sourced LCLMs, GPT4-32k, Cluade-100k at {\url{https://github.com/OpenLMLab/LEval}.
    摘要 近些时间,有越来越多的人对长 Context Language Model(LCLM)的Context长度扩展表示出兴趣,以便有效处理单次输入(如报告概要)和历史记录更加详细的对话。虽然Proprietary模型如GPT-4和Claude已经在处理极长输入方面进行了显著的进步,但开源模型仍处于实验阶段。尚未确定是否通过扩展Context可以提供重要的提升,以及到哪程度 extent 可以超越传统方法(如检索)的实际下渠道任务中的表现。为解决这个挑战,我们提议实施长 Context Language Model 的标准化评估。具体来说,我们开发了L-Eval,包含411个长文档和超过2000个人标注的查询响应对。L-Eval采用多种评估方法和指令风格,使得对Long Context Language Models 的评估更加可靠。我们的发现表明,虽然开源模型通常落后于商业模型,但仍然在开放式任务中表现出色,LLaMA2-13B在开放式任务中取得了42%的胜利率,并在4k Context长度下达到了最佳成绩。我们将在{\url{https://github.com/OpenLMLab/LEval}上发布我们的新评估集合、代码和所有生成结果,包括所有开源LCLMs、GPT4-32k和Cluade-100k的预测结果。

Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

  • paper_url: http://arxiv.org/abs/2307.11031
  • repo_url: https://github.com/HazyResearch/embroid
  • paper_authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré
  • for: 提高自动数据标注的效率,使用语言模型(LM)的提问基本学习能力。
  • methods: 使用LM的预测结果进行修改,而不需要更多的标注数据。提出Embroid方法,通过计算不同嵌入函数下的多个表示来实现。
  • results: 对六种LM和95个任务进行了严格的实验评估,发现 Embroid 能够提高表现(比如 GPT-JT 的平均提高7.3个点),同时也能够实现更复杂的提问策略(如连链思维),并可以根据嵌入函数进行特化。
    Abstract Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work asks whether it is possible to improve prompt-based learning without additional labeled data. We approach this problem by attempting to modify the predictions of a prompt, rather than the prompt itself. Our intuition is that accurate predictions should also be consistent: samples which are similar under some feature representation should receive the same prompt prediction. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. Embroid then uses these neighborhoods to create additional predictions for each sample, and combines these predictions with a simple latent variable graphical model in order to generate a final corrected prediction. In addition to providing a theoretical analysis of Embroid, we conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. We find that (1) Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT), (2) also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought), and (3) can be specialized to domains like law through the embedding functions.
    摘要 最近的研究表明,语言模型(LM)的提问基本学习能力使其适合自动化数据标注,尤其是在人工标注成本高的领域。问题在于,虽然编写初始提问便宜,但是改进提问却需要较多的标注数据来评估提问修改的影响。我们的工作是要判断是否可以不使用额外的标注数据来改进提问基本学习。我们采取的方法是通过修改预测而不是提问本身来改进提问。我们的直觉是,准确的预测应该也是一致的:样本在某种特征表示下应该具有相似的预测结果。我们提出了Embroid方法,它计算了不同的嵌入函数下的数据集多种表示,并使用预测结果之间的一致性来识别预测错误。Embroid使用这些邻居样本来生成每个样本的额外预测,并将这些预测与一个简单的潜在变量图模型结合起来,以生成最终更正的预测。除了对Embroid的理论分析之外,我们还进行了六种不同LM的系统性的实验,并在95个任务中进行了严格的实验评估。我们发现:1. Embroid可以明显改进原始提问的性能(例如,使用GPT-JT的平均提高7.3分)。2. Embroid还可以提高更复杂的提问策略(例如,链条思维)的性能。3. Embroid可以根据嵌入函数特定的领域进行特殊化。

Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation

  • paper_url: http://arxiv.org/abs/2307.11019
  • repo_url: https://github.com/rucaibox/llm-knowledge-boundary
  • paper_authors: Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang
  • for: 这个研究的目的是调查大语言模型(LLMs)在开放领域问答(QA)任务中的知识边界能力,以及如何通过检索增强来提高 LLMS 的判断能力。
  • methods: 这个研究使用了大语言模型(LLMs),包括 ChatGPT,以及检索增强技术来解决开放领域问答(QA)任务。
  • results: 研究发现,LLMs 具有不动摇的自信心,并且它们的回答准确率较高。此外,通过检索增强,LLMs 的判断能力得到了改善。同时,研究发现 LLMS 倾向于使用提供的检索结果来组织答案,而这些结果的质量对 LLMS 的依赖有着重要的影响。
    Abstract Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require a substantial amount of factual knowledge and often rely on external information for assistance. Recently, large language models (LLMs) (e.g., ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks with world knowledge, including knowledge-intensive tasks. However, it remains unclear how well LLMs are able to perceive their factual knowledge boundaries, particularly how they behave when incorporating retrieval augmentation. In this study, we present an initial analysis of the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially, we focus on three primary research questions and analyze them by examining QA performance, priori judgement and posteriori judgement of LLMs. We show evidence that LLMs possess unwavering confidence in their capabilities to respond to questions and the accuracy of their responses. Furthermore, retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries, thereby improving their judgemental abilities. Additionally, we also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers, while the quality of these results significantly impacts their reliance. The code to reproduce this work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.
    摘要 知识密集任务(如开放领域问答(QA))需要很大的实际知识和常常依靠外部信息。最近,大型语言模型(LLMs)(如ChatGPT)在解决各种任务的能力方面表现出色,包括知识密集任务。然而,LLMs在把握实际知识边界方面的能力仍然不清楚,特别是在将检索增强纳入系统时。在本研究中,我们提出了三个主要研究问题,并通过分析LLMs的问答性能、先前判断和后来判断来研究这些问题。我们发现LLMs具有不退让的自信心,并且其回答的准确率较高。此外,我们发现检索增强是一种有效的方法,可以提高LLMs对实际知识边界的意识。此外,我们还发现LLMs倾向于根据提供的检索结果来组织答案,而这些结果的质量对LLMs的依赖度有着重要影响。相关代码可以在GitHub上找到:https://github.com/RUCAIBox/LLM-Knowledge-Boundary。

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

  • paper_url: http://arxiv.org/abs/2307.11005
  • repo_url: None
  • paper_authors: Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
  • for: 这篇论文是为了探讨预训练语音识别(ASR)和语言模型(LM)在语义理解(SLU)框架中的集成。
  • methods: 该论文提出了一种三个过程的端到端(E2E)SLU系统,它将预训练ASR和LM子网络集成到SLU框架中进行序列生成任务。在第一个过程中,我们的架构预测ASR转译文本使用ASR子网络。接着,LM子网络会对初始SLU预测。最后,在第三个过程中,妥协子网络通过ASR和LM子网络表示进行最终预测。
  • results: 该论文的提出的三个过程SLU系统在两个标准SLU数据集上(SLURP和SLUE)表现出优于栅积和E2E SLU模型,特别是在听觉困难的语音中。
    Abstract There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework. However, prior methods often struggle with a vocabulary mismatch between pretrained models, and LM cannot be directly utilized as they diverge from its NLU formulation. In this study, we propose a three-pass end-to-end (E2E) SLU system that effectively integrates ASR and LM subnetworks into the SLU formulation for sequence generation tasks. In the first pass, our architecture predicts ASR transcripts using the ASR subnetwork. This is followed by the LM subnetwork, which makes an initial SLU prediction. Finally, in the third pass, the deliberation subnetwork conditions on representations from the ASR and LM subnetworks to make the final prediction. Our proposed three-pass SLU system shows improved performance over cascaded and E2E SLU models on two benchmark SLU datasets, SLURP and SLUE, especially on acoustically challenging utterances.
    摘要 有些研究者表示,在将预训练的语音识别(ASR)和语言模型(LM)集成到SLU框架中的 интерес增长。然而,之前的方法经常会遇到预训练模型和LM之间的词汇差异,LM无法直接使用,因为它与NLU表述不匹配。在这项研究中,我们提出了一种三个过程的SLU系统,可以有效地将ASR和LM子网络集成到SLU表述中进行序列生成任务。在第一个过程中,我们的架构预测ASR译文使用ASR子网络。接着,LM子网络会 Initial SLU预测。最后,在第三个过程中,妥协子网络使用ASR和LM子网络的表示来做最终预测。我们的提议的三个过程SLU系统在两个标准SLU数据集上(SLURP和SLUE)表现出色,特别是在具有声学挑战的语音中。

MASR: Metadata Aware Speech Representation

  • paper_url: http://arxiv.org/abs/2307.10982
  • repo_url: None
  • paper_authors: Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth
  • for: 本文提出了一种Metadata Aware Speech Representation learning框架(MASR),用于利用外部知识来增强speech表示学习。
  • methods: 本文使用了一种基于sample-level pair-wise similarity矩阵的硬挖损失函数,以及任意选择的self-supervised learning方法。
  • results: 在多个下游任务中,包括语言识别、语音识别和非语言性任务中,MASR表示学习得到了显著的性能提升,比较其他已知标准。在语言识别任务中,文件进行了详细分析,以便理解如何使用提案的损失函数使表示分离 closely related语言。
    Abstract In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Metadata Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information. The external knowledge sources are incorporated in the form of sample-level pair-wise similarity matrices that are useful in a hard-mining loss. A key advantage of the MASR framework is that it can be combined with any choice of SSL method. Using MASR representations, we perform evaluations on several downstream tasks such as language identification, speech recognition and other non-semantic tasks such as speaker and emotion recognition. In these experiments, we illustrate significant performance improvements for the MASR over other established benchmarks. We perform a detailed analysis on the language identification task to provide insights on how the proposed loss function enables the representations to separate closely related languages.
    摘要

cs.LG - 2023-07-21

Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts

  • paper_url: http://arxiv.org/abs/2307.11661
  • repo_url: https://github.com/mayug/vdt-adapter
  • paper_authors: Mayug Maniparambil, Chris Vorster, Derek Molloy, Noel Murphy, Kevin McGuinness, Noel E. O’Connor
  • for: 这篇论文旨在研究如何使用生成型大型视觉语言模型(VLMs)和适应器来提高视觉表示学习的性能。
  • methods: 论文使用了一种生成型大型语言模型GPT-4,通过设计相关的提问来适应CLIP模型,并使用这些提问来生成视觉描述文本。
  • results: 论文表明,使用GPT-4生成的提问可以大幅提高CLIP模型的零shot传输精度(7%),并且设计了一种简单的几招 adapter 来选择最佳的句子来构建通用的分类器,该分类器可以超过 CoCoOP 的表现(2%)。
    Abstract Contrastive pretrained large Vision-Language Models (VLMs) like CLIP have revolutionized visual representation learning by providing good performance on downstream datasets. VLMs are 0-shot adapted to a downstream dataset by designing prompts that are relevant to the dataset. Such prompt engineering makes use of domain expertise and a validation dataset. Meanwhile, recent developments in generative pretrained models like GPT-4 mean they can be used as advanced internet search tools. They can also be manipulated to provide visual information in any structure. In this work, we show that GPT-4 can be used to generate text that is visually descriptive and how this can be used to adapt CLIP to downstream tasks. We show considerable improvements in 0-shot transfer accuracy on specialized fine-grained datasets like EuroSAT (~7%), DTD (~7%), SUN397 (~4.6%), and CUB (~3.3%) when compared to CLIP's default prompt. We also design a simple few-shot adapter that learns to choose the best possible sentences to construct generalizable classifiers that outperform the recently proposed CoCoOP by ~2% on average and by over 4% on 4 specialized fine-grained datasets. The code, prompts, and auxiliary text dataset is available at https://github.com/mayug/VDT-Adapter.
    摘要 带有对比学习的大型视力语言模型(VLM)如CLIP,已经革命化视觉表示学习。VLM可以通过设计相关的提问来适应下游数据集,而这种提问工程充分利用了领域专业知识和验证数据集。同时,最新的生成预训练模型如GPT-4,可以用作高级网络搜索工具,同时也可以通过提供视觉信息的任意结构来 manipulate。在这项工作中,我们示出了使用GPT-4生成文本可以提供视觉描述,并如何使用这种方法适应CLIP到下游任务。我们在特殊细化数据集上(如EuroSAT、DTD、SUN397和CUB)表现出了 considerable 的提升(~7%、~7%、~4.6%和~3.3%),相比CLIP的默认提问。我们还设计了一个简单的几招学习器,可以选择最佳的句子来构建通用的分类器,超过CoCoOP的最近提出的 ~2% 的平均提升和 ~4% 的特殊细化数据集上的提升。代码、提问和辅助文本数据集可以在 https://github.com/mayug/VDT-Adapter 上找到。

Bandits with Deterministically Evolving States

  • paper_url: http://arxiv.org/abs/2307.11655
  • repo_url: None
  • paper_authors: Khashayar Khosravi, Renato Paes Leme, Chara Podimata, Apostolis Tsorvantzis
  • for: 学习渠道反馈中的抽象渠道问题,特别是在推荐系统和在线广告中。
  • methods: 使用多支武器抽象渠道模型,考虑渠道状态的演化和不可见性。
  • results: 提出了一种受到演化状态的抽象渠道模型,并对这种模型进行了线上学习算法的分析,包括对任何可能的演化率$\lambda$的分析。结果显示,在不同的演化率下,算法的 regret 率可以达到 $\widetilde O(\sqrt{KT})$、$\widetilde O(T^{b/a})$ 和 $\widetilde O(K^{1/3}T^{2/3})$ 等 bound。
    Abstract We propose a model for learning with bandit feedback while accounting for deterministically evolving and unobservable states that we call Bandits with Deterministically Evolving States. The workhorse applications of our model are learning for recommendation systems and learning for online ads. In both cases, the reward that the algorithm obtains at each round is a function of the short-term reward of the action chosen and how ``healthy'' the system is (i.e., as measured by its state). For example, in recommendation systems, the reward that the platform obtains from a user's engagement with a particular type of content depends not only on the inherent features of the specific content, but also on how the user's preferences have evolved as a result of interacting with other types of content on the platform. Our general model accounts for the different rate $\lambda \in [0,1]$ at which the state evolves (e.g., how fast a user's preferences shift as a result of previous content consumption) and encompasses standard multi-armed bandits as a special case. The goal of the algorithm is to minimize a notion of regret against the best-fixed sequence of arms pulled. We analyze online learning algorithms for any possible parametrization of the evolution rate $\lambda$. Specifically, the regret rates obtained are: for $\lambda \in [0, 1/T^2]$: $\widetilde O(\sqrt{KT})$; for $\lambda = T^{-a/b}$ with $b < a < 2b$: $\widetilde O (T^{b/a})$; for $\lambda \in (1/T, 1 - 1/\sqrt{T}): \widetilde O (K^{1/3}T^{2/3})$; and for $\lambda \in [1 - 1/\sqrt{T}, 1]: \widetilde O (K\sqrt{T})$.
    摘要 我们提出一种学习模型,称为带有束缚演化的带刺机制(Bandits with Deterministically Evolving States)。该模型的应用场景包括推荐系统和在线广告学习。在这两种情况下,算法在每个轮次获得的奖励是基于选择的动作的短期奖励和系统的状态。例如,在推荐系统中,用户对于特定类型的内容的互动奖励取决于内容的特性以及用户在前一次互动后的偏好的演化程度。我们的通用模型考虑了不同的演化速率 $\lambda \in [0,1]$,并包括标准多重机枪为特例。算法的目标是在最佳固定序列的机枪上取得最小的误差。我们分析了在任何可能的参数化 $\lambda$ 下的在线学习算法,并得到了以下 regret 率:对 $\lambda \in [0, 1/T^2]$, $\widetilde O(\sqrt{KT})$;对 $\lambda = T^{-a/b}$, $b < a < 2b$, $\widetilde O (T^{b/a})$;对 $\lambda \in (1/T, 1 - 1/\sqrt{T})$, $\widetilde O (K^{1/3}T^{2/3})$;对 $\lambda \in [1 - 1/\sqrt{T}, 1]$, $\widetilde O (K\sqrt{T})$。

Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs

  • paper_url: http://arxiv.org/abs/2307.11629
  • repo_url: None
  • paper_authors: Jiayu Chen, Jingdi Chen, Tian Lan, Vaneet Aggarwal
  • for: This paper is written for improving the exploration of reinforcement learning (RL) in single-agent scenarios with sparse reward signals, and enabling the ease of decomposition in multi-agent systems.
  • methods: The paper proposes a multi-agent skill discovery method that approximates the joint state space as a Kronecker graph, and estimates its Fiedler vector using the Laplacian spectrum of individual agents’ transition graphs. The method also includes a deep learning extension using NN-based representation learning techniques.
  • results: The proposed algorithm is evaluated on multi-agent tasks built with simulators like Mujoco, and shows significant outperformance compared to the state-of-the-art.Here is the text in Simplified Chinese:
  • for: 这篇论文是为了改进单个机器人的奖励学习(RL)探索,以及在多机器人系统中实现拆分的容易性。
  • methods: 这篇论文提出了一种多机器人技能发现方法,通过将 JOINT 状态空间approximate为Kronecker图,并使用个体机器人的过渡图 Laplacian спектrum 来估算 JOINT 状态空间的Fiedler вектор。方法还包括一种基于深度学习的扩展,通过使用 NN-based 表示学习技术来估算 eigenfunctions。
  • results: 提出的算法在使用 Mujoco 等模拟器构建的多机器人任务上进行评估,并显示出了明显的超越现有的state-of-the-art。
    Abstract Covering skill (a.k.a., option) discovery has been developed to improve the exploration of RL in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. Given that joint state space grows exponentially with the number of agents in multi-agent systems, existing researches still relying on single-agent skill discovery either become prohibitive or fail to directly discover joint skills that improve the connectivity of the joint state space. In this paper, we propose multi-agent skill discovery which enables the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector using the Laplacian spectrum of individual agents' transition graphs. Further, considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method by estimating eigenfunctions through NN-based representation learning techniques. The evaluation on multi-agent tasks built with simulators like Mujoco, shows that the proposed algorithm can successfully identify multi-agent skills, and significantly outperforms the state-of-the-art. Codes are available at: https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP.
    摘要 <>使用简化中文表述文本。<>利用技能探索(即选项)发展来改进单机RL探索,通过连接状态transition图中的最远状态点。由于多机体系中的共同状态空间 exponentiation WITH agent数量,现有的研究仍然依赖单机体系技能探索,导致计算拥堵或直接探索多机体系技能而不可能。在这篇论文中,我们提出了多机体系技能探索,允许拆分。我们的关键思想是将共同状态空间approximer为Kronecker图,基于这个图可以直接估算其Fiedler вектор使用个体代理的过程图laplacian spectrum。此外,由于直接计算laplacian spectrum是任务 WITH 无穷大状态空间中的不可能,我们进一步提出了基于深度学习的方法,通过NN-based representation learning技术来估算eigenfunctions。在使用Mujoco等模拟器constructed的多机任务上进行评估,我们的方法可以成功地确定多机技能,并在与当前最佳方法进行比较时显著超越。代码可以在https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP中找到。

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization

  • paper_url: http://arxiv.org/abs/2307.11620
  • repo_url: None
  • paper_authors: Xiangsen Wang, Haoran Xu, Yinan Zheng, Xianyuan Zhan
  • for: Offline multi-agent reinforcement learning (MARL)
  • methods: Implicit global-to-local value regularization, in-sample learning
  • results: Superior performance over state-of-the-art offline MARL methods in almost all tasks, as demonstrated through comprehensive experiments on the offline multi-agent MuJoCo and StarCraft II micro-management tasks.
    Abstract Offline reinforcement learning (RL) has received considerable attention in recent years due to its attractive capability of learning policies from offline datasets without environmental interactions. Despite some success in the single-agent setting, offline multi-agent RL (MARL) remains to be a challenge. The large joint state-action space and the coupled multi-agent behaviors pose extra complexities for offline policy optimization. Most existing offline MARL studies simply apply offline data-related regularizations on individual agents, without fully considering the multi-agent system at the global level. In this work, we present OMIGA, a new offline m ulti-agent RL algorithm with implicit global-to-local v alue regularization. OMIGA provides a principled framework to convert global-level value regularization into equivalent implicit local value regularizations and simultaneously enables in-sample learning, thus elegantly bridging multi-agent value decomposition and policy learning with offline regularizations. Based on comprehensive experiments on the offline multi-agent MuJoCo and StarCraft II micro-management tasks, we show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks.
    摘要 偏向式学习(RL)在线下学习(Offline RL)方面已经吸引了一些注意,因为它可以从在线数据集中学习策略而不需要环境交互。然而,多智能体RL(MARL)在线下仍然是一个挑战。大量共同状态-动作空间和多智能体行为的交互增加了在线策略优化的复杂性。大多数现有的在线MARL研究只是对个体代理应用在线数据相关的正则化,没有全面考虑多智能体系统的全局水平。在这种情况下,我们提出了OMIGA,一种新的在线多智能体RL算法,其中包含隐式全局到本地值正则化。OMIGA可以将全局水平的值正则化转换为等效的隐式本地值正则化,同时允许在样本中学习,因此简洁地结合多智能体值分解和策略学习,并且使用在线正则化。根据对多智能体MuJoCo和StarCraft II微管理任务的完整实验,我们表明OMIGA在大多数任务上表现出优于当前最佳的在线MARL方法。

Robust Fully-Asynchronous Methods for Distributed Training over General Architecture

  • paper_url: http://arxiv.org/abs/2307.11617
  • repo_url: None
  • paper_authors: Zehan Zhu, Ye Tian, Yan Huang, Jinming Xu, Shibo He
  • for: 提高分布式机器学习问题中的同步效率和稳定性,因为存在延迟、包失和偏移者等问题。
  • methods: 提出了一种Robust Fully-Asynchronous Stochastic Gradient Tracking方法(R-FAST),每个设备都可以在自己的pace进行本地计算和通信,不需要任何同步。与现有的异步分布式算法不同,R-FAST可以消除设备间数据不同性的影响和 packet loss 的问题,通过设计合适的副本变量来跟踪和缓存总导数向量。
  • results: R-FAST 可以在平均情况下达到一个邻域的优点,具有平方根速率和强共形性的目标函数,以及一个子线性速率的总体非共形目标函数。实验表明,R-FAST 比同步 refer 算法(如环形 AllReduce 和 D-PSGD)快 1.5-2 倍,同时保持相似的精度,并超过现有的异步 SOTA 算法(如 AD-PSGD 和 OSGP),特别是在存在偏移者的情况下。
    Abstract Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers. We propose a Robust Fully-Asynchronous Stochastic Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own pace without any form of synchronization. Different from existing asynchronous distributed algorithms, R-FAST can eliminate the impact of data heterogeneity across devices and allow for packet losses by employing a robust gradient tracking strategy that relies on properly designed auxiliary variables for tracking and buffering the overall gradient vector. More importantly, the proposed method utilizes two spanning-tree graphs for communication so long as both share at least one common root, enabling flexible designs in communication architectures. We show that R-FAST converges in expectation to a neighborhood of the optimum with a geometric rate for smooth and strongly convex objectives; and to a stationary point with a sublinear rate for general non-convex settings. Extensive experiments demonstrate that R-FAST runs 1.5-2 times faster than synchronous benchmark algorithms, such as Ring-AllReduce and D-PSGD, while still achieving comparable accuracy, and outperforms existing asynchronous SOTA algorithms, such as AD-PSGD and OSGP, especially in the presence of stragglers.
    摘要 完美同步在分布式机器学习问题上是不可取和无法实现,因为存在延迟、包装丢失和偏移器。我们提议一种Robust Fully-Asynchronous Stochastic Gradient Tracking方法(R-FAST),其中每个设备都会在自己的速度和方式下进行本地计算和通信,不需要任何同步。与现有的异步分布式算法不同,R-FAST可以消除设备之间数据不一致的影响,并且能够承受 packet 丢失,通过设计合适的辅助变量来跟踪和缓存总Gradient向量。此外,我们利用了两个拓扑图进行通信,只要这两个拓扑图至少有一个共同的根节点,然后可以实现 flexible 的通信架构。我们证明了R-FAST在各种情况下都可以减少到预期的一个邻域中的优化率,包括圆满和强烈的对称问题。同时,R-FAST在一般非对称问题下也可以到达一个站点点的稳定点,但是速度是不可预测的。我们的实验表明,R-FAST比同步标准算法快1.5-2倍,同时仍能保持相同的准确率,并且在存在偏移器的情况下比现有的异步SOTA算法更高效。

Persistent Ballistic Entanglement Spreading with Optimal Control in Quantum Spin Chains

  • paper_url: http://arxiv.org/abs/2307.11609
  • repo_url: None
  • paper_authors: Ying Lu, Pei Shi, Xiao-Han Wang, Jie Hu, Shi-Ju Ran
  • for: 这个论文是研究量子多体系统的动态行为的关键方法之一,即束缚扩散。
  • methods: 这篇论文使用了变量束缚增强场(VEEF)来induce持续弹性扩散束缚。VEEF是时间依赖的,并可以最优化以 maximize维度束缚 entropy(EE)的最终态。
  • results: 研究发现,在VEEF的控制下,束缚 entropy(EE)会 linearly增长,直到EE到达真正的极限值 $\tilde{S} = - \log_{2} 2^{-\frac{N}{2} = \frac{N}{2}$。在这个过程中,EE的增长速度是velocity $v$,其中$v \about 2.76$, $4.98$,和$5.75$分别对应于各种束缚(Ising、XY、Heisenberg)。此外,研究还发现了在长距离相互作用下出现非线性增长的束缚。
    Abstract Entanglement propagation provides a key routine to understand quantum many-body dynamics in and out of equilibrium. In this work, we uncover that the ``variational entanglement-enhancing'' field (VEEF) robustly induces a persistent ballistic spreading of entanglement in quantum spin chains. The VEEF is time dependent, and is optimally controlled to maximize the bipartite entanglement entropy (EE) of the final state. Such a linear growth persists till the EE reaches the genuine saturation $\tilde{S} = - \log_{2} 2^{-\frac{N}{2}=\frac{N}{2}$ with $N$ the total number of spins. The EE satisfies $S(t) = v t$ for the time $t \leq \frac{N}{2v}$, with $v$ the velocity. These results are in sharp contrast with the behaviors without VEEF, where the EE generally approaches a sub-saturation known as the Page value $\tilde{S}_{P} =\tilde{S} - \frac{1}{2\ln{2}$ in the long-time limit, and the entanglement growth deviates from being linear before the Page value is reached. The dependence between the velocity and interactions is explored, with $v \simeq 2.76$, $4.98$, and $5.75$ for the spin chains with Ising, XY, and Heisenberg interactions, respectively. We further show that the nonlinear growth of EE emerges with the presence of long-range interactions.
    摘要 Entanglement 传播提供了量子多体动力学中重要的键式程序,以解释在和离EQilibrium中的动力学行为。在这个工作中,我们发现了“可变性增强”的场(VEEF)可以坚定地促进量子磁链中的抽象协议射频衰减。VEEF是时间依赖的,并且可以优化地控制以最大化磁链的二分 Entanglement entropy(EE)的最终状态。这种线性增长持续到EE到达真正的极限 $\tilde{S} = - \log_{2} 2^{-\frac{N}{2}=\frac{N}{2}$ 的时间 $t \leq \frac{N}{2v}$,其中 $v$ 是速度。这些结果与无VEEF情况下的行为不同,其中EE通常在长时间后达到一个下降的值known as the Page value $\tilde{S}_{P} =\tilde{S} - \frac{1}{2\ln{2}$,而且协议射频的增长不是线性的。我们还发现了在存在长距离交互时,非线性增长的EE现象。

Learning minimal representations of stochastic processes with variational autoencoders

  • paper_url: http://arxiv.org/abs/2307.11608
  • repo_url: https://github.com/gabrielfernandezfernandez/spivae
  • paper_authors: Gabriel Fernández-Fernández, Carlo Manzo, Maciej Lewenstein, Alexandre Dauphin, Gorka Muñoz-Gil
  • for: 研究者使用无监督机器学习方法自动发现随机过程中不确定参数的最小集合。
  • methods: 研究者使用扩展β-variational autoencoder建模方法,通过模拟数据集来证明方法的有效性。
  • results: 方法可以准确地描述随机过程的动态,并且可以生成符合预期随机行为的新轨迹。
    Abstract Stochastic processes have found numerous applications in science, as they are broadly used to model a variety of natural phenomena. Due to their intrinsic randomness and uncertainty, they are however difficult to characterize. Here, we introduce an unsupervised machine learning approach to determine the minimal set of parameters required to effectively describe the dynamics of a stochastic process. Our method builds upon an extended $\beta$-variational autoencoder architecture. By means of simulated datasets corresponding to paradigmatic diffusion models, we showcase its effectiveness in extracting the minimal relevant parameters that accurately describe these dynamics. Furthermore, the method enables the generation of new trajectories that faithfully replicate the expected stochastic behavior. Overall, our approach enables for the autonomous discovery of unknown parameters describing stochastic processes, hence enhancing our comprehension of complex phenomena across various fields.
    摘要 Note: Simplified Chinese is also known as "简化字" or "简化字".Please note that the translation is done using the "free" translation tool provided by Google, and may not be 100% accurate or idiomatic.

Finding Optimal Diverse Feature Sets with Alternative Feature Selection

  • paper_url: http://arxiv.org/abs/2307.11607
  • repo_url: https://github.com/jakob-bach/alternative-feature-selection
  • paper_authors: Jakob Bach
  • for: 本研究旨在提出一种新的特征选择方法,以提供多个可能的特征集,从而为用户提供不同的解释方案。
  • methods: 本研究使用了约束来定义备选特征集,并允许用户控制备选集的数量和差异度。
  • results: 研究发现,使用备选特征集可以获得高精度预测模型,并且分析了一些影响这种结果的因素。
    Abstract Feature selection is popular for obtaining small, interpretable, yet highly accurate prediction models. Conventional feature-selection methods typically yield one feature set only, which might not suffice in some scenarios. For example, users might be interested in finding alternative feature sets with similar prediction quality, offering different explanations of the data. In this article, we introduce alternative feature selection and formalize it as an optimization problem. In particular, we define alternatives via constraints and enable users to control the number and dissimilarity of alternatives. Next, we analyze the complexity of this optimization problem and show NP-hardness. Further, we discuss how to integrate conventional feature-selection methods as objectives. Finally, we evaluate alternative feature selection with 30 classification datasets. We observe that alternative feature sets may indeed have high prediction quality, and we analyze several factors influencing this outcome.
    摘要 <>通过选择特征来获得小、可解释、具有高准确性预测模型的做法非常受欢迎。传统的特征选择方法通常只能提供一个特征集,这可能无法满足一些场景中的需求。例如,用户可能会有兴趣找到不同的特征集,这些特征集可以提供不同的数据解释,同时具有相似的预测质量。在这篇文章中,我们将介绍代替特征选择,并将其формализова为优化问题。具体来说,我们将通过约束定义代替,并让用户控制代替的数量和差异。接下来,我们分析了这个优化问题的复杂性,并证明其NP困难。此外,我们还讨论了如何 интеGRATE传统的特征选择方法作为目标。最后,我们对30种分类 datasets进行了评估,并发现代替特征集可以具有高预测质量。我们还分析了一些影响这种结果的因素。

Transferability of Convolutional Neural Networks in Stationary Learning Tasks

  • paper_url: http://arxiv.org/abs/2307.11588
  • repo_url: https://github.com/damowerko/mtt
  • paper_authors: Damian Owerko, Charilaos I. Kanatsoulis, Jennifer Bondarchuk, Donald J. Bucci Jr, Alejandro Ribeiro
  • for: This paper is written for those interested in efficient training of convolutional neural networks (CNNs) for large-scale spatial problems.
  • methods: The paper investigates the properties of CNNs for tasks where the underlying signals are stationary, and proposes a novel framework for efficient training of CNNs on small windows of such signals.
  • results: The paper shows that the proposed framework achieves nearly optimal performance on much larger windows without retraining, and demonstrates this through theoretical analysis and experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. The results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten.
    Abstract Recent advances in hardware and big data acquisition have accelerated the development of deep learning techniques. For an extended period of time, increasing the model complexity has led to performance improvements for various tasks. However, this trend is becoming unsustainable and there is a need for alternative, computationally lighter methods. In this paper, we introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. To accomplish this we investigate the properties of CNNs for tasks where the underlying signals are stationary. We show that a CNN trained on small windows of such signals achieves a nearly performance on much larger windows without retraining. This claim is supported by our theoretical analysis, which provides a bound on the performance degradation. Additionally, we conduct thorough experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. Our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten. Thus, CNN architectures provide solutions to these problems at previously computationally intractable scales.
    摘要 Translated into Simplified Chinese:latest advances in hardware and big data acquisition have accelerated the development of deep learning techniques. for a long time, increasing the model complexity has led to performance improvements for various tasks. however, this trend is becoming unsustainable, and there is a need for alternative, computationally lighter methods. in this paper, we introduce a novel framework for efficient training of convolutional neural networks (CNNs) for large-scale spatial problems. to accomplish this, we investigate the properties of CNNs for tasks where the underlying signals are stationary. we show that a CNN trained on small windows of such signals achieves nearly the same performance on much larger windows without retraining. this claim is supported by our theoretical analysis, which provides a bound on the performance degradation. additionally, we conduct thorough experimental analysis on two tasks: multi-target tracking and mobile infrastructure on demand. our results show that the CNN is able to tackle problems with many hundreds of agents after being trained with fewer than ten. thus, CNN architectures provide solutions to these problems at previously computationally intractable scales.

A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

  • paper_url: http://arxiv.org/abs/2307.11584
  • repo_url: https://github.com/iclr2023achangeofheart/meld-modality-conversion
  • paper_authors: Zeinab Sadat Taghavi, Ali Satvaty, Hossein Sameti
  • for: 提高语音情感识别(SER)的性能,特别是在MELD数据集上。
  • methods: 使用自动语音识别(ASR)系统,然后使用文本分类器进行模式转换。在这个过程中,我们首先使用ASR系统将语音转换为文本,然后使用文本分类器进行模式转换。
  • results: 我们的方法在MELD数据集上实现了substantial的result,而Modality-Conversion++方法在语音基于的approaches中实现了最高的SERWF1分数。这些结果表明,模式转换可以帮助提高tasks的性能,特别是在不同的modalities上进行的任务。
    Abstract Speech Emotion Recognition (SER) is a challenging task. In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. We assess our approach through two experiments: first, a method named Modality-Conversion that employs automatic speech recognition (ASR) systems, followed by a text classifier; second, we assume perfect ASR output and investigate the impact of modality conversion on SER, this method is called Modality-Conversion++. Our findings indicate that the first method yields substantial results, while the second method outperforms state-of-the-art (SOTA) speech-based approaches in terms of SER weighted-F1 (WF1) score on the MELD dataset. This research highlights the potential of modality conversion for tasks that can be conducted in alternative modalities.
    摘要 《语音情感识别(SER)是一项复杂的任务。在这篇论文中,我们介绍了一种模态转换概念,用于提高MELD数据集上的情感识别性能。我们通过两个实验进行评估:第一个方法是使用自动语音识别(ASR)系统,然后使用文本分类器;第二个方法是假设ASR输出是完美的,并调查模态转换对SER的影响。我们的发现表明第一个方法具有显著的效果,而第二个方法在MELD数据集上的SERWeighted-F1(WF1)分数超过了现有的speech-based方法。这项研究强调了模态转换在可以进行多种模态任务的场景中的潜力。》

Design Space Exploration on Efficient and Accurate Human Pose Estimation from Sparse IMU-Sensing

  • paper_url: http://arxiv.org/abs/2308.02397
  • repo_url: https://github.com/itiv-kit/dse-sparse-imu
  • paper_authors: Iris Fürst-Walter, Antonio Nappi, Tanja Harbaum, Jürgen Becker
  • for: 本研究旨在为体育、重abilitation或工作安全领域的人体动态评估(HPE)提供准确的感知,而不需要泄露敏感个人数据。因此,本地处理是必要的,而且由于限制的能量预算,IMU(惯性测量单元)成为了更好的选择。
  • methods: 本研究使用了 simulative Design Space Exploration(DSE)来研究变量IMU传感器的数量和位置的影响。首先,我们生成了基于公共可用的人体模型数据集的IMU数据,并使用深度学习模型进行训练。此外,我们提出了一个结合约束的精度-资源交互度量来评估感知器的配置。
  • results: 我们的研究表明,对于一个与精度和资源平衡的系统,可以通过选择4个感知器的优质配置,来提高精度32.7%,同时减少硬件尝试量 by two个感知器。我们的工作可以用于设计关注隐私和资源管理的健康应用程序。
    Abstract Human Pose Estimation (HPE) to assess human motion in sports, rehabilitation or work safety requires accurate sensing without compromising the sensitive underlying personal data. Therefore, local processing is necessary and the limited energy budget in such systems can be addressed by Inertial Measurement Units (IMU) instead of common camera sensing. The central trade-off between accuracy and efficient use of hardware resources is rarely discussed in research. We address this trade-off by a simulative Design Space Exploration (DSE) of a varying quantity and positioning of IMU-sensors. First, we generate IMU-data from a publicly available body model dataset for different sensor configurations and train a deep learning model with this data. Additionally, we propose a combined metric to assess the accuracy-resource trade-off. We used the DSE as a tool to evaluate sensor configurations and identify beneficial ones for a specific use case. Exemplary, for a system with equal importance of accuracy and resources, we identify an optimal sensor configuration of 4 sensors with a mesh error of 6.03 cm, increasing the accuracy by 32.7% and reducing the hardware effort by two sensors compared to state of the art. Our work can be used to design health applications with well-suited sensor positioning and attention to data privacy and resource-awareness.
    摘要 人体姿势估算(HPE)在运动、重abilitation或工作安全中需要准确感知而不妨碍敏感个人数据的保护。因此,本地处理是必要的,而常见的相机感知可能会占用过多的硬件资源。中心的准确性和硬件资源的有效使用之间的贸易是在研究中 rarely 被讨论。我们通过 simulative 的 Design Space Exploration(DSE)来解决这个贸易,并通过不同的 IMU 传感器的数量和位置来调整 IMU 数据。首先,我们使用公共可用的人体模型集来生成 IMU 数据,并使用这些数据来训练深度学习模型。此外,我们提出了一个合并的度量来评估准确性和资源之间的贸易。通过 DSE 作为工具,我们可以评估不同的传感器配置,并为特定应用场景选择有利的传感器配置。例如,对于需要同时保证准确性和资源的系统,我们identified 最佳的传感器配置为4个传感器,具有6.03 cm的网格误差,提高准确性32.7%,并在相对于状态艺术的系统中减少硬件努力两个传感器。我们的工作可以用于设计健康应用程序,并且注重数据隐私和硬件资源的注意。

FMT: Removing Backdoor Feature Maps via Feature Map Testing in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.11565
  • repo_url: https://github.com/ase2023paper/fmt
  • paper_authors: Dong Huang, Qingwen Bu, Yahao Qing, Yichao Fu, Heming Cui
  • for: 本研究旨在提出一种新的防御策略,以防止深度神经网络(DNN)模型受到攻击。
  • methods: 本研究使用了特征图测试(Feature Map Testing,FMT)方法,不同于现有的防御策略,FMT尝试探测DNN模型中的后门特征图,然后将其消除,并使用安全的subset of training data进行微调。
  • results: 对比现有防御策略,FMT可以更好地降低攻击成功率(ASR),并保持模型性能高。此外,FMT的Robust Accuracy(RA)比 conventinal defense methods高,表明其在mitigating the effects of backdoor attacks时可以更好地保持模型性能。
    Abstract Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attack, which is achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender can not successfully reproduce the trigger. Consequently, the DNN model will not be repaired since the trigger is not effectively removed. In this work, we propose Feature Map Testing~(FMT). Different from existing defense strategies, which focus on reproducing backdoor triggers, FMT tries to detect the backdoor feature maps, which are trained to extract backdoor information from the inputs. After detecting these backdoor feature maps, FMT will erase them and then fine-tune the model with a secure subset of training data. Our experiments demonstrate that, compared to existing defense strategies, FMT can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers. Second, unlike conventional defense methods that tend to exhibit low Robust Accuracy (i.e., the model's accuracy on the poisoned data), FMT achieves higher RA, indicating its superiority in maintaining model performance while mitigating the effects of backdoor attacks~(e.g., FMT obtains 87.40\% RA in CIFAR10). Third, compared to existing feature map pruning techniques, FMT can cover more backdoor feature maps~(e.g., FMT removes 83.33\% of backdoor feature maps from the model in the CIFAR10 \& BadNet scenario).
    摘要 深度神经网络在许多关键应用中广泛使用,如自动驾驶和医疗诊断。然而,它们的安全性受到后门攻击的威胁,后门攻击通过添加人工 patrerns到特定的训练数据来实现。现有的防御策略主要集中在使用反工程来复制攻击者生成的后门触发器,并在使用ground-truth标签进行修复模型。然而,如果攻击者生成的后门触发器复杂且隐藏的话,DEFENDER无法成功复制它。因此,模型不会被修复,因为触发器不会有效地除除。在这种情况下,我们提出了特征图测试~(FMT)。与现有的防御策略不同,FMT尝试探测backdoor特征图,它们是用于从输入中提取backdoor信息的。一旦探测到这些backdoor特征图,FMT会将它们清除,然后使用安全的训练数据进行微调。我们的实验表明,相比于现有的防御策略,FMT可以更有效地减少攻击成功率,即使攻击者生成的触发器非常复杂和隐藏。二、与传统防御方法不同,FMT可以保持模型性能的高度,而不是产生低Robust Accuracy(i.e.,模型在恶意数据上的准确率)。三、与现有的特征图剔除技术相比,FMT可以覆盖更多的backdoor特征图,例如在CIFAR10 & BadNet场景中,FMT可以从模型中删除83.33%的backdoor特征图。

A multi-modal representation of El Niño Southern Oscillation Diversity

  • paper_url: http://arxiv.org/abs/2307.11552
  • repo_url: https://github.com/jakob-schloer/latentgmm
  • paper_authors: Jakob Schlör, Felix Strnad, Antonietta Capotondi, Bedartha Goswami
  • for: 本研究旨在描述和分类东太平洋附近海面温度异常(ENSO)的多样性。
  • methods: 研究人员使用低维表示法和不精确 clustering 方法来描述和分类ENSO事件。
  • results: 研究人员发现,ENSO事件并不是二元的,而是有五种类型:极端 El Ni~no、EP El Ni~no、CP La Ni~na、EP La Ni~na 和 Extreme La Ni~na。此外,研究人员还发现,这些不同类型的 ENSO 事件在不同的时间和强度方面具有显著的差异。
    Abstract The El Ni\~no-Southern Oscillation (ENSO) is characterized by alternating periods of warm (El Ni\~no) and cold (La Ni\~na) sea surface temperature anomalies (SSTA) in the equatorial Pacific. Although El Ni\~no and La Ni\~na are well-defined climate patterns, no two events are alike. To date, ENSO diversity has been described primarily in terms of the longitudinal location of peak SSTA, used to define a bimodal classification of events in Eastern Pacific (EP) and Central Pacific (CP) types. Here, we use low-dimensional representations of Pacific SSTAs to argue that binary categorical memberships are unsuitable to describe ENSO events. Using fuzzy unsupervised clustering, we recover the four known ENSO categories, along with a fifth category: an Extreme El Ni\~no. We show that Extreme El Ni\~nos differ both in their intensity and temporal evolution from canonical EP El Ni\~nos. We also find that CP La Ni\~nas, EP El Ni\~nos, and Extreme El Ni\~nos contribute the most to interdecadal ENSO variability.
    摘要 “恶elf Niño-南方振荡(ENSO)是指太平洋赤道区的海水温度异常(SSTA)在 alternate periods of warm(El Niño)和 cold(La Niña)。虽然El Niño和La Niña是明确定义的气候模式,但每个事件都不一样。到目前为止,ENSO 多样性仅被描述为东太平洋(EP)和中太平洋(CP)类型的 longitudinal 位置峰值SSTA的 биModal 分类。本文使用低维表示太平洋SSTAs, argue binary categorical memberships 不适用于描述ENSO事件。使用不supervised clustering,我们回收了四种已知ENSO类别,以及一个第五个类别:极端 El Niño。我们显示,极端 El Niño 与 canonical EP El Niño 不同,具有不同的强度和时间演化。我们还发现,CP La Niña、EP El Niño 和极端 El Niño 在 Interdecadal ENSO 变化中发挥了关键作用。”Note: "恶elf" is a typo, the correct word is "El Niño".

Towards practical reinforcement learning for tokamak magnetic control

  • paper_url: http://arxiv.org/abs/2307.11546
  • repo_url: None
  • paper_authors: Brendan D. Tracey, Andrea Michi, Yuri Chervonyi, Ian Davies, Cosmin Paduraru, Nevena Lazic, Federico Felici, Timo Ewalds, Craig Donner, Cristian Galperti, Jonas Buchli, Michael Neunert, Andrea Huber, Jonathan Evens, Paula Kurylowicz, Daniel J. Mankowitz, Martin Riedmiller, The TCV Team
  • for: 这个研究旨在改善基于增强反馈学习(RL)的真实时间控制系统,特别是核聚体控制领域。
  • methods: 本研究使用RL方法,并提出了几个算法改进,包括代理架构和训练程序。
  • results: simulation results show that the proposed RL-based controller can achieve up to 65% improvement in shape accuracy, reduce the long-term bias of the plasma current, and reduce the training time required to learn new tasks by a factor of 3 or more. 新的实验结果显示,RL方法可以在TCV Tokamak上实现精准的燃烧。
    Abstract Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.
    摘要 现代回归学习(RL)已经在实时控制系统中展现出了扎实的成果,包括磁泵控制领域。然而,RL方法还存在许多缺点,比如传统反馈控制方法更高的控制精度和稳定性。在这项工作中,我们解决了RL方法的关键缺点,包括提高愿望束质量的控制精度、减少稳态误差和减少学习新任务所需的训练时间。我们基于\cite{degrave2022magnetic}的研究,提出了算法改进和训练过程优化。我们在实验中获得了65%的形状精度提高、长期偏差的磁泵电流减少和学习新任务所需的训练时间减少了3倍以上。此外,我们在TCVtokamak上使用了更新的RL控制器,实际实现了在实验中获得的结果,并预示了在RL方法中实现精度的常见实践。

Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning

  • paper_url: http://arxiv.org/abs/2307.11532
  • repo_url: None
  • paper_authors: Yao Wen, Guopeng Zhang, Kezhi Wang, Kun Yang
  • for: 提高 federated learning 中的计算能力,以便在训练深度神经网络时遇到的计算能力短缺问题。
  • methods: 利用边缘计算和分解学习,提出了模型分解允许 federated learning 框架(SFL),以减少训练延迟而无损测试精度。
  • results: 通过对 EfficientNetV2 模型在 MNIST 数据集上进行广泛的实验,证明了提案的 SFL 框架的有效性和提高性。
    Abstract To alleviate the shortage of computing power faced by clients in training deep neural networks (DNNs) using federated learning (FL), we leverage the edge computing and split learning to propose a model-splitting allowed FL (SFL) framework, with the aim to minimize the training latency without loss of test accuracy. Under the synchronized global update setting, the latency to complete a round of global training is determined by the maximum latency for the clients to complete a local training session. Therefore, the training latency minimization problem (TLMP) is modelled as a minimizing-maximum problem. To solve this mixed integer nonlinear programming problem, we first propose a regression method to fit the quantitative-relationship between the cut-layer and other parameters of an AI-model, and thus, transform the TLMP into a continuous problem. Considering that the two subproblems involved in the TLMP, namely, the cut-layer selection problem for the clients and the computing resource allocation problem for the parameter-server are relative independence, an alternate-optimization-based algorithm with polynomial time complexity is developed to obtain a high-quality solution to the TLMP. Extensive experiments are performed on a popular DNN-model EfficientNetV2 using dataset MNIST, and the results verify the validity and improved performance of the proposed SFL framework.
    摘要 To solve this mixed integer nonlinear programming problem, we first develop a regression method to establish a quantitative relationship between the cut-layer and other parameters of an AI model. This transformation allows us to convert the TLMP into a continuous problem. As the two subproblems involved in the TLMP, namely, the cut-layer selection problem for clients and the computing resource allocation problem for the parameter server, are relatively independent, we develop an alternate optimization-based algorithm with polynomial time complexity to obtain a high-quality solution to the TLMP.We conduct extensive experiments on the popular DNN model EfficientNetV2 using the MNIST dataset, and the results demonstrate the effectiveness and improved performance of the proposed SFL framework.

General regularization in covariate shift adaptation

  • paper_url: http://arxiv.org/abs/2307.11503
  • repo_url: None
  • paper_authors: Duc Hoan Nguyen, Sergei V. Pereverzyev, Werner Zellinger
  • for: corrected error of least squares learning algorithms in reproducing kernel Hilbert spaces (RKHS)
  • methods: sample weights determined by estimated Radon-Nikod'ym derivative of future data distribution w.r.t.~training data distribution
  • results: novel error bounds for reweighted kernel regression in RKHS, showing that fewer samples are needed for the same level of accuracy compared to state-of-the-art analyses under weak smoothness conditions.Here’s the Chinese text in the format you requested:
  • for: 修正最小二乘学习算法在嵌入kernel空间(RKHS)中的错误
  • methods: 使用估计的拉戈-尼科德偏微分来确定未来数据分布与训练数据分布之间的比较
  • results: novel的误差下界结果,表明在弱约束下,只需要 fewer samples 来达到相同级别的准确率,与现有分析相比。
    Abstract Sample reweighting is one of the most widely used methods for correcting the error of least squares learning algorithms in reproducing kernel Hilbert spaces (RKHS), that is caused by future data distributions that are different from the training data distribution. In practical situations, the sample weights are determined by values of the estimated Radon-Nikod\'ym derivative, of the future data distribution w.r.t.~the training data distribution. In this work, we review known error bounds for reweighted kernel regression in RKHS and obtain, by combination, novel results. We show under weak smoothness conditions, that the amount of samples, needed to achieve the same order of accuracy as in the standard supervised learning without differences in data distributions, is smaller than proven by state-of-the-art analyses.
    摘要 Sample 重点是一种广泛使用的方法,用于纠正最小二乘学习算法在 reproduce kernel Hilbert space(RKHS)中的错误,这种错误是由未来数据分布不同于训练数据分布所导致的。在实际应用中,样本权重通常是根据估计的Radon-Nikodym derivate,未来数据分布与训练数据分布之间的比例来确定。在这种工作中,我们回顾了已知的重量 kernel 回归错误约束,并通过组合,获得了新的结果。我们在弱纹理条件下表明,需要的样本数量,以达到标准无 diferenze 学习中的同等精度水平,比已知分析的结果少。

Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2307.11494
  • repo_url: None
  • paper_authors: Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, Yuyang Wang
  • for: 这个研究旨在探讨时间序列散射模型在不同领域的生成模型任务中的潜在性。
  • methods: 本研究提出了一个名为TSDiff的无条件散射模型,并实现了在推理过程中通过自我指引机制来将TSDiff训练为下游任务。
  • results: 研究结果显示TSDiff可以与多个任务特定的条件散射模型竞争,并且可以透过将TSDiff训练为下游任务来实现更好的预测性和生成性。
    Abstract Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (predict). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (refine). Notably, the generative performance of the model remains intact -- downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (synthesize).
    摘要 Translate the given text into Simplified Chinese.Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (预测). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (修正). Notably, the generative performance of the model remains intact -- downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (生成).

A New Deep State-Space Analysis Framework for Patient Latent State Estimation and Classification from EHR Time Series Data

  • paper_url: http://arxiv.org/abs/2307.11487
  • repo_url: None
  • paper_authors: Aya Nakamura, Ryosuke Kojima, Yuji Okamoto, Eiichiro Uchino, Yohei Mineharu, Yohei Harada, Mayumi Kamada, Manabu Muto, Motoko Yanagita, Yasushi Okuno
  • for: 这个研究旨在利用电子医疗记录(EHRs)和机器学习技术,提供更好的诊断和治疗策略。
  • methods: 这种深度状态空间分析框架使用时间序列无监督学习,通过深度状态空间模型来学习、可见化和分群病人的潜在状态变化。
  • results: 研究人员通过对12,695名癌病人的时间序列实验数据进行分析,成功地发现了与诊断相关的潜在状态。通过可见化和分群分析,研究人员还发现了不同抗癌药物的时间过程中病人状态的特征变化。这种框架比现有方法更能够捕捉可解释的潜在空间。
    Abstract Many diseases, including cancer and chronic conditions, require extended treatment periods and long-term strategies. Machine learning and AI research focusing on electronic health records (EHRs) have emerged to address this need. Effective treatment strategies involve more than capturing sequential changes in patient test values. It requires an explainable and clinically interpretable model by capturing the patient's internal state over time. In this study, we propose the "deep state-space analysis framework," using time-series unsupervised learning of EHRs with a deep state-space model. This framework enables learning, visualizing, and clustering of temporal changes in patient latent states related to disease progression. We evaluated our framework using time-series laboratory data from 12,695 cancer patients. By estimating latent states, we successfully discover latent states related to prognosis. By visualization and cluster analysis, the temporal transition of patient status and test items during state transitions characteristic of each anticancer drug were identified. Our framework surpasses existing methods in capturing interpretable latent space. It can be expected to enhance our comprehension of disease progression from EHRs, aiding treatment adjustments and prognostic determinations.
    摘要 许多疾病,包括癌症和chronic condition,需要长期的治疗期和长期策略。机器学习和人工智能研究集中在电子医疗记录(EHRs)上,以解决这一需求。有效的治疗策略不仅仅是记录患者的测试值序列变化,而是需要一个可解释和临床可解释的模型,可以捕捉患者的内部状态的变化。在这项研究中,我们提出了“深度状态空间分析框架”,使用时间序列无监督学习EHRs中的深度状态空间模型。这个框架允许我们学习、可见化和分组患者的时间变化状态。我们对12,695名癌症患者的时间序列实验室数据进行了评估。通过估计 latent states,我们成功地发现了与诊断相关的秘密状态。通过可视化和分组分析,我们可以了解患者的状态转移特征和测试项的时间变化特征。我们的框架比现有方法更能捕捉可解释的状态空间。它可以预期地提高我们从EHRs中了解疾病进程的认知,以便治疗调整和诊断决策。

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

  • paper_url: http://arxiv.org/abs/2307.11465
  • repo_url: None
  • paper_authors: Camillo Maria Caruso, Valerio Guarrasi, Sara Ramella, Paolo Soda
  • for: 本研究旨在应用人工智能技术于肺癌研究,特别是非小细胞肺癌(NSCLC),以提高患者状况评估和生存时间(OS)率。
  • methods: 本研究使用变换器架构,不需要任何填充策略,可以Effectively learning from both censored and uncensored patients and their available features,预测NSCLC患者的OS。
  • results: comparing with现有的方法,本研究 obtiented Ct-index of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
    Abstract One of the most challenging fields where Artificial Intelligence (AI) can be applied is lung cancer research, specifically non-small cell lung cancer (NSCLC). In particular, overall survival (OS), the time between diagnosis and death, is a vital indicator of patient status, enabling tailored treatment and improved OS rates. In this analysis, there are two challenges to take into account. First, few studies effectively exploit the information available from each patient, leveraging both uncensored (i.e., dead) and censored (i.e., survivors) patients, considering also the events' time. Second, the handling of incomplete data is a common issue in the medical field. This problem is typically tackled through the use of imputation methods. Our objective is to present an AI model able to overcome these limits, effectively learning from both censored and uncensored patients and their available features, for the prediction of OS for NSCLC patients. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. By making use of ad-hoc losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
    摘要 一个非常挑战性的领域是用人工智能(AI)应用于肺癌研究,特别是非小细胞肺癌(NSCLC)。在这种分析中,需要考虑两个挑战。首先,很少的研究有效利用每个病人的信息,同时利用bothuncensored(即死亡)和censored(即存活)病人的信息,同时考虑事件的时间。第二,医疗领域中的数据缺失是一个常见的问题。通常通过使用插入方法来解决这个问题。我们的目标是开发一个能够超越这些限制的AI模型,可以从bothuncensored和censored病人中学习可用的特征,并且可以预测NSCLC病人的生存时间。我们提出了一种新的缺失值处理方法,利用变换器架构来考虑only可用的特征,不需要任何插入策略。通过使用适应性损失函数来考虑bothuncensored和censored病人,以及时间变化的风险。我们与状态arta的模型进行比较,并使用不同的插入策略。我们在6年的评估期间使用不同的时间粒度obtained a Ct-index of 71.97, 77.58, and 80.72 for time units of 1 month, 1 year, and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.

Improve Long-term Memory Learning Through Rescaling the Error Temporally

  • paper_url: http://arxiv.org/abs/2307.11462
  • repo_url: None
  • paper_authors: Shida Wang, Zhanglu Yan
  • for: 这 paper 研究了长期记忆学习中的错误度量选择。我们发现通常使用的错误都带有短期记忆偏好,包括平均绝对/平方Error。我们的发现表明所有有正向时间权重的错误都带有短期记忆偏好,以学习线性函数。
  • methods: 为了减少这种偏好和提高长期记忆学习,我们提议使用时间折算错误。此外,这种方法还可以减轻消失梯度问题。
  • results: 我们在不同的长期任务和序列模型上进行了数值实验,并证实了我们的主张。数值结果表明,适当的时间折算错误对长期记忆学习是重要的。据我们所知,这是首次对序列模型中不同错误的短期记忆偏好进行量化分析。
    Abstract This paper studies the error metric selection for long-term memory learning in sequence modelling. We examine the bias towards short-term memory in commonly used errors, including mean absolute/squared error. Our findings show that all temporally positive-weighted errors are biased towards short-term memory in learning linear functionals. To reduce this bias and improve long-term memory learning, we propose the use of a temporally rescaled error. In addition to reducing the bias towards short-term memory, this approach can also alleviate the vanishing gradient issue. We conduct numerical experiments on different long-memory tasks and sequence models to validate our claims. Numerical results confirm the importance of appropriate temporally rescaled error for effective long-term memory learning. To the best of our knowledge, this is the first work that quantitatively analyzes different errors' memory bias towards short-term memory in sequence modelling.
    摘要

Neural Operators for Delay-Compensating Control of Hyperbolic PIDEs

  • paper_url: http://arxiv.org/abs/2307.11436
  • repo_url: https://github.com/jingzhang-jz/no_hyperbolic_delay
  • paper_authors: Jie Qi, Jing Zhang, Miroslav Krstic
  • for: 该 paper 探讨了将 DeepONet Operator-learning 框架应用于高级湍流 PDE 控制中的延迟问题。
  • methods: 该 paper 使用了 PDE 倒退设计生成积分函数,并使用 DeepONet 神经网络来近似这些积分函数。
  • results: 该 paper 证明了在反馈 controllers 下的稳定性,并且还提出了 DeepONet 近似的观测器和输出反馈法则,并证明了其稳定性。 numerics 表明,使用 DeepONet 可以大幅减少计算量,减少两个数量级。
    Abstract The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input. The PDE backstepping design produces gain functions that are outputs of a nonlinear operator, mapping functions on a spatial domain into functions on a spatial domain, and where this gain-generating operator's inputs are the PDE's coefficients. The operator is approximated with a DeepONet neural network to a degree of accuracy that is provably arbitrarily tight. Once we produce this approximation-theoretic result in infinite dimension, with it we establish stability in closed loop under feedback that employs approximate gains. In addition to supplying such results under full-state feedback, we also develop DeepONet-approximated observers and output-feedback laws and prove their own stabilizing properties under neural operator approximations. With numerical simulations we illustrate the theoretical results and quantify the numerical effort savings, which are of two orders of magnitude, thanks to replacing the numerical PDE solving with the DeepONet.
    摘要 With this approximation-theoretic result in infinite dimension, we establish stability in closed loop under feedback that employs approximate gains. In addition to supplying results under full-state feedback, we also develop DeepONet-approximated observers and output-feedback laws and prove their own stabilizing properties under neural operator approximations.Numerical simulations illustrate the theoretical results and quantify the numerical effort savings, which are of two orders of magnitude, thanks to replacing the numerical PDE solving with the DeepONet.

Batching for Green AI – An Exploratory Study on Inference

  • paper_url: http://arxiv.org/abs/2307.11434
  • repo_url: None
  • paper_authors: Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, Arie van Deursen
  • for: 本研究旨在探讨输入批处理对深度学习模型的能耗和响应时间的影响。
  • methods: 研究使用了五种已经在出版时被视为state-of-the-art的计算机视觉神经网络,并对这些神经网络的输入批处理产生了影响。
  • results: 研究发现,批处理有显著影响于能耗和响应时间。此外,研究还提供了过去十年内神经网络精度和能耗的时间轴图表,发现能耗在精度提升的同时增长得远远 быстре,并提出了一些问题。同时,研究发现ShuffleNetV2(2018)模型在其时代实现了竞争性性能,但能耗较低。但是,研究表明结果受模型影响。
    Abstract The batch size is an essential parameter to tune during the development of new neural networks. Amongst other quality indicators, it has a large degree of influence on the model's accuracy, generalisability, training times and parallelisability. This fact is generally known and commonly studied. However, during the application phase of a deep learning model, when the model is utilised by an end-user for inference, we find that there is a disregard for the potential benefits of introducing a batch size. In this study, we examine the effect of input batching on the energy consumption and response times of five fully-trained neural networks for computer vision that were considered state-of-the-art at the time of their publication. The results suggest that batching has a significant effect on both of these metrics. Furthermore, we present a timeline of the energy efficiency and accuracy of neural networks over the past decade. We find that in general, energy consumption rises at a much steeper pace than accuracy and question the necessity of this evolution. Additionally, we highlight one particular network, ShuffleNetV2(2018), that achieved a competitive performance for its time while maintaining a much lower energy consumption. Nevertheless, we highlight that the results are model dependent.
    摘要 批处大小是深度学习模型开发中非常重要的参数,它对模型的准确率、通用性、训练时间和并行性具有很大的影响。这个事实已经广泛被研究和了解。但在深度学习模型在实际应用阶段使用时,批处大小却往往被忽视。在这个研究中,我们研究了五种在发表时被视为顶尖的计算机视觉神经网络的输入批处效果。结果表明,批处有着显著的影响于能源消耗和响应时间。此外,我们还提供了过去十年内神经网络能效和准确率的时间轴,发现在总体来说,能源消耗在准确率增长的速度上升得 Much steeper than the latter,并质疑这种演化的必要性。此外,我们还提到了一个特定的网络,ShuffleNetV2(2018),它在其时间内实现了竞争性的性能,同时保持了远低的能源消耗。但我们要注意,结果是模型具体的。

Unsupervised Embedding Learning for Human Activity Recognition Using Wearable Sensor Data

  • paper_url: http://arxiv.org/abs/2307.11796
  • repo_url: None
  • paper_authors: Taoran Sheng, Manfred Huber
  • for: Recognizing different human activities from wearable sensor data in ubiquitous computing.
  • methods: Unsupervised approach based on the nature of human activity to project data into an embedding space, followed by clustering algorithms to form behavior clusters.
  • results: Experimental results on three labeled benchmark datasets show the effectiveness of the framework and improved performance in identifying and categorizing human activities compared to unsupervised techniques applied directly to the original data set.Here’s the full Chinese text:
  • for: 本研究旨在recognize wearable sensor数据中的人类活动,尤其是在ubiquitous computing中。
  • methods: 我们提出了一种无监督方法,基于人类活动的本质,将数据 проек到嵌入空间中,使得相似的活动在空间中呈现相似的坐标。然后,使用 clustering 算法对行为集进行分类。
  • results: 对三个标注数据集进行实验,结果表明我们的框架具有效果,可以帮助 clustering 算法更好地认定和分类人类活动,相比直接应用于原始数据集的无监督技术。
    Abstract The embedded sensors in widely used smartphones and other wearable devices make the data of human activities more accessible. However, recognizing different human activities from the wearable sensor data remains a challenging research problem in ubiquitous computing. One of the reasons is that the majority of the acquired data has no labels. In this paper, we present an unsupervised approach, which is based on the nature of human activity, to project the human activities into an embedding space in which similar activities will be located closely together. Using this, subsequent clustering algorithms can benefit from the embeddings, forming behavior clusters that represent the distinct activities performed by a person. Results of experiments on three labeled benchmark datasets demonstrate the effectiveness of the framework and show that our approach can help the clustering algorithm achieve improved performance in identifying and categorizing the underlying human activities compared to unsupervised techniques applied directly to the original data set.
    摘要 随处通用的智能手机和其他搭载设备中嵌入的传感器使人类活动数据更加可 accessible。然而,从智能手机传感器数据中识别不同的人类活动仍然是智能 computing中一个挑战。其中一个原因是大多数获取到的数据没有标签。在这篇论文中,我们提出了一种不supervised的方法,基于人类活动的自然特征,将人类活动 проек到一个嵌入空间中,在该空间中相似的活动会受到相似的嵌入。使用这些嵌入,后续的聚类算法可以从这些嵌入中受益,形成人类活动的行为归一化。实验结果表明,我们的方法可以帮助聚类算法更好地识别和分类人类活动,相比直接应用于原始数据集的无监督技术。

An Analysis of Multi-Agent Reinforcement Learning for Decentralized Inventory Control Systems

  • paper_url: http://arxiv.org/abs/2307.11432
  • repo_url: None
  • paper_authors: Marwan Mousa, Damien van de Berg, Niki Kotecha, Ehecatl Antonio del Rio-Chanona, Max Mowbray
  • for: 解决供应链中独立实体之间的信息封锁问题,提出了一种基于多代理人学习的分布式存储管理方法。
  • methods: 研究了三种多代理人变体的距离优化算法,包括中央训练分布式执行、分布式训练中央执行和分布式训练分布式执行。
  • results: 通过对不同供应链网络和不确定性水平的 simulate 结果显示,使用多代理人距离优化算法与中央数据驱动方法相比,性能几乎相同,而且在大多数情况下超过分布式模型基于方法。
    Abstract Most solutions to the inventory management problem assume a centralization of information that is incompatible with organisational constraints in real supply chain networks. The inventory management problem is a well-known planning problem in operations research, concerned with finding the optimal re-order policy for nodes in a supply chain. While many centralized solutions to the problem exist, they are not applicable to real-world supply chains made up of independent entities. The problem can however be naturally decomposed into sub-problems, each associated with an independent entity, turning it into a multi-agent system. Therefore, a decentralized data-driven solution to inventory management problems using multi-agent reinforcement learning is proposed where each entity is controlled by an agent. Three multi-agent variations of the proximal policy optimization algorithm are investigated through simulations of different supply chain networks and levels of uncertainty. The centralized training decentralized execution framework is deployed, which relies on offline centralization during simulation-based policy identification, but enables decentralization when the policies are deployed online to the real system. Results show that using multi-agent proximal policy optimization with a centralized critic leads to performance very close to that of a centralized data-driven solution and outperforms a distributed model-based solution in most cases while respecting the information constraints of the system.
    摘要 大多数解决存储管理问题的解决方案假设了中央化信息,这与实际供应链网络中的组织结构不兼容。存储管理问题是操作研究中的一个常见规划问题,旨在找到优化再配货策略的节点在供应链中。虽然许多中央化解决方案存在,但它们不适用于实际的独立实体组成的供应链。问题可以自然划分为子问题,每个问题与一个独立实体相关。因此,一种基于多代理人学习的数据驱动的分布式存储管理解决方案被提议,其中每个实体由一个代理人控制。通过在不同供应链网络和不确定程度下进行多个多代理人变种的迪斯特普罗ximal Policy优化算法的实践,研究了这种解决方案的性能。中央训练、分布行动的框架被采用,其在在线部署策略时仅仅是在模拟期间进行中央化训练,但在实际系统中允许分布式执行。结果表明,使用多代理人 proximal Policy优化算法和中央评估器可以与中央数据驱动解决方案的性能几乎相同,而且在大多数情况下超越分布式模型基于解决方案,同时尊重系统信息约束。

Prompting Large Language Models with Speech Recognition Abilities

  • paper_url: http://arxiv.org/abs/2307.11795
  • repo_url: None
  • paper_authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
  • for: extensions the capabilities of large language models (LLMs) to perform speech recognition
  • methods: directly attaching a small audio encoder to the LLM, prepending a sequence of audial embeddings to the text token embeddings
  • results: outperformed monolingual baselines by 18% and performed multilingual speech recognition despite LLaMA being trained overwhelmingly on English text, and the LLM can be frozen or with fewer embeddings
    Abstract Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings, the LLM can be converted to an automatic speech recognition (ASR) system, and be used in the exact same manner as its textual counterpart. Experiments on Multilingual LibriSpeech (MLS) show that incorporating a conformer encoder into the open sourced LLaMA-7B allows it to outperform monolingual baselines by 18% and perform multilingual speech recognition despite LLaMA being trained overwhelmingly on English text. Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings. The results from these studies show that multilingual ASR is possible even when the LLM is frozen or when strides of almost 1 second are used in the audio encoder opening up the possibility for LLMs to operate on long-form audio.
    摘要

Attention to Entropic Communication

  • paper_url: http://arxiv.org/abs/2307.11423
  • repo_url: None
  • paper_authors: Torsten Enßlin, Carolin Weidinger, Philipp Frank
  • for: 这种研究旨在探讨一种新的通信协议,即基于注意力的 entropy 通信,以便在技术应用中设计优化的通信协议,以及更好地理解人类communication。
  • methods: 这种研究使用了 relative entropy(RE)和 maximum entropy principle(MEP)来分析和设计通信协议。具体来说,研究使用了 RE 来导引合理编码和解码消息,并使用了 weighted RE 来引导注意力协调。
  • results: 研究发现,使用 entropic attention communication 可以实现合理的注意力协调,并且可以帮助解决人类communication 中的各种问题,例如不同利益的合作。此外,研究还发现,在某些情况下,weighted RE 不是一个正确的方法,需要使用 proper attention communication。
    Abstract The concept of attention, numerical weights that emphasize the importance of particular data, has proven to be very relevant in artificial intelligence. Relative entropy (RE, aka Kullback-Leibler divergence) plays a central role in communication theory. Here we combine these concepts, attention and RE. RE guides optimal encoding of messages in bandwidth-limited communication as well as optimal message decoding via the maximum entropy principle (MEP). In the coding scenario, RE can be derived from four requirements, namely being analytical, local, proper, and calibrated. Weighted RE, used for attention steering in communications, turns out to be improper. To see how proper attention communication can emerge, we analyze a scenario of a message sender who wants to ensure that the receiver of the message can perform well-informed actions. If the receiver decodes the message using the MEP, the sender only needs to know the receiver's utility function to inform optimally, but not the receiver's initial knowledge state. In case only the curvature of the utility function maxima are known, it becomes desirable to accurately communicate an attention function, in this case a by this curvature weighted and re-normalized probability function. Entropic attention communication is here proposed as the desired generalization of entropic communication that permits weighting while being proper, thereby aiding the design of optimal communication protocols in technical applications and helping to understand human communication. For example, our analysis shows how to derive the level of cooperation expected under misaligned interests of otherwise honest communication partners.
    摘要 人工智能中的注意力概念(numerical weights that emphasize the importance of particular data)已经证明非常有用。相对 entropy(RE,又称库拉克-莱布尔差分)在通信理论中扮演着中心角色。在这里,我们将这些概念结合起来。RE guideline优化的信息编码和解码,以及最大Entropy原则(MEP)。在编码场景下,RE可以从四个需求 derivation,namely analytical, local, proper, and calibrated。在通信中使用Weighted RE进行注意力引导,实际上是不正确的。要实现正确的注意力通信,我们分析了一个消息发送者想要确保接收者可以做出有知识的行动的场景。如果接收者使用MEP解码消息, THEN sender只需要知道接收者的用途函数,以便具有最佳知识,而不需要知道接收者的初始知识状态。如果只知道用途函数的拐点的 curvature,则可以准确地传输注意力函数,即通过拐点的权重和normalized probability function。我们提出了Entropic attention communication作为可以正确地权重的通信 generalized,从而帮助设计优化的通信协议和理解人类通信。例如,我们的分析表明,在不同的利益冲突下,可以预期的合作水平。

Direct and inverse modeling of soft robots by learning a condensed FEM model

  • paper_url: http://arxiv.org/abs/2307.11408
  • repo_url: None
  • paper_authors: Etienne Ménager, Tanguy Navez, Olivier Goury, Christian Duriez
  • for: 该文章是为了提出一种学习基于方法来获得一个具有充分精炼的机械表示的软体机器人控制方法。
  • methods: 该文章使用了非线性弹性数据在活动器/效应器空间,从 Condensation of Finite Element Method 模型中提取出了一个压缩后的机械模型。
  • results: 该文章表明了这个压缩后的机械模型可以通过一个合理的数据量来学习,同时也非常高效地模拟软体机器人的直接和反直接姿势。 authors also show了一个由 FEM 模型和学习后的压缩模型组成的融合模型,并对其进行了比较。
    Abstract The Finite Element Method (FEM) is a powerful modeling tool for predicting the behavior of soft robots. However, its use for control can be difficult for non-specialists of numerical computation: it requires an optimization of the computation to make it real-time. In this paper, we propose a learning-based approach to obtain a compact but sufficiently rich mechanical representation. Our choice is based on nonlinear compliance data in the actuator/effector space provided by a condensation of the FEM model. We demonstrate that this compact model can be learned with a reasonable amount of data and, at the same time, be very efficient in terms of modeling, since we can deduce the direct and inverse kinematics of the robot. We also show how to couple some models learned individually in particular on an example of a gripper composed of two soft fingers. Other results are shown by comparing the inverse model derived from the full FEM model and the one from the compact learned version. This work opens new perspectives, namely for the embedded control of soft robots, but also for their design. These perspectives are also discussed in the paper.
    摘要 finite element method (FEM) 是软型机器人预测行为的强大工具。然而,它的控制使用可能会对非专家们数值计算不太容易。在这篇论文中,我们提出了一种基于学习的方法,以获得实时计算的减少。我们的选择基于软型机器人 actuator/effector 空间中的非线性弹性数据,即 FEM 模型的压缩。我们证明了这个简洁的模型可以通过一定数据量学习,同时具有高效的计算模型特性,因为我们可以直接从模型中推导软型机器人的直接和反直接骨骼动作。我们还将一些单独学习的模型,如一个由两个软指组成的抓取器,融合在一起。其他结果还是通过对全 ФЭМ 模型 derive 的反模型和学习后的模型进行比较。这项工作打开了新的视野,包括软型机器人 Embedded 控制和设计。这些视野也在论文中进行了讨论。

Probabilistic Modeling of Inter- and Intra-observer Variability in Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.11397
  • repo_url: None
  • paper_authors: Arne Schmidt, Pablo Morales-Álvarez, Rafael Molina
  • for: 这篇论文目的是提出一种新的医学图像分割模型,以减少医生之间和同一医生之间的变量,提高医学图像分割的准确性。
  • methods: 该模型called Pionono,使用多维度概率分布来捕捉每名评估者的标注行为,并将其与图像特征图层结合,生成概率性的分割预测。该模型可以通过变量推理优化,并可以进行端到端训练。
  • results: 对实际的肿瘤分割数据进行实验,显示Pionono模型在准确性和效率方面比前者模型(如STAPLE、概率U-Net等)高,同时还可以预测多个协调的分割图,提供诊断过程中的有价值信息。
    Abstract Medical image segmentation is a challenging task, particularly due to inter- and intra-observer variability, even between medical experts. In this paper, we propose a novel model, called Probabilistic Inter-Observer and iNtra-Observer variation NetwOrk (Pionono). It captures the labeling behavior of each rater with a multidimensional probability distribution and integrates this information with the feature maps of the image to produce probabilistic segmentation predictions. The model is optimized by variational inference and can be trained end-to-end. It outperforms state-of-the-art models such as STAPLE, Probabilistic U-Net, and models based on confusion matrices. Additionally, Pionono predicts multiple coherent segmentation maps that mimic the rater's expert opinion, which provides additional valuable information for the diagnostic process. Experiments on real-world cancer segmentation datasets demonstrate the high accuracy and efficiency of Pionono, making it a powerful tool for medical image analysis.
    摘要 医学图像分割是一项具有挑战性的任务,特别是因为 между观察者和内部观察者之间存在差异,甚至包括医学专家。在这篇论文中,我们提出了一种新的模型,即可信度描述网络(Pionono)。它捕捉每名评分员的标签行为,并将其映射到多维度的可信度分布中。然后,将这些信息与图像特征图拟合,生成可信度推测。该模型通过变分推断优化,可以在端到端方式进行训练。与现有的STAPLE、概率U-Net和基于冲突矩阵的模型相比,Pionono表现出较高的准确率和效率。此外,Pionono可以预测多个具有一致性的分割图,这些图像与评分员的专业意见相似,提供了更多有价值的信息 для诊断过程。在实际抑制癌症分割数据集上进行了实验,Pionono表现出了高度的准确性和效率,使其成为医学图像分析中的有力工具。

Towards Better Fairness-Utility Trade-off: A Comprehensive Measurement-Based Reinforcement Learning Framework

  • paper_url: http://arxiv.org/abs/2307.11379
  • repo_url: None
  • paper_authors: Simiao Zhang, Jitao Bai, Menghong Guan, Yihao Huang, Yueling Zhang, Jun Sun, Geguang Pu
  • for: This paper aims to ensure the fairness of machine learning classifiers while maintaining their utility.
  • methods: The proposed method, CFU (Comprehensive Fairness-Utility), is a reinforcement learning-based framework that considers multiple fairness metrics and utility simultaneously.
  • results: CFU outperforms all state-of-the-art techniques and improves the classifier on multiple fairness metrics without sacrificing its utility, with an average improvement of 37.5%.Here is the answer in Simplified Chinese text:
  • for: 这篇论文目标是确保机器学习分类器的公正性,同时保持其用用性。
  • methods: 提议的方法是 CFU(全面公正性-用用性)框架,它考虑了多种公正性指标以及用用性。
  • results: CFU比所有现有技术更高效,在多种公正性指标上提高分类器,无需牺牲其用用性,平均提高37.5%。
    Abstract Machine learning is widely used to make decisions with societal impact such as bank loan approving, criminal sentencing, and resume filtering. How to ensure its fairness while maintaining utility is a challenging but crucial issue. Fairness is a complex and context-dependent concept with over 70 different measurement metrics. Since existing regulations are often vague in terms of which metric to use and different organizations may prefer different fairness metrics, it is important to have means of improving fairness comprehensively. Existing mitigation techniques often target at one specific fairness metric and have limitations in improving multiple notions of fairness simultaneously. In this work, we propose CFU (Comprehensive Fairness-Utility), a reinforcement learning-based framework, to efficiently improve the fairness-utility trade-off in machine learning classifiers. A comprehensive measurement that can simultaneously consider multiple fairness notions as well as utility is established, and new metrics are proposed based on an in-depth analysis of the relationship between different fairness metrics. The reward function of CFU is constructed with comprehensive measurement and new metrics. We conduct extensive experiments to evaluate CFU on 6 tasks, 3 machine learning models, and 15 fairness-utility measurements. The results demonstrate that CFU can improve the classifier on multiple fairness metrics without sacrificing its utility. It outperforms all state-of-the-art techniques and has witnessed a 37.5% improvement on average.
    摘要 在这种情况下,我们提出了 CFU(全面公平性-实用性)框架,用于改进机器学习分类器的公平性-实用性贸易。我们提出了一种涵盖多个公平性指标以及实用性的衡量方法,并基于这些指标构建了奖励函数。我们在 6 个任务、3 种机器学习模型和 15 个公平性-实用性衡量中进行了广泛的实验。结果显示,CFU 可以同时改进多个公平性指标,不 sacrifi 其实用性。它超过了所有现有技术,并在平均上提高了 37.5%。

LatentAugment: Data Augmentation via Guided Manipulation of GAN’s Latent Space

  • paper_url: http://arxiv.org/abs/2307.11375
  • repo_url: https://github.com/ltronchin/latentaugment
  • paper_authors: Lorenzo Tronchin, Minh H. Vu, Paolo Soda, Tommy Löfstedt
  • for: 提高训练数据的量和多样性,并减少过拟合和提高泛化。
  • methods: 使用生成对抗网络(GANs)生成高质量样本,同时增加样本的多样性和模式覆盖率。
  • results: 在MRI-to-CT翻译任务中,使用LatentAugment DA策略可以提高模型的泛化能力,并在多样性和模式覆盖率方面超过标准DA和GAN-based sampling。
    Abstract Data Augmentation (DA) is a technique to increase the quantity and diversity of the training data, and by that alleviate overfitting and improve generalisation. However, standard DA produces synthetic data for augmentation with limited diversity. Generative Adversarial Networks (GANs) may unlock additional information in a dataset by generating synthetic samples having the appearance of real images. However, these models struggle to simultaneously address three key requirements: fidelity and high-quality samples; diversity and mode coverage; and fast sampling. Indeed, GANs generate high-quality samples rapidly, but have poor mode coverage, limiting their adoption in DA applications. We propose LatentAugment, a DA strategy that overcomes the low diversity of GANs, opening up for use in DA applications. Without external supervision, LatentAugment modifies latent vectors and moves them into latent space regions to maximise the synthetic images' diversity and fidelity. It is also agnostic to the dataset and the downstream task. A wide set of experiments shows that LatentAugment improves the generalisation of a deep model translating from MRI-to-CT beating both standard DA as well GAN-based sampling. Moreover, still in comparison with GAN-based sampling, LatentAugment synthetic samples show superior mode coverage and diversity. Code is available at: https://github.com/ltronchin/LatentAugment.
    摘要 数据扩展(DA)是一种技术来增加训练数据的量和多样性,从而避免过拟合和提高泛化。然而,标准的DA仅生成有限多样性的 sintetic 数据。生成对抗网络(GANs)可以从数据集中提取更多的信息,生成具有真实图像外观的 sintetic 样本。然而,这些模型很难同时满足三个关键要求:准确性和高质量样本; 多样性和模式覆盖率; 和快速采样。实际上,GANs 可以快速生成高质量样本,但它们的模式覆盖率很低,因此它们在 DA 应用中尚未得到广泛采用。我们提出了 LatentAugment,一种 DA 策略,可以在 GANs 中提高 sintetic 样本的多样性和准确性。无需外部监督,LatentAugment 会修改幽默向量并将其移动到幽默空间中,以 maximize sintetic 样本的多样性和准确性。此外,LatentAugment 是无关于数据集和下游任务的。一系列实验表明,LatentAugment 可以提高一个深度模型从 MRI-to-CT 的翻译效果,比标准 DA 和 GAN-based 采样更高。此外,与 GAN-based 采样相比,LatentAugment 的 sintetic 样本还显示出更高的模式覆盖率和多样性。代码可以在 GitHub 上找到:https://github.com/ltronchin/LatentAugment。

Diverse Offline Imitation via Fenchel Duality

  • paper_url: http://arxiv.org/abs/2307.11373
  • repo_url: None
  • paper_authors: Marin Vlastelica, Pavel Kolev, Jin Cheng, Georg Martius
  • for: 本研究旨在提出一种无监督技能发现算法,以便在无线上访问环境的前提下,通过自适应的方式学习多个独特的技能。
  • methods: 本研究使用了较新的积分信息对象函数作为内在动机,并通过对环境数据进行离线分析,实现了无监督技能发现。
  • results: 本研究的主要贡献是将 Fenchel 对偶、强化学习和无监督技能发现相连接起来,并提供了一种简单的离线算法,可以在不同的环境下学习多个与专家相似的独特技能。
    Abstract There has been significant recent progress in the area of unsupervised skill discovery, with various works proposing mutual information based objectives, as a source of intrinsic motivation. Prior works predominantly focused on designing algorithms that require online access to the environment. In contrast, we develop an \textit{offline} skill discovery algorithm. Our problem formulation considers the maximization of a mutual information objective constrained by a KL-divergence. More precisely, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning and unsupervised skill discovery, and to give a simple offline algorithm for learning diverse skills that are aligned with an expert.
    摘要 “Recently, there have been significant advancements in the field of unsupervised skill discovery, with various studies proposing mutual information-based objectives as a source of intrinsic motivation. Previous works primarily focused on designing algorithms that require online access to the environment. In contrast, we develop an \textbf{offline} skill discovery algorithm. Our problem formulation involves maximizing a mutual information objective while constraining the KL-divergence. Specifically, the constraints ensure that the state occupancy of each skill remains close to the state occupancy of an expert, within the support of an offline dataset with good state-action coverage. Our main contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery, and to provide a simple offline algorithm for learning diverse skills that are aligned with an expert.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Random Separating Hyperplane Theorem and Learning Polytopes

  • paper_url: http://arxiv.org/abs/2307.11371
  • repo_url: None
  • paper_authors: Chiranjib Bhattacharyya, Ravindran Kannan, Amit Kumar
  • for: 本 paper 的目的是提供一种快速和有效地学习多聚体的算法,特别是在高维空间中。
  • methods: 本 paper 使用 Random Separating Hyperplane Theorem (RSH) 来学习多聚体。RSH 是一种强化了 Separating Hyperplane theorem 的结论,可以用来分离多聚体中的点。
  • results: 本 paper 的结果是提供了一种可以快速和有效地学习多聚体的算法,并且可以 garantuee 学习的误差的上界是 O(δ)。此外,本 paper 还提供了一种可以在高维空间中学习多聚体的算法,并且可以 garantuee 学习的误差的上界是 O(δ)。
    Abstract The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. $\rsh$ asserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $\delta$, where $\delta$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $\Omega \left(\delta/\sqrt{d} \right)$. An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope $K$ within Hausdorff distance $\delta$, given an optimization oracle for $K$. Using RSH, we show that with polynomially many random queries to the optimization oracle, $K$ can be approximated within error $O(\delta)$. To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of $K$ are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance $O(\delta)$ of $K$, with the property that the list contains a point close to each vertex of $K$. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption.
    摘要 “ separating 凸函数定理”是凸 геометRY的基本结果,它在各种应用中具有重要意义。我们的第一个结果是随机分隔函数定理(RSH),它是凸函数定理的强化版本,适用于多面体。RSHasserts that if the distance between $a$ and a polytope $K$ with $k$ vertices and unit diameter in $\Re^d$ is at least $\delta$, where $\delta$ is a fixed constant in $(0,1)$, then a randomly chosen hyperplane separates $a$ and $K$ with probability at least $1/poly(k)$ and margin at least $\Omega \left(\delta/\sqrt{d} \right)$.这个结果的直接后果是关于凸函数定理中Error增加的首先近似Optimal bound。RSH有Algorithmic应用在学习多面体上。我们考虑了一个基本问题,称为“ Hausdorff 问题”,即在 Hausdorff 距离 $\delta$ 内, Learning一个 unit diameter 多面体 $K$,给定一个优化函数 oracle for $K$.使用 RSH,我们表明了可以使用 polynomially many random queries to the optimization oracle, Approximate $K$ within error $O(\delta)$。这是我们知道的第一个可证算法 для Hausdorff 问题。基于这个结果,我们还证明了如果多面体的顶点受到很好地分离,那么可以使用优化函数 oracle 生成一个包含多面体顶点的点列表,每个点在 Hausdorff 距离 $O(\delta)$ 内的 $K$ 的顶点。此外,我们还证明了如何将这个列表缩减,以生成一个(唯一)approximation to each vertex of the polytope。最后,我们证明了在多变量设置中,例如主题模型化、LDA、优化函数 oracle 存在,只要Project to a suitable SVD subspace。因此,我们的工作提供了第一个高效的算法来找到隐藏多面体的顶点approximation,假设顶点受到很好地分离。

Bridging the Reality Gap of Reinforcement Learning based Traffic Signal Control using Domain Randomization and Meta Learning

  • paper_url: http://arxiv.org/abs/2307.11357
  • repo_url: None
  • paper_authors: Arthur Müller, Matthia Sabatelli
  • for: 本研究旨在 Addressing the reality gap challenge in Reinforcement Learning (RL) based Traffic Signal Control (TSC) systems.
  • methods: 本研究使用了两种有前途的策略来减少实际与模拟之间的差距:Domain Randomization (DR) 和 Model-Agnostic Meta-Learning (MAML).
  • results: 实验结果表明,DR 和 MAML 两种策略都能够超过现有RL算法的性能,因此有望在RL基于TSC系统中减少实际与模拟之间的差距。
    Abstract Reinforcement Learning (RL) has been widely explored in Traffic Signal Control (TSC) applications, however, still no such system has been deployed in practice. A key barrier to progress in this area is the reality gap, the discrepancy that results from differences between simulation models and their real-world equivalents. In this paper, we address this challenge by first presenting a comprehensive analysis of potential simulation parameters that contribute to this reality gap. We then also examine two promising strategies that can bridge this gap: Domain Randomization (DR) and Model-Agnostic Meta-Learning (MAML). Both strategies were trained with a traffic simulation model of an intersection. In addition, the model was embedded in LemgoRL, a framework that integrates realistic, safety-critical requirements into the control system. Subsequently, we evaluated the performance of the two methods on a separate model of the same intersection that was developed with a different traffic simulator. In this way, we mimic the reality gap. Our experimental results show that both DR and MAML outperform a state-of-the-art RL algorithm, therefore highlighting their potential to mitigate the reality gap in RLbased TSC systems.
    摘要

What can a Single Attention Layer Learn? A Study Through the Random Features Lens

  • paper_url: http://arxiv.org/abs/2307.11353
  • repo_url: None
  • paper_authors: Hengyu Fu, Tianyu Guo, Yu Bai, Song Mei
  • for: 本文研究了一种单个多头注意层的学习和泛化,该层接受一个序列输入和多个键向量作为输入,并使用多头注意机制来映射输入到输出序列中。
  • methods: 本文使用了随机特征设置,其中注意层有大量的头,键和值矩阵是随机冻结的,而值矩阵是可训练的。作者们表明了这种随机特征注意层可以高效地学习一类具有排序不变性的目标函数。
  • results: 作者们提供了许多与注意结构相关的特点,如(1)与标准两层随机特征网络相比,随机特征注意层在样本复杂性方面具有优势;(2)随机特征注意层可以高效地学习一类自然的目标函数;以及(3)采样Query-key权重矩阵(Query和Key矩阵的乘积)的分布对学习某些自然目标函数的效果有所影响。实验结果与理论发现相一致,并证明了样本大小和目标函数的复杂度之间的交互关系。
    Abstract Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern artificial intelligence. This paper presents a rigorous theoretical study on the learning and generalization of a single multi-head attention layer, with a sequence of key vectors and a separate query vector as input. We consider the random feature setting where the attention layer has a large number of heads, with randomly sampled frozen query and key matrices, and trainable value matrices. We show that such a random-feature attention layer can express a broad class of target functions that are permutation invariant to the key vectors. We further provide quantitative excess risk bounds for learning these target functions from finite samples, using random feature attention with finitely many heads. Our results feature several implications unique to the attention structure compared with existing random features theory for neural networks, such as (1) Advantages in the sample complexity over standard two-layer random-feature networks; (2) Concrete and natural classes of functions that can be learned efficiently by a random-feature attention layer; and (3) The effect of the sampling distribution of the query-key weight matrix (the product of the query and key matrix), where Gaussian random weights with a non-zero mean result in better sample complexities over the zero-mean counterpart for learning certain natural target functions. Experiments on simulated data corroborate our theoretical findings and further illustrate the interplay between the sample size and the complexity of the target function.
    摘要 注意层 -- 将输入序列映射到输出序列 -- 是现代人工智能中的核心组件之一,它们已经实现了重要的突破。这篇论文对单个多头注意层的学习和泛化进行了严格的理论研究,输入包括一个序列的键 vector 和一个独立的查询 вектор。我们考虑了随机特征设置,其中注意层有大量的头,查询和键矩阵随机冻结,值矩阵可变。我们表明,这种随机特征注意层可以表示一类具有排序不变性的目标函数。我们进一步提供了来自 finite samples 的过度风险下界,用于学习这类目标函数。我们的结果具有以下特点:1. 相比标准的两层随机特征网络,随机特征注意层在样本复杂性方面具有优势。2. 随机特征注意层可以高效地学习一类自然的目标函数,这些目标函数具有排序不变性。3. 查询-键权重矩阵(查询和键矩阵的乘积)的采样分布对学习某些自然目标函数的性能产生了影响。在某些情况下,随机采样的查询-键权重矩阵的 Gaussian 分布可以导致更好的样本复杂性。实验结果证明了我们的理论结论,并进一步阐明了样本大小和目标函数复杂性之间的关系。

Model-based Offline Reinforcement Learning with Count-based Conservatism

  • paper_url: http://arxiv.org/abs/2307.11352
  • repo_url: https://github.com/oh-lab/count-morl
  • paper_authors: Byeongchan Kim, Min-hwan Oh
  • for: 这个论文提出了一种基于模型的离线强化学习方法,称为$\texttt{Count-MORL}$。这种方法利用状态动作对的计数估计来衡量模型估计误差,这是现有 literature 中第一个示示了计数基础保守性的算法。
  • methods: 我们的方法首先显示了状态动作对的计数估计与频率之间的相对关系。其次,我们证明了基于计数基础保守性的学习政策具有优化性 guarantees。
  • results: 通过广泛的数字实验,我们证明了 $\texttt{Count-MORL}$ 与哈希码实现在 D4RL 测试数据集上表现出色,与现有的离线RL算法相比显著超越。代码可以在 $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$ 上获取。
    Abstract In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.
    摘要 “在这篇论文中,我们提出了一种基于模型的离线束缚学习方法,称为$\texttt{Count-MORL}$。我们的方法利用状态动作对的计数估计来衡量模型估计误差,这是离线深度学习中COUNT-based保守性的首次应用,至于我们所知道的最佳办法。首先,我们证明了估计误差与状态动作对的频率成反比。其次,我们示出了learned政策在基于计数的保守模型下提供了近似优化性能保证。通过广泛的数值实验,我们证明了$\texttt{Count-MORL}$与哈希码实现在D4RL benchmark数据集上表现出色,明显超过了现有的离线RL算法。代码可以在 $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$ 上获取。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Bounded P-values in Parametric Programming-based Selective Inference

  • paper_url: http://arxiv.org/abs/2307.11351
  • repo_url: https://github.com/shirara1016/bounded_p_values_in_si
  • paper_authors: Tomohiro Shiraishi, Daiki Miwa, Vo Nguyen Le Duy, Ichiro Takeuchi
  • for: 本研究旨在提出一种可靠且高效的选择推理(Selective Inference)方法,以便对数据驱动的假设进行统计检验。
  • methods: 本研究使用 Parametric Programming-based Selective Inference(PP-based SI)方法,并提出了一种计算最大和最小值的方法来降低计算成本。同时,我们还提出了三种搜索策略来有效地提高这些 bound。
  • results: 我们在线性模型和深度神经网络中进行了Feature选择和注意区域标识等假设测试问题,并证明了我们的方法的有效性和高效性。
    Abstract Selective inference (SI) has been actively studied as a promising framework for statistical hypothesis testing for data-driven hypotheses. The basic idea of SI is to make inferences conditional on an event that a hypothesis is selected. In order to perform SI, this event must be characterized in a traceable form. When selection event is too difficult to characterize, additional conditions are introduced for tractability. This additional conditions often causes the loss of power, and this issue is referred to as over-conditioning. Parametric programming-based SI (PP-based SI) has been proposed as one way to address the over-conditioning issue. The main problem of PP-based SI is its high computational cost due to the need to exhaustively explore the data space. In this study, we introduce a procedure to reduce the computational cost while guaranteeing the desired precision, by proposing a method to compute the upper and lower bounds of p-values. We also proposed three types of search strategies that efficiently improve these bounds. We demonstrate the effectiveness of the proposed method in hypothesis testing problems for feature selection in linear models and attention region identification in deep neural networks.
    摘要 选择性推理(SI)已经广泛研究,作为数据驱动假设检测的可能性框架。SI的基本思想是在检测假设时,根据某些事件进行 conditional inference。为了实现SI,这个事件必须能够 traces。当选择事件太难以 traces时,通常会引入额外条件,这会导致损失力量,称为过度条件。Parametric programming-based SI(PP-based SI)已经提出来解决过度条件问题。然而,PP-based SI的主要问题是高计算成本,因为需要完全探索数据空间。在这种情况下,我们提出了一种方法来降低计算成本,保证所需的精度,通过计算权值的上下限。此外,我们还提出了三种搜索策略,可以有效地提高这些上下限。我们在线性模型中的特征选择问题和深度神经网络中的注意区域标识问题中进行了示例。

Improving Transferability of Adversarial Examples via Bayesian Attacks

  • paper_url: http://arxiv.org/abs/2307.11334
  • repo_url: None
  • paper_authors: Qizhang Li, Yiwen Guo, Xiaochen Yang, Wangmeng Zuo, Hao Chen
  • for: 提高逆转换性,增强模型对不同任务的适应性。
  • methods: incorporating Bayesian formulation into model parameters and input, 采用权重学习策略,提高模型的适应性和稳定性。
  • results: 实验表明, combining Bayesian formulations for both model input and parameters leads to significant improvements in transferability, 新方法在 ImageNet 和 CIFAR-10 上的平均成功率提高19.14%和2.08%,分别。
    Abstract This paper presents a substantial extension of our work published at ICLR. Our ICLR work advocated for enhancing transferability in adversarial examples by incorporating a Bayesian formulation into model parameters, which effectively emulates the ensemble of infinitely many deep neural networks, while, in this paper, we introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters. Our empirical findings demonstrate that: 1) the combination of Bayesian formulations for both the model input and model parameters yields significant improvements in transferability; 2) by introducing advanced approximations of the posterior distribution over the model input, adversarial transferability achieves further enhancement, surpassing all state-of-the-arts when attacking without model fine-tuning. Moreover, we propose a principled approach to fine-tune model parameters in such an extended Bayesian formulation. The derived optimization objective inherently encourages flat minima in the parameter space and input space. Extensive experiments demonstrate that our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively, when comparing with our ICLR basic Bayesian method. We will make our code publicly available.
    摘要
  1. The combination of Bayesian formulations for both the model input and model parameters leads to significant improvements in transferability.2. By introducing advanced approximations of the posterior distribution over the model input, adversarial transferability is further enhanced, surpassing all state-of-the-art methods when attacking without model fine-tuning.Moreover, we propose a principled approach to fine-tune model parameters in such an extended Bayesian formulation. The derived optimization objective inherently encourages flat minima in both the parameter space and input space. Extensive experiments demonstrate that our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively, when compared with our ICLR basic Bayesian method. Our code will be publicly available.

Demystifying Local and Global Fairness Trade-offs in Federated Learning Using Partial Information Decomposition

  • paper_url: http://arxiv.org/abs/2307.11333
  • repo_url: None
  • paper_authors: Faisal Hamman, Sanghamitra Dutta
  • For: This paper aims to provide an information-theoretic perspective on group fairness trade-offs in federated learning (FL) with respect to sensitive attributes, such as gender and race.* Methods: The paper uses a body of work in information theory called partial information decomposition (PID) to identify three sources of unfairness in FL, namely, Unique Disparity, Redundant Disparity, and Masked Disparity.* Results: The paper derives fundamental limits and trade-offs between global and local fairness, particularly under data heterogeneity, and presents experimental results on benchmark datasets to support the theoretical findings.Here is the same information in Simplified Chinese:* For: 这篇论文目标是通过对敏感特征(如性别和种族)的联邦学习(FL)中的群体公平负担进行信息理论的视角来探讨。* Methods: 这篇论文使用信息理论中的剩余信息分解(PID)技术来识别联邦学习中的三种不公平源,即唯一差异、重复差异和遮盖差异。* Results: 这篇论文 derivates了全球公平和本地公平之间的基本限制和负担,特别是在数据不均衡情况下,并通过使用标准例子来说明这三种差异如何影响全球和本地公平。
    Abstract In this paper, we present an information-theoretic perspective to group fairness trade-offs in federated learning (FL) with respect to sensitive attributes, such as gender, race, etc. Existing works mostly focus on either \emph{global fairness} (overall disparity of the model across all clients) or \emph{local fairness} (disparity of the model at each individual client), without always considering their trade-offs. There is a lack of understanding of the interplay between global and local fairness in FL, and if and when one implies the other. To address this gap, we leverage a body of work in information theory called partial information decomposition (PID) which first identifies three sources of unfairness in FL, namely, \emph{Unique Disparity}, \emph{Redundant Disparity}, and \emph{Masked Disparity}. Using canonical examples, we demonstrate how these three disparities contribute to global and local fairness. This decomposition helps us derive fundamental limits and trade-offs between global or local fairness, particularly under data heterogeneity, as well as, derive conditions under which one implies the other. We also present experimental results on benchmark datasets to support our theoretical findings. This work offers a more nuanced understanding of the sources of disparity in FL that can inform the use of local disparity mitigation techniques, and their convergence and effectiveness when deployed in practice.
    摘要 在这篇论文中,我们提出了一种信息学方面的视角来探讨在联合学习(Federated Learning,FL)中的群体公平负担。现有的工作主要关注global fairness(总体模型对所有客户的不均衡)或local fairness(每个客户模型不均衡),而不一定考虑它们之间的负担。对于FL中的群体公平负担的理解匮乏,特别是这些负担之间的关系。为了解决这个难题,我们利用了一种信息理论中的分解技术(partial information decomposition,PID),并识别了FL中三种不公平来源,即Unique Disparity、Redundant Disparity和Masked Disparity。使用标准示例,我们示出了这三种不公平如何影响全局和本地公平。这种分解帮助我们 derive fundamental limits和这些负担之间的负担规则,特别是在数据不均衡情况下。我们还提供了实验结果,以支持我们的理论发现。这项工作为FL中不公平负担管理技术的使用提供了更加细化的理解,以及这些技术在实践中的协调和效果。

Beyond Convergence: Identifiability of Machine Learning and Deep Learning Models

  • paper_url: http://arxiv.org/abs/2307.11332
  • repo_url: None
  • paper_authors: Reza Sameni
  • for: investigate the notion of model parameter identifiability through a case study
  • methods: utilizing a deep neural network to estimate subject-wise parameters from motion sensor data
  • results: certain parameters can be identified, while others remain unidentifiable due to the experimental setup’s limitations
    Abstract Machine learning (ML) and deep learning models are extensively used for parameter optimization and regression problems. However, not all inverse problems in ML are ``identifiable,'' indicating that model parameters may not be uniquely determined from the available data and the data model's input-output relationship. In this study, we investigate the notion of model parameter identifiability through a case study focused on parameter estimation from motion sensor data. Utilizing a bipedal-spring mass human walk dynamics model, we generate synthetic data representing diverse gait patterns and conditions. Employing a deep neural network, we attempt to estimate subject-wise parameters, including mass, stiffness, and equilibrium leg length. The results show that while certain parameters can be identified from the observation data, others remain unidentifiable, highlighting that unidentifiability is an intrinsic limitation of the experimental setup, necessitating a change in data collection and experimental scenarios. Beyond this specific case study, the concept of identifiability has broader implications in ML and deep learning. Addressing unidentifiability requires proven identifiable models (with theoretical support), multimodal data fusion techniques, and advancements in model-based machine learning. Understanding and resolving unidentifiability challenges will lead to more reliable and accurate applications across diverse domains, transcending mere model convergence and enhancing the reliability of machine learning models.
    摘要 机器学习(ML)和深度学习模型在参数优化和回归问题中广泛应用。然而,不是所有机器学习 inverse problem 是可识别的,表示模型参数可能不是来自可用数据和输入输出关系的数据模型的唯一确定。在本研究中,我们通过人行动数据的情况进行研究,探讨模型参数可识别性的概念。使用一个人体弹簧模型,我们生成了多种步态和条件的 sintetic 数据。使用深度神经网络,我们尝试了每个参与者的参数,包括质量、刚度和平衡脚长。结果表明,有些参数可以从观察数据中提取,而其他参数则无法准确地确定,这反映了实验设置的内在限制,需要更改数据收集和实验方法。这种特定的案例研究还有更广泛的意义在机器学习和深度学习中。解决不可识别性需要有理据支持的可识别模型、多Modal 数据融合技术和机器学习模型的发展。更好地理解和解决不可识别性挑战,将导致更可靠、更准确的应用在多个领域,超越模型的极限并提高机器学习模型的可靠性。

Methodologies for Improving Modern Industrial Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.01204
  • repo_url: None
  • paper_authors: Shusen Wang
  • for: 这篇论文是为了提高现代工业 recommender systems(RS)的方法ologys。
  • methods: 这篇论文使用了一些现代工业RS的实践经验和不公开的参考文献。
  • results: 这篇论文提出了一些有效的方法,可以提高现代工业RS的适用率和持续时间。
    Abstract Recommender system (RS) is an established technology with successful applications in social media, e-commerce, entertainment, and more. RSs are indeed key to the success of many popular APPs, such as YouTube, Tik Tok, Xiaohongshu, Bilibili, and others. This paper explores the methodology for improving modern industrial RSs. It is written for experienced RS engineers who are diligently working to improve their key performance indicators, such as retention and duration. The experiences shared in this paper have been tested in some real industrial RSs and are likely to be generalized to other RSs as well. Most contents in this paper are industry experience without publicly available references.
    摘要 “推荐系统(RS)是一种已经成熟的技术,在社交媒体、电商、娱乐等领域都有成功应用。RS在许多受欢迎的APP中扮演重要角色,如YouTube、Tik Tok、Xiaohongshu和Bilibili等。本文探讨现代工业RS的改进方法。这篇文章主要对于经验丰富的RS工程师,以提高关键性表现指标(如滞留率和使用时间)为目标。文中的经验主要基于实际的工业应用,并且可能对其他RS进行应用。”Note: Please keep in mind that the translation is based on the provided text and may not be perfect or entirely accurate, as the nuances of the original text may be lost in translation.

Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration

  • paper_url: http://arxiv.org/abs/2307.11327
  • repo_url: None
  • paper_authors: Debasmita Mukherjee, Ritwik Singhai, Homayoun Najjaran
  • for: 这篇论文旨在解决虚拟现实环境中手势识别问题,以便实现人机合作 robots 的embodied teleoperation。
  • methods: 该论文提出了一种系统性的框架,通过限制大小的虚拟数据集和精心制作的数据集来适应虚拟环境。
  • results: 该论文通过对实际环境中训练的深度学习模型进行适应,在虚拟环境中实现了高效的手势识别。
    Abstract Virtual reality has proved to be useful in applications in several fields ranging from gaming, medicine, and training to development of interfaces that enable human-robot collaboration. It empowers designers to explore applications outside of the constraints posed by the real world environment and develop innovative solutions and experiences. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets. In order to utilize the power of natural and intuitive hand gestures in the virtual domain for enabling embodied teleoperation of collaborative robots, similarly large datasets must be created so as to keep the working interface easy to learn and flexible enough to add more gestures. Depending on the application, this may be computationally or economically prohibitive. Thus, the adaptation of trained deep learning models that perform well in the real environment to the virtual may be a solution to this challenge. This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset along with guidelines for creating a curated dataset. Finally, while hand gestures have been considered as the communication mode, the guidelines and recommendations presented are generic. These are applicable to other modes such as body poses and facial expressions which have large datasets available in the real domain which must be adapted to the virtual one.
    摘要 虚拟现实已经在各种领域展示了其用途,包括游戏、医疗、训练和人机合作交互的开发。它让设计师能够在虚拟环境中探索不受实际环境限制的应用,并开发创新的解决方案和体验。手势认识是虚拟现实中的一个重要话题,因为大量标注的数据集的创建使得手势在实际世界中得到了商业化。为了在虚拟世界中使用自然和直观的手势,需要创建大量的虚拟数据集,以保持工作界面简单易学习,并能够添加更多的手势。在应用程序方面,这可能是计算机或经济上的瓶颈。因此,将已经在实际环境中表现好的深度学习模型适应到虚拟环境可能是一个解决方案。本文提出了一个系统化的实际环境到虚拟环境的适应框架,以及创建审核数据集的指南。最后,尽管手势被视为交流方式,但是这些指南和建议适用于其他模式,如身体姿态和表情,这些在实际环境中有大量数据集可以适应到虚拟环境。

Analysis of Elephant Movement in Sub-Saharan Africa: Ecological, Climatic, and Conservation Perspectives

  • paper_url: http://arxiv.org/abs/2307.11325
  • repo_url: None
  • paper_authors: Matthew Hines, Gregory Glatzer, Shreya Ghosh, Prasenjit Mitra
  • for: 这项研究旨在更深入理解非洲亚洲 elephant 的移动 behaviors,以便更好地保护这些动物。
  • methods: 该研究使用分析方法来揭示非洲亚洲 elephant 的移动模式,包括考虑季节变化和降水 patrerns 等生态因素。
  • results: 研究发现 elephant 的移动 behaviors 受到季节变化和降水 patrerns 等生态因素的影响,并提供了一种可预测 elephant 移动模式的方法,这可能为保护这些动物提供有价值的信息。
    Abstract The interaction between elephants and their environment has profound implications for both ecology and conservation strategies. This study presents an analytical approach to decipher the intricate patterns of elephant movement in Sub-Saharan Africa, concentrating on key ecological drivers such as seasonal variations and rainfall patterns. Despite the complexities surrounding these influential factors, our analysis provides a holistic view of elephant migratory behavior in the context of the dynamic African landscape. Our comprehensive approach enables us to predict the potential impact of these ecological determinants on elephant migration, a critical step in establishing informed conservation strategies. This projection is particularly crucial given the impacts of global climate change on seasonal and rainfall patterns, which could substantially influence elephant movements in the future. The findings of our work aim to not only advance the understanding of movement ecology but also foster a sustainable coexistence of humans and elephants in Sub-Saharan Africa. By predicting potential elephant routes, our work can inform strategies to minimize human-elephant conflict, effectively manage land use, and enhance anti-poaching efforts. This research underscores the importance of integrating movement ecology and climatic variables for effective wildlife management and conservation planning.
    摘要 elephants和其环境之间的互动对生态和保护策略有深刻的影响。这项研究提出了一种分析方法,用于解读在亚洲南部的非洲大象运动征分布的复杂模式,主要考虑了季节变化和降水 patrerns等生态因素。尽管这些因素具有复杂性,但我们的分析提供了一个整体的视角,用于理解非洲静止的大象migrations。我们的全面approach可以预测大象移动的可能性,这是为建立有知识基础的保护策略提供了关键的一步。由于全球气候变化对季节和降水 patrerns的影响,这些因素可能会对大象移动产生深刻的影响。我们的研究结果希望能不仅提高大象运动学的理解,还能推动人类和大象之间可持续合作的发展,以保护非洲南部的野生动物资源。我们的工作可以预测可能的大象路线,以便为人类-大象冲突的避免、土地管理和反贼战斗提供有价值的信息。这项研究强调了在保护野生动物资源方面 integrating movement ecology和 climatic variables的重要性。

XLDA: Linear Discriminant Analysis for Scaling Continual Learning to Extreme Classification at the Edge

  • paper_url: http://arxiv.org/abs/2307.11317
  • repo_url: None
  • paper_authors: Karan Shah, Vishruth Veerendranath, Anushka Hebbar, Raghavendra Bhat
  • For: 该研究旨在提出一种基于流式线性减少分析(XLDA)的分类器,用于在边缘部署中进行类增量学习(Class-IL),并在极端分类场景下保持等效性。* Methods: 该研究提出了一种基于XLDA的框架,包括对LDA分类器的证明,以及在边缘部署中进行训练和推断的优化策略,以提高效率和可扩展性。* Results: 研究人员通过使用批处理训练策略和最近匹配搜索策略,在极端分类场景下实现了至多42倍的训练速度减少和至多5倍的推断速度减少,在AliProducts和Google Landmarks V2等极端数据集上进行了实验 validate。
    Abstract Streaming Linear Discriminant Analysis (LDA) while proven in Class-incremental Learning deployments at the edge with limited classes (upto 1000), has not been proven for deployment in extreme classification scenarios. In this paper, we present: (a) XLDA, a framework for Class-IL in edge deployment where LDA classifier is proven to be equivalent to FC layer including in extreme classification scenarios, and (b) optimizations to enable XLDA-based training and inference for edge deployment where there is a constraint on available compute resources. We show up to 42x speed up using a batched training approach and up to 5x inference speedup with nearest neighbor search on extreme datasets like AliProducts (50k classes) and Google Landmarks V2 (81k classes)
    摘要 <>将文本翻译成简化中文。<>流式线性准确分析(LDA)在Edge环境中进行类增量学习部署(upto 1000)已经证明,但在极端分类场景下没有得到证明。本文提出了以下两点:(a)XLDA,一个基于LDA分类器的Edge环境中的类增量学习框架,其中LDA分类器在极端分类场景下与FC层等价;(b)为了实现XLDA基于训练和推断的编制和优化,以便在 Edge环境中使用有限的计算资源。我们在极端 dataset like AliProducts (50k类) 和 Google Landmarks V2 (81k类) 上实现了最多42倍的批处理训练速度和最多5倍的 nearest neighbor search 推断速度。

Making Pre-trained Language Models both Task-solvers and Self-calibrators

  • paper_url: http://arxiv.org/abs/2307.11316
  • repo_url: https://github.com/yangyi-chen/lm-toast
  • paper_authors: Yangyi Chen, Xingyao Wang, Heng Ji
  • for: 这 paper 的目的是提高 PLM 的自信估计,使其在错误预测时不再过于自信。
  • methods: 这 paper 使用了一种叫做 LM-TOAST 的训练算法,以使 PLM 同时成为任务解决者和自我调整器。
  • results: 实验结果表明,LM-TOAST 可以有效地利用训练数据,使 PLM 有合理的自信估计,而不会影响其原始任务性能。
    Abstract Pre-trained language models (PLMs) serve as backbones for various real-world systems. For high-stake applications, it's equally essential to have reasonable confidence estimations in predictions. While the vanilla confidence scores of PLMs can already be effectively utilized, PLMs consistently become overconfident in their wrong predictions, which is not desirable in practice. Previous work shows that introducing an extra calibration task can mitigate this issue. The basic idea involves acquiring additional data to train models in predicting the confidence of their initial predictions. However, it only demonstrates the feasibility of this kind of method, assuming that there are abundant extra available samples for the introduced calibration task. In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and self-calibrators. Three challenges are presented, including limited training samples, data imbalance, and distribution shifts. We first conduct pilot experiments to quantify various decisive factors in the calibration task. Based on the empirical analysis results, we propose a training algorithm LM-TOAST to tackle the challenges. Experimental results show that LM-TOAST can effectively utilize the training data to make PLMs have reasonable confidence estimations while maintaining the original task performance. Further, we consider three downstream applications, namely selective classification, adversarial defense, and model cascading, to show the practical usefulness of LM-TOAST. The code will be made public at \url{https://github.com/Yangyi-Chen/LM-TOAST}.
    摘要 预训言语模型(PLM)作为各种实际系统的基础,其中一个重要问题是保证预测结果的可靠性。虽然vanilla confidence scores已经可以有效地使用,但PLM在错误预测时常常过分自信,这不是实际应用中所需的。 previous work表明,通过添加额外的calibration任务可以解决这个问题。基本思想是通过训练模型预测其初始预测的可信度。然而,这只是一种可行的方法,假设有充足的额外可用样本。在这种实际场景中,我们需要使PLM同时成为任务解决者和自我调整者。我们描述了三个挑战,包括有限的训练样本、数据不均衡和分布shift。我们首先进行了飞行实验,以量化各种决定性因素在calibration任务中。根据实验分析结果,我们提出了一种训练算法LM-TOAST,用于解决这些挑战。实验结果表明,LM-TOAST可以有效地利用训练数据,使PLM有合理的可信度估计,同时保持原始任务性能。此外,我们考虑了三个下游应用,namely selective classification、adversarial defense和model cascading,以显示LM-TOAST的实际用途。代码将在\url{https://github.com/Yangyi-Chen/LM-TOAST}公开。

Artificial Intelligence-Generated Terahertz Multi-Resonant Metasurfaces via Improved Transformer and CGAN Neural Networks

  • paper_url: http://arxiv.org/abs/2307.11794
  • repo_url: None
  • paper_authors: Yangpeng Huang, Naixing Feng, Yijun Cai
  • For: This paper proposes improved Transformer and conditional generative adversarial neural networks (CGAN) for the inverse design of graphene metasurfaces based on THz multi-resonant absorption spectra.* Methods: The paper uses traditional deep neural networks (DNNs), improved Transformer, and CGAN for the inverse design of graphene metasurfaces.* Results: The improved Transformer achieves higher accuracy and generalization performance in the StoV design, while the StoI design achieved through CGAN provides more comprehensive information and higher accuracy than the StoV design obtained by MLP. The improved CGAN can also achieve the inverse design of graphene metasurface images directly from the desired multi-resonant absorption spectra.Here’s the Chinese translation of the three key points:* For: 这篇论文提出了基于 THz 多晶谐振荷重Graphene 元件的 inverse 设计方法,使用传统的深度神经网络 (DNNs) 和改进的 Transformer 和 conditional generative adversarial neural networks (CGAN)。* Methods: 这篇论文使用了 DNNs、改进的 Transformer 和 CGAN 进行 inverse 设计。* Results: 改进的 Transformer 在 StoV 设计中实现了更高的准确率和泛化能力,而 StoI 设计通过 CGAN 实现了更全面的信息和更高的准确率,而且改进的 CGAN 还可以直接从愿景多晶谐振荷谱图中生成图像。
    Abstract It is well known that the inverse design of terahertz (THz) multi-resonant graphene metasurfaces by using traditional deep neural networks (DNNs) has limited generalization ability. In this paper, we propose improved Transformer and conditional generative adversarial neural networks (CGAN) for the inverse design of graphene metasurfaces based upon THz multi-resonant absorption spectra. The improved Transformer can obtain higher accuracy and generalization performance in the StoV (Spectrum to Vector) design compared to traditional multilayer perceptron (MLP) neural networks, while the StoI (Spectrum to Image) design achieved through CGAN can provide more comprehensive information and higher accuracy than the StoV design obtained by MLP. Moreover, the improved CGAN can achieve the inverse design of graphene metasurface images directly from the desired multi-resonant absorption spectra. It is turned out that this work can finish facilitating the design process of artificial intelligence-generated metasurfaces (AIGM), and even provide a useful guide for developing complex THz metasurfaces based on 2D materials using generative neural networks.
    摘要 bekannt ist, dass die inverse Design von terahertz (THz) multi-resonant graphene metasurfaces using traditional deep neural networks (DNNs) begrenzt ist. In diesem paper, we propose improved Transformer and conditional generative adversarial neural networks (CGAN) for the inverse Design of graphene metasurfaces based on THz multi-resonant absorption spectra. die verbesserte Transformer kann eine höhere Genauigkeit und Generalisierungleistung im StoV (Spektrum zu Vektor) Design gegenüber traditionellen multilayer perceptron (MLP) neural networks erzielen, während die StoI (Spektrum zu Bild) Gestaltung durch CGAN mehr umfassende Informationen und eine höhere Genauigkeit als die StoV Gestaltung durch MLP bereitstellt. besides, the improved CGAN can achieve the inverse design of graphene metasurface images directly from the desired multi-resonant absorption spectra. it turns out that this work can facilitate the design process of artificial intelligence-generated metasurfaces (AIGM), and even provide a useful guide for developing complex THz metasurfaces based on 2D materials using generative neural networks.

Neuromorphic Online Learning for Spatiotemporal Patterns with a Forward-only Timeline

  • paper_url: http://arxiv.org/abs/2307.11314
  • repo_url: None
  • paper_authors: Zhenhang Zhang, Jingang Jin, Haowen Fang, Qinru Qiu
  • for: 这个论文主要针对的是在线学习神经网络模型(SNN),以提高神经网络的能效性和扩展它们的应用范围。
  • methods: 这篇论文提出了一种名为Spatiotemporal Online Learning for Synaptic Adaptation(SOLSA)的在线学习算法,用于学习具有泄漏散发和软重置的窗口级神经元(LIF)和其相关的Synapse。该算法不仅学习了synaptic weight,还适应了时间滤波器。相比BPTT算法,SOLSA具有远低的内存需求,并实现了更好的时间工作负荷分布。此外,SOLSA还包含了启用技术,如调度的weight更新、早期停止训练和自适应synapse滤波器,这些技术使得SOLSA更快速地 converges和提高了学习性能。
  • results: 相比非BPTT基于SNN学习算法,SOLSA在平均学习精度上提高了14.2%。而相比BPTT算法,SOLSA在内存成本下降72%的情况下,实现了5%高的平均学习精度。
    Abstract Spiking neural networks (SNNs) are bio-plausible computing models with high energy efficiency. The temporal dynamics of neurons and synapses enable them to detect temporal patterns and generate sequences. While Backpropagation Through Time (BPTT) is traditionally used to train SNNs, it is not suitable for online learning of embedded applications due to its high computation and memory cost as well as extended latency. Previous works have proposed online learning algorithms, but they often utilize highly simplified spiking neuron models without synaptic dynamics and reset feedback, resulting in subpar performance. In this work, we present Spatiotemporal Online Learning for Synaptic Adaptation (SOLSA), specifically designed for online learning of SNNs composed of Leaky Integrate and Fire (LIF) neurons with exponentially decayed synapses and soft reset. The algorithm not only learns the synaptic weight but also adapts the temporal filters associated to the synapses. Compared to the BPTT algorithm, SOLSA has much lower memory requirement and achieves a more balanced temporal workload distribution. Moreover, SOLSA incorporates enhancement techniques such as scheduled weight update, early stop training and adaptive synapse filter, which speed up the convergence and enhance the learning performance. When compared to other non-BPTT based SNN learning, SOLSA demonstrates an average learning accuracy improvement of 14.2%. Furthermore, compared to BPTT, SOLSA achieves a 5% higher average learning accuracy with a 72% reduction in memory cost.
    摘要 神经网络(SNN)是生物可能性计算模型,具有高能效性。 neuron和 synapse 的时间动态性允许它们检测时间模式和生成序列。 而传统上使用的 Backpropagation Through Time(BPTT)不适用于嵌入式应用的在线学习,因为它具有高计算和存储成本以及延迟。 先前的工作已经提出了在线学习算法,但它们通常使用简化的脑神经模型,无法考虑 synaptic 动态和软Reset,导致性能不佳。 在这种工作中,我们提出了Spatiotemporal Online Learning for Synaptic Adaptation(SOLSA),专门为SNNs组成的Leaky Integrate and Fire(LIF) neurons with exponentially decayed synapses and soft reset进行在线学习。该算法不仅学习synaptic weight,还适应相关的时间筛子。相比BPTT算法,SOLSA具有远低的内存需求,并实现了更平衡的时间工作负荷分布。此外,SOLSA还包括了加强技术,如计划的weight更新、早期停止训练和适应synapse筛子,快速增长和提高学习性能。与其他非BPTT基于SNN学习相比,SOLSA表现出14.2%的学习精度提高。而与BPTT相比,SOLSA实现了72%的内存成本减少,并达到15%高的学习精度。

Who should I Collaborate with? A Comparative Study of Academia and Industry Research Collaboration in NLP

  • paper_url: http://arxiv.org/abs/2308.04524
  • repo_url: None
  • paper_authors: Hussain Sadiq Abuwala, Bohan Zhang, Mushi Wang
  • for: investigate the effects of collaboration between academia and industry on Natural Language Processing (NLP)
  • methods: created a pipeline to extract affiliations and citations from NLP papers and divided them into three categories: academia, industry, and hybrid (collaborations between academia and industry)
  • results: found a trend towards an increase in industry and academia-industry collaboration publications, and these types of publications tend to have a higher impact compared to those produced solely within academia.Here’s the information in Simplified Chinese text:
  • for: 研究学术与产业之间的协作对自然语言处理(NLP)的影响
  • methods: 创建了一个管道,EXTRACT affiliations and citations from NLP papers, and divided them into three categories: academia, industry, and hybrid (collaborations between academia and industry)
  • results: 发现了协作类别的发表数量在提高,并且这些类别的发表物 tend to have a higher impact compared to those produced solely within academia.
    Abstract The goal of our research was to investigate the effects of collaboration between academia and industry on Natural Language Processing (NLP). To do this, we created a pipeline to extract affiliations and citations from NLP papers and divided them into three categories: academia, industry, and hybrid (collaborations between academia and industry). Our empirical analysis found that there is a trend towards an increase in industry and academia-industry collaboration publications and that these types of publications tend to have a higher impact compared to those produced solely within academia.
    摘要 我们的研究目标是研究学术与工业合作对自然语言处理(NLP)的影响。为此,我们创建了一个管道,以EXTRACT affiliations和引用信息从NLP论文中,并将其分为三类:学术、产业和混合(学术与产业合作)。我们的实证分析发现, there is a trend towards an increase in industry and academia-industry collaboration publications, and these types of publications tend to have a higher impact compared to those produced solely within academia.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

PI-VEGAN: Physics Informed Variational Embedding Generative Adversarial Networks for Stochastic Differential Equations

  • paper_url: http://arxiv.org/abs/2307.11289
  • repo_url: None
  • paper_authors: Ruisong Gao, Yufeng Wang, Min Yang, Chuanjun Chen
  • for: 解决Stochastic Differential Equations(SDEs)的前向、逆向和混合问题,只有部分系统参数的感知数据available。
  • methods: integrate governing physical laws into Physics Informed Variational Embedding Generative Adversarial Network (PI-VEGAN) with automatic differentiation, introduce variational encoder to approximate latent variables, and use stochastic gradient descent algorithm to update components.
  • results: PI-VEGAN achieves satisfactory stability and accuracy in addressing forward, inverse, and mixed problems, outperforming previous Physics-Informed Generative Adversarial Network (PI-WGAN) in numerical results.
    Abstract We present a new category of physics-informed neural networks called physics informed variational embedding generative adversarial network (PI-VEGAN), that effectively tackles the forward, inverse, and mixed problems of stochastic differential equations. In these scenarios, the governing equations are known, but only a limited number of sensor measurements of the system parameters are available. We integrate the governing physical laws into PI-VEGAN with automatic differentiation, while introducing a variational encoder for approximating the latent variables of the actual distribution of the measurements. These latent variables are integrated into the generator to facilitate accurate learning of the characteristics of the stochastic partial equations. Our model consists of three components, namely the encoder, generator, and discriminator, each of which is updated alternatively employing the stochastic gradient descent algorithm. We evaluate the effectiveness of PI-VEGAN in addressing forward, inverse, and mixed problems that require the concurrent calculation of system parameters and solutions. Numerical results demonstrate that the proposed method achieves satisfactory stability and accuracy in comparison with the previous physics-informed generative adversarial network (PI-WGAN).
    摘要 我们提出了一新类型的物理学信息泛化网络(PI-VEGAN),用于有效地解决Stochastic Differential Equations(SDEs)中的前向、逆向和混合问题。在这些场景下,系统参数的 governing equations 已知,但只有一个有限多的感知器测量可用。我们将物理法则integrated into PI-VEGAN中,并引入了一个变量编码器来 aproximate latent variables的实际分布。这些 latent variables 被 integrate into the generator 以便准确地学习 SDEs 的特征。我们的模型包括三个组件:编码器、生成器和批判器,每个组件都在使用随机梯度下降算法进行更新。我们评估了PI-VEGAN的有效性,并发现它在解决前向、逆向和混合问题时具有满意的稳定性和准确性,比之前的物理学信息泛化网络(PI-WGAN)更好。

Kernelized Offline Contextual Dueling Bandits

  • paper_url: http://arxiv.org/abs/2307.11288
  • repo_url: None
  • paper_authors: Viraj Mehta, Ojash Neopane, Vikramjeet Das, Sen Lin, Jeff Schneider, Willie Neiswanger
  • for: 这个论文主要针对哪些应用场景?
  • methods: 这个论文使用哪些方法?
  • results: 这个论文获得了哪些结果?Here are the answers, in Simplified Chinese:
  • for: 这个论文主要针对 preference-based 反馈的应用场景,如人工智能学习从人类反馈中获得的大语言模型。
  • methods: 这个论文使用 offline contextual dueling bandit 设定,并提供了一种 upper-confidence-bound 样式的算法,以及证明了 regret bound。
  • results: 这个论文的算法比一种使用均匀随机 contexts 的策略表现更好,并得到了实验证明。
    Abstract Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and introduce the offline contextual dueling bandit setting. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound. We also give empirical confirmation that this method outperforms a similar strategy that uses uniformly sampled contexts.
    摘要 Preferences-based feedback 是许多应用程序中非常重要的,特别是当直接评估奖金函数不可能时。一个最近的例子来自大自然语言模型的人工反馈。许多这些应用程序中,人类反馈的成本可能很高或甚至是不可接受的。在这种情况下,我们利用agent可以选择获取人类反馈的上下文,以便最有效地标识一个好策略,并引入了线上上下文战斗带 setting。我们提供了一种上确界风格的算法,并证明了一个违和 bound。我们还提供了实验证明,这种方法在相似的策略中表现比较好。

MAS: Towards Resource-Efficient Federated Multiple-Task Learning

  • paper_url: http://arxiv.org/abs/2307.11285
  • repo_url: None
  • paper_authors: Weiming Zhuang, Yonggang Wen, Lingjuan Lyu, Shuai Zhang
  • for: 这篇研究旨在实现多元分布式机器学习(Federated Learning,FL)系统,以实现在分散的Edge设备上进行内置模型训练。
  • methods: 本研究提出了首个实现多元FL任务训练的系统,称为MAS(Merge and Split)。MAS首先将多元FL任务合并为一个统一FL任务,然后在训练一些循环后,使用任务之间的相互关联性来拆分为多个FL任务。接着,MAS将每个拆分的FL任务继续训练,根据在统一训练中获得的模型参数。
  • results: 实验结果显示,MAS比其他方法具有更好的性能,同时降低了训练时间和能源消耗。实验结果显示,在训练20个FL任务时,MAS可以降低训练时间2倍,并降低能源消耗40%。
    Abstract Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous FL tasks could overload resource-constrained devices. In this work, we propose the first FL system to effectively coordinate and train multiple simultaneous FL tasks. We first formalize the problem of training simultaneous FL tasks. Then, we present our new approach, MAS (Merge and Split), to optimize the performance of training multiple simultaneous FL tasks. MAS starts by merging FL tasks into an all-in-one FL task with a multi-task architecture. After training for a few rounds, MAS splits the all-in-one FL task into two or more FL tasks by using the affinities among tasks measured during the all-in-one training. It then continues training each split of FL tasks based on model parameters from the all-in-one training. Extensive experiments demonstrate that MAS outperforms other methods while reducing training time by 2x and reducing energy consumption by 40%. We hope this work will inspire the community to further study and optimize training simultaneous FL tasks.
    摘要 federa 学习(FL)是一种emerging分布式机器学习方法,它允许在分布式边缘设备上进行增 Situ 模型训练。然而,多个同时进行FL任务可能会过载具有限的设备资源。在这种情况下,我们提出了首个可以有效协调和训练多个同时FL任务的FL系统。我们首先正式定义同时训练多个FL任务的问题。然后,我们介绍了我们的新方法MAS(合并并分),用于优化训练多个FL任务的性能。MAS开始是将FL任务合并成一个所有任务的FL任务,并使用多任务架构进行训练。在训练一些循环后,MAS将所有任务合并的FL任务分解成两个或更多的FL任务,然后继续基于所有任务训练的模型参数进行每个分解的FL任务训练。广泛的实验表明,MAS在减少训练时间2x,减少能量消耗40%的情况下,能够超越其他方法。我们希望这种工作能够激励社区进一步研究并优化同时训练FL任务。

Epsilon*: Privacy Metric for Machine Learning Models

  • paper_url: http://arxiv.org/abs/2307.11280
  • repo_url: None
  • paper_authors: Diana M. Negoescu, Humberto Gonzalez, Saad Eddin Al Orjany, Jilei Yang, Yuliia Lut, Rahul Tandra, Xiaowen Zhang, Xinyi Zheng, Zach Douglas, Vidita Nolkha, Parvez Ahammad, Gennady Samorodnitsky
    for:The paper aims to provide a new privacy metric called Epsilon* to measure the privacy risk of a single model instance before, during, or after deployment of privacy mitigation strategies.methods:The metric does not require access to the training data sampling or model training algorithm, and is based on a hypothesis test used by an adversary in a membership inference attack.results:The paper shows that Epsilon* is sensitive to privacy risk mitigation by training with differential privacy (DP), where the value of Epsilon* is reduced by up to 800% compared to the Epsilon* values of non-DP trained baseline models. This allows privacy auditors to be independent of model owners and enables all decision-makers to visualize the privacy-utility landscape to make informed decisions regarding the trade-offs between model privacy and utility.Here is the same information in Simplified Chinese:for:这篇论文目标是提供一个新的隐私度量 metric called Epsilon,用于在模型实例之前、 durante 或者 после部署隐私修正策略时测量隐私风险。methods:metric 不需要访问训练数据采样或模型训练算法,基于一个 adversary 在成员推测攻击中使用的 гипотеза测试。results:论文显示,使用权限隐私(DP)训练可以降低 Epsilon 的值,相比非DP 训练基eline 模型,最多降低 800%。这使得隐私审计人员可以独立于模型所有者,并且允许所有决策者在隐私与用途之间做出 Informed 决策。
    Abstract We introduce Epsilon*, a new privacy metric for measuring the privacy risk of a single model instance prior to, during, or after deployment of privacy mitigation strategies. The metric does not require access to the training data sampling or model training algorithm. Epsilon* is a function of true positive and false positive rates in a hypothesis test used by an adversary in a membership inference attack. We distinguish between quantifying the privacy loss of a trained model instance and quantifying the privacy loss of the training mechanism which produces this model instance. Existing approaches in the privacy auditing literature provide lower bounds for the latter, while our metric provides a lower bound for the former by relying on an (${\epsilon}$,${\delta}$)-type of quantification of the privacy of the trained model instance. We establish a relationship between these lower bounds and show how to implement Epsilon* to avoid numerical and noise amplification instability. We further show in experiments on benchmark public data sets that Epsilon* is sensitive to privacy risk mitigation by training with differential privacy (DP), where the value of Epsilon* is reduced by up to 800% compared to the Epsilon* values of non-DP trained baseline models. This metric allows privacy auditors to be independent of model owners, and enables all decision-makers to visualize the privacy-utility landscape to make informed decisions regarding the trade-offs between model privacy and utility.
    摘要 我们引入ε*, a new privacy metric to measure the privacy risk of a single model instance before, during, or after deploying privacy mitigation strategies. This metric does not require access to the training data sampling or model training algorithm. ε* is a function of true positive and false positive rates in a hypothesis test used by an adversary in a membership inference attack. We distinguish between quantifying the privacy loss of a trained model instance and quantifying the privacy loss of the training mechanism that produces this model instance. Existing approaches in the privacy auditing literature provide lower bounds for the latter, while our metric provides a lower bound for the former by relying on an (${\epsilon}$,${\delta}$)-type of quantification of the privacy of the trained model instance. We establish a relationship between these lower bounds and show how to implement ε* to avoid numerical and noise amplification instability. We further show in experiments on benchmark public data sets that ε* is sensitive to privacy risk mitigation by training with differential privacy (DP), where the value of ε* is reduced by up to 800% compared to the ε* values of non-DP trained baseline models. This metric allows privacy auditors to be independent of model owners, and enables all decision-makers to visualize the privacy-utility landscape to make informed decisions regarding the trade-offs between model privacy and utility.

Learning to Segment from Noisy Annotations: A Spatial Correction Approach

  • paper_url: http://arxiv.org/abs/2308.02498
  • repo_url: https://github.com/michaelofsbu/spatialcorrection
  • paper_authors: Jiachen Yao, Yikai Zhang, Songzhu Zheng, Mayank Goswami, Prateek Prasanna, Chao Chen
  • for: 本研究旨在提高深度神经网络(DNNs)在医学图像分割任务中的性能,通过处理杂乱标注(noisy labels)。
  • methods: 我们提出了一种新的Markov模型,用于捕捉分割杂乱标注中的空间相关性和偏好性。此外,我们还提出了一种标签修正方法,用于逐步修复真实标签。
  • results: 我们的方法在 sintetic和实际杂乱标注数据上进行了实验,并证明了我们的方法可以超过现有状态的各种方法。
    Abstract Noisy labels can significantly affect the performance of deep neural networks (DNNs). In medical image segmentation tasks, annotations are error-prone due to the high demand in annotation time and in the annotators' expertise. Existing methods mostly assume noisy labels in different pixels are \textit{i.i.d}. However, segmentation label noise usually has strong spatial correlation and has prominent bias in distribution. In this paper, we propose a novel Markov model for segmentation noisy annotations that encodes both spatial correlation and bias. Further, to mitigate such label noise, we propose a label correction method to recover true label progressively. We provide theoretical guarantees of the correctness of the proposed method. Experiments show that our approach outperforms current state-of-the-art methods on both synthetic and real-world noisy annotations.
    摘要 干扰标签可以影响深度神经网络(DNNs)的性能。在医疗图像分割任务中,标注错误率高,主要因为标注时间和标注人员的专业知识需求很高。现有方法大多假设不同像素的干扰标签是独立的,但实际上,分割标签噪声通常具有强相关性和明显的偏见。在这篇论文中,我们提出了一种新的马尔可夫模型,用于捕捉分割噪声标注的空间相关性和偏见。此外,我们还提出了一种标签修正方法,可以逐步recover真实的标签。我们提供了理论保证方法的正确性。实验表明,我们的方法在 sintetic和实际噪声标注上都超过了当前状态的先进方法。

Screening Mammography Breast Cancer Detection

  • paper_url: http://arxiv.org/abs/2307.11274
  • repo_url: https://github.com/chakrabortyde/rsna-breast-cancer
  • paper_authors: Debajyoti Chakraborty
  • for: 提高乳癌检测效率和准确率,减少成本和假阳性结果导致的患者担忧。
  • methods: 使用自动化 breast cancer 检测方法,包括不同的方法ologies 在 RSNA 数据集上进行测试,并获得了 roughly 20,000 名女性患者的 radiographic breast images 的平均验证案例 pF1 分数为 0.56。
  • results: 通过自动化检测方法,可以提高乳癌检测的效率和准确率,减少成本和假阳性结果导致的患者担忧。
    Abstract Breast cancer is a leading cause of cancer-related deaths, but current programs are expensive and prone to false positives, leading to unnecessary follow-up and patient anxiety. This paper proposes a solution to automated breast cancer detection, to improve the efficiency and accuracy of screening programs. Different methodologies were tested against the RSNA dataset of radiographic breast images of roughly 20,000 female patients and yielded an average validation case pF1 score of 0.56 across methods.
    摘要 乳癌是癌症相关死亡率的主要原因,但目前的计划昂贵,且容易出现假阳性结果,导致无需跟踪和患者担忧。本文提出一种自动乳癌检测方案,以提高检测计划的效率和准确率。不同的方法在RSNA数据集上进行了测试,并获得了 roughly 20,000 名女性患者的乳影像数据集的平均验证案例 pF1 分数0.56。

On the Fisher-Rao Gradient of the Evidence Lower Bound

  • paper_url: http://arxiv.org/abs/2307.11249
  • repo_url: None
  • paper_authors: Nihat Ay, Jesse van Oostrum
  • for: 这篇论文研究了鱼asser-Rao梯度,也称为自然梯度,是证据下界的证据,它在变换自适应机器、海尔曼机器和自由能原理中扮演重要的角色。
  • methods: 该论文使用了变换自适应机器、海尔曼机器和自由能原理中的各种方法,包括使用鱼asser-Rao梯度的自然梯度和库拉克-莱布尔差分的自然梯度。
  • results: 研究发现,在满足某些条件下,最小化 prime objective function 和最大化 ELBO 都可以 Ensure the equivalence of minimizing the prime objective function and maximizing the ELBO.
    Abstract This article studies the Fisher-Rao gradient, also referred to as the natural gradient, of the evidence lower bound, the ELBO, which plays a crucial role within the theory of the Variational Autonecoder, the Helmholtz Machine and the Free Energy Principle. The natural gradient of the ELBO is related to the natural gradient of the Kullback-Leibler divergence from a target distribution, the prime objective function of learning. Based on invariance properties of gradients within information geometry, conditions on the underlying model are provided that ensure the equivalence of minimising the prime objective function and the maximisation of the ELBO.
    摘要

Leveraging arbitrary mobile sensor trajectories with shallow recurrent decoder networks for full-state reconstruction

  • paper_url: http://arxiv.org/abs/2307.11793
  • repo_url: None
  • paper_authors: Megan R. Ebers, Jan P. Williams, Katherine M. Steele, J. Nathan Kutz
  • for: 这种论文的目的是提出一种基于深度学习的模型,用于在动态系统中进行感知和状态估计。
  • methods: 这种模型使用了长短时冲Memery(LSTM)网络和推理器,将流动感知器的轨迹信息转化为全状态空间估计。
  • results: 试验表明,这种模型可以准确地重建全状态空间,并且在不同的动态轨迹下可以快速适应。此外,模型还可以减少抽象误差的方差,并且可以在训练数据外进行快速推理。
    Abstract Sensing is one of the most fundamental tasks for the monitoring, forecasting and control of complex, spatio-temporal systems. In many applications, a limited number of sensors are mobile and move with the dynamics, with examples including wearable technology, ocean monitoring buoys, and weather balloons. In these dynamic systems (without regions of statistical-independence), the measurement time history encodes a significant amount of information that can be extracted for critical tasks. Most model-free sensing paradigms aim to map current sparse sensor measurements to the high-dimensional state space, ignoring the time-history all together. Using modern deep learning architectures, we show that a sequence-to-vector model, such as an LSTM (long, short-term memory) network, with a decoder network, dynamic trajectory information can be mapped to full state-space estimates. Indeed, we demonstrate that by leveraging mobile sensor trajectories with shallow recurrent decoder networks, we can train the network (i) to accurately reconstruct the full state space using arbitrary dynamical trajectories of the sensors, (ii) the architecture reduces the variance of the mean-square error of the reconstruction error in comparison with immobile sensors, and (iii) the architecture also allows for rapid generalization (parameterization of dynamics) for data outside the training set. Moreover, the path of the sensor can be chosen arbitrarily, provided training data for the spatial trajectory of the sensor is available. The exceptional performance of the network architecture is demonstrated on three applications: turbulent flows, global sea-surface temperature data, and human movement biomechanics.
    摘要 感测是观测、预测和控制复杂空间时间系统的基本任务之一。在许多应用中,有限数量的感测器是移动的,其中例子包括佩戴式技术、海洋监测浮标和天气气球。在这些动态系统中(没有独立的统计区域),测量时间历史中含有大量信息,可以EXTRACT FOR CRITICAL TASKS。大多数无模型感测方法忽略时间历史,只是将当前稀缺的感测值映射到高维状态空间中。使用现代深度学习架构,我们显示了一种序列到向量模型,如LSTM(长短期记忆)网络,并将decoder网络与动态轨迹信息结合使用。我们可以通过这种方法来:1. 使用移动感测器的动态轨迹,准确重建全状态空间。2. 相比 stationary 感测器,使用这种架构可以降低mean-square error的方差。3. 该架构允许快速通用化(参数化动态),可以在训练集外进行数据处理。此外,感测器的路径可以随意选择,只需要提供感测器的空间轨迹训练数据即可。我们在三个应用中展示了该网络架构的出色表现:液体动力学、全球海面温度数据和人体运动生物力学。

On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments

  • paper_url: http://arxiv.org/abs/2307.11242
  • repo_url: None
  • paper_authors: Shruti R. Kulkarni, Aaron Young, Prasanna Date, Narasinga Rao Miniskar, Jeffrey S. Vetter, Farah Fahim, Benjamin Parpillon, Jennet Dickinson, Nhan Tran, Jieun Yoo, Corrinne Mills, Morris Swartz, Petar Maksimovic, Catherine D. Schuman, Alice Bean
  • for: 这个研究探讨了基于神经omorphic计算的脉冲神经网络(SNN)模型,用于高能物理实验中的传感器数据筛选。
  • methods: 我们提出了一种压缩型神经omorphic模型,用于基于粒子的横向动量进行数据筛选,以减少下游电子设备接收的数据量。我们将入来的电荷波形转换为二进制值事件流,然后由SNN进行处理。
  • results: 我们的研究表明,使用进化算法和优化的超参数,SNN可以达到约91%的信号效率,并且减少了大约一半的参数量,相比深度神经网络。
    Abstract This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.
    摘要

Edgewise outliers of network indexed signals

  • paper_url: http://arxiv.org/abs/2307.11239
  • repo_url: https://github.com/kristats/spout
  • paper_authors: Christopher Rieser, Anne Ruiz-Gazen, Christine Thomas-Agnan
  • for: 本文旨在提出模型,用于处理网络索引多变量数据,包括变量之间的依赖关系和图节点之间的关系。
  • methods: 本文使用了修改后的MCD算法,以检测图像中的异常点。首先,我们计算了一些平方 Mahalanobis 距离的分布,以便定制检测规则和阈值。然后,我们提出了一种Robust MCD算法,称为 Edgewise MCD。
  • results: 在模拟数据和实际数据集上,我们的方法可以准确地检测图像中的异常点。同时,我们还证明了在不考虑依赖关系时,检测异常点的方法可能会导致 false positive 的出现。
    Abstract We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes. In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobis distances that can be used to fix detection rules and thresholds for outlier detection. We then propose a robust version of the deterministic MCD algorithm that we call edgewise MCD. An application on simulated data shows the interest of taking the dependence structure into account. We also illustrate the utility of the proposed method with a real data set.
    摘要 我们考虑网络索引多变量数据中存在变量之间和图节点之间的依赖关系。在这个框架下,我们关注异常检测,并引入边缘异常概念。为此,我们首先计算一些平方差的分布,特别是方差距离的平方的分布,可以用于固定检测规则和阈值 для异常检测。然后,我们提出了一种robust版本的权重MCD算法,称为边缘MCD。在对模拟数据进行应用后,我们发现了考虑依赖结构的利好。此外,我们还通过实际数据集来证明方法的实用性。

QDC: Quantum Diffusion Convolution Kernels on Graphs

  • paper_url: http://arxiv.org/abs/2307.11234
  • repo_url: None
  • paper_authors: Thomas Markovich
  • for: 提高图像预测精度
  • methods: 基于量子扩散概念的新卷积核心(Quantum Diffusion Convolution,QDC)和传统的 combinatorial Laplacian 的多尺度组合
  • results: 在多种数据集上,与类似方法相比,QDC 能够提高预测性能
    Abstract Graph convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.
    摘要 图 convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.Here's the translation in Traditional Chinese:同步 convolutional neural networks (GCNs) operate by aggregating messages over local neighborhoods given the prediction task under interest. Many GCNs can be understood as a form of generalized diffusion of input features on the graph, and significant work has been dedicated to improving predictive accuracy by altering the ways of message passing. In this work, we propose a new convolution kernel that effectively rewires the graph according to the occupation correlations of the vertices by trading on the generalized diffusion paradigm for the propagation of a quantum particle over the graph. We term this new convolution kernel the Quantum Diffusion Convolution (QDC) operator. In addition, we introduce a multiscale variant that combines messages from the QDC operator and the traditional combinatorial Laplacian. To understand our method, we explore the spectral dependence of homophily and the importance of quantum dynamics in the construction of a bandpass filter. Through these studies, as well as experiments on a range of datasets, we observe that QDC improves predictive performance on the widely used benchmark datasets when compared to similar methods.

From Adaptive Query Release to Machine Unlearning

  • paper_url: http://arxiv.org/abs/2307.11228
  • repo_url: None
  • paper_authors: Enayat Ullah, Raman Arora
  • for: 这个论文主要关注的问题是机器学习忘记(machine unlearning),即设计高效的忘记算法,以应对学习算法的选择性查询。
  • methods: 论文提出了高效的忘记算法,包括线性和预处理查询类型的算法。
  • results: 论文得到了在多个应用中的改进保证,特别是在随机凸优化(SCO)等问题中。在不同的损失函数下,论文得到了不同的忘记算法,其中对于光滑 lipschitz 损失函数和任意 $\rho>0$,论文得到了忘记算法的质量损失风险为 $\tilde O\big(\frac{1}{\sqrt{n}+\frac{\sqrt{d}{n\rho}\big)$,并且忘记查询(梯度)复杂度为 $\tilde O(\rho \cdot \text{Retraining Complexity})$。
    Abstract We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes. We give efficient unlearning algorithms for linear and prefix-sum query classes. As applications, we show that unlearning in many problems, in particular, stochastic convex optimization (SCO), can be reduced to the above, yielding improved guarantees for the problem. In particular, for smooth Lipschitz losses and any $\rho>0$, our results yield an unlearning algorithm with excess population risk of $\tilde O\big(\frac{1}{\sqrt{n}+\frac{\sqrt{d}{n\rho}\big)$ with unlearning query (gradient) complexity $\tilde O(\rho \cdot \text{Retraining Complexity})$, where $d$ is the model dimensionality and $n$ is the initial number of samples. For non-smooth Lipschitz losses, we give an unlearning algorithm with excess population risk $\tilde O\big(\frac{1}{\sqrt{n}+\big(\frac{\sqrt{d}{n\rho}\big)^{1/2}\big)$ with the same unlearning query (gradient) complexity. Furthermore, in the special case of Generalized Linear Models (GLMs), such as those in linear and logistic regression, we get dimension-independent rates of $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{2/3}\big)$ and $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{1/3}\big)$ for smooth Lipschitz and non-smooth Lipschitz losses respectively. Finally, we give generalizations of the above from one unlearning request to \textit{dynamic} streams consisting of insertions and deletions.
    摘要 我们正式化机器学习忘记问题为设计高效的忘记算法,与学习算法相似的选择适应查询。我们提供了高效的忘记算法 для线性和预先构成查询类。我们还证明了忘记在许多问题中,特别是随机凸似对映运算(SCO),可以被简化为上述问题,从而获得改善的保证。具体而言,对于光滑 lipschitz 损失函数并且任意 $\rho>0$,我们的结果提供了一个忘记算法,其错失人口风险为 $\tilde O\big(\frac{1}{\sqrt{n}+\frac{\sqrt{d}{n\rho}\big)$,忘记查询(Gradient)复杂度为 $\tilde O(\rho \cdot \text{Retraining Complexity})$。对非光滑 lipschitz 损失函数,我们提供了一个忘记算法,其错失人口风险为 $\tilde O\big(\frac{1}{\sqrt{n}+\big(\frac{\sqrt{d}{n\rho}\big)^{1/2}\big)$,忘记查询复杂度仍为 $\tilde O(\rho \cdot \text{Retraining Complexity})$。尤其是,在通用化线性模型(GLM)中,例如线性和对数回传 regression,我们得到了维度独立的约 $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{2/3}\big)$ 和 $\tilde O\big(\frac{1}{\sqrt{n} +\frac{1}{(n\rho)^{1/3}\big)$ 的错失人口风险。最后,我们将上述结果推广到多个忘记请求的情况,包括动态流过中的插入和删除。

Quantum Convolutional Neural Networks with Interaction Layers for Classification of Classical Data

  • paper_url: http://arxiv.org/abs/2307.11792
  • repo_url: None
  • paper_authors: Jishnu Mahmud, Raisa Mashtura, Shaikh Anowarul Fattah, Mohammad Saquib
  • for: 研究多比特交互对量子神经网络的影响,提高表达能力和嵌入能力。
  • methods: 提出了一种量子卷积网络,利用三元器件交互层,提高网络的表达能力和嵌入能力。
  • results: 在MNIST、Fashion MNIST和芳香 dataset上进行了二分和多分类预测,并与现有状态 искусственный智能方法进行比较,结果表明该方法可以超越现有的状态 искусственный智能方法。
    Abstract Quantum Machine Learning (QML) has come into the limelight due to the exceptional computational abilities of quantum computers. With the promises of near error-free quantum computers in the not-so-distant future, it is important that the effect of multi-qubit interactions on quantum neural networks is studied extensively. This paper introduces a Quantum Convolutional Network with novel Interaction layers exploiting three-qubit interactions increasing the network's expressibility and entangling capability, for classifying both image and one-dimensional data. The proposed approach is tested on three publicly available datasets namely MNIST, Fashion MNIST, and Iris datasets, to perform binary and multiclass classifications and is found to supersede the performance of the existing state-of-the-art methods.
    摘要 量子机器学习(QML)因量子计算机的特殊计算能力而受到关注。随着近Error-free量子计算机的未来降低,研究多比特交互对量子神经网络的影响非常重要。本文提出了一种具有新型互动层的量子卷积网络,利用三比特交互提高网络的表达能力和排序能力,用于图像和一维数据的分类。该方法在MNIST、Fashion MNIST和爬行 datasets上进行了三类和多类分类测试,并被证明超过了现有状态的方法表现。

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

  • paper_url: http://arxiv.org/abs/2307.11224
  • repo_url: None
  • paper_authors: Michael Günther, Louis Milliken, Jonathan Geuter, Georgios Mastrapas, Bo Wang, Han Xiao
  • for: 本研究的目的是开发一种高性能的句子嵌入模型,能够将不同的文本输入翻译为数字表示,捕捉文本的 semantic essence。
  • methods: 本研究使用了高质量的对Alignment和 triplet dataset,进行模型训练。同时, authors also constructed a novel dataset of negated and non-negated statements,以提高模型对否定语言的识别能力。
  • results: 本研究通过 Massive Textual Embedding Benchmark (MTEB) 进行了广泛的性能评估,并取得了优异的结果。 In addition, the authors also provided a publicly available dataset of negated and non-negated statements, which can be used to improve the model’s ability to recognize negations.
    Abstract Jina Embeddings constitutes a set of high-performance sentence embedding models adept at translating various textual inputs into numerical representations, thereby capturing the semantic essence of the text. The models excel in applications such as dense retrieval and semantic textual similarity. This paper details the development of Jina Embeddings, starting with the creation of high-quality pairwise and triplet datasets. It underlines the crucial role of data cleaning in dataset preparation, gives in-depth insights into the model training process, and concludes with a comprehensive performance evaluation using the Massive Textual Embedding Benchmark (MTEB). To increase the model's awareness of negations, we constructed a novel training and evaluation dataset of negated and non-negated statements, which we make publicly available to the community.
    摘要

FairMobi-Net: A Fairness-aware Deep Learning Model for Urban Mobility Flow Generation

  • paper_url: http://arxiv.org/abs/2307.11214
  • repo_url: None
  • paper_authors: Zhewei Liu, Lipai Huang, Chao Fan, Ali Mostafavi
  • for: 帮助研究者和实践者更好地理解城市结构和人口活动模式,并且实现重要的城市规划和管理应用。
  • methods: 提出了一种新的、具有公平性的深度学习模型,即 FairMobi-Net,用于跨地区人流预测。该模型独特地将公平性损失添加到损失函数中,并使用了混合方法,将二分类和数值回归技术结合在一起 для人流预测。
  • results: 对四个美国城市的人流数据进行了详细验证,并发现 FairMobi-Net 模型在不同地区之间的人流预测中具有更高的准确性和公平性。模型在不同地区之间的人流预测中具有更高的稳定性和可靠性,并且能够更好地捕捉人们在不同地区之间的流动性。
    Abstract Generating realistic human flows across regions is essential for our understanding of urban structures and population activity patterns, enabling important applications in the fields of urban planning and management. However, a notable shortcoming of most existing mobility generation methodologies is neglect of prediction fairness, which can result in underestimation of mobility flows across regions with vulnerable population groups, potentially resulting in inequitable resource distribution and infrastructure development. To overcome this limitation, our study presents a novel, fairness-aware deep learning model, FairMobi-Net, for inter-region human flow prediction. The FairMobi-Net model uniquely incorporates fairness loss into the loss function and employs a hybrid approach, merging binary classification and numerical regression techniques for human flow prediction. We validate the FairMobi-Net model using comprehensive human mobility datasets from four U.S. cities, predicting human flow at the census-tract level. Our findings reveal that the FairMobi-Net model outperforms state-of-the-art models (such as the DeepGravity model) in producing more accurate and equitable human flow predictions across a variety of region pairs, regardless of regional income differences. The model maintains a high degree of accuracy consistently across diverse regions, addressing the previous fairness concern. Further analysis of feature importance elucidates the impact of physical distances and road network structures on human flows across regions. With fairness as its touchstone, the model and results provide researchers and practitioners across the fields of urban sciences, transportation engineering, and computing with an effective tool for accurate generation of human mobility flows across regions.
    摘要 Generating realistic human flows across regions is essential for understanding urban structures and population activity patterns, enabling important applications in urban planning and management. However, most existing mobility generation methodologies neglect prediction fairness, which can result in underestimation of mobility flows across regions with vulnerable population groups, potentially leading to inequitable resource distribution and infrastructure development. To address this limitation, our study presents a novel, fairness-aware deep learning model, FairMobi-Net, for inter-region human flow prediction. The FairMobi-Net model incorporates fairness loss into the loss function and employs a hybrid approach, merging binary classification and numerical regression techniques for human flow prediction. We validate the FairMobi-Net model using comprehensive human mobility datasets from four U.S. cities, predicting human flow at the census-tract level. Our findings show that the FairMobi-Net model outperforms state-of-the-art models (such as the DeepGravity model) in producing more accurate and equitable human flow predictions across a variety of region pairs, regardless of regional income differences. The model maintains a high degree of accuracy consistently across diverse regions, addressing the previous fairness concern. Further analysis of feature importance reveals the impact of physical distances and road network structures on human flows across regions. With fairness as its touchstone, the model and results provide researchers and practitioners across the fields of urban sciences, transportation engineering, and computing with an effective tool for accurate generation of human mobility flows across regions.

The Effect of Epidemiological Cohort Creation on the Machine Learning Prediction of Homelessness and Police Interaction Outcomes Using Administrative Health Care Data

  • paper_url: http://arxiv.org/abs/2307.11211
  • repo_url: https://github.com/Fuzzy-sh/Machine-Learning-Risk-Estimation-and-Prediction-of-Homelessness-and-Police-interaction
  • paper_authors: Faezehsadat Shahidi, M. Ethan MacDonald, Dallas Seitz, Geoffrey Messier
  • For: The paper aims to identify factors associated with initial homelessness and police interaction among individuals with addiction or mental health (AMH) diagnoses, and to evaluate the performance of different predictive models using flexible and fixed observation windows.* Methods: The study uses an administrative healthcare dataset from Calgary, Alberta, Canada, comprising 240,219 individuals diagnosed with AMH between April 1, 2013, and March 31, 2018. The cohort is followed for 2 years to identify factors associated with homelessness and police interactions. The authors compare the performance of logistic regression (LR) and machine learning (ML) models, including random forests (RF) and extreme gradient boosting (XGBoost), in two cohorts with fixed and flexible observation windows.* Results: The study finds that male sex, substance disorder, psychiatrist visits, and drug abuse are associated with initial homelessness and police interaction. The authors also demonstrate that XGBoost shows superior performance using the flexible method, with sensitivity and AUC values of 91% and 90%, respectively, for initial homelessness, and 90% and 89%, respectively, for initial police interaction.
    Abstract Background: Mental illness can lead to adverse outcomes such as homelessness and police interaction and understanding of the events leading up to these adverse outcomes is important. Predictive models may help identify individuals at risk of such adverse outcomes. Using a fixed observation window cohort with logistic regression (LR) or machine learning (ML) models can result in lower performance when compared with adaptive and parcellated windows. Method: An administrative healthcare dataset was used, comprising of 240,219 individuals in Calgary, Alberta, Canada who were diagnosed with addiction or mental health (AMH) between April 1, 2013, and March 31, 2018. The cohort was followed for 2 years to identify factors associated with homelessness and police interactions. To understand the benefit of flexible windows to predictive models, an alternative cohort was created. Then LR and ML models, including random forests (RF), and extreme gradient boosting (XGBoost) were compared in the two cohorts. Results: Among 237,602 individuals, 0.8% (1,800) experienced first homelessness, while 0.32% (759) reported initial police interaction among 237,141 individuals. Male sex (AORs: H=1.51, P=2.52), substance disorder (AORs: H=3.70, P=2.83), psychiatrist visits (AORs: H=1.44, P=1.49), and drug abuse (AORs: H=2.67, P=1.83) were associated with initial homelessness (H) and police interaction (P). XGBoost showed superior performance using the flexible method (sensitivity =91%, AUC =90% for initial homelessness, and sensitivity =90%, AUC=89% for initial police interaction) Conclusion: This study identified key features associated with initial homelessness and police interaction and demonstrated that flexible windows can improve predictive modeling.
    摘要 背景:心理疾病可能导致不良结果,如失Homelessness和与警察的交往,了解这些不良结果的发展过程是重要的。预测模型可能帮助 identificidadividuals at risk of such adverse outcomes。使用固定观察窗口 cohort with logistic regression (LR) or machine learning (ML) models可能会导致性能下降相比之下 flexible and parcellated windows。方法:使用了一个行政医疗数据集,包括2013年4月1日至2018年3月31日在加拿大阿尔伯塔省加尔法瑞市的240,219名患有情绪或心理健康(AMH)的个体。这个 cohort 被跟踪了2年,以便发现与失Homelessness和与警察的交往相关的因素。为了理解柔性窗口对预测模型的好处,一个alternative cohort 被创建。然后,LR和ML模型,包括随机森林(RF)和极限差分boosting(XGBoost)在两个 cohort 中进行了比较。结果:在237,602个个体中,0.8%(1,800)经历了首次失Homelessness,而0.32%(759)在237,141个个体中首次与警察交往。男性性别(AORs:H=1.51,P=2.52)、物质过度(AORs:H=3.70,P=2.83)、心理医生访问(AORs:H=1.44,P=1.49)和药物滥用(AORs:H=2.67,P=1.83)与初始失Homelessness(H)和初始与警察交往(P)相关。XGBoost 在柔性方法下表现出优于性能(敏感性 =91%, ROC =90% для初始失Homelessness,以及敏感性 =90%, ROC =89% для初始与警察交往)。结论:本研究确定了初始失Homelessness和初始与警察交往的关键特征,并证明了柔性窗口可以改善预测模型。

Clinical Trial Active Learning

  • paper_url: http://arxiv.org/abs/2307.11209
  • repo_url: https://github.com/olivesgatech/clinical-trial-active-learning
  • paper_authors: Zoe Fowler, Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib
  • for: 这篇论文提出了一种新的活动学习方法,能够考虑临床试验中数据的非独立同分布(非i.i.d)结构。
  • methods: 该方法使用了 prospectively active learning,在医学试验中采集数据时,condition on the time an image was collected,以维护i.i.d.假设。
  • results: Comparing with传统的active learning方法,prospective active learning在两种不同的测试环境中表现出了更好的性能。
    Abstract This paper presents a novel approach to active learning that takes into account the non-independent and identically distributed (non-i.i.d.) structure of a clinical trial setting. There exists two types of clinical trials: retrospective and prospective. Retrospective clinical trials analyze data after treatment has been performed; prospective clinical trials collect data as treatment is ongoing. Typically, active learning approaches assume the dataset is i.i.d. when selecting training samples; however, in the case of clinical trials, treatment results in a dependency between the data collected at the current and past visits. Thus, we propose prospective active learning to overcome the limitations present in traditional active learning methods and apply it to disease detection in optical coherence tomography (OCT) images, where we condition on the time an image was collected to enforce the i.i.d. assumption. We compare our proposed method to the traditional active learning paradigm, which we refer to as retrospective in nature. We demonstrate that prospective active learning outperforms retrospective active learning in two different types of test settings.
    摘要

Heuristic Hyperparameter Choice for Image Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.11197
  • repo_url: None
  • paper_authors: Zeyu Jiang, João P. C. Bertoldo, Etienne Decencière
  • for: 这个论文主要针对图像异常检测(AD)中使用深度学习神经网络,以提高图像异常检测的精度和效率。
  • methods: 本文使用了预训练模型提取的深度特征,并使用NPCA维度减少算法进行维度减少。文中还提出了一些优化NPCA算法参数的启发方法,以优化维度减少后的性能。
  • results: 经过NPCA维度减少后,图像异常检测的性能得到了提高,而且可以避免大量的维度缩放。同时,NPCA algorithm的选择参数也对性能产生了一定的影响。
    Abstract Anomaly detection (AD) in images is a fundamental computer vision problem by deep learning neural network to identify images deviating significantly from normality. The deep features extracted from pretrained models have been proved to be essential for AD based on multivariate Gaussian distribution analysis. However, since models are usually pretrained on a large dataset for classification tasks such as ImageNet, they might produce lots of redundant features for AD, which increases computational cost and degrades the performance. We aim to do the dimension reduction of Negated Principal Component Analysis (NPCA) for these features. So we proposed some heuristic to choose hyperparameter of NPCA algorithm for getting as fewer components of features as possible while ensuring a good performance.
    摘要 <> translate "Anomaly detection (AD) in images is a fundamental computer vision problem by deep learning neural network to identify images deviating significantly from normality. The deep features extracted from pretrained models have been proved to be essential for AD based on multivariate Gaussian distribution analysis. However, since models are usually pretrained on a large dataset for classification tasks such as ImageNet, they might produce lots of redundant features for AD, which increases computational cost and degrades the performance. We aim to do the dimension reduction of Negated Principal Component Analysis (NPCA) for these features. So we proposed some heuristic to choose hyperparameter of NPCA algorithm for getting as fewer components of features as possible while ensuring a good performance." into Simplified Chinese.翻译文本“图像异常检测(AD)是计算机视觉领域的基本问题,使用深度学习神经网络来标识图像异常的情况。深度特征从预训练模型中提取出来已经证明是AD的关键因素,基于多元 Gaussian 分布分析。但是,由于模型通常在大量数据集上进行分类任务,如 ImageNet,因此可能生成大量的冗余特征,从而增加计算成本并降低性能。我们想使用 Negated Principal Component Analysis(NPCA)维度减少这些特征。因此,我们提出了一些启发来选择 NPCA 算法中的 гиперпарамет。”into Simplified Chinese.Here's the translation:“图像异常检测(AD)是计算机视觉领域的基本问题,通过深度学习神经网络来标识图像异常的情况。深度特征从预训练模型中提取出来已经证明是AD的关键因素,基于多元 Gaussian 分布分析。但是,由于模型通常在大量数据集上进行分类任务,如 ImageNet,因此可能生成大量的冗余特征,从而增加计算成本并降低性能。我们想使用 Negated Principal Component Analysis(NPCA)维度减少这些特征。因此,我们提出了一些启发来选择 NPCA 算法中的 гиперпарамет。”

Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment

  • paper_url: http://arxiv.org/abs/2307.11166
  • repo_url: None
  • paper_authors: Vaddadi Sai Rahul, Debajyoti Chakraborty
  • for: 本文目的是对 continuous control 环境下的值基方法进行比较,并使用 MuJoCo физи学引擎进行实验,以揭示每个任务的观察空间、动作空间、奖励等细节。
  • methods: 本文使用了 Q-学习和 SARSA 两种值基方法作为基准,并使用 DDPG 作为一种 state-of-the-art 深度政策梯度方法进行比较。
  • results: Q-学习在大量集数据下表现较好,但 DDPG 在一些集数据下表现更好,且在几个集数据下可以快速达到适当的奖励平均值。此外,文章还评估了模型超参数的调整,并预计通过增加时间和资源可以进一步提高性能。
    Abstract We leverage the fast physics simulator, MuJoCo to run tasks in a continuous control environment and reveal details like the observation space, action space, rewards, etc. for each task. We benchmark value-based methods for continuous control by comparing Q-learning and SARSA through a discretization approach, and using them as baselines, progressively moving into one of the state-of-the-art deep policy gradient method DDPG. Over a large number of episodes, Qlearning outscored SARSA, but DDPG outperformed both in a small number of episodes. Lastly, we also fine-tuned the model hyper-parameters expecting to squeeze more performance but using lesser time and resources. We anticipated that the new design for DDPG would vastly improve performance, yet after only a few episodes, we were able to achieve decent average rewards. We expect to improve the performance provided adequate time and computational resources.
    摘要 我们利用快速物理模拟器MuJoCo来运行任务在连续控制环境中,并揭示任务的观察空间、动作空间、奖励等细节。我们对连续控制方法进行了价值基础比较,通过精度逐步逼近一个现有的深度策略方法DDPG。经过大量的集数,Q学习超越了SARSA,但DDPG在一些集数中超越了两者。最后,我们还进行了模型参数的微调,以适应更好的性能,但是使用更少的时间和资源。我们预计新的DDPG设计将大幅提高性能,但只需几集数就能达到了相当的平均奖励。我们预计通过充足的时间和计算资源,可以进一步提高性能。

Data-driven criteria for quantum correlations

  • paper_url: http://arxiv.org/abs/2307.11091
  • repo_url: None
  • paper_authors: Mateusz Krawczyk, Jarosław Pawłowski, Maciej M. Maśka, Katarzyna Roszak
  • for: 这个论文目的是检测三个量子比特系统中的相关性,使用基于无监督学习的神经网络。
  • methods: 这个论文使用了一种叫做”强制认知”的方法,强制神经网络认识分解性状态。在这种方法下,相关性被检测为异常。
  • results: 研究发现,该检测器在检测量子不紧密相关性(quantum discord)时表现很好,而在检测量子同步相关性(entanglement)时表现不尽如人们所料。实际上,它会大大过分检测同步相关性,而对分解相关性的检测则很准确。
    Abstract We build a machine learning model to detect correlations in a three-qubit system using a neural network trained in an unsupervised manner on randomly generated states. The network is forced to recognize separable states, and correlated states are detected as anomalies. Quite surprisingly, we find that the proposed detector performs much better at distinguishing a weaker form of quantum correlations, namely, the quantum discord, than entanglement. In fact, it has a tendency to grossly overestimate the set of entangled states even at the optimal threshold for entanglement detection, while it underestimates the set of discordant states to a much lesser extent. In order to illustrate the nature of states classified as quantum-correlated, we construct a diagram containing various types of states -- entangled, as well as separable, both discordant and non-discordant. We find that the near-zero value of the recognition loss reproduces the shape of the non-discordant separable states with high accuracy, especially considering the non-trivial shape of this set on the diagram. The network architecture is designed carefully: it preserves separability, and its output is equivariant with respect to qubit permutations. We show that the choice of architecture is important to get the highest detection accuracy, much better than for a baseline model that just utilizes a partial trace operation.
    摘要 我们建立了一个机器学习模型,用于检测三量子系统中的相关性,使用一个无监督的神经网络在随机生成的态上进行训练。网络强制地认可分解态,相关态被检测为异常。 surprisingly,我们发现,我们提议的检测器在分解态和非分解态之间的分界点上表现比较好,而不是在共聚态和非共聚态之间的分界点上。实际上,它在优化的阈值下对共聚态进行检测时会大大过分,而对非分解态的检测则比较准确。为了 illustrate the nature of 被分类为量子相关的态,我们构建了一个包含多种态的图表,包括共聚态、分解态、不同的分orde和非分orde态。我们发现, near-zero值的认可损失很准确地复制了非分orde分解态的形状,特别是考虑到这些态的非rivial形状。我们注意到,网络的建立是非常重要,以获得最高的检测精度。我们采用了一种细心设计的网络结构,它保留了分解性,并且输出是对 qubit permutation 的对称变换的。我们显示,这种选择的网络结构可以达到最高的检测精度,远胜于基eline 模型,只使用 partial trace 操作。

PAPR: Proximity Attention Point Rendering

  • paper_url: http://arxiv.org/abs/2307.11086
  • repo_url: None
  • paper_authors: Yanshu Zhang, Shichong Peng, Alireza Moazeni, Ke Li
  • for: 学习 scene 表面的准确和简洁点云表示方法。
  • methods: 我们提出了 Proximity Attention Point Rendering (PAPR) 方法,它包括一个点云表示和可微分渲染器。我们的点云表示使用每个点的空间位置、前景得分和视角独立特征向量来Characterize each point。渲染器选择每个辐线上 relevante 点并使用它们相关的特征生成准确的颜色。
  • results: PAPR 方法可以准确地学习点云位置,以表示正确的 scene geometry,即使初始化与目标geometry差异很大。此外,我们的方法还能够捕捉细 texture 细节,只用parsimonious 的点云数据。我们还实现了 geometry 编辑、物体操作、Texture 传输和曝光控制等四种实用应用。更多结果和代码可以在我们项目网站上找到:https://zvict.github.io/papr/.
    Abstract Learning accurate and parsimonious point cloud representations of scene surfaces from scratch remains a challenge in 3D representation learning. Existing point-based methods often suffer from the vanishing gradient problem or require a large number of points to accurately model scene geometry and texture. To address these limitations, we propose Proximity Attention Point Rendering (PAPR), a novel method that consists of a point-based scene representation and a differentiable renderer. Our scene representation uses a point cloud where each point is characterized by its spatial position, foreground score, and view-independent feature vector. The renderer selects the relevant points for each ray and produces accurate colours using their associated features. PAPR effectively learns point cloud positions to represent the correct scene geometry, even when the initialization drastically differs from the target geometry. Notably, our method captures fine texture details while using only a parsimonious set of points. We also demonstrate four practical applications of our method: geometry editing, object manipulation, texture transfer, and exposure control. More results and code are available on our project website at https://zvict.github.io/papr/.
    摘要 学习精确且简洁的点云表示Scene表面从头条仍然是3D表示学习中的挑战。现有的点云方法经常会遇到消失梯度问题或需要很多点云来准确地模型Scene的geometry和文本ure。为了解决这些限制,我们提出了Proximity Attention Point Rendering(PAPR),一种新的方法,其包括点云表示和可导 Renderer。我们的Scene表示使用一个点云,每个点被定义为其空间位置、前景分数和视 independente特征向量。Renderer选择每个辐射中的相关点,并使用它们关联的特征来生成高精度颜色。PAPR有效地学习点云位置来表示正确的Scene geometry,即使初始化与目标geometry有很大差异。此外,我们的方法能够捕捉细节Texture detail,使用只需要一小部分的点云。我们还展示了PAPR的四个实用应用:geometry编辑、物体操作、Texture传输和曝光控制。更多结果和代码可以在我们项目网站https://zvict.github.io/papr/中找到。

Representation Learning in Anomaly Detection: Successes, Limits and a Grand Challenge

  • paper_url: http://arxiv.org/abs/2307.11085
  • repo_url: None
  • paper_authors: Yedid Hoshen
  • for: 这篇论文主要写于异常检测领域, argue that 异常检测领域的主流思想无法无限扩展,将 eventually 遇到基础限制。
  • methods: 这篇论文使用了 “no free lunch” 原理来解释异常检测领域的限制。当有强大任务优先的情况下,这些限制可以被突破。
  • results: 文章提出了两个异常检测挑战 зада题:一是科学发现通过异常检测,二是ImageNet dataset中最异常的图像检测挑战。文章认为,新的异常检测工具和想法需要被开发,以解决这些挑战。
    Abstract In this perspective paper, we argue that the dominant paradigm in anomaly detection cannot scale indefinitely and will eventually hit fundamental limits. This is due to the a no free lunch principle for anomaly detection. These limitations can be overcome when there are strong tasks priors, as is the case for many industrial tasks. When such priors do not exists, the task is much harder for anomaly detection. We pose two such tasks as grand challenges for anomaly detection: i) scientific discovery by anomaly detection ii) a "mini-grand" challenge of detecting the most anomalous image in the ImageNet dataset. We believe new anomaly detection tools and ideas would need to be developed to overcome these challenges.
    摘要 在这篇观点论文中,我们 argue That the dominant paradigm in anomaly detection cannot be scaled indefinitely and will eventually reach fundamental limits. This is due to the a no free lunch principle for anomaly detection. These limitations can be overcome when there are strong task priors, as is the case for many industrial tasks. When such priors do not exist, the task is much harder for anomaly detection. We pose two such tasks as grand challenges for anomaly detection: i) scientific discovery by anomaly detection ii) a "mini-grand" challenge of detecting the most anomalous image in the ImageNet dataset. We believe new anomaly detection tools and ideas would need to be developed to overcome these challenges.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. If you prefer Traditional Chinese, I can provide that as well.

GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos

  • paper_url: http://arxiv.org/abs/2307.11081
  • repo_url: https://github.com/nisargshah1999/glsformer
  • paper_authors: Nisarg A. Shah, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel
  • for: automated surgical step recognition, 自动化手术步骤识别
  • methods: 使用视transformer学习 directly from sequence of frame-level patches, incorporates a gated-temporal attention mechanism, 并在两个猫眼手术视频数据集上进行了广泛评估
  • results: 比various state-of-the-art methods superior performance, validate the suitability of our proposed approach for automated surgical step recognition, Results: 在两个猫眼手术视频数据集上进行了广泛评估,并证明了我们提议的方法适用于自动化手术步骤识别。Here’s the full text in Simplified Chinese:
  • for: 本研究旨在提出一种基于视transformer的自动化手术步骤识别方法,以提高患者安全性和决策过程中的手术步骤识别精度。
  • methods: 我们的方法使用视transformer学习 directly from sequence of frame-level patches,并在这些patches中嵌入精度的spatio-temporal feature。我们还提出了一种闭合时间注意力机制,可以智能地结合短期和长期的spatio-temporal特征表示。
  • results: 我们在两个猫眼手术视频数据集上进行了广泛评估,并证明了我们提议的方法比various state-of-the-art methods superior performance。这些结果证明了我们的方法适用于自动化手术步骤识别。I hope that helps! Let me know if you have any further questions.
    Abstract Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormer
    摘要 自动化手术步骤识别是一项非常重要的任务,可以大大提高手术过程中病人安全性和决策。现有的现状前沿方法都是 Either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly。然而, jointly learning spatio-temporal features and long-range information 的好处没有得到了考虑。在这篇论文中,我们提出了基于视觉转换器的方法,直接从框架级别的补充帧序列中学习spatio-temporal特征。我们的方法包括一个闭合时间注意力机制,智能地结合短期和长期的spatio-temporal特征表示。我们对两个眼部手术视频数据集,即Cataract-101和D99进行了广泛的评估,并demonstrate了与多种现状前沿方法相比的更好的性能。这些结果证明了我们提出的方法的适用性。我们的代码可以在:https://github.com/nisargshah1999/GLSFormer 中找到。

Brain2Music: Reconstructing Music from Human Brain Activity

  • paper_url: http://arxiv.org/abs/2307.11078
  • repo_url: None
  • paper_authors: Timo I. Denk, Yu Takagi, Takuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, Shinji Nishimoto
  • for: 这项研究旨在通过解剖活动中的脑活动来重构音乐经验,以便更好地理解脑如何解释和表达世界。
  • methods: 这篇论文提出了一种使用功能磁共振成像(fMRI)技术capture脑活动,并使用音乐检索或MusicLM音乐生成模型来重构音乐。
  • results: 研究发现,通过使用这种方法可以重构出与人类Subjects经验的音乐相似的音乐,包括元素如乐器、氛围和情感等Semantic属性。此外,研究还发现了脑活动中不同组织部分对音乐描述文本的表达方式的响应。更多详细的例子可以在https://google-research.github.io/seanet/brain2music中找到。
    Abstract The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derived from fMRI data. The generated music resembles the musical stimuli that human subjects experienced, with respect to semantic properties like genre, instrumentation, and mood. We investigate the relationship between different components of MusicLM and brain activity through a voxel-wise encoding modeling analysis. Furthermore, we discuss which brain regions represent information derived from purely textual descriptions of music stimuli. We provide supplementary material including examples of the reconstructed music at https://google-research.github.io/seanet/brain2music
    摘要 人脑活动重建经验的过程提供了一种独特的视角,描绘如何Brain interprets和表示世界。在这篇论文中,我们介绍了一种使用functional magnetic resonance imaging(fMRI)捕捉的Brain Activity来重建音乐的方法。我们的方法使用 either music retrieval or MusicLM music generation model conditioned on embeddings derived from fMRI data。生成的音乐与人类试验者所经历的音乐具有相同的Semantic properties,如种类、编制和情感。我们通过 voxel-wise encoding modeling analysis来调查MusicLM中不同组件和脑动activities之间的关系。此外,我们还讨论了脑动活动中表示来自文本描述的音乐刺激的Brain regions。我们提供了补充材料,包括重建音乐的示例,可以在https://google-research.github.io/seanet/brain2music中找到。

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

  • paper_url: http://arxiv.org/abs/2307.11077
  • repo_url: https://github.com/liming-ai/AlignDet
  • paper_authors: Ming Li, Jie Wu, Xionghui Wang, Chen Chen, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan
  • for: 提高物体检测算法的性能、通用性和速度
  • methods: 提出了一种名为AlignDet的统一预训练框架,可以适应不同的检测器,以解决预训练和细化训练过程中存在的数据、模型和任务之间的不一致问题
  • results: 经过广泛的实验表明,AlignDet可以在多种协议下 дости得显著的提高,例如检测算法、模型脊梁、数据设定和训练计划等,如图1所示,对于FCOS、RetinaNet、Faster R-CNN和DETR等检测器,AlignDet可以提高5.3 mAP、2.1 mAP、3.3 mAP和2.3 mAP等指标。
    Abstract The paradigm of large-scale pre-training followed by downstream fine-tuning has been widely employed in various object detection algorithms. In this paper, we reveal discrepancies in data, model, and task between the pre-training and fine-tuning procedure in existing practices, which implicitly limit the detector's performance, generalization ability, and convergence speed. To this end, we propose AlignDet, a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. AlignDet decouples the pre-training process into two stages, i.e., image-domain and box-domain pre-training. The image-domain pre-training optimizes the detection backbone to capture holistic visual abstraction, and box-domain pre-training learns instance-level semantics and task-aware concepts to initialize the parts out of the backbone. By incorporating the self-supervised pre-trained backbones, we can pre-train all modules for various detectors in an unsupervised paradigm. As depicted in Figure 1, extensive experiments demonstrate that AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule. For example, AlignDet improves FCOS by 5.3 mAP, RetinaNet by 2.1 mAP, Faster R-CNN by 3.3 mAP, and DETR by 2.3 mAP under fewer epochs.
    摘要 大量预训练后下游精度定义的框架在各种对象检测算法中广泛应用。在这篇论文中,我们发现了预训练和精度定义过程中的数据、模型和任务之间的不一致,这些不一致限制了检测器的性能、泛化能力和收敛速度。为解决这些问题,我们提出了AlignDet,一个可适应不同检测器的统一预训练框架。AlignDet将预训练过程分成两个阶段:图像领域预训练和框预训练。图像领域预训练使检测后处理网络学习整体视觉抽象,而框预训练学习实例级别的语义和任务相关概念,以初始化检测器的部分。通过将自然语言预训练后的背景 integrate 到检测器中,我们可以在无监督模式下预训练所有模块。根据 Figure 1 的实验结果,AlignDet可以在不同的协议、模型背景、数据设置和训练计划下实现显著的改善。例如,AlignDet可以提高 FCOS 的 mAP 指标by 5.3,RetinaNet 的 mAP 指标by 2.1,Faster R-CNN 的 mAP 指标by 3.3,和 DETR 的 mAP 指标by 2.3,并且在更少的训练 epoch 下达到这些改善。

Effectiveness and predictability of in-network storage cache for scientific workflows

  • paper_url: http://arxiv.org/abs/2307.11069
  • repo_url: None
  • paper_authors: Caitlin Sim, Kesheng Wu, Alex Sim, Inder Monga, Chin Guok, Frank Wurthwein, Diego Davila, Harvey Newman, Justas Balcas
  • for: 这个论文是为了研究大型科学合作中多名科学家访问同一个数据集而减少宽带网络流量和数据访问延迟而写的。
  • methods: 该论文使用了地域数据存储缓存系统来减少宽带网络流量和数据访问延迟。它们还使用了Machine Learning模型来预测缓存行为。
  • results: 该论文通过分析约3TB的运维日志,发现地域缓存系统可以将67.6%的文件请求从宽带网络中除除,并将宽带网络流量减少35.4%(或12.3TB)的平均值。但由于数据访问模式的不同,缓存系统采用了不要将小文件被赋值的策略。此外,该论文还建立了一个可以准确预测缓存行为的Machine Learning模型,这使得该模型有用于未来资源配置和规划研究。
    Abstract Large scientific collaborations often have multiple scientists accessing the same set of files while doing different analyses, which create repeated accesses to the large amounts of shared data located far away. These data accesses have long latency due to distance and occupy the limited bandwidth available over the wide-area network. To reduce the wide-area network traffic and the data access latency, regional data storage caches have been installed as a new networking service. To study the effectiveness of such a cache system in scientific applications, we examine the Southern California Petabyte Scale Cache for a high-energy physics experiment. By examining about 3TB of operational logs, we show that this cache removed 67.6% of file requests from the wide-area network and reduced the traffic volume on wide-area network by 12.3TB (or 35.4%) an average day. The reduction in the traffic volume (35.4%) is less than the reduction in file counts (67.6%) because the larger files are less likely to be reused. Due to this difference in data access patterns, the cache system has implemented a policy to avoid evicting smaller files when processing larger files. We also build a machine learning model to study the predictability of the cache behavior. Tests show that this model is able to accurately predict the cache accesses, cache misses, and network throughput, making the model useful for future studies on resource provisioning and planning.
    摘要 大型科学合作项目经常有多个科学家访问同一组文件进行不同的分析,这会导致访问大量分布在远程位置的数据的重复访问。这些数据访问具有较长延迟时间和吞吐量限制,从宽带网络中传输数据。为了减少宽带网络流量和数据访问延迟时间,地区数据存储缓存已经被设为新的网络服务。为了研究这种缓存系统在科学应用中的效iveness,我们研究了南加利福尼亚州 petabyte 级缓存系统,用于一个高能物理实验。通过分析约 3TB 的操作日志,我们发现这个缓存系统可以从宽带网络中除掉 67.6% 的文件请求,并将每天宽带网络流量减少 12.3TB(或 35.4%)。由于不同的数据访问模式,缓存系统实施了不会将小文件逐出缓存系统的策略。我们还建立了一个机器学习模型,用于研究缓存行为的预测性。测试表明,这个模型能准确预测缓存访问、缓存错误和网络吞吐量,使得这个模型有用于未来的资源配置和规划研究。

A LLM Assisted Exploitation of AI-Guardian

  • paper_url: http://arxiv.org/abs/2307.15008
  • repo_url: None
  • paper_authors: Nicholas Carlini
  • for: 本研究是用GPT-4语言模型评估AI-Guardian防御机器学习攻击的能力。
  • methods: 本研究使用GPT-4语言模型实现攻击AI-Guardian防御机器学习模型的攻击算法,不需要编写任何代码。
  • results: 研究发现,GPT-4语言模型可以快速和高效地生成攻击AI-Guardian防御机器学习模型的代码,并且在某些情况下可以更快速地生成代码 than 本研究的作者。
    Abstract Large language models (LLMs) are now highly capable at a diverse range of tasks. This paper studies whether or not GPT-4, one such LLM, is capable of assisting researchers in the field of adversarial machine learning. As a case study, we evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023, a top computer security conference. We completely break this defense: the proposed scheme does not increase robustness compared to an undefended baseline. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done. We conclude by discussing (1) the warning signs present in the evaluation that suggested to us AI-Guardian would be broken, and (2) our experience with designing attacks and performing novel research using the most recent advances in language modeling.
    摘要 我们没有编写攻击代码,而是使用 GPT-4 实现所有攻击算法,根据我们的指导和指令。这个过程 surprisingly 高效,GPT-4 可以很快地从抽象的指令中生成代码,甚至比作者本身更快。我们 conclude 了 (1) 在评估中存在的警示符,表明 AI-Guardian 将被破坏,以及 (2) 我们使用最新的语言模型技术来设计攻击和执行原创研究的经验。

  • paper_url: http://arxiv.org/abs/2307.11049
  • repo_url: https://github.com/improbable-ai/human-guided-exploration
  • paper_authors: Marcel Torne, Max Balsells, Zihan Wang, Samedh Desai, Tao Chen, Pulkit Agrawal, Abhishek Gupta
  • for: 解决强化学习中的探索和奖励问题,需要精心设计奖励函数或使用新奇探索奖励。
  • methods: 使用低质量的人类反馈,不需要同步高质量的人类反馈,可以在实际世界中进行探索和学习。
  • results: 在模拟和实际世界中,使用人类反馈 guideline 探索,可以学习多种复杂的机器人探索和搅拌任务,而无需手动设计奖励函数或探索奖励。
    Abstract Exploration and reward specification are fundamental and intertwined challenges for reinforcement learning. Solving sequential decision-making tasks requiring expansive exploration requires either careful design of reward functions or the use of novelty-seeking exploration bonuses. Human supervisors can provide effective guidance in the loop to direct the exploration process, but prior methods to leverage this guidance require constant synchronous high-quality human feedback, which is expensive and impractical to obtain. In this work, we present a technique called Human Guided Exploration (HuGE), which uses low-quality feedback from non-expert users that may be sporadic, asynchronous, and noisy. HuGE guides exploration for reinforcement learning not only in simulation but also in the real world, all without meticulous reward specification. The key concept involves bifurcating human feedback and policy learning: human feedback steers exploration, while self-supervised learning from the exploration data yields unbiased policies. This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses. HuGE is able to learn a variety of challenging multi-stage robotic navigation and manipulation tasks in simulation using crowdsourced feedback from non-expert users. Moreover, this paradigm can be scaled to learning directly on real-world robots, using occasional, asynchronous feedback from human supervisors.
    摘要

A Definition of Continual Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.11046
  • repo_url: None
  • paper_authors: David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh
  • for: 本研究旨在开发一个基础 для continual reinforcement learning。
  • methods: 本 paper 使用的方法包括 <insert methods used in the paper, e.g. experience replay, memory-based methods, etc.>。
  • results: 本研究获得了 <insert results of the paper, e.g. improved performance on benchmark tasks, etc.>。
    Abstract In this paper we develop a foundation for continual reinforcement learning.
    摘要 在这篇论文中,我们开发了一基础 для连续回归学习。Here's the breakdown of the translation:* 在这篇论文中 (in this paper) - 在这里 (here)* 我们开发了 (we develop) - 我们开发 (we develop)* 一基础 (a foundation) - 一基 (a base)* для连续回归学习 (for continual reinforcement learning) - 连续回归学习 (continual reinforcement learning)

On the Convergence of Bounded Agents

  • paper_url: http://arxiv.org/abs/2307.11044
  • repo_url: None
  • paper_authors: David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh
  • for: 本文探讨了Agent convergency的定义和特点,尤其是在 bounded agents 上。
  • methods: 作者提出了两种 complementary 的Agent convergency定义,一种是指Agent的行为未来状态数量不能减少,另一种是指Agent的性能只会改变当Agent的内部状态改变。
  • results: 作者证明了这两种定义的基本性质和特点,以及它们在标准设置下的应用。
    Abstract When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.
    摘要 当一个代理人已经 converges 时,标准的模型会给出一个直观的定义:一个代理人 converges 当其在每个环境状态下的行为或性能停止变化。但是,如果我们从环境状态向代理人的状态转移,则代理人的 converges 的概念变得非常不清楚。在这篇论文中,我们提出了两种 complementary 的代理人 converges 观点,即:第一种观点是,一个受限的代理人已经 converges 当其最小需要的状态数量不能减少。第二种观点是,一个受限的代理人已经 converges 当代理人的性能只会变化当代理人的内部状态发生变化。我们证明了这两种定义的基本性质,并证明它们能够涵盖标准设定下的常见 converges 观点。此外,我们还证明了这两种定义之间的关系。我们通过这些观点、定义和分析来带来这个领域中的一个重要概念的清晰性。

Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

  • paper_url: http://arxiv.org/abs/2307.11031
  • repo_url: https://github.com/HazyResearch/embroid
  • paper_authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré
  • for: automating data labeling in domains where manual annotation is expensive
  • methods: using a method called Embroid to modify the predictions of a prompt, rather than the prompt itself, to improve prompt-based learning without additional labeled data
  • results: Embroid substantially improves performance over original prompts, realizes improvements for more sophisticated prompting strategies, and can be specialized to domains like law through the embedding functions.Here’s the full Chinese text:
  • for: automatizing 数据标注在人工标注成本高的领域
  • methods: 使用 Embroid 方法,修改提示而不是提示本身,以提高提示基本学习的效果,无需更多的标注数据
  • results: Embroid 可以大幅提高提示的性能,实现更复杂的提示策略的提高,并可以根据嵌入函数进行特化,用于领域如法律。
    Abstract Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work asks whether it is possible to improve prompt-based learning without additional labeled data. We approach this problem by attempting to modify the predictions of a prompt, rather than the prompt itself. Our intuition is that accurate predictions should also be consistent: samples which are similar under some feature representation should receive the same prompt prediction. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. Embroid then uses these neighborhoods to create additional predictions for each sample, and combines these predictions with a simple latent variable graphical model in order to generate a final corrected prediction. In addition to providing a theoretical analysis of Embroid, we conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. We find that (1) Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT), (2) also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought), and (3) can be specialized to domains like law through the embedding functions.
    摘要 Embroid works by computing multiple representations of a dataset under different embedding functions and using the consistency between the LM predictions for neighboring samples to identify mispredictions. It then uses these neighborhoods to create additional predictions for each sample and combines these predictions with a simple latent variable graphical model to generate a final corrected prediction.We provide a theoretical analysis of Embroid and conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. Our results show that Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT) and also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought). Additionally, we find that Embroid can be specialized to domains like law through the embedding functions.

Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering

  • paper_url: http://arxiv.org/abs/2307.11030
  • repo_url: None
  • paper_authors: Yijun Dong, Kevin Miller, Qi Lei, Rachel Ward
  • for: 本文旨在提供关于 semi-supervised classification 问题的 theoretically 解释 relational knowledge distillation(RKD)的一个初步。
  • methods: 本文使用 spectral clustering 来解释 RKD,并提供了一个基于 teacher model 生成的 population-induced graph,以及一个量化预测和实际 clustering 之间的差异的 clustering error。
  • results: 本文证明了 RKD 能够适用于 semi-supervised classification 问题,并提供了一个样本复杂度 bound。此外,文章还展示了 RKD 可以增强 label efficiency,并通过一种涉及到 low clustering error 的框架来证明这一点。最后,文章还将 data augmentation consistency regularization 与这种框架结合,并证明 RKD 可以帮助模型具有 “global” 视角,而不是 consistency regularization 的 “local” 视角。
    Abstract Despite the empirical success and practical significance of (relational) knowledge distillation that matches (the relations of) features between teacher and student models, the corresponding theoretical interpretations remain limited for various knowledge distillation paradigms. In this work, we take an initial step toward a theoretical understanding of relational knowledge distillation (RKD), with a focus on semi-supervised classification problems. We start by casting RKD as spectral clustering on a population-induced graph unveiled by a teacher model. Via a notion of clustering error that quantifies the discrepancy between the predicted and ground truth clusterings, we illustrate that RKD over the population provably leads to low clustering error. Moreover, we provide a sample complexity bound for RKD with limited unlabeled samples. For semi-supervised learning, we further demonstrate the label efficiency of RKD through a general framework of cluster-aware semi-supervised learning that assumes low clustering errors. Finally, by unifying data augmentation consistency regularization into this cluster-aware framework, we show that despite the common effect of learning accurate clusterings, RKD facilitates a "global" perspective through spectral clustering, whereas consistency regularization focuses on a "local" perspective via expansion.
    摘要 尽管relational知识储备(RKD)在实践中取得了成功和实用意义,但其理论解释仍然受限。在这个工作中,我们开始了RKD的理论理解,特点是用 semi-supervised classification 问题作为入口。我们首先将RKD视为师模型对人口induced图像中的spectral clustering。通过定义分类错误来衡量师模型预测和真实分类之间的差异,我们证明了RKD在人口上可以达到低分类错误。此外,我们还提供了有限无标示样本的抽象 bound。在 semi-supervised learning 中,我们进一步表明了RKD的标签效率,通过一种涵盖cluster-aware semi-supervised learning 框架,假设分类错误低。最后,我们将数据扩展一致性正则化 integrate 到这个框架中,并证明了RKD在global perspective中促进了spectral clustering,而consistency regularization强调了local perspectivevia expansion。

Amortized Variational Inference: When and Why?

  • paper_url: http://arxiv.org/abs/2307.11018
  • repo_url: https://github.com/charlesm93/amortized_vi
  • paper_authors: Charles C. Margossian, David M. Blei
  • for: 这个论文主要针对的是approximate posterior inference的问题,具体来说是用amortized variational inference(A-VI)代替classical factorized variational inference(F-VI)来减轻计算复杂性。
  • methods: 这篇论文使用了A-VI作为一种计算技巧,以便在深度生成模型中加速推理。在这篇论文中,作者研究了A-VI作为一个通用的后退推理方法,并Derive了关于模型和推理函数下的条件,使得A-VI可以达到F-VI的优秀解。
  • results: 作者通过 theoretical analysis和实验研究,证明了在某些模型中,A-VI可以与F-VI的优秀解相匹配,而且在某些情况下,A-VI可以更快 converge than F-VI。此外,作者还发现在某些模型中,A-VI无法达到F-VI的优秀解,即使推理函数具有足够的表达能力。
    Abstract Amortized variational inference (A-VI) is a method for approximating the intractable posterior distributions that arise in probabilistic models. The defining feature of A-VI is that it learns a global inference function that maps each observation to its local latent variable's approximate posterior. This stands in contrast to the more classical factorized (or mean-field) variational inference (F-VI), which directly learns the parameters of the approximating distribution for each latent variable. In deep generative models, A-VI is used as a computational trick to speed up inference for local latent variables. In this paper, we study A-VI as a general alternative to F-VI for approximate posterior inference. A-VI cannot produce an approximation with a lower Kullback-Leibler divergence than F-VI's optimal solution, because the amortized family is a subset of the factorized family. Thus a central theoretical problem is to characterize when A-VI still attains F-VI's optimal solution. We derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum. We show that for a broad class of hierarchical models, including deep generative models, it is possible to close the gap between A-VI and F-VI. Further, for an even broader class of models, we establish when and how to expand the domain of the inference function to make amortization a feasible strategy. Finally, we prove that for certain models -- including hidden Markov models and Gaussian processes -- A-VI cannot match F-VI's solution, no matter how expressive the inference function is. We also study A-VI empirically. On several examples, we corroborate our theoretical results and investigate the performance of A-VI when varying the complexity of the inference function. When the gap between A-VI and F-VI can be closed, we find that the required complexity of the function need not scale with the number of observations, and that A-VI often converges faster than F-VI.
    摘要 《总结:Amortized Variational Inference的潜在优势和局限性》Amortized Variational Inference(A-VI)是一种用于估计不可计算的 posterior 分布的方法,它在概率模型中出现的 posterior 分布中进行估计。A-VI 的特点在于它学习一个全局的推理函数,该函数可以将每个观察数据映射到其相应的本地隐藏变量的近似 posterior。与传统的分解(或均值场)变量假设(F-VI)相比,A-VI 直接学习每个隐藏变量的参数。在深度生成模型中,A-VI 被用作一种计算技巧,以加速地域性 latent 变量的推理。在这篇论文中,我们研究 A-VI 作为一种通用的近似 posterior 推理方法。由于 A-VI 不能生成一个低于 Kullback-Leibler 差分的近似,因为权重插值家族是 F-VI 的一个子集。因此,A-VI 的主要理论问题是Characterizing situations in which A-VI can attain F-VI's optimal solution。我们 derive conditions on both the model and the inference function under which A-VI can theoretically achieve F-VI's optimum。我们证明,对于一类层次模型,包括深度生成模型,可以减小 A-VI 和 F-VI 之间的差异。此外,对于另一类模型,我们确定了如何扩展推理函数的域,以使权重插值成为可能的策略。最后,我们证明,对于某些模型,例如隐藏 Markov 模型和 Gaussian 过程,A-VI 无法与 F-VI 的解决方案匹配,不管推理函数如何表达。我们还进行了 A-VI 的实验研究。在一些示例中,我们证明了我们的理论结果,并investigated A-VI 在观察数据的变化情况下的性能。当 A-VI 和 F-VI 之间的差异可以减小时,我们发现推理函数的复杂度不必随观察数据的数量增长,而且 A-VI 通常更快 convergence than F-VI。

Multi-objective point cloud autoencoders for explainable myocardial infarction prediction

  • paper_url: http://arxiv.org/abs/2307.11017
  • repo_url: None
  • paper_authors: Marcel Beetz, Abhirup Banerjee, Vicente Grau
  • for: 这个研究旨在开发一种基于多对象点云自适应神经网络的可解释性心血栓损预测方法,以提高心血栓损的诊断精度。
  • methods: 该方法基于多类3D点云表示心脏 анатомия和功能的多对象点云自适应神经网络,其架构包括多个任务特定分支连接到一个低维 latent space,以实现多对象学习 both 重建和心血栓损预测,同时捕捉疾病特定的3D形态信息在可读取的 latent space 中。
  • results: 在一个大型 UK Biobank 数据集上,这种方法能够准确重建多时间序3D形态,Chamfer 距离输入形态下的误差在图像 pixel 分辨率以下,并在多种机器学习和深度学习标准模型之上提高了19%的incident MI 预测精度。此外,这种方法的任务特定紧凑的 latent space 可以清晰地分离控制和 MI 群集,并与相应的3D形态之间存在 клиниче可能的关联,因此证明了预测的可解释性。
    Abstract Myocardial infarction (MI) is one of the most common causes of death in the world. Image-based biomarkers commonly used in the clinic, such as ejection fraction, fail to capture more complex patterns in the heart's 3D anatomy and thus limit diagnostic accuracy. In this work, we present the multi-objective point cloud autoencoder as a novel geometric deep learning approach for explainable infarction prediction, based on multi-class 3D point cloud representations of cardiac anatomy and function. Its architecture consists of multiple task-specific branches connected by a low-dimensional latent space to allow for effective multi-objective learning of both reconstruction and MI prediction, while capturing pathology-specific 3D shape information in an interpretable latent space. Furthermore, its hierarchical branch design with point cloud-based deep learning operations enables efficient multi-scale feature learning directly on high-resolution anatomy point clouds. In our experiments on a large UK Biobank dataset, the multi-objective point cloud autoencoder is able to accurately reconstruct multi-temporal 3D shapes with Chamfer distances between predicted and input anatomies below the underlying images' pixel resolution. Our method outperforms multiple machine learning and deep learning benchmarks for the task of incident MI prediction by 19% in terms of Area Under the Receiver Operating Characteristic curve. In addition, its task-specific compact latent space exhibits easily separable control and MI clusters with clinically plausible associations between subject encodings and corresponding 3D shapes, thus demonstrating the explainability of the prediction.
    摘要 心肺infarction (MI) 是全球最常见的死亡原因之一。通常使用在临床中的图像基于标记器,如舒缩率,无法捕捉心肺三维形态中更复杂的模式,因此限制诊断准确性。在这种工作中,我们介绍了一种多目标点云自适应神经网络,作为解释性infarction预测的新的几何深度学方法,基于多类3D点云表示心肺 анатоми和功能。其架构包括多个任务特定分支,连接了一个低维ensional的秘密空间,以实现有效的多目标学习重建和infarction预测,同时捕捉疾病特定的3D形状信息。此外,其层次分支设计和点云深度运算使得高级别特征学习可以直接进行高分辨率的生物Point cloud。在我们对大型UK Biobank数据集的实验中,多目标点云自适应神经网络能够准确重建多时间点云形态,Chamfer距离输入和预测的形态下限于图像的像素分辨率。我们的方法在incident MI预测任务上超过多种机器学习和深度学习参考值19%,在接收操作特征曲线的面积下测试 Area Under the Receiver Operating Characteristic curve。此外,任务特定的秘密空间表现出易分割控制和MI团群,并且与相应的3D形状之间存在临床可能的相关性,因此证明预测的可解释性。

Flow Map Learning for Unknown Dynamical Systems: Overview, Implementation, and Benchmarks

  • paper_url: http://arxiv.org/abs/2307.11013
  • repo_url: None
  • paper_authors: Victor Churchill, Dongbin Xiu
  • for: 这 paper 是关于 Flow map learning (FML) 和深度神经网络 (DNNs) 如何用于数据驱动模型未知动力系统的研究。
  • methods: 这 paper 使用 FML 框架,并提供了实现这种方法的重要计算细节。
  • results: 这 paper 提供了一组定义良好的 benchmark 问题,以及这些问题的 FML 结果,以便其他研究人员可以进行交互式检验和结果重现。
    Abstract Flow map learning (FML), in conjunction with deep neural networks (DNNs), has shown promises for data driven modeling of unknown dynamical systems. A remarkable feature of FML is that it is capable of producing accurate predictive models for partially observed systems, even when their exact mathematical models do not exist. In this paper, we present an overview of the FML framework, along with the important computational details for its successful implementation. We also present a set of well defined benchmark problems for learning unknown dynamical systems. All the numerical details of these problems are presented, along with their FML results, to ensure that the problems are accessible for cross-examination and the results are reproducible.
    摘要 <>Translate the following text into Simplified Chinese.<>流图学习(FML),与深度神经网络(DNN)结合,已经表现出模拟未知动力系统的数据驱动模型的搭配性。FML有能力生成准确预测模型,即使系统的准确数学模型不存在。在这篇论文中,我们提供FML框架的概述,以及成功实施的计算细节。我们还提供一组已定义的benchmark问题,用于学习未知动力系统。这些问题的数字细节和FML结果都提供,以便让问题可以进行交叉检验,并且结果可以重新制作。

Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

  • paper_url: http://arxiv.org/abs/2307.11011
  • repo_url: https://github.com/ase2023paper/nss
  • paper_authors: Dong Huang, Qingwen Bu, Yichao Fu, Yuhao Qing, Bocheng Xiao, Heming Cui
  • for: 这篇论文的目的是解释如何透过NSS(Neural Network Sensitivity guided test case Selection)方法来快速检测深度神经网络(DNN)模型中的错误行为,并且将其修复。
  • methods: 这篇论文使用了NSS方法,它利用了内部神经元的信息来选择有价值的测试案例,以提高测试效率和错误检测率。
  • results: 根据论文的结果,NSS方法可以实现高度的错误检测率,比如在MNIST & LeNet1实验中,从5%的测试案例中,NSS可以获得81.8%的错误检测率,较baseline方法高出20%。
    Abstract Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label them to test the DNN models. However, properly labeling a large number of unlabeled datasets is a highly expensive and time-consuming task. To address the above-mentioned problem, we propose NSS, Neuron Sensitivity guided test case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the internal neuron's information induced by test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluate NSS with four widely used datasets and four well-designed DNN models compared to SOTA baseline methods. The results show that NSS performs well in assessing the test cases' probability of fault triggering and model improvement capabilities. Specifically, compared with baseline approaches, NSS obtains a higher fault detection rate~(e.g., when selecting 5\% test case from the unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault detection rate, 20\% higher than baselines).
    摘要 深度神经网络(DNN)在软件中广泛应用,用于解决多种任务(例如自动驾驶和医疗诊断)。然而,它们也可能产生错误行为,导致金钱损失和对人类安全的威胁。为了揭示DNN中的错误行为并修复它们,DNN开发者们经常收集自然世界中的丰富无标数据集和将其标记以测试DNN模型。然而,为了标记大量无标数据集是非常昂贵和时间消耗的。为解决上述问题,我们提出了NSS,即神经元敏感度导引测试 caso选择。NSS可以将valuable测试 caso从无标数据集中选择出来,以降低标记时间。NSS利用测试 caso对内部神经元的影响来选择有价值的测试 caso,这些测试 caso具有高度触发模型错误的 confidence。我们对四个常用的数据集和四个Well-designed DNN模型进行评估,并与当前最佳方法进行比较。结果表明,NSS在评估测试 caso的可能性和模型改进能力方面表现良好。具体来说,相比基eline方法,NSS在MNIST & LeNet1 experiment中选择5%的测试 caso时可以获得81.8%的错误检测率,高于基eline20%。

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

  • paper_url: http://arxiv.org/abs/2307.11007
  • repo_url: None
  • paper_authors: Kaiyue Wen, Zhiyuan Li, Tengyu Ma
  • for: 这种研究旨在解释为什么过参数神经网络可以泛化?
  • methods: 该研究使用了理论和实验方法,对两层ReLU网络进行了研究,并identified以下三种情况:(1)平坦性确实导致泛化;(2)存在不泛化最平坦的模型和锋尖最小化算法,但它们并不泛化;(3)有些不泛化最平坦的模型,但锋尖最小化算法仍然泛化。
  • results: 研究结果表明,锋尖和泛化之间的关系不仅取决于数据分布和模型结构,而且锋尖最小化算法不仅减少锋尖度来获得更好的泛化性。这说明,过参数神经网络的泛化仍然需要寻找更多的解释。
    Abstract Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.
    摘要 尽管有广泛的研究,底层原因为为争 parameterized neural networks 能够泛化仍然未知。现有理论表明,通用的随机优化器偏好训练损失函数的平坦 minimizers,因此一个自然的可能解释是平坦性implicit generalization。这项工作 Critically examines this explanation。通过理论和实验调查,我们确定了以下三个场景:(1)平坦性可以证明地导致泛化;(2)存在不泛化的最平坦模型和锋尖优化算法失败泛化,以及(3)最有趣的是,存在不泛化的最平坦模型,但锋尖优化算法仍然泛化。我们的结果表明,泛化与锋尖之间存在复杂的相互关系,并且锋尖优化算法不仅仅是为了减少锋尖来实现更好的泛化。这些结果呼吁搜索其他解释泛化过 parameterized neural networks 的机制。

Private Federated Learning with Autotuned Compression

  • paper_url: http://arxiv.org/abs/2307.10999
  • repo_url: https://github.com/google-research/federated
  • paper_authors: Enayat Ullah, Christopher A. Choquette-Choo, Peter Kairouz, Sewoong Oh
  • for: 降低聚合学习中的通信量,不需要设置或调整压缩率。
  • methods: 使用安全聚合和差分隐私来保证隐私,并在训练过程中自动调整压缩率,以适应训练问题的“困难程度”。
  • results: 在实际数据上实现了不需要调整的压缩率,并且可以保证隐私。
    Abstract We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-optimal for mean estimation, meaning that they can adapt to the ``hardness of the problem" with minimal interactivity. We demonstrate the effectiveness of our approach on real-world datasets by achieving favorable compression rates without the need for tuning.
    摘要 我们提出了一些新的技术来减少联盟学习中的通信量,不需要设置或调整压缩率。我们的在线方法会自动根据训练过程中的错误来调整压缩率,同时保持可信数据隐藏和分散隐藏的保证。我们的技术可以适应问题的“困难程度”,并具有最小化互动性。我们在实际应用中获得了有利的压缩率,无需调整。

DREAM: Domain-free Reverse Engineering Attributes of Black-box Model

  • paper_url: http://arxiv.org/abs/2307.10997
  • repo_url: None
  • paper_authors: Rongqing Li, Jiaqi Yu, Changsheng Li, Wenhan Luo, Ye Yuan, Guoren Wang
  • for: 这篇论文的目的是研究一种不知道黑盒模型训练数据的情况下,可以透过一系列的询问来暴露黑盒模型的特性。
  • methods: 这篇论文提出了一个基于偏出现象(out-of-distribution,OOD)扩展的框架,通过这个框架可以将黑盒模型的特性们逆向推断出来,不需要知道黑盒模型的训练数据。
  • results: 实验结果显示,该方法比基于基线模型的方法更有优势,并且在不同的领域中均有优秀的测试成绩。
    Abstract Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes ($e.g.$, the number of convolutional layers) of a target black-box neural network can be exposed through a sequence of queries. There is a crucial limitation: these works assume the dataset used for training the target model to be known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target Model, called DREAM, without requiring the availability of the target model's training dataset, and put forward a general and principled framework by casting this problem as an out of distribution (OOD) generalization problem. In this way, we can learn a domain-agnostic model to inversely infer the attributes of a target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental studies are conducted and the results validate the superiority of our proposed method over the baselines.
    摘要 深度学习模型在机器学习平台上通常是黑obox。先前的研究表明,目标黑obox神经网络的特征(例如,卷积层的数量)可以通过一系列查询暴露出来。然而,这些研究假设了目标模型的训练集已经知道,并利用这个训练集进行模型特征攻击。然而,在实际场景中,目标模型的训练集往往难以获取。因此,目标模型的特征是否可以在这种情况下暴露出来的是有很大的uncertainty。在这篇论文中,我们调查了一个新的问题:针对黑obox目标模型的领域无关特征探测,称为DREAM(黑obox模型特征探测)。我们不需要知道目标模型的训练集,并且提出了一种普适和原则性的框架,将这个问题转化为对外围(OOD)泛化问题。因此,我们可以学习一个领域无关的模型,以倒计时探测黑obox目标模型的特征。这使我们的方法成为可以适应任意领域的模型特征探测方法之一,并且具有强大的泛化能力。我们进行了广泛的实验研究,结果证明了我们的提议方法的优越性。

Progressive distillation diffusion for raw music generation

  • paper_url: http://arxiv.org/abs/2307.10994
  • repo_url: None
  • paper_authors: Svetlana Pavlova
  • for: 这个论文旨在应用新的深度学习方法来生成原始的音频文件。它基于扩散模型,这是一种最近受到计算机视觉社区关注的深度生成模型。
  • methods: 这个论文使用进步的扩散Diffusion模型,并使用1D U-Net进行逐步减少。然后,对不同扩散参数的比较和其全局效果的展示。
  • results: 这个论文通过对不同自收集的数据进行实验,实现了无条件生成音频的进程。模型能够处理进程audio处理和生成,并使用变换从1个通道128x384到3个通道128x128 mel-spectrograms进行循环生成。
    Abstract This paper aims to apply a new deep learning approach to the task of generating raw audio files. It is based on diffusion models, a recent type of deep generative model. This new type of method has recently shown outstanding results with image generation. A lot of focus has been given to those models by the computer vision community. On the other hand, really few have been given for other types of applications such as music generation in waveform domain. In this paper the model for unconditional generating applied to music is implemented: Progressive distillation diffusion with 1D U-Net. Then, a comparison of different parameters of diffusion and their value in a full result is presented. One big advantage of the methods implemented through this work is the fact that the model is able to deal with progressing audio processing and generating , using transformation from 1-channel 128 x 384 to 3-channel 128 x 128 mel-spectrograms and looped generation. The empirical comparisons are realized across different self-collected datasets.
    摘要 这篇论文目标是应用一种新的深度学习方法来生成原始音频文件。它基于扩散模型,一种最近的深度生成模型。这种新的方法在图像生成领域有出色的成绩, Computer Vision 社区对其具有很大的关注。然而,对其他应用领域,如音乐生成,实际上很少有研究。在这篇论文中,我们实现了不可Conditional的生成模型:进程散布扩散 with 1D U-Net。然后,我们对不同扩散参数的影响进行了比较,并对全面结果进行了评价。这种方法的一大优点是它可以处理进行 Audio 处理和生成,从1个通道的 128 x 384 转换为3个通道的 128 x 128 mel-spectrograms,并实现循环生成。我们在不同自己收集的数据集上进行了实际比较。

LLM Cognitive Judgements Differ From Human

  • paper_url: http://arxiv.org/abs/2307.11787
  • repo_url: https://github.com/sotlampr/llm-cognitive-judgements
  • paper_authors: Sotiris Lamprinidis
  • for: 研究大语言模型(LLMs)的认知能力
  • methods: 使用GPT-3和ChatGPT模型完成有限数据 inductive reasoning任务
  • results: GPT-3和ChatGPT模型的认知判断不类似于人类的认知
    Abstract Large Language Models (LLMs) have lately been on the spotlight of researchers, businesses, and consumers alike. While the linguistic capabilities of such models have been studied extensively, there is growing interest in investigating them as cognitive subjects. In the present work I examine GPT-3 and ChatGPT capabilities on an limited-data inductive reasoning task from the cognitive science literature. The results suggest that these models' cognitive judgements are not human-like.
    摘要 大型语言模型(LLM)在研究人员、企业和消费者之间备受关注。虽然这些模型的语言能力已经得到了广泛的研究,但是有越来越多的人想研究它们作为认知主体。在 presente 作品中,我 исследова了 GPT-3 和 ChatGPT 在有限数据 inductive reasoning 任务上的能力。结果表明这些模型的认知判断不如人类的。

Investigating minimizing the training set fill distance in machine learning regression

  • paper_url: http://arxiv.org/abs/2307.10988
  • repo_url: None
  • paper_authors: Paolo Climaco, Jochen Garcke
  • for: 这篇论文的目的是提出一种采样方法,以最小化预测误差。
  • methods: 本文使用了一种基于采样的方法,通过从大量的未标注数据中采样小型训练集,以提高模型性能而降低计算成本。
  • results: 实验结果显示,这种采样方法可以对多种回归模型进行最佳化,并且较以往的采样方法有着明显的优势。
    Abstract Many machine learning regression methods leverage large datasets for training predictive models. However, using large datasets may not be feasible due to computational limitations or high labelling costs. Therefore, sampling small training sets from large pools of unlabelled data points is essential to maximize model performance while maintaining computational efficiency. In this work, we study a sampling approach aimed to minimize the fill distance of the selected set. We derive an upper bound for the maximum expected prediction error that linearly depends on the training set fill distance, conditional to the knowledge of data features. For empirical validation, we perform experiments using two regression models on two datasets. We empirically show that selecting a training set by aiming to minimize the fill distance, thereby minimizing the bound, significantly reduces the maximum prediction error of various regression models, outperforming existing sampling approaches by a large margin.
    摘要 很多机器学习回归方法利用大量数据进行训练预测模型。然而,使用大量数据可能不是可行的,因为计算限制或高标注成本。因此,从大量未标注数据点中采样小训练集是必要的,以最大化模型性能而减少计算成本。在这种情况下,我们研究了一种采样方法,以最小化选择集的填距。我们 derivates了一个上界,用于预测误差的最大值,其 conditional 于数据特征的知识。为了Empirical验证,我们在两个回归模型上进行了两个数据集的实验。我们发现,通过选择填距最小的训练集,最大化约束,并最小化 bound,可以大幅降低不同回归模型的最大预测误差,超过现有采样方法的表现。

MASR: Metadata Aware Speech Representation

  • paper_url: http://arxiv.org/abs/2307.10982
  • repo_url: None
  • paper_authors: Anjali Raj, Shikhar Bharadwaj, Sriram Ganapathy, Min Ma, Shikhar Vashishth
  • for: 提高演示语言模型的性能,使用多种外部知识来增强 metadata 信息的利用
  • methods: 提出了Metadata Aware Speech Representation learning(MASR)框架,可以结合任何 SSL 方法,并使用样本级对比性 Matrix 来增强表示
  • results: 在语言识别、语音识别和其他非语义任务中,MASR 表示具有显著的性能提升,并进行了详细的语言识别任务分析,以提供表示增强的原因
    Abstract In the recent years, speech representation learning is constructed primarily as a self-supervised learning (SSL) task, using the raw audio signal alone, while ignoring the side-information that is often available for a given speech recording. In this paper, we propose MASR, a Metadata Aware Speech Representation learning framework, which addresses the aforementioned limitations. MASR enables the inclusion of multiple external knowledge sources to enhance the utilization of meta-data information. The external knowledge sources are incorporated in the form of sample-level pair-wise similarity matrices that are useful in a hard-mining loss. A key advantage of the MASR framework is that it can be combined with any choice of SSL method. Using MASR representations, we perform evaluations on several downstream tasks such as language identification, speech recognition and other non-semantic tasks such as speaker and emotion recognition. In these experiments, we illustrate significant performance improvements for the MASR over other established benchmarks. We perform a detailed analysis on the language identification task to provide insights on how the proposed loss function enables the representations to separate closely related languages.
    摘要

PATROL: Privacy-Oriented Pruning for Collaborative Inference Against Model Inversion Attacks

  • paper_url: http://arxiv.org/abs/2307.10981
  • repo_url: None
  • paper_authors: Shiwei Ding, Lan Zhang, Miao Pan, Xiaoyong Yuan
  • for: 提高边缘设备的隐私和安全性,推动可靠的分布式推理
  • methods: 利用隐私导向的剪辑技术,在边缘设备上部署更多层,以提高任务特定的特征提取和隐私保护
  • results: 提高了隐私和安全性,同时保持了可靠的分布式推理性能
    Abstract Collaborative inference has been a promising solution to enable resource-constrained edge devices to perform inference using state-of-the-art deep neural networks (DNNs). In collaborative inference, the edge device first feeds the input to a partial DNN locally and then uploads the intermediate result to the cloud to complete the inference. However, recent research indicates model inversion attacks (MIAs) can reconstruct input data from intermediate results, posing serious privacy concerns for collaborative inference. Existing perturbation and cryptography techniques are inefficient and unreliable in defending against MIAs while performing accurate inference. This paper provides a viable solution, named PATROL, which develops privacy-oriented pruning to balance privacy, efficiency, and utility of collaborative inference. PATROL takes advantage of the fact that later layers in a DNN can extract more task-specific features. Given limited local resources for collaborative inference, PATROL intends to deploy more layers at the edge based on pruning techniques to enforce task-specific features for inference and reduce task-irrelevant but sensitive features for privacy preservation. To achieve privacy-oriented pruning, PATROL introduces two key components: Lipschitz regularization and adversarial reconstruction training, which increase the reconstruction errors by reducing the stability of MIAs and enhance the target inference model by adversarial training, respectively.
    摘要 协同推理已经是一个有前途的解决方案,使得具有限制的边缘设备可以使用当前最佳深度学习模型(DNN)进行推理。在协同推理中,边缘设备首先将输入feed到本地部分DNN中,然后将中间结果上传到云端以完成推理。然而,最近的研究表明,模型反向攻击(MIA)可以从中间结果中重construct输入数据,对协同推理 pose serious privacy concerns。现有的干扰和加密技术是不可靠和不fficient的,无法防止 MIA 的攻击。这篇论文提供了一个可行的解决方案,名为PATROL,它通过在隐私、效率和可用性之间平衡privacy-oriented pruning来解决这个问题。PATROL利用了DNN的后 layers可以更好地提取任务特定的特征。在边缘设备具有限制的协同推理资源的情况下,PATROL将更多的层部署在边缘基础上,使用剪辑技术来强制实施任务特定的特征 для推理,同时减少任务无关但敏感的特征来保护隐私。为了实现隐私 oriented pruning,PATROL引入了两个关键 ком ponent:Lipschitz regularization和对抗重建训练。这两个 ком ponent 会增加MIA的重建错误,从而提高目标推理模型的性能,同时增强隐私保护。

Globally Normalising the Transducer for Streaming Speech Recognition

  • paper_url: http://arxiv.org/abs/2307.10975
  • repo_url: None
  • paper_authors: Rogier van Dalen
  • for: 这篇论文主要是关于如何解决流行式模型中的一个数学问题,以提高流行式模型在语音识别任务中的表现。
  • methods: 该论文提出了一种将全球 нормализация应用于流行式模型中,以解决流行式模型在部分输入序列时的数学问题。
  • results: 根据实验结果,在应用全球 нормализаation后,流行式模型的词错率下降了9-11%相对,相对于lookahead模式,流行式模式的表现减少了约一半的差距。
    Abstract The Transducer (e.g. RNN-Transducer or Conformer-Transducer) generates an output label sequence as it traverses the input sequence. It is straightforward to use in streaming mode, where it generates partial hypotheses before the complete input has been seen. This makes it popular in speech recognition. However, in streaming mode the Transducer has a mathematical flaw which, simply put, restricts the model's ability to change its mind. The fix is to replace local normalisation (e.g. a softmax) with global normalisation, but then the loss function becomes impossible to evaluate exactly. A recent paper proposes to solve this by approximating the model, severely degrading performance. Instead, this paper proposes to approximate the loss function, allowing global normalisation to apply to a state-of-the-art streaming model. Global normalisation reduces its word error rate by 9-11% relative, closing almost half the gap between streaming and lookahead mode.
    摘要 <>转换器(例如 RNN-转换器或 Conformer-转换器)在输入序列中生成输出标签序列。在流式模式下使用它非常直接,它可以在输入序列未完全传输前就生成部分假设。这使得它在语音识别中很受欢迎。然而,在流式模式下,转换器具有一个数学毛病,简单来说,限制模型改变自己的想法的能力。这可以通过全局 нормализации(例如 softmax)的替换来解决,但是这会使损失函数不能准确计算。一篇最近的论文提议解决这问题,通过模型的 aproximation 来解决,但这会严重降低性能。这篇论文提议通过近似损失函数来解决,使得全局 нормализация可以应用于流式模型中,从而降低流式模式中的单词错误率,相对于lookahead模式下降9-11%,只差半个差距。<>

eess.IV - 2023-07-21

Deep Reinforcement Learning Based System for Intraoperative Hyperspectral Video Autofocusing

  • paper_url: http://arxiv.org/abs/2307.11638
  • repo_url: None
  • paper_authors: Charlie Budd, Jianrong Qiu, Oscar MacCormac, Martin Huber, Christopher Mower, Mirek Janatka, Théo Trotouin, Jonathan Shapey, Mads S. Bergholt, Tom Vercauteren
  • for: 本研究旨在开发一种可靠的、快速的激光成像系统,以便在手术室中实时进行细胞分类。
  • methods: 该研究使用了深度学习自适应关注法,并将焦点调整liquid镜组合到视频激光成像镜头中。
  • results: 研究发现,使用深度学习自适应关注法可以提高激光成像系统的焦点精度,并且在比较 tradicional 焦点调整策略时表现出 statistically significan advantage。此外,两名 neurosurgeon 在不知情的用户测试中,对不同的焦点调整策略进行比较,并评价了我们的新方法,发现其最为满意。
    Abstract Hyperspectral imaging (HSI) captures a greater level of spectral detail than traditional optical imaging, making it a potentially valuable intraoperative tool when precise tissue differentiation is essential. Hardware limitations of current optical systems used for handheld real-time video HSI result in a limited focal depth, thereby posing usability issues for integration of the technology into the operating room. This work integrates a focus-tunable liquid lens into a video HSI exoscope, and proposes novel video autofocusing methods based on deep reinforcement learning. A first-of-its-kind robotic focal-time scan was performed to create a realistic and reproducible testing dataset. We benchmarked our proposed autofocus algorithm against traditional policies, and found our novel approach to perform significantly ($p<0.05$) better than traditional techniques ($0.070\pm.098$ mean absolute focal error compared to $0.146\pm.148$). In addition, we performed a blinded usability trial by having two neurosurgeons compare the system with different autofocus policies, and found our novel approach to be the most favourable, making our system a desirable addition for intraoperative HSI.
    摘要 This work integrates a focus-tunable liquid lens into a video HSI exoscope and proposes novel video autofocusing methods based on deep reinforcement learning. A first-of-its-kind robotic focal-time scan was performed to create a realistic and reproducible testing dataset. The proposed autofocus algorithm was benchmarked against traditional policies, and the results showed that the novel approach performed significantly better ($p<0.05$) than traditional techniques ($0.070\pm.098$ mean absolute focal error compared to $0.146\pm.148$).In addition, a blinded usability trial was conducted by having two neurosurgeons compare the system with different autofocus policies, and the results showed that the novel approach was the most favourable, making the system a desirable addition for intraoperative HSI.

Computational Image Formation

  • paper_url: http://arxiv.org/abs/2307.11635
  • repo_url: https://github.com/AnirudhaRamesh/15663-Computational-Photography-Assignment1
  • paper_authors: Stanley H. Chan
  • For: The paper is focused on the concept of computational image formation (CIF) and its applications in imaging through adverse weather conditions.* Methods: The paper introduces the idea of using an approximate mapping $\mathcal{H}_{\theta}$ to simulate the forward degradation process in imaging, and discusses the attributes of a CIF simulator, including accuracy, speed, well-posedness, and differentiability.* Results: The paper provides a detailed case study on imaging through atmospheric turbulence using CIF, and discusses other examples of CIF applications. The paper also shares thoughts on the future direction and recommendations for the community.
    Abstract At the pinnacle of computational imaging is the co-optimization of camera and algorithm. This, however, is not the only form of computational imaging. In problems such as imaging through adverse weather, the bigger challenge is how to accurately simulate the forward degradation process so that we can synthesize data to train reconstruction models and/or integrating the forward model as part of the reconstruction algorithm. This article introduces the concept of computational image formation (CIF). Compared to the standard inverse problems where the goal is to recover the latent image $\mathbf{x}$ from the observation $\mathbf{y} = \mathcal{G}(\mathbf{x})$, CIF shifts the focus to designing an approximate mapping $\mathcal{H}_{\theta}$ such that $\mathcal{H}_{\theta} \approx \mathcal{G}$ while giving a better image reconstruction result. The word ``computational'' highlights the fact that the image formation is now replaced by a numerical simulator. While matching nature remains an important goal, CIF pays even greater attention on strategically choosing an $\mathcal{H}_{\theta}$ so that the reconstruction performance is maximized. The goal of this article is to conceptualize the idea of CIF by elaborating on its meaning and implications. The first part of the article is a discussion on the four attributes of a CIF simulator: accurate enough to mimic $\mathcal{G}$, fast enough to be integrated as part of the reconstruction, providing a well-posed inverse problem when plugged into the reconstruction, and differentiable in the backpropagation sense. The second part of the article is a detailed case study based on imaging through atmospheric turbulence. The third part of the article is a collection of other examples that fall into the category of CIF. Finally, thoughts about the future direction and recommendations to the community are shared.
    摘要 在计算成像领域的尽头是合理化相机和算法的共同优化。然而,这并不是唯一的计算成像方式。在如影像受恶征天气等问题中,更大的挑战是如何准确地模拟前向干扰过程,以便可以合成数据来训练重建模型和/或将前向模型纳入重建算法中。这篇文章介绍了计算成像形成(CIF)的概念。与标准的逆问题where the goal is to recover the latent image $\mathbf{x}$ from the observation $\mathbf{y} = \mathcal{G}(\mathbf{x})$不同,CIF将注意力集中在设计一个approximate mapping $\mathcal{H}_{\theta}$,使其 approximate $\mathcal{G}$,同时提供更好的图像重建结果。“计算”一词 highlights the fact that the image formation is now replaced by a numerical simulator。而Matching nature remains an important goal,CIF pays even greater attention on strategically choosing an $\mathcal{H}_{\theta}$ so that the reconstruction performance is maximized。本文的目标是把CIF的概念进行详细说明和分析,包括其意义和影响。文中的第一部分是讨论CIF simulator的四个特征:准确地模拟 $\mathcal{G}$,快速 enough to be integrated as part of the reconstruction,提供一个准确的逆问题when plugged into the reconstruction,以及在反射卷积中可微分。第二部分是基于大气扩散的详细案例研究。第三部分是收集其他符合CIF的例子。最后,文章结束了未来方向和对社区的建议。

Cascaded multitask U-Net using topological loss for vessel segmentation and centerline extraction

  • paper_url: http://arxiv.org/abs/2307.11603
  • repo_url: None
  • paper_authors: Pierre Rougé, Nicolas Passat, Odyssée Merveille
  • for: 这paper的目的是提出一种基于深度学习的血管分割和中心线抽取方法,以提高血管疾病诊断工具的精度。
  • methods: 这paper使用了一种基于U-Net的方法,通过计算分割后的血管skeleton来提高分割结果的准确性。
  • results: 这paper的实验结果表明,使用U-Net计算血管skeleton可以提高分割结果的准确性,并且可以提供更加准确的血管中心线。
    Abstract Vessel segmentation and centerline extraction are two crucial preliminary tasks for many computer-aided diagnosis tools dealing with vascular diseases. Recently, deep-learning based methods have been widely applied to these tasks. However, classic deep-learning approaches struggle to capture the complex geometry and specific topology of vascular networks, which is of the utmost importance in most applications. To overcome these limitations, the clDice loss, a topological loss that focuses on the vessel centerlines, has been recently proposed. This loss requires computing, with a proposed soft-skeleton algorithm, the skeletons of both the ground truth and the predicted segmentation. However, the soft-skeleton algorithm provides suboptimal results on 3D images, which makes the clDice hardly suitable on 3D images. In this paper, we propose to replace the soft-skeleton algorithm by a U-Net which computes the vascular skeleton directly from the segmentation. We show that our method provides more accurate skeletons than the soft-skeleton algorithm. We then build upon this network a cascaded U-Net trained with the clDice loss to embed topological constraints during the segmentation. The resulting model is able to predict both the vessel segmentation and centerlines with a more accurate topology.
    摘要 船 Segmentation 和中心线抽取是许多计算机支持诊断工具处理血管疾病的两项重要前置任务。现在,深度学习基于方法广泛应用于这两项任务。然而, classical deep-learning 方法很难捕捉血管网络的复杂 геометри和特有的 topologic,这在大多数应用中是非常重要的。为了解决这些限制,recently proposed clDice loss 强调血管中心线的 topological loss。这个损失函数需要计算,使用我们提议的 soft-skeleton 算法,真实的血管skeleton 和预测 segmentation 的skeleton。然而,soft-skeleton 算法在 3D 图像上提供了不佳的结果,使 clDice hardly suitable for 3D images。在本文中,我们提议将 soft-skeleton 算法 replaced by a U-Net,从 segmentation 直接计算血管skeleton。我们示出了我们的方法可以提供更加准确的 skeletons。然后,我们在这个网络上建立了一个 cascaded U-Net,通过 clDice 损失函数 embedding topological constraints During the segmentation。结果是一个能够预测血管 segmentation 和中心线的更加准确 topology。

CortexMorph: fast cortical thickness estimation via diffeomorphic registration using VoxelMorph

  • paper_url: http://arxiv.org/abs/2307.11567
  • repo_url: None
  • paper_authors: Richard McKinley, Christian Rummel
  • for: 这个论文是用于描述一种新的 cortical thickness estimation 方法,即 CortexMorph,以及其与深度学习基于的 segmentation 模型的结合。
  • methods: 该方法使用了不监督的深度学习来直接预测 deformation field,以便用 DiReCT 方法计算 cortical thickness。
  • results: 研究表明,通过结合 CortexMorph 和深度学习基于的 segmentation 模型,可以在秒钟内从 T1 束缚图像中计算区域化 cortical thickness,同时保持检测 cortical atrophy 的能力。
    Abstract The thickness of the cortical band is linked to various neurological and psychiatric conditions, and is often estimated through surface-based methods such as Freesurfer in MRI studies. The DiReCT method, which calculates cortical thickness using a diffeomorphic deformation of the gray-white matter interface towards the pial surface, offers an alternative to surface-based methods. Recent studies using a synthetic cortical thickness phantom have demonstrated that the combination of DiReCT and deep-learning-based segmentation is more sensitive to subvoxel cortical thinning than Freesurfer. While anatomical segmentation of a T1-weighted image now takes seconds, existing implementations of DiReCT rely on iterative image registration methods which can take up to an hour per volume. On the other hand, learning-based deformable image registration methods like VoxelMorph have been shown to be faster than classical methods while improving registration accuracy. This paper proposes CortexMorph, a new method that employs unsupervised deep learning to directly regress the deformation field needed for DiReCT. By combining CortexMorph with a deep-learning-based segmentation model, it is possible to estimate region-wise thickness in seconds from a T1-weighted image, while maintaining the ability to detect cortical atrophy. We validate this claim on the OASIS-3 dataset and the synthetic cortical thickness phantom of Rusak et al.
    摘要 cortical 带的厚度与多种神经内科和心理科学疾病相关,通常通过表面基本方法如Freesurfer在MRI研究中估算。DiReCT方法,它通过Diffomorphic deformation of gray-white matter interface towards the pial surface来计算 cortical thickness,为表面基本方法提供了一种alternative。最近的研究使用了Synthetic cortical thickness phantom表明,combined DiReCT and deep-learning-based segmentation more sensitive to subvoxel cortical thinning than Freesurfer。 whereas anatomical segmentation of a T1-weighted image now takes only seconds, existing implementations of DiReCT rely on iterative image registration methods that can take up to an hour per volume. On the other hand, learning-based deformable image registration methods like VoxelMorph have been shown to be faster than classical methods while improving registration accuracy. This paper proposes CortexMorph, a new method that employs unsupervised deep learning to directly regress the deformation field needed for DiReCT. By combining CortexMorph with a deep-learning-based segmentation model, it is possible to estimate region-wise thickness in seconds from a T1-weighted image, while maintaining the ability to detect cortical atrophy. We validate this claim on the OASIS-3 dataset and the synthetic cortical thickness phantom of Rusak et al.

FedAutoMRI: Federated Neural Architecture Search for MR Image Reconstruction

  • paper_url: http://arxiv.org/abs/2307.11538
  • repo_url: None
  • paper_authors: Ruoyou Wu, Cheng Li, Juan Zou, Shanshan Wang
  • for: 用于MR图像重建
  • methods: 使用分布式协同学习方法和不同数据分布稳定化方法
  • results: 实现了较好的表现,使用轻量级模型并且比经典联合学习方法具有更少的参数数量
    Abstract Centralized training methods have shown promising results in MR image reconstruction, but privacy concerns arise when gathering data from multiple institutions. Federated learning, a distributed collaborative training scheme, can utilize multi-center data without the need to transfer data between institutions. However, existing federated learning MR image reconstruction methods rely on manually designed models which have extensive parameters and suffer from performance degradation when facing heterogeneous data distributions. To this end, this paper proposes a novel FederAted neUral archiTecture search approach fOr MR Image reconstruction (FedAutoMRI). The proposed method utilizes differentiable architecture search to automatically find the optimal network architecture. In addition, an exponential moving average method is introduced to improve the robustness of the client model to address the data heterogeneity issue. To the best of our knowledge, this is the first work to use federated neural architecture search for MR image reconstruction. Experimental results demonstrate that our proposed FedAutoMRI can achieve promising performances while utilizing a lightweight model with only a small number of model parameters compared to the classical federated learning methods.
    摘要 中央化训练方法在MR图像重建中表现出了扎实的成果,但是收集数据从多家机构时,隐私问题就会出现。基于分布式合作训练的联邦学习(Federated Learning)可以利用多中心数据无需将数据传输到机构之间。然而,现有的联邦学习MR图像重建方法通常采用手动设计的模型,这些模型具有较多的参数,并且面临着数据不均衡问题时会导致性能下降。为此,这篇论文提出了一种新的FederAted neUral archiTecture search Approach(FedAutoMRI)。提议的方法使用可微分的建筑搜索自动找到最佳网络架构。此外,我们还引入了指数移动平均方法,以提高客户端模型的数据不均衡问题的Robustness。到目前为止,这是我们知道的第一篇使用联邦神经建筑搜索MR图像重建的论文。实验结果表明,我们的提议的FedAutoMRI可以实现扎实的性能,同时使用轻量级的模型,只有少量的模型参数,与传统的联邦学习方法相比,具有显著的优势。

UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN

  • paper_url: http://arxiv.org/abs/2307.11530
  • repo_url: https://github.com/Tinysqua/UWAT-GAN
  • paper_authors: Zhaojie Fang, Zhanghao Chen, Pengxue Wei, Wangting Li, Shaochong Zhang, Ahmed Elazab, Gangyong Jia, Ruiquan Ge, Changmiao Wang
  • for: 本研究旨在提出一种新的条件生成对抗网络(UWAT-GAN),用于从ultra-wide-angle fundus photography(UWF-SLO)中生成高分辨率的 fluorescein angiography(UWF-FA)图像。
  • methods: 该模型使用多尺度生成器和融合模块贮取更好地抽取全局和局部信息,并使用注意力传输模块帮助解码器学习。此外,使用多个新的权重损失函数在不同的数据尺度进行超参数化训练。
  • results: 实验结果表明,UWAT-GAN比现有方法有更高的图像质量和更好的抽象能力。code可以在GitHub上找到:https://github.com/Tinysqua/UWAT-GAN。
    Abstract Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influences. To avoid negative impacts, cross-modality medical image generation algorithms have been proposed. Nevertheless, current methods in fundus imaging could not produce high-resolution images and are unable to capture tiny vascular lesion areas. This paper proposes a novel conditional generative adversarial network (UWAT-GAN) to synthesize UWF-FA from UWF-SLO. Using multi-scale generators and a fusion module patch to better extract global and local information, our model can generate high-resolution images. Moreover, an attention transmit module is proposed to help the decoder learn effectively. Besides, a supervised approach is used to train the network using multiple new weighted losses on different scales of data. Experiments on an in-house UWF image dataset demonstrate the superiority of the UWAT-GAN over the state-of-the-art methods. The source code is available at: https://github.com/Tinysqua/UWAT-GAN.
    摘要 血液照片是诊断和区分疾病的基本检查方法。最近,ultra-wide-anglefundus(UWF)技术,UWFfluoresceinangiography(UWF-FA)和UWF扫描镜观察(UWF-SLO)逐渐普及。但是,fluoresceinangiography(FA)和UWF-FA需要注射Na fluorescein,可能有不良影响。为了避免这些影响,多modal医学影像生成算法已经被提出。然而,目前的基于基准图像的检查方法无法生成高分辨率图像,也无法捕捉微小血管损伤区域。本文提出了一种新的冲激生成随机网络(UWAT-GAN),用于从UWF-SLO中生成UWF-FA。通过多级生成器和融合模块贴图,我们的模型可以生成高分辨率图像。此外,我们还提出了一种注意力传输模块,帮助解码器更好地学习。此外,我们采用了多种新的质量权重损失来训练网络。实验结果表明,UWAT-GAN比现有的方法更高效。代码可以在GitHub上找到:https://github.com/Tinysqua/UWAT-GAN。

Bone mineral density estimation from a plain X-ray image by learning decomposition into projections of bone-segmented computed tomography

  • paper_url: http://arxiv.org/abs/2307.11513
  • repo_url: None
  • paper_authors: Yi Gu, Yoshito Otake, Keisuke Uemura, Mazen Soufi, Masaki Takao, Hugues Talbot, Seiji Okada, Nobuhiko Sugano, Yoshinobu Sato
  • For: The paper aims to estimate bone mineral density (BMD) from a plain X-ray image for opportunistic screening of osteoporosis.* Methods: The proposed method uses a novel approach that learns decomposition into projections of bone-segmented quantitative computed tomography (QCT) for BMD estimation under limited datasets.* Results: The proposed method achieved high accuracy in BMD estimation, with Pearson correlation coefficients of 0.880 and 0.920 observed for DXA-measured BMD and QCT-measured BMD estimation tasks, respectively. The root mean square of the coefficient of variation values were 3.27 to 3.79% for four measurements with different poses.
    Abstract Osteoporosis is a prevalent bone disease that causes fractures in fragile bones, leading to a decline in daily living activities. Dual-energy X-ray absorptiometry (DXA) and quantitative computed tomography (QCT) are highly accurate for diagnosing osteoporosis; however, these modalities require special equipment and scan protocols. To frequently monitor bone health, low-cost, low-dose, and ubiquitously available diagnostic methods are highly anticipated. In this study, we aim to perform bone mineral density (BMD) estimation from a plain X-ray image for opportunistic screening, which is potentially useful for early diagnosis. Existing methods have used multi-stage approaches consisting of extraction of the region of interest and simple regression to estimate BMD, which require a large amount of training data. Therefore, we propose an efficient method that learns decomposition into projections of bone-segmented QCT for BMD estimation under limited datasets. The proposed method achieved high accuracy in BMD estimation, where Pearson correlation coefficients of 0.880 and 0.920 were observed for DXA-measured BMD and QCT-measured BMD estimation tasks, respectively, and the root mean square of the coefficient of variation values were 3.27 to 3.79% for four measurements with different poses. Furthermore, we conducted extensive validation experiments, including multi-pose, uncalibrated-CT, and compression experiments toward actual application in routine clinical practice.
    摘要 骨质疾病(osteoporosis)是一种非常普遍的骨疾病,可以导致脆弱骨骼的折损,从而导致日常生活活动下降。双能X射线吸收测定(DXA)和量子计算Tomography(QCT)是骨质疾病的诊断非常准确的方法,但这些方法需要特殊的设备和扫描协议。为了经常监测骨健康,低成本、低剂量、 universally available的诊断方法很需求。在这个研究中,我们想使用普通X射线图像来估算骨质密度(BMD),以便于早期诊断。现有的方法通常使用多个阶段的方法,包括提取区域兴趣和简单的回归来估算BMD,这些方法需要大量的训练数据。因此,我们提出了一种高效的方法,该方法可以通过分解为骨 segmentation QCT 的投影来估算BMD,并在有限的数据集下进行学习。我们的方法实现了高精度的BMD 估算,其中 DXA 测量的BMD 和 QCT 测量的BMD 估算任务中的归一化相关系数为0.880和0.920,分解系数的平均方差为3.27%-3.79%。此外,我们进行了广泛的验证实验,包括多个姿势、不同扫描方式、压缩实验等,以便在实际临床医学中应用。

MatSpectNet: Material Segmentation Network with Domain-Aware and Physically-Constrained Hyperspectral Reconstruction

  • paper_url: http://arxiv.org/abs/2307.11466
  • repo_url: https://github.com/heng-yuwen/matspectnet
  • paper_authors: Yuwen Heng, Yihong Wu, Jiawen Chen, Srinandan Dasmahapatra, Hansung Kim
  • for: 实现RGB图像中物质分 segmentation任务中的精准物质分类,由于场景中物质的外观变化很大,是一项挑战。
  • methods: 提出了一种新的模型——MatSpectNet,通过使用现有的RGB图像恢复 hyperspectral图像,并利用频谱恢复数据集中的频谱恢复能力进行适应化,以便在物质分类任务中提高物质分 segmentation的精度。
  • results: 对于LMD数据集和OpenSurfaces数据集,MatSpectNet实验表明,与最近一篇论文相比,MatSpectNet可以提高平均像素准确率1.60%,提高物种均准确率3.42%。
    Abstract Achieving accurate material segmentation for 3-channel RGB images is challenging due to the considerable variation in a material's appearance. Hyperspectral images, which are sets of spectral measurements sampled at multiple wavelengths, theoretically offer distinct information for material identification, as variations in intensity of electromagnetic radiation reflected by a surface depend on the material composition of a scene. However, existing hyperspectral datasets are impoverished regarding the number of images and material categories for the dense material segmentation task, and collecting and annotating hyperspectral images with a spectral camera is prohibitively expensive. To address this, we propose a new model, the MatSpectNet to segment materials with recovered hyperspectral images from RGB images. The network leverages the principles of colour perception in modern cameras to constrain the reconstructed hyperspectral images and employs the domain adaptation method to generalise the hyperspectral reconstruction capability from a spectral recovery dataset to material segmentation datasets. The reconstructed hyperspectral images are further filtered using learned response curves and enhanced with human perception. The performance of MatSpectNet is evaluated on the LMD dataset as well as the OpenSurfaces dataset. Our experiments demonstrate that MatSpectNet attains a 1.60% increase in average pixel accuracy and a 3.42% improvement in mean class accuracy compared with the most recent publication. The project code is attached to the supplementary material and will be published on GitHub.
    摘要 achieving accurate material segmentation for 3-channel RGB images is challenging due to the considerable variation in a material's appearance. Hyperspectral images, which are sets of spectral measurements sampled at multiple wavelengths, theoretically offer distinct information for material identification, as variations in intensity of electromagnetic radiation reflected by a surface depend on the material composition of a scene. However, existing hyperspectral datasets are impoverished regarding the number of images and material categories for the dense material segmentation task, and collecting and annotating hyperspectral images with a spectral camera is prohibitively expensive. To address this, we propose a new model, the MatSpectNet to segment materials with recovered hyperspectral images from RGB images. The network leverages the principles of color perception in modern cameras to constrain the reconstructed hyperspectral images and employs the domain adaptation method to generalize the hyperspectral reconstruction capability from a spectral recovery dataset to material segmentation datasets. The reconstructed hyperspectral images are further filtered using learned response curves and enhanced with human perception. The performance of MatSpectNet is evaluated on the LMD dataset as well as the OpenSurfaces dataset. Our experiments demonstrate that MatSpectNet attains a 1.60% increase in average pixel accuracy and a 3.42% improvement in mean class accuracy compared with the most recent publication. The project code is attached to the supplementary material and will be published on GitHub.Here's the translation in Traditional Chinese:achieving accurate material segmentation for 3-channel RGB images is challenging due to the considerable variation in a material's appearance. Hyperspectral images, which are sets of spectral measurements sampled at multiple wavelengths, theoretically offer distinct information for material identification, as variations in intensity of electromagnetic radiation reflected by a surface depend on the material composition of a scene. However, existing hyperspectral datasets are impoverished regarding the number of images and material categories for the dense material segmentation task, and collecting and annotating hyperspectral images with a spectral camera is prohibitively expensive. To address this, we propose a new model, the MatSpectNet to segment materials with recovered hyperspectral images from RGB images. The network leverages the principles of color perception in modern cameras to constrain the reconstructed hyperspectral images and employs the domain adaptation method to generalize the hyperspectral reconstruction capability from a spectral recovery dataset to material segmentation datasets. The reconstructed hyperspectral images are further filtered using learned response curves and enhanced with human perception. The performance of MatSpectNet is evaluated on the LMD dataset as well as the OpenSurfaces dataset. Our experiments demonstrate that MatSpectNet attains a 1.60% increase in average pixel accuracy and a 3.42% improvement in mean class accuracy compared with the most recent publication. The project code is attached to the supplementary material and will be published on GitHub.

BLISS: Interplanetary Exploration with Swarms of Low-Cost Spacecraft

  • paper_url: http://arxiv.org/abs/2307.11226
  • repo_url: None
  • paper_authors: Alexander N. Alvara, Lydia Lee, Emmanuel Sin, Nathan Lambert, Andrew J. Westphal, Kristofer S. J. Pister
  • for: 这个论文旨在探讨一种用微型技术实现的低成本、自动化的小天体飞船,用于快速、低成本的内太阳系探索。
  • methods: 这个论文使用了小型技术,包括微型电romechanical系统(MEMS)尺寸步进动作器和太阳帆,实现一种约10g的空间飞船。
  • results: 论文详细介绍了一种用于控制太阳帆的轨迹和低级 actuation控制,以及建议的载荷和计算机设计。 论文还 briefly Considered两个其他应用:从数十个金星家族彗星返回样本,以及遥感和拍摄遥行彗星。
    Abstract Leveraging advancements in micro-scale technology, we propose a fleet of autonomous, low-cost, small solar sails for interplanetary exploration. The Berkeley Low-cost Interplanetary Solar Sail (BLISS) project aims to utilize small-scale technologies to create a fleet of tiny interplanetary femto-spacecraft for rapid, low-cost exploration of the inner solar system. This paper describes the hardware required to build a nearly 10 g spacecraft using a 1 m$^2$ solar sail steered by micro-electromechanical systems (MEMS) inchworm actuators. The trajectory control to a NEO, here 101955 Bennu, is detailed along with the low-level actuation control of the solar sail and the specifications of proposed onboard communication and computation. Two other applications are also shortly considered: sample return from dozens of Jupiter-family comets and interstellar comet rendezvous and imaging. The paper concludes by discussing the fundamental scaling limits and future directions for steerable autonomous miniature solar sails with onboard custom computers and sensors.
    摘要 使用微型技术进行推动,我们提议一支自主、低成本、小型太阳帆船队进行 planetary exploration。 Berkeley Low-cost Interplanetary Solar Sail(BLISS)项目旨在利用小规模技术创造一支微型惯性空间飞船,用于快速、低成本地 explore 内太阳系。这篇文章描述了用于建立约10g空间飞船的硬件,包括1米²的太阳帆,由微型电子机械系统(MEMS)滚动器控制。文章还详细介绍了天体控制 trajectory 到 NEO 101955 Bennu,以及太阳帆的低级控制和船载通信和计算机的规格。此外,文章还 briefly 讨论了从数十个金星家族彗星返回样本和遥感柯梅丝 rendezvous 和拍摄。文章 conclude 了可控推进器的基本扩展限制和未来方向,包括自适应驱动器和船载特定计算机和感测器。

  • paper_url: http://arxiv.org/abs/2307.11273
  • repo_url: None
  • paper_authors: Vega-Hernandez, Mayrim, Galan-Garcia, Lidice, Perez-Hidalgo-Gato, Jhoanna, Ontivero-Ortega, Marlis, Garcia-Agustin, Daysi, Garcia-Reyes, Ronaldo, Bosch-Bayard, Jorge, Marinazzo, Daniele, Martinez-Montes, Eduardo, Valdes-Sosa, Pedro A
  • For: The paper aims to identify stable Electrophysiological Source Imaging (ESI) biomarkers associated with Gait Speed (GS) as a measure of functional decline in aging individuals.* Methods: The authors use a combination of flexible sparse/smooth/non-negative models (NN-SLASSO) and the Stable Sparse Classifier method to estimate ESI and select relevant features, including activation ESI (aESI) and connectivity ESI (cESI) features.* Results: The authors found that novel sparse aESI models outperformed traditional methods, and that combining aESI and cESI features improved the predictability of GS changes. The selected biomarkers were localized to orbitofrontal and temporal cortical regions.Here’s the Chinese translation of the three points:* For: 这篇论文目标是找到年轻人函散速度 (GS) 的稳定电physiological Source Imaging (ESI) 标记器。* Methods: 作者使用了一种组合 flexible sparse/smooth/non-negative models (NN-SLASSO) 和稳定粗粒分类方法来估算 ESI 和选择相关特征,包括 activation ESI (aESI) 和 connectivity ESI (cESI) 特征。* Results: 作者发现了一种新的简单 aESI 模型,并且将 aESI 和 cESI 特征组合可以提高 GS 变化的预测性。选择的标记器被localized到 orbitofrontal 和 temporal cortical region。
    Abstract Objective: We seek stable Electrophysiological Source Imaging (ESI) biomarkers associated with Gait Speed (GS) as a measure of functional decline. Towards this end we determine the predictive value of ESI activation and connectivity patterns of resting-state EEG Theta rhythm on physical performance decline measured by a slowing GS in aging individuals. Methods: As potential biomarkers related to GS changes, we estimate ESI using flexible sparse/smooth/non-negative models (NN-SLASSO), from which activation ESI (aESI) and connectivity ESI (cESI) features are selected using the Stable Sparse Classifier method. Results and Conclusions: Novel sparse aESI models outperformed traditional methods such as the LORETA family. The models combining aESI and cESI features improved the predictability of GS changes. Selected biomarkers from activation/connectivity patterns were localized to orbitofrontal and temporal cortical regions. Significance: The proposed methodology contributes to understanding the activation and connectivity of ESI complex patterns related to GS, providing potential biomarker features for GS slowing. Given the known relationship between GS decline and cognitive impairment, this preliminary work suggests it might be applied to other, more complex measures of healthy and pathological aging. Importantly, it might allow an ESI-based evaluation of rehabilitation programs.
    摘要 Methods: 作为可能的生物 marker相关于 GS 变化的方法,我们使用了 flexible sparse/smooth/non-negative 模型(NN-SLASSO),从中可以选择 activation ESI(aESI)和 connectivity ESI(cESI)特征,使用稳定的组合方法。Results and Conclusions: 我们发现了 Novel sparse aESI 模型,比 traditional LORETA 家族的方法更好。将 activation ESI 和 connectivity ESI 特征结合起来,可以提高 GS 变化的预测性。选择的生物 marker从 activation/connectivity 模式中的localized 到 orbitofrontal 和 temporal cortical region。Significance: 我们的方法可以帮助理解ESI复杂的模式之间的活动和连接相互关联,提供了可能的生物 marker特征,用于评估GS slowing。由于GS decline 和认知障碍之间的已知关系,这些早期的工作可能适用于其他更复杂的健康和疾病年龄。更重要的是,这种方法可能可以用于评估复健计划。

Treatment And Follow-Up Guidelines For Multiple Brain Metastases: A Systematic Review

  • paper_url: http://arxiv.org/abs/2307.11016
  • repo_url: None
  • paper_authors: Ana Sofia Santos, Victor Alves, José Soares, Matheus Silva, Crystian Saraiva
  • for: 这篇研究是为了探讨多个脑部метаstatic的管理方法,以提高病人生活质量和神经保存。
  • methods: 这篇研究使用了STereotactic radiosurgery (SRS)来管理多个脑部metastatic,并且使用了人工智能模型来预测后治疗图像中的新出现脑部metastatic。
  • results: 研究发现这种方法可以帮助医疗专业人员早期决定最佳治疗方案,并且可以提高病人的生活质量和神经保存。
    Abstract Brain metastases are a complication of primary cancer, representing the most common type of brain tumor in adults. The management of multiple brain metastases represents a clinical challenge worldwide in finding the optimal treatment for patients considering various individual aspects. Managing multiple metastases with stereotactic radiosurgery (SRS) is being increasingly used because of quality of life and neurocognitive preservation, which do not present such good outcomes when dealt with whole brain radiation therapy (WBRT). After treatment, analyzing the progression of the disease still represents a clinical issue, since it is difficult to determine a standard schedule for image acquisition. A solution could be the applying artificial intelligence, namely predictive models to forecast the incidence of new metastases in post-treatment images. Although there aren't many works on this subject, this could potentially bennefit medical professionals in early decision of the best treatment approaches.
    摘要 主要癌症的脑部 метаста团是一种常见的脑肿瘤,特别是在成人中。管理多个脑部 метаста团的治疗呈现出了全球的临床挑战,在考虑各个个体特点时寻找优化的治疗方案。使用STereotactic radiosurgery(SRS)来管理多个脑部 метаста团,因为它可以保持生活质量和神经功能,而不是整个脑部放射疗法(WBRT)所能提供的较差的结果。然而,在 после治疗时,评估疾病的进程仍然是临床问题,因为困难确定标准的图像获取时间间隔。一种可能的解决方案是通过应用人工智能,即预测模型,预测在后治疗图像中新形成的脑部 метаста团的发生率。虽然没有很多相关研究,但这可能可以帮助医疗专业人员在早期决定最佳治疗方案。

Frequency-aware optical coherence tomography image super-resolution via conditional generative adversarial neural network

  • paper_url: http://arxiv.org/abs/2307.11130
  • repo_url: None
  • paper_authors: Xueshen Li, Zhenxing Dong, Hongshan Liu, Jennifer J. Kang-Mieler, Yuye Ling, Yu Gan
  • for: 提高镜像医学图像诊断和治疗的能力,特别是心血管和眼科领域。
  • methods: integrate three critical frequency-based modules (i.e., frequency transformation, frequency skip connection, and frequency alignment) and frequency-based loss function into a conditional generative adversarial network (cGAN)。
  • results: 在现有的 coronary OCT 数据集上进行了大规模的量化研究,证明了我们提出的框架在现有的深度学习框架之上具有超过性。此外,我们还在鱼眼层图像和兔脑层图像上应用了我们的框架,证明了它的 universality。
    Abstract Optical coherence tomography (OCT) has stimulated a wide range of medical image-based diagnosis and treatment in fields such as cardiology and ophthalmology. Such applications can be further facilitated by deep learning-based super-resolution technology, which improves the capability of resolving morphological structures. However, existing deep learning-based method only focuses on spatial distribution and disregard frequency fidelity in image reconstruction, leading to a frequency bias. To overcome this limitation, we propose a frequency-aware super-resolution framework that integrates three critical frequency-based modules (i.e., frequency transformation, frequency skip connection, and frequency alignment) and frequency-based loss function into a conditional generative adversarial network (cGAN). We conducted a large-scale quantitative study from an existing coronary OCT dataset to demonstrate the superiority of our proposed framework over existing deep learning frameworks. In addition, we confirmed the generalizability of our framework by applying it to fish corneal images and rat retinal images, demonstrating its capability to super-resolve morphological details in eye imaging.
    摘要

Deep Spiking-UNet for Image Processing

  • paper_url: http://arxiv.org/abs/2307.10974
  • repo_url: https://github.com/snnresearch/spiking-unet
  • paper_authors: Hebei Li, Yueyi Zhang, Zhiwei Xiong, Zheng-jun Zha, Xiaoyan Sun
  • for: 这paper的目的是提出一种基于神经元活动的图像处理方法,使用SNNs和U-Net结构。
  • methods: 这paper使用了多reshold spiking neurons来提高信息传递效率,并采用了一种转换和精度调整管道来使用预训练的U-Net模型。
  • results: 实验结果表明,对于图像分割和净化任务,我们的Spiking-UNet可以与非神经元网络相比,并且超过现有的SNN方法。与未调整的Spiking-UNet相比,我们的Spiking-UNet可以降低推理时间约90%。
    Abstract U-Net, known for its simple yet efficient architecture, is widely utilized for image processing tasks and is particularly suitable for deployment on neuromorphic chips. This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture. To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy. To address the issue of information loss, we introduce multi-threshold spiking neurons, which improve the efficiency of information transmission within the Spiking-UNet. For the training strategy, we adopt a conversion and fine-tuning pipeline that leverage pre-trained U-Net models. During the conversion process, significant variability in data distribution across different parts is observed when utilizing skip connections. Therefore, we propose a connection-wise normalization method to prevent inaccurate firing rates. Furthermore, we adopt a flow-based training method to fine-tune the converted models, reducing time steps while preserving performance. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart, surpassing existing SNN methods. Compared with the converted Spiking-UNet without fine-tuning, our Spiking-UNet reduces inference time by approximately 90\%. This research broadens the application scope of SNNs in image processing and is expected to inspire further exploration in the field of neuromorphic engineering. The code for our Spiking-UNet implementation is available at https://github.com/SNNresearch/Spiking-UNet.
    摘要 优等网络(U-Net),因其简单而高效的架构,广泛应用于图像处理任务,特别适合部署在神经omorphic芯片上。这篇论文提出了一种新的启发式神经网络(SNN)图像处理方法,其结合了U-Net架构和启发式神经网络(SNN)的优势。为确保高精度信息传递,我们面临两个主要挑战:保证信息传递的高精度和制定有效的训练策略。为Address the issue of information loss, we introduce multi-threshold spiking neurons, which improve the efficiency of information transmission within the Spiking-UNet. For the training strategy, we adopt a conversion and fine-tuning pipeline that leverage pre-trained U-Net models. During the conversion process, we observe significant variability in data distribution across different parts when utilizing skip connections. Therefore, we propose a connection-wise normalization method to prevent inaccurate firing rates. Furthermore, we adopt a flow-based training method to fine-tune the converted models, reducing time steps while preserving performance. Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart, surpassing existing SNN methods. Compared with the converted Spiking-UNet without fine-tuning, our Spiking-UNet reduces inference time by approximately 90\%. This research broadens the application scope of SNNs in image processing and is expected to inspire further exploration in the field of neuromorphic engineering. The code for our Spiking-UNet implementation is available at .

cs.SD - 2023-07-20

Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

  • paper_url: http://arxiv.org/abs/2307.10834
  • repo_url: https://github.com/changhongw/audio-embedding-bias
  • paper_authors: Changhong Wang, Gaël Richard, Brian McFee
  • for: 这研究旨在调查预训练的音频表示中的偏见传递现象,以及这些偏见如何影响音乐信息检索任务中的器乐器识别。
  • methods: 本研究使用了三种不同的预训练表示(VGGish、OpenL3和YAMNet),并对它们进行了比较性分析,以 Investigate the properties of pre-trained audio representations for the task of instrument recognition.
  • results: 研究发现,不同的预训练表示在不同的数据集上 exhibit comparable performance,但在不同数据集之间的泛化性不同。研究还发现,数据集标识和类型分布可能是偏见的来源。为了 Mitigate the effects of bias, the research proposes and evaluates post-processing countermeasures.
    Abstract Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all applications in MIR have sufficient quantities of training data, it is becoming increasingly common to transfer models across domains. This approach allows representations derived for one task to be applied to another, and can result in high accuracy with less stringent training data requirements for the downstream task. However, the properties of pre-trained audio embeddings are not fully understood. Specifically, and unlike traditionally engineered features, the representations extracted from pre-trained deep networks may embed and propagate biases from the model's training regime. This work investigates the phenomenon of bias propagation in the context of pre-trained audio representations for the task of instrument recognition. We first demonstrate that three different pre-trained representations (VGGish, OpenL3, and YAMNet) exhibit comparable performance when constrained to a single dataset, but differ in their ability to generalize across datasets (OpenMIC and IRMAS). We then investigate dataset identity and genre distribution as potential sources of bias. Finally, we propose and evaluate post-processing countermeasures to mitigate the effects of bias, and improve generalization across datasets.
    摘要 深度神经网络模型已成为音乐信息检索(MIR)中主要的方法。这些模型通常需要大量(注解)训练数据来达到高精度。由于不 все MIR 应用程序具有足够的训练数据,因此在域转移成为越来越普遍。这种方法允许 Representation 从一个任务传递到另一个,并可以通过较少的训练数据要求达到下游任务的高精度。然而,预训练音频表示的性质并不完全理解。 Specifically, 与传统工程化特征不同,预训练深度网络中的表示可能会嵌入和传播模型训练中的偏见。这项研究 investigate 预训练音频表示中偏见的现象,特别是在 instruNet 识别任务中。我们首先示出三个不同的预训练表示(VGGish、OpenL3 和 YAMNet)在单个数据集上受限时具有相似的表现,但在不同数据集上有不同的泛化能力(OpenMIC 和 IRMAS)。然后,我们研究数据集标识和类型分布是否会影响偏见。最后,我们提出并评估了针对偏见进行后处理的缓解措施,以改善数据集之间的泛化性能。

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

  • paper_url: http://arxiv.org/abs/2307.10814
  • repo_url: None
  • paper_authors: Ephrem Afele Retta, Richard Sutcliffe, Jabar Mahmood, Michael Abebe Berwo, Eiad Almekhlafi, Sajjad Ahmed Khan, Shehzad Ashraf Chaudhry, Mustafa Mhamed, Jun Feng
  • for: 这个研究是为了探索跨语言和多语言情感识别(SER)任务中的可行性,当资源匮乏时。
  • methods: 这个研究使用了AlexNet、VGGE(一种VGG架构的变体)和ResNet50三个分类器,并将所有数据集映射为仅两个类别(正面和负面),以便直接比较不同语言之间的性能。
  • results: 研究发现,使用英语或德语作为源语言,对于target语言阿姆利卡的识别率比较高,而使用阿姆利卡作为target语言,对于英语或德语的识别率也有良好的表现。此外,使用多个非阿姆利卡语言进行训练,对于阿姆利卡的识别率也有较好的表现。总的来说,这些结果表明,跨语言和多语言的训练可以是一个有效的策略,当资源匮乏时。
    Abstract In a conventional Speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language does not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German and URDU. For Amharic, we use our own publicly-available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu we use the existing RAVDESS, EMO-DB and URDU datasets. We followed previous research in mapping labels for all datasets to just two classes, positive and negative. Thus we can compare performance on different languages directly, and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. Results averaged for the three models were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each pair: Amharic<->German, Amharic<->English, and Amharic<->Urdu. Results with Amharic as target suggested that using English or German as source will give the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percent greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training a SER classifier when resources for a language are scarce.
    摘要 在一个常规的语音情感识别(SER)任务中,一个分类器为某种语言进行训练,通常使用该语言的已有数据集。但是,当数据集不存在时,可以使用其他语言的数据。我们在阿姆哈里语、英语、德语和乌尔都语之间进行了交叉语言和多语言 SER 的实验。对于阿姆哈里语,我们使用了我们自己公开的阿姆哈里语语音情感数据集(ASED)。对于英语、德语和乌尔都语,我们使用了现有的 RAVDESS、EMO-DB 和 URDU 数据集。我们按照之前的研究进行了所有数据集的标签映射,将所有数据集的标签映射到只有两个类别:正面和负面。这样,我们可以直接对不同语言进行比较,并将不同语言结合在一起用于训练和测试。在实验一中,我们使用了三个模型:AlexNet、VGGE 和 ResNet50,进行了单语言 SER 测试。结果表明,在 ASED 和 RAVDESS 上,三个模型的平均结果几乎相同,表明阿姆哈里语和英语 SER 相当Difficult。类似地,德语 SER 更加Difficult,而乌尔都语 SER 相对更加容易。在实验二中,我们将一种语言作为源语言,将另一种语言作为目标语言,在两个方向上进行了每对测试。结果表明,当使用英语或德语作为源语言时,在 Amharic 作为目标语言时的最佳准确率比较高。在实验三中,我们使用了多种非阿姆哈里语言进行训练,然后在 Amharic 上进行测试。最佳准确率比较高,与实验二中的最佳准确率相比,表明可以通过使用两三种非阿姆哈里语言进行训练,而不是只使用一种非阿姆哈里语言,来获得更高的准确率。总的来说,结果表明,在语言资源稀缺时,可以使用交叉语言和多语言训练的策略来训练一个 SER 分类器,以提高其性能。

Perceptual Quality Assessment of Omnidirectional Audio-visual Signals

  • paper_url: http://arxiv.org/abs/2307.10813
  • repo_url: None
  • paper_authors: Xilei Zhu, Huiyu Duan, Yuqin Cao, Yuxin Zhu, Yucheng Zhu, Jing Liu, Li Chen, Xiongkuo Min, Guangtao Zhai
  • for: 评估普directional视频的质量,以提高用户的体验质量(Quality of Experience,QoE)。
  • methods: 使用多模式的音频和视频质量评估模型进行融合,以实现全referenced omnidirectional audio-visual quality assessment(OAVQA)。
  • results: 在大规模的音频视频质量评估 dataset 上验证了音频视频融合方法的效果,提供了新的 benchmark для普directional QoE 评估。
    Abstract Omnidirectional videos (ODVs) play an increasingly important role in the application fields of medical, education, advertising, tourism, etc. Assessing the quality of ODVs is significant for service-providers to improve the user's Quality of Experience (QoE). However, most existing quality assessment studies for ODVs only focus on the visual distortions of videos, while ignoring that the overall QoE also depends on the accompanying audio signals. In this paper, we first establish a large-scale audio-visual quality assessment dataset for omnidirectional videos, which includes 375 distorted omnidirectional audio-visual (A/V) sequences generated from 15 high-quality pristine omnidirectional A/V contents, and the corresponding perceptual audio-visual quality scores. Then, we design three baseline methods for full-reference omnidirectional audio-visual quality assessment (OAVQA), which combine existing state-of-the-art single-mode audio and video QA models via multimodal fusion strategies. We validate the effectiveness of the A/V multimodal fusion method for OAVQA on our dataset, which provides a new benchmark for omnidirectional QoE evaluation. Our dataset is available at https://github.com/iamazxl/OAVQA.
    摘要 这篇文章评估了单向影片(ODV)在医疗、教育、广告、旅游等应用领域的重要性,并认为评估ODV的质量是为服务提供者提高用户体验质量(QoE)的关键。然而,大多数现有的质量评估研究仅对ODV的视觉扭转进行评估,忽略了音频信号的影响。本文首先建立了大规模的音频视觉质量评估数据集,包括15高品质原始音频视觉内容中的375个扭转音频视觉序列,以及它们的相应感觉音频视觉质量分数。接着,我们设计了三个基线方法 для全referenced omnidirectional audio-visual quality assessment(OAVQA),这些方法结合了现有的单模式音频和视觉质量评估模型,并通过多模式融合策略进行融合。我们验证了这些A/V多模式融合方法的有效性,将提供一个新的OMNi-QoE评估指标。您可以在GitHub上获取我们的数据集:https://github.com/iamazxl/OAVQA。

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

  • paper_url: http://arxiv.org/abs/2307.10757
  • repo_url: https://github.com/happycolor/vesper
  • paper_authors: Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu
  • for: 本研究旨在适应大规模预训练模型(PTM)于语音情感识别任务。PTMs为人工智能预测新提供了光,但它们在特定任务上的性能可以进一步改进,而且在实际应用中使用PTMs可能会受到其巨大大小的限制。
  • methods: 本研究提出了一种优化大规模PTMs для特定任务的方法,以生成特定任务的PTMs。这种方法包括使用语音集基于WavLM的预训练encoder,并对语音中的情感特征进行识别。为了提高情感信息的敏感度,encoder使用了情感引导的遮盾策略,并在不同层次和层次之间进行协同和自我超vision。
  • results: 实验结果表明,使用4层的Vesper可以超越WavLM Base的12层,而12层的Vesper可以超越WavLM Large的24层。
    Abstract This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their considerable size. Above limitations spawn another research direction, namely, optimizing large-scale PTMs for specific tasks to generate task-specific PTMs that are both compact and effective. In this paper, we focus on the speech emotion recognition task and propose an improved emotion-specific pretrained encoder called Vesper. Vesper is pretrained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Subsequently, Vesper employs hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, both of which are crucial for emotion recognition. Experimental results on the IEMOCAP, MELD, and CREMA-D datasets demonstrate that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers.
    摘要 In this paper, the authors focus on the speech emotion recognition task and propose an improved emotion-specific pre-trained encoder called Vesper. Vesper is pre-trained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Additionally, Vesper uses hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, which are crucial for emotion recognition.Experimental results on the IEMOCAP, MELD, and CREMA-D datasets show that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers. This demonstrates that Vesper is a more effective and efficient approach for speech emotion recognition compared to existing PTMs.

PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

  • paper_url: http://arxiv.org/abs/2307.10628
  • repo_url: https://github.com/rst0070/Partial_Additive_Speech
  • paper_authors: Wonbin Kim, Hyun-seo Shin, Ju-ho Kim, Jungwoo Heo, Chan-yeong Lim, Ha-Jin Yu
  • for: 提高干扰环境下的 speaker verification(SV)系统的噪音抗性
  • methods: 提出了一种新的干扰方法:partial additive speech(PAS),用于培养SV系统在噪音环境下的性能
  • results: PAS方法相比传统的干扰方法,在EER指标下表现出4.64%和5.01%的相对改进,并通过注意模块和 speaker embedding 的分析,证明了该方法的效iveness。
    Abstract Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environments. The experimental results demonstrate that PAS outperforms traditional additive noise in terms of equal error rates (EER), with relative improvements of 4.64% and 5.01% observed in SE-ResNet34 and ECAPA-TDNN. We also show the effectiveness of proposed method by analyzing attention modules and visualizing speaker embeddings.
    摘要 背景噪声会降低语音智能和质量,使得说话验证(SV)在噪声环境下成为一项具有挑战性的任务。为了提高噪声Robustness of SV系统,通常使用添加噪声数据增强方法。在这篇论文中,我们提出了一种新的添加噪声方法,即部分添加语音(PAS),旨在培养SV系统免受噪声环境的影响。实验结果显示,PAS在EER方面超过传统的添加噪声,升高了4.64%和5.01%的提升,分别在SE-ResNet34和ECAPA-TDNN中。我们还通过分析注意力模块和可见化说话特征来证明提案的有效性。

Transsion TSUP’s speech recognition system for ASRU 2023 MADASR Challenge

  • paper_url: http://arxiv.org/abs/2307.11778
  • repo_url: None
  • paper_authors: Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue Yang, Jianchao Zhu
  • for: 这个论文是为了提出一种用于ASRU 2023 MADASR Challenge的语音识别系统,并且专注于适应低资源印度语言的ASR模型。
  • methods: 该系统使用了一个压缩转换器encoder和一个反向传输层decoder,并使用了共同CTC-Attention训练损失。此外,在TLG beam search解码中还使用了一个外部KenLM语言模型。
  • results: 该方法在四个轨道上实现了单词错误率(WER)为24.17%、24.43%、15.97%和15.97% для孟买语言,以及WER为19.61%、19.54%、15.48%和15.48% для拜克瑞语言。这些结果表明该方法的有效性。
    Abstract This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding. For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge. The proposed method achieved word error rates (WER) of 24.17%, 24.43%, 15.97%, and 15.97% for Bengali language in the four tracks, and WER of 19.61%, 19.54%, 15.48%, and 15.48% for Bhojpuri language in the four tracks. These results demonstrate the effectiveness of the proposed method.
    摘要 For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding.For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge.The proposed method achieved word error rates (WER) of 24.17%, 24.43%, 15.97%, and 15.97% for Bengali language in the four tracks, and WER of 19.61%, 19.54%, 15.48%, and 15.48% for Bhojpuri language in the four tracks. These results demonstrate the effectiveness of the proposed method.

cs.CV - 2023-07-20

Spinal nerve segmentation method and dataset construction in endoscopic surgical scenarios

  • paper_url: http://arxiv.org/abs/2307.10955
  • repo_url: https://github.com/zzzzzzpc/funet
  • paper_authors: Shaowu Peng, Pengcheng Zhao, Yongyu Ye, Junying Chen, Yunbing Chang, Xiaoqing Zheng
  • for: 该研究旨在提供一种实时分割方法,帮助外科医生在endooscopic surgery中避免损害脊椎神经。
  • methods: 该研究使用了一个精心标注的分割数据集,并提出了基于这个数据集的 Frame-Unet 模型,利用了间帧信息和自注意机制,实现了当前最佳性能。
  • results: 研究表明, Frame-Unet 模型在一个相似的肠苔视频数据集上表现出了良好的泛化能力,并且在实际endooscopic surgery中具有优异的表现。Here’s the full Chinese text:
  • for: 该研究旨在提供一种实时分割方法,帮助外科医生在endooscopic surgery中避免损害脊椎神经。
  • methods: 该研究使用了一个精心标注的分割数据集,并提出了基于这个数据集的 Frame-Unet 模型,利用了间帧信息和自注意机制,实现了当前最佳性能。
  • results: 研究表明, Frame-Unet 模型在一个相似的肠苔视频数据集上表现出了良好的泛化能力,并且在实际endooscopic surgery中具有优异的表现。
    Abstract Endoscopic surgery is currently an important treatment method in the field of spinal surgery and avoiding damage to the spinal nerves through video guidance is a key challenge. This paper presents the first real-time segmentation method for spinal nerves in endoscopic surgery, which provides crucial navigational information for surgeons. A finely annotated segmentation dataset of approximately 10,000 consec-utive frames recorded during surgery is constructed for the first time for this field, addressing the problem of semantic segmentation. Based on this dataset, we propose FUnet (Frame-Unet), which achieves state-of-the-art performance by utilizing inter-frame information and self-attention mechanisms. We also conduct extended exper-iments on a similar polyp endoscopy video dataset and show that the model has good generalization ability with advantageous performance. The dataset and code of this work are presented at: https://github.com/zzzzzzpc/FUnet .
    摘要 现代endooscopic surgery中,避免神经bundles损害是一项关键挑战。这篇论文提出了首个实时分割方法,用于在endooscopic surgery中分割脊梗神经。这种方法提供了重要的导航信息,帮助外科医生进行更加精准的手术。为了解决semantic segmentation问题,我们构建了约10,000帧 consecutiverecorded during surgery的精心注释分割集合,这是endooscopic surgery领域中的首次。基于这个分割集合,我们提出了Frame-Unet(FUnet),它利用了inter-frame信息和自我注意机制,并达到了当前最佳性能。我们还对相似的polyp endoscopy video dataset进行了广泛的实验,并证明了模型具有良好的泛化能力。 dataset和代码可以在:https://github.com/zzzzzzpc/FUnet 上获取。

Soft-tissue Driven Craniomaxillofacial Surgical Planning

  • paper_url: http://arxiv.org/abs/2307.10954
  • repo_url: None
  • paper_authors: Xi Fang, Daeseung Kim, Xuanang Xu, Tianshu Kuang, Nathan Lampen, Jungwook Lee, Hannah H. Deng, Jaime Gateno, Michael A. K. Liebschner, James J. Xia, Pingkun Yan
  • for: correction of facial deformities in CMF surgery
  • methods: soft-tissue driven framework that combines bony planner network and facial simulator network
  • results: improved accuracy and efficacy of surgical planning compared to conventional bone-driven approach
    Abstract In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct facial deformities. Despite efforts to simulate facial changes resulting from bony movement, surgical planning still relies on iterative revisions and educated guesses. To address these issues, we propose a soft-tissue driven framework that can automatically create and verify surgical plans. Our framework consists of a bony planner network that estimates the bony movements required to achieve the desired facial outcome and a facial simulator network that can simulate the possible facial changes resulting from the estimated bony movement plans. By combining these two models, we can verify and determine the final bony movement required for planning. The proposed framework was evaluated using a clinical dataset, and our experimental results demonstrate that the soft-tissue driven approach greatly improves the accuracy and efficacy of surgical planning when compared to the conventional bone-driven approach.
    摘要 在CMF手术中,计划骨骼运动以实现愿景型脸结果是一项具有挑战性的任务。现有的骨骼驱动方法假设将骨骼归正,并且期望通过这种方法来 correect facial appearance。然而,由于脸部软组织和骨骼之间的复杂非线性关系,这些骨骼驱动方法无法正确地修复脸部异常。尽管尝试通过模拟骨骼运动所导致的脸部变化来估算 facial changes,但 surgical planning仍然依赖于迭代修改和教育的尝试。为解决这些问题,我们提出了一种软组织驱动框架,可以自动创建和验证手术计划。我们的框架包括一个骨骼规划网络,可以估算需要实现愿景型脸结果所需的骨骼运动,以及一个脸部模拟网络,可以模拟骨骼运动所导致的可能的脸部变化。通过将这两个模型相结合,我们可以验证和确定最终的骨骼运动计划。我们的实验结果表明,相比传统的骨骼驱动方法,我们的软组织驱动方法可以大幅提高手术规划的准确性和效果。

Improving Online Lane Graph Extraction by Object-Lane Clustering

  • paper_url: http://arxiv.org/abs/2307.10947
  • repo_url: https://github.com/ybarancan/object_lane
  • paper_authors: Yigit Baran Can, Alexander Liniger, Danda Pani Paudel, Luc Van Gool
  • for: 提高自动驾驶的本地场景理解精度
  • methods: 使用3D对象探测输出进行中心线分配和路径估计
  • results: 提高了本地路径估计精度,比前景方法有显著改进
    Abstract Autonomous driving requires accurate local scene understanding information. To this end, autonomous agents deploy object detection and online BEV lane graph extraction methods as a part of their perception stack. In this work, we propose an architecture and loss formulation to improve the accuracy of local lane graph estimates by using 3D object detection outputs. The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers and the objects as data points to be assigned a probability distribution over the cluster centers. This training scheme ensures direct supervision on the relationship between lanes and objects, thus leading to better performance. The proposed method improves lane graph estimation substantially over state-of-the-art methods. The extensive ablations show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods. Since our method uses the detection outputs rather than detection method intermediate representations, a single model of our method can use any detection method at test time.
    摘要 自适应驾驶需要准确的本地场景理解信息。为此,自适应代理将对象检测和在线BEV车道图EXTRACT方法作为其感知堆栈的一部分。在这项工作中,我们提出了一种体制和损失函数设计,以提高本地车道图估算的准确性。我们的方法通过考虑中心线为集中点,对象为数据点分布的概率分布来分配对象。这种训练方案直接supervise了车道和对象之间的关系,从而导致更好的性能。我们的方法在比较前一些方法的基础上显著提高了车道图估算。我们的广泛ablation表明,我们的方法可以通过使用现有3D对象检测方法的输出来实现显著性能提高。由于我们的方法使用检测输出而不是检测方法中间表示,因此在测试时可以使用任何检测方法。

OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios

  • paper_url: http://arxiv.org/abs/2307.10934
  • repo_url: None
  • paper_authors: Aditya Nalgunda Ganesh, Dhruval Pobbathi Badrinath, Harshith Mohan Kumar, Priya SS, Surabhi Narayan
  • for: 实现自主NAVIGATION中的环境感知,使用自我超级vised monocular depth estimation算法输出失真图。
  • methods: 提出了一个名为OCTraN的transformer架构,使用迭代注意力将2D影像特征转换为3Doccupancy特征,并使用卷积和倒卷积效率地处理空间信息。
  • results: 提出了一个自我超级vised training管道,可以将模型扩展到任何场景,并且不需要LiDAR真实ground truth,可以使用pseudo-ground truth标签来取代。
    Abstract Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth by substituting it with pseudo-ground truth labels obtained from boosted monocular depth estimation.
    摘要 现代方法 для自主导航中的环境感知广泛使用自我超vised单目深度估计算算法,输出的是 disparity 地图。但当这个disparity地图被投影到3D空间时,错误会被夹攻ive,导致depth估计错误增加平方根根据相机距离的距离。虽然激光探测(LiDAR)可以解决这个问题,但它是昂贵的并不适合许多应用。为了解决低成本感知器中的准确距离问题,我们提出了OCTraN,一种基于 transformer 架构的iterative-attention转换2D图像特征为3D占用特征,并使用 convolution 和 transpose convolution 有效地处理空间信息。我们还开发了一个自我超vised训练管道,以便在不需要 LiDAR 真实数据的情况下,通过 pseudo-ground truth 标签从 boosted 单目深度估计中提取出精度的训练数据。

Modeling 3D cardiac contraction and relaxation with point cloud deformation networks

  • paper_url: http://arxiv.org/abs/2307.10927
  • repo_url: None
  • paper_authors: Marcel Beetz, Abhirup Banerjee, Vicente Grau
  • for: 该研究旨在提出一种基于点云深度学习的新方法,用于模拟心脏功能的3D弹性变化。
  • methods: 该方法使用点云深度学习的最新进展,将心脏功能的3D点云表示转化为多级别的特征学习。
  • results: 研究人员对大量的UK Biobank数据集进行了测试,并发现了PCD-Net可以准确地预测心脏功能的3D弹性变化,并且可以更好地识别正常人群和急性心肺病(MI)患者之间的差异。
    Abstract Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances between the predicted and ground truth anatomies below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.
    摘要 全球唯一价值指标通常用于临床实践中,如抽射率,只提供有限的true 3D 冠状膜弹性过程的信息,因此限制了健康和疾病冠状机械的理解。在这项工作中,我们提议使用Point Cloud Deformation Network(PCD-Net)作为一种新的几何深度学技术,以模型3D 冠状膜收缩和膨涨过程。它采用了最新的点云基于深度学技术,组织成编码器-解码器结构,以便高效地学习多尺度特征。我们在UK Biobank研究中的大量数据集上评估了我们的方法,并发现了下面的 Chamfer 距离值下than the pixel resolution of the underlying image acquisition。此外,我们发现了预测和实际数据集之间的临床指标相似性,并示出PCD-Net可以成功地捕捉冠状膜疾病特征。最后,我们表明了学习的3D 弹性模式在预测冠状膜疾病和预测冠状膜疾病的任务上比多种临床标准更高,提高了7%和13%。

Confidence intervals for performance estimates in 3D medical image segmentation

  • paper_url: http://arxiv.org/abs/2307.10926
  • repo_url: https://github.com/rosanajurdi/SegVal_TMI
  • paper_authors: R. El Jurdi, G. Varoquaux, O. Colliot
  • for: 本研究探讨了医疗像素分割模型的实际评估方法,以及如何更正准确地计算测试集大小所需的样本数量。
  • methods: 本研究使用了nnU-net框架和医疗挑战赛 dataset,并使用了两个性能指标: dice精度和 Hausdorff 距离。
  • results: 研究发现,在不同的测试集大小和性能指标的标准差下,参数型 confidence interval 是一个可靠的估计方法,并且需要更少的样本数量来达到给定的精度水平。 Typically, 需要100-200个测试样本,而在更Difficult segmentation task中,可能需要更多的样本。
    Abstract Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across of the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in medical image segmentation. We carry experiments on 3D image segmentation using the standard nnU-net framework, two datasets from the Medical Decathlon challenge and two performance measures: the Dice accuracy and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples.
    摘要 医疗分割模型通常被实证性地评估。由于这种评估基于有限的示例图像,因此不可避免噪声。而在医疗图像分割中,rarelyReporting confidence intervals是关键的。宽度取决于测试集大小和性能指标(其标准差)的覆盖率。为 classification,需要许多测试图像,以避免宽度的信任间隔。但是,分割却没有被研究,而且它们的信任间隔取决于给定测试图像的信息量。在这篇文章中,我们研究了医疗图像分割中的典型信任间隔。我们在使用标准 nnU-net 框架,以及 Medical Decathlon 挑战赛中的两个数据集进行实验。我们发现, Parametric confidence intervals 是可靠的近似值,并且它们与 bootstrap 估计有关。重要的是,我们发现,为了达到某种精度,测试样本的数量通常比 классификация任务要少。例如,在标准差 relativately 低(约3%)时,一个 1% 宽的信任间隔需要约 100-200 个测试样本。难度更高的分割任务可能会导致更高的标准差,需要更多的测试样本。

Intrinsic Appearance Decomposition Using Point Cloud Representation

  • paper_url: http://arxiv.org/abs/2307.10924
  • repo_url: https://github.com/xyxingx/PoInt-Net
  • paper_authors: Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers
  • for: 根据点云数据协同预测照明、质量和阴影,解决图像内部积分问题。
  • methods: 提议使用点云表示来解决图像内部积分问题,并使用Point Intrinsic Net(简称PoInt-Net)结合预测照明、光源方向和阴影。
  • results: 对多个数据集进行比较,PoInt-Net具有高精度、效率和稳定性。具体来说,它在多个纬度上超过了基于2D图像的方法,并且可以在任何大小的点云上培训和稳定地运行。
    Abstract Intrinsic decomposition is to infer the albedo and shading from the image. Since it is a heavily ill-posed problem, previous methods rely on prior assumptions from 2D images, however, the exploration of the data representation itself is limited. The point cloud is known as a rich format of scene representation, which naturally aligns the geometric information and the color information of an image. Our proposed method, Point Intrinsic Net, in short, PoInt-Net, jointly predicts the albedo, light source direction, and shading, using point cloud representation. Experiments reveal the benefits of PoInt-Net, in terms of accuracy, it outperforms 2D representation approaches on multiple metrics across datasets; in terms of efficiency, it trains on small-scale point clouds and performs stably on any-scale point clouds; in terms of robustness, it only trains on single object level dataset, and demonstrates reasonable generalization ability for unseen objects and scenes.
    摘要 <>对给定图像进行内在几何分解,以估算图像中的反射率和照明方向。由于这是一个极度不定Problem,前一代方法通常基于2D图像的假设,但是它们对数据表示的探索受限。Point cloud是一种 ric丰富的场景表示格式,自然地将地形信息和图像中的颜色信息相匹配。我们提出的方法,Point Intrinsic Net,简称为PoInt-Net,使用点云表示来同时预测反射率、照明方向和镜面照明。实验表明,PoInt-Net在准确性、效率和稳定性方面都有明显的优势。准确性方面,它在多个数据集上的多个指标上超过2D表示方法;效率方面,它可以在小规模点云上训练,并在任何规模的点云上稳定地进行预测;Robustness方面,它只需要在单个对象级别上训练,并在未seen对象和场景上示出了合理的泛化能力。Note: " ric丰富" is a typo, it should be " ric丰盈" (meaning "rich" in Chinese).

Language-based Action Concept Spaces Improve Video Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2307.10922
  • repo_url: None
  • paper_authors: Kanchana Ranasinghe, Michael Ryoo
  • for: 学习高效传递和稳定的视频 Representation
  • methods: 使用语言关联自我指导学习对图像 CLIP 模型进行适应
  • results: 提高 zero-shot 和线性考核表现 на 三个动作认识 bencmarksIn English:
  • for: Learning highly transferable and robust video representations
  • methods: Using language-tied self-supervised learning to adapt an image CLIP model to the video domain
  • results: Improved zero-shot and linear probing performance on three action recognition benchmarks
    Abstract Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domains with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.
    摘要 注意:下面的文本是使用简化中文(Simplified Chinese)进行翻译的。近期的语言图像对比预训练技术已经导致了图像表示的高度传输和鲁棒性。然而,将这些模型应用到视频频道上 WITH 最小的监督仍然是一个开放的问题。我们通过使用语言绑定的自我超vised学习来适应图像 CLIP 模型到视频频道。我们修改了包括时间模型化的背部,并在自我折衔设置下使用训练目标在动作概念空间中运行。我们使用语言解码器提取的不同动作概念的特征向量来构建这个空间。我们引入了两个训练目标,概念精炼和概念对接,以保持原始表示的通用性,同时强制行为和其特征之间的关系。我们的方法提高了零配置和线性探测性能在三个动作识别标准 benchmark 上。

Revisiting Fine-Tuning Strategies for Self-supervised Medical Imaging Analysis

  • paper_url: http://arxiv.org/abs/2307.10915
  • repo_url: None
  • paper_authors: Muhammad Osama Khan, Yi Fang
  • for: 本研究旨在探讨自助学习(SSL)是否可以更好地利用预训练知识,以及预训练层数和精度之间的关系。
  • methods: 本研究使用了强制对比和恢复SSL的基线,并对多个预训练和精度训练数据集进行了广泛的细化分析。
  • results: 研究发现,在四个不同的下游任务上, intermediate layers fine-tuning 比 end-to-end fine-tuning 更有效,而且在不同的SSL类型下,不同的预训练层数也有不同的优化效果。使用这些发现,研究提出了一种简单 yet effective的多SSL模型协同使用方法,可以提高自助学习医疗影像分析的性能。
    Abstract Despite the rapid progress in self-supervised learning (SSL), end-to-end fine-tuning still remains the dominant fine-tuning strategy for medical imaging analysis. However, it remains unclear whether this approach is truly optimal for effectively utilizing the pre-trained knowledge, especially considering the diverse categories of SSL that capture different types of features. In this paper, we first establish strong contrastive and restorative SSL baselines that outperform SOTA methods across four diverse downstream tasks. Building upon these strong baselines, we conduct an extensive fine-tuning analysis across multiple pre-training and fine-tuning datasets, as well as various fine-tuning dataset sizes. Contrary to the conventional wisdom of fine-tuning only the last few layers of a pre-trained network, we show that fine-tuning intermediate layers is more effective, with fine-tuning the second quarter (25-50%) of the network being optimal for contrastive SSL whereas fine-tuning the third quarter (50-75%) of the network being optimal for restorative SSL. Compared to the de-facto standard of end-to-end fine-tuning, our best fine-tuning strategy, which fine-tunes a shallower network consisting of the first three quarters (0-75%) of the pre-trained network, yields improvements of as much as 5.48%. Additionally, using these insights, we propose a simple yet effective method to leverage the complementary strengths of multiple SSL models, resulting in enhancements of up to 3.57% compared to using the best model alone. Hence, our fine-tuning strategies not only enhance the performance of individual SSL models, but also enable effective utilization of the complementary strengths offered by multiple SSL models, leading to significant improvements in self-supervised medical imaging analysis.
    摘要 尽管自我超级学习(SSL)的进步迅速,但END-TO-END fine-tuning仍然是医学影像分析中最主要的精度调整策略。然而,是否这种方法是最佳方式,以便有效地利用预训练知识,尤其是考虑到不同类型的SSL捕捉不同类型的特征。在这篇论文中,我们首先建立了强大的对比和修复SSL基线,超过了当前的标准方法。基于这些强大的基线,我们进行了多个预训练和精度调整数据集的广泛分析,以及不同的精度调整数据集大小。与传统的精度调整仅训练最后几层的网络不同,我们发现,在对比SSL中,中间层的精度调整更有效,而在修复SSL中,第三季(50-75%)的网络精度调整更加优化。相比于传统的精度调整方法,我们的最佳精度调整策略,即精度调整0-75%的预训练网络,可以提高达5.48%。此外,我们还提出了一种简单 yet effective的方法,使得多个SSL模型之间的补做效果,从而实现更大的提升。因此,我们的精度调整策略不仅提高了单个SSL模型的性能,还使得多个SSL模型之间的补做效果,导致了医学影像分析中的自助学习得到了重要的提升。

Diffusion Sampling with Momentum for Mitigating Divergence Artifacts

  • paper_url: http://arxiv.org/abs/2307.11118
  • repo_url: https://github.com/sWizad/momentum-diffusion
  • paper_authors: Suttisak Wizadwongsa, Worameth Chinchuthakun, Pramook Khungurn, Amit Raj, Supasorn Suwajanakorn
  • for: 提高Diffusion模型的批处速度,尤其是在低步骤情况下。
  • methods: 使用ODE/SDE重新表述批处 sampling,并引入高阶数字方法,但这些方法经常会产生异常artefacts,特别是在低步骤情况下。
  • results: 我们的研究发现了这些artefacts的可能性性 causes,并提出了两种新的技术来解决这个问题。首先,我们将重要的滚动阶段(Heavy Ball) momentum纳入现有的批处数字方法中,并证明这些方法具有第一阶度收敛性。其次,我们提出了一种新的高阶方法,即通用增强滚动阶段(GHVB),它可以在不同的步骤情况下提供可变的质量和artefact消除之间的质量。实验结果表明,我们的技术可以高效地减少artefacts并提高图像质量,超过当前的批处推导器在像素基和幂值基的批处模型中。
    Abstract Despite the remarkable success of diffusion models in image generation, slow sampling remains a persistent issue. To accelerate the sampling process, prior studies have reformulated diffusion sampling as an ODE/SDE and introduced higher-order numerical methods. However, these methods often produce divergence artifacts, especially with a low number of sampling steps, which limits the achievable acceleration. In this paper, we investigate the potential causes of these artifacts and suggest that the small stability regions of these methods could be the principal cause. To address this issue, we propose two novel techniques. The first technique involves the incorporation of Heavy Ball (HB) momentum, a well-known technique for improving optimization, into existing diffusion numerical methods to expand their stability regions. We also prove that the resulting methods have first-order convergence. The second technique, called Generalized Heavy Ball (GHVB), constructs a new high-order method that offers a variable trade-off between accuracy and artifact suppression. Experimental results show that our techniques are highly effective in reducing artifacts and improving image quality, surpassing state-of-the-art diffusion solvers on both pixel-based and latent-based diffusion models for low-step sampling. Our research provides novel insights into the design of numerical methods for future diffusion work.
    摘要 尽管扩散模型在图像生成中取得了很大成功,但扩散采样仍然是一个持续存在的问题。以前的研究已经将扩散采样 reformulate 为 ODE/SDE 和引入更高级数学方法,但这些方法经常会产生漏斗artefacts,特别是采样步数较少时,这限制了可以实现的加速。在这篇论文中,我们调查了这些artefacts的可能原因,并认为小稳定区域是主要的原因。为了解决这个问题,我们提出了两种新的技术。第一种技术是在现有的扩散数学方法中添加重力球(HB)势,这是一种广泛应用于优化的技术,可以扩大稳定区域。我们也证明了这些方法具有首个收敛性。第二种技术是称为通用重力球(GHVB),它构建了一种新的高阶方法,可以在精度和漏斗artefacts之间选择变量负担。实验结果表明,我们的技术可以减少漏斗artefacts,提高图像质量,超过了当前的扩散解决方案。我们的研究为未来扩散工作提供了新的视角。

WeakPolyp: You Only Look Bounding Box for Polyp Segmentation

  • paper_url: http://arxiv.org/abs/2307.10912
  • repo_url: https://github.com/weijun88/weakpolyp
  • paper_authors: Jun Wei, Yiwen Hu, Shuguang Cui, S. Kevin Zhou, Zhen Li
    for: 这个论文的目的是提出一种基于 bounding box 标注的弱型肿瘤分割模型(即 WeakPolyp),以减少标注成本。methods: 这个模型使用了 mask-to-box (M2B) 变换,以减少粗略标注的干扰,并使用 scale consistency (SC) 损失以 dense supervision。results: 实验表明,提出的 WeakPolyp 模型可以 дости到与完全监督模型相同的性能,而不需要任何涂抹标注。
    Abstract Limited by expensive pixel-level labels, polyp segmentation models are plagued by data shortage and suffer from impaired generalization. In contrast, polyp bounding box annotations are much cheaper and more accessible. Thus, to reduce labeling cost, we propose to learn a weakly supervised polyp segmentation model (i.e., WeakPolyp) completely based on bounding box annotations. However, coarse bounding boxes contain too much noise. To avoid interference, we introduce the mask-to-box (M2B) transformation. By supervising the outer box mask of the prediction instead of the prediction itself, M2B greatly mitigates the mismatch between the coarse label and the precise prediction. But, M2B only provides sparse supervision, leading to non-unique predictions. Therefore, we further propose a scale consistency (SC) loss for dense supervision. By explicitly aligning predictions across the same image at different scales, the SC loss largely reduces the variation of predictions. Note that our WeakPolyp is a plug-and-play model, which can be easily ported to other appealing backbones. Besides, the proposed modules are only used during training, bringing no computation cost to inference. Extensive experiments demonstrate the effectiveness of our proposed WeakPolyp, which surprisingly achieves a comparable performance with a fully supervised model, requiring no mask annotations at all.
    摘要 限于昂贵的像素级标签,肿瘤分割模型受到数据不足和模型适应性下降的困扰。相比之下,肿瘤 bounding box 标注更加可 accessible,因此我们提议学习一个弱型肿瘤分割模型(i.e., WeakPolyp),完全基于 bounding box 标注进行学习。然而,粗略的 bounding box 包含了过多的噪声。为了避免干扰,我们引入 mask-to-box(M2B)变换。通过对预测外部 box máscara 进行监督,而不是直接监督预测自身,M2B 可以减少 mistmatch between 粗略标注和精确预测。然而,M2B 只提供笼统监督,导致预测不唯一。因此,我们进一步提议一个权重Consistency(SC)损失,以 dense 监督。通过将预测在不同的图像上进行对齐,SC 损失可以减少预测的变化。注意,我们的 WeakPolyp 是一个插件型模型,可以轻松地портирова到其他吸引人的背bone。此外,我们提posed的模块只在训练时使用,无需在推理时计算。广泛的实验表明我们的提议的 WeakPolyp surprisingly 达到了与完全监督模型相同的性能,不需要任何 mask 标注。

Variational Point Encoding Deformation for Dental Modeling

  • paper_url: http://arxiv.org/abs/2307.10895
  • repo_url: None
  • paper_authors: Johan Ziruo Ye, Thomas Ørkild, Peter Lempel Søndergaard, Søren Hauberg
  • for: 本研究旨在鼓励进一步的数字牙科研究,发布新的大量牙齿数据集,并提出Variational FoldingNet(VF-Net)模型,用于掌握牙齿点云表示的概率学学习。
  • methods: 本研究使用Variational FoldingNet(VF-Net)模型,扩展了FoldingNet模型,以实现牙齿点云表示的概率学学习。VF-Net模型解决了现有笛卡尔距离metric不具有归一化分布的问题,从而提高计算效率和牙齿点云 reconstruction的准确性。
  • results: 本研究的实验结果表明,VF-Net模型在牙齿扫描重建和推测方面表现出了superior的性能,并且对牙齿点云的latent表示进行了robust性测试。这些结果证明了VF-Net模型作为牙齿点云重建和分析的有效和可靠方法。
    Abstract Digital dentistry has made significant advancements in recent years, yet numerous challenges remain to be addressed. In this study, we release a new extensive dataset of tooth meshes to encourage further research. Additionally, we propose Variational FoldingNet (VF-Net), which extends FoldingNet to enable probabilistic learning of point cloud representations. A key challenge in existing latent variable models for point clouds is the lack of a 1-to-1 mapping between input points and output points. Instead, they must rely on optimizing Chamfer distances, a metric that does not have a normalized distributional counterpart, preventing its usage in probabilistic models. We demonstrate that explicit minimization of Chamfer distances can be replaced by a suitable encoder, which allows us to increase computational efficiency while simplifying the probabilistic extension. Our experimental findings present empirical evidence demonstrating the superior performance of VF-Net over existing models in terms of dental scan reconstruction and extrapolation. Additionally, our investigation highlights the robustness of VF-Net's latent representations. These results underscore the promising prospects of VF-Net as an effective and reliable method for point cloud reconstruction and analysis.
    摘要 “数字 dentistry 在 recent 年来已经做出了 significative 进步,但还有许多挑战需要解决。在这项研究中,我们发布了一个新的广泛的 tooth 精度数据集,以便进一步研究。此外,我们提出了 Variational FoldingNet (VF-Net),这是一种扩展 FoldingNet 以实现点云表示的 probabilistic 学习。现有的积分变量模型对点云输入的挑战之一是输入点cloud 与输出点云之间不存在 1-to-1 映射,而是通过优化 Chamfer 距离来优化模型。我们证明了 Chamfer 距离的直接最小化可以被替换为适当的编码器,这使得我们可以提高计算效率,并简化 probabilistic 扩展。我们的实验成果表明 VF-Net 在 dental scan 重建和推广方面比现有模型表现更好,并且其 latent 表示的稳定性也得到了证明。这些结果表明 VF-Net 是一种有效和可靠的点云重建和分析方法。”

Human Motion Generation: A Survey

  • paper_url: http://arxiv.org/abs/2307.10894
  • repo_url: None
  • paper_authors: Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang
  • for: 这篇论文旨在为人体动作生成技术提供一个全面的文献综述,以便为研究人员提供一个快速入门的机会,并鼓励新的研究方向。
  • methods: 本文提出的方法主要包括文本、音频和场景等条件征文生成人体动作的三大类方法,具体来说是:文本征文生成人体动作、音频征文生成人体动作和场景征文生成人体动作。
  • results: 本文的结果主要表明,人体动作生成领域在过去几年内取得了重要进步,但仍然存在许多挑战,如人体动作的复杂性和条件征文的隐式关系。
    Abstract Human motion generation aims to generate natural human pose sequences and shows immense potential for real-world applications. Substantial progress has been made recently in motion data collection technologies and generation methods, laying the foundation for increasing interest in human motion generation. Most research within this field focuses on generating human motions based on conditional signals, such as text, audio, and scene contexts. While significant advancements have been made in recent years, the task continues to pose challenges due to the intricate nature of human motion and its implicit relationship with conditional signals. In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field. We begin by introducing the background of human motion and generative models, followed by an examination of representative methods for three mainstream sub-tasks: text-conditioned, audio-conditioned, and scene-conditioned human motion generation. Additionally, we provide an overview of common datasets and evaluation metrics. Lastly, we discuss open problems and outline potential future research directions. We hope that this survey could provide the community with a comprehensive glimpse of this rapidly evolving field and inspire novel ideas that address the outstanding challenges.
    摘要 人体动作生成目标是生成自然的人体姿势序列,具有很大的应用前途。最近几年来,人体动作数据采集技术和生成方法的进步非常大,为人体动作生成领域的兴趣培育提供了良好的基础。大多数研究在这个领域都是基于决定信号,如文本、音频和场景上下文,来生成人体动作。虽然在过去几年内有了 significative 的进步,但是这个任务仍然存在许多挑战,主要是因为人体动作的复杂性和决定信号的间接关系。在这篇评论中,我们提供了人体动作生成领域的全面的文献综述,到目前为止没有任何一篇。我们开始于人体动作和生成模型的背景,然后探讨了三个主流子任务的表现:文本决定、音频决定和场景决定的人体动作生成。此外,我们还提供了常用的数据集和评价指标的概述。最后,我们讨论了现有的问题和未来研究方向。我们希望这篇评论可以为这个领域提供一个全面的视图,并鼓励社区提出新的想法,以解决现有的挑战。

Risk-optimized Outlier Removal for Robust Point Cloud Classification

  • paper_url: http://arxiv.org/abs/2307.10875
  • repo_url: None
  • paper_authors: Xinke Li, Junchi Lu
  • for: 提高点云深度模型的可靠性和安全性,抗击意外或自然发生的点云噪声。
  • methods: 提出了一种新的点云异常点除掉方法 called PointCVaR,可以让标准训练的模型消除额外的异常点并重新还原数据。该方法基于对每个点的影响分析,以及Conditional Value at Risk(CVaR)作为目标函数来优化过滤高风险点的过程。
  • results: 在不同的去噪分类实验中,PointCVaR方法能够达到87%的防御效果,从而提高点云分类的精度和可靠性。
    Abstract The popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.
    摘要 popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.Here's the translation in Traditional Chinese:popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.

Conservative Estimation of Perception Relevance of Dynamic Objects for Safe Trajectories in Automotive Scenarios

  • paper_url: http://arxiv.org/abs/2307.10873
  • repo_url: None
  • paper_authors: Ken Mori, Kai Storms, Steven Peters
  • for: 本研究旨在提供一种新的方法来解决自动驾驶系统测试中的效率检测挑战,即确定测试需求和适用方法。
  • methods: 本研究使用了一种新的方法,通过对各个功能场景进行分解,并对各个场景中可能的动作进行形式化表述,以 derive relevance criteria。
  • results: 研究结果表明,该方法可以提供一种保守的估计,即哪些动物对感知 module 是重要的,需要进行完整的评估。此外,研究还提供了一种可视化的示例,以证明方法的可行性。
    Abstract Having efficient testing strategies is a core challenge that needs to be overcome for the release of automated driving. This necessitates clear requirements as well as suitable methods for testing. In this work, the requirements for perception modules are considered with respect to relevance. The concept of relevance currently remains insufficiently defined and specified. In this paper, we propose a novel methodology to overcome this challenge by exemplary application to collision safety in the highway domain. Using this general system and use case specification, a corresponding concept for relevance is derived. Irrelevant objects are thus defined as objects which do not limit the set of safe actions available to the ego vehicle under consideration of all uncertainties. As an initial step, the use case is decomposed into functional scenarios with respect to collision relevance. For each functional scenario, possible actions of both the ego vehicle and any other dynamic object are formalized as equations. This set of possible actions is constrained by traffic rules, yielding relevance criteria. As a result, we present a conservative estimation which dynamic objects are relevant for perception and need to be considered for a complete evaluation. The estimation provides requirements which are applicable for offline testing and validation of perception components. A visualization is presented for examples from the highD dataset, showing the plausibility of the results. Finally, a possibility for a future validation of the presented relevance concept is outlined.
    摘要 有效的测试策略是自动驾驶发布的核心挑战之一。这需要清晰的需求以及适合的测试方法。在这项工作中,我们考虑了感知模块的需求,尤其是 relevance 的问题。现在 relevance 的概念尚未充分定义和规定。在这篇论文中,我们提出了一种新的方法来解决这个挑战,通过应用到高速公路领域的冲突安全场景。使用这种总体系统和用例 спецификация,我们 derive 了一种新的 relevance 概念。不重要的对象被定义为不限制 egO 车在考虑所有不确定性下的安全行为的对象。在进行初步分解的使用 случа,我们将用例分解成功能场景,并将每个场景中 egO 车和任何其他动态对象的可能行动 formalize 为方程。这些可能的行动集被限制为交通规则,从而得到 relevance 标准。这些标准提供了对某些测试和验证感知组件的需求,这些需求可以在线上进行测试和验证。我们还提供了一个可视化的例子,来说明高D数据集中的可能性。最后,我们提出了一种将来验证 relevance 概念的可能性。

BlendFace: Re-designing Identity Encoders for Face-Swapping

  • paper_url: http://arxiv.org/abs/2307.10854
  • repo_url: https://github.com/mapooon/blendface
  • paper_authors: Kaede Shiohara, Xingchao Yang, Takafumi Taketomi
    for: 这个研究的目的是如何独特地辨识人脸,并且将人脸辨识模型中的属性偏好降低为最小化。methods: 这个研究使用了一种新的人脸辨识器,即BlendFace,它通过在混合图像中替换属性来将属性与人脸辨识独立开来。BlendFace使用了一个新的损失函数来引导生成器,以确保生成的人脸具有正确的属性。results: 实验结果显示,BlendFace能够将人脸辨识模型中的属性偏好降低为最小化,同时维持与前一代方法相似的量化性能。
    Abstract The great advancements of generative adversarial networks and face recognition models in computer vision have made it possible to swap identities on images from single sources. Although a lot of studies seems to have proposed almost satisfactory solutions, we notice previous methods still suffer from an identity-attribute entanglement that causes undesired attributes swapping because widely used identity encoders, eg, ArcFace, have some crucial attribute biases owing to their pretraining on face recognition tasks. To address this issue, we design BlendFace, a novel identity encoder for face-swapping. The key idea behind BlendFace is training face recognition models on blended images whose attributes are replaced with those of another mitigates inter-personal biases such as hairsyles. BlendFace feeds disentangled identity features into generators and guides generators properly as an identity loss function. Extensive experiments demonstrate that BlendFace improves the identity-attribute disentanglement in face-swapping models, maintaining a comparable quantitative performance to previous methods.
    摘要 “ generator adversarial networks 和脸部识别模型 在计算机视觉领域的进步使得可以将图像中的身份交换。although many studies have proposed almost satisfactory solutions, previous methods still suffer from an identity-attribute entanglement that causes undesired attribute swapping because widely used identity encoders, such as ArcFace, have some crucial attribute biases owing to their pretraining on face recognition tasks. to address this issue, we design BlendFace, a novel identity encoder for face-swapping. the key idea behind BlendFace is training face recognition models on blended images whose attributes are replaced with those of another mitigates inter-personal biases such as hairstyles. BlendFace feeds disentangled identity features into generators and guides generators properly as an identity loss function. extensive experiments demonstrate that BlendFace improves the identity-attribute disentanglement in face-swapping models, maintaining a comparable quantitative performance to previous methods.”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Exploring Effective Priors and Efficient Models for Weakly-Supervised Change Detection

  • paper_url: http://arxiv.org/abs/2307.10853
  • repo_url: https://github.com/zhenghuizhao/transwcd
  • paper_authors: Zhenghui Zhao, Lixiang Ru, Chen Wu
  • for: 这篇论文目标是提高无监督变化检测(WSCD)的精度,特别是解决变化检测中的变化缺失和 fabricating 问题。
  • methods: 该论文提出了两个组件:一个叫做扩展幂(DP)解码器,另一个叫做标签阻止(LG)约束。DP解码器根据变化的图像级标签,选择ively decode samples,并将所有 pixels 标记为无变化。LG约束基于变化表示和图像级标签之间的相对关系,对模型进行惩罚,以避免模型错误地预测变化状况。
  • results: 该论文使用 TransWCD 模型,并将 DP 解码器和 LG 约束纳入 TransWCD-DL 模型。这些方法在 WHU-CD 数据集上实现了 +6.33% 和 +9.55% F1 分数的提升,超过了一些全监督变化检测(FSCD)竞争对手。
    Abstract Weakly-supervised change detection (WSCD) aims to detect pixel-level changes with only image-level annotations. Owing to its label efficiency, WSCD is drawing increasing attention recently. However, current WSCD methods often encounter the challenge of change missing and fabricating, i.e., the inconsistency between image-level annotations and pixel-level predictions. Specifically, change missing refer to the situation that the WSCD model fails to predict any changed pixels, even though the image-level label indicates changed, and vice versa for change fabricating. To address this challenge, in this work, we leverage global-scale and local-scale priors in WSCD and propose two components: a Dilated Prior (DP) decoder and a Label Gated (LG) constraint. The DP decoder decodes samples with the changed image-level label, skips samples with the unchanged label, and replaces them with an all-unchanged pixel-level label. The LG constraint is derived from the correspondence between changed representations and image-level labels, penalizing the model when it mispredicts the change status. Additionally, we develop TransWCD, a simple yet powerful transformer-based model, showcasing the potential of weakly-supervised learning in change detection. By integrating the DP decoder and LG constraint into TransWCD, we form TransWCD-DL. Our proposed TransWCD and TransWCD-DL achieve significant +6.33% and +9.55% F1 score improvements over the state-of-the-art methods on the WHU-CD dataset, respectively. Some performance metrics even exceed several fully-supervised change detection (FSCD) competitors. Code will be available at https://github.com/zhenghuizhao/TransWCD.
    摘要 weakly-supervised change detection (WSCD) targets to detect pixel-level changes with only image-level annotations. Due to its label efficiency, WSCD has gained increasing attention recently. However, current WSCD methods often encounter the challenge of change missing and fabricating, i.e., the inconsistency between image-level annotations and pixel-level predictions. Specifically, change missing refers to the situation where the WSCD model fails to predict any changed pixels, even though the image-level label indicates changed, and vice versa for change fabricating. To address this challenge, in this work, we leverage global-scale and local-scale priors in WSCD and propose two components: a Dilated Prior (DP) decoder and a Label Gated (LG) constraint. The DP decoder decodes samples with the changed image-level label, skips samples with the unchanged label, and replaces them with an all-unchanged pixel-level label. The LG constraint is derived from the correspondence between changed representations and image-level labels, penalizing the model when it mispredicts the change status. Additionally, we develop TransWCD, a simple yet powerful transformer-based model, showcasing the potential of weakly-supervised learning in change detection. By integrating the DP decoder and LG constraint into TransWCD, we form TransWCD-DL. Our proposed TransWCD and TransWCD-DL achieve significant +6.33% and +9.55% F1 score improvements over the state-of-the-art methods on the WHU-CD dataset, respectively. Some performance metrics even exceed several fully-supervised change detection (FSCD) competitors. Code will be available at .

Self-paced Weight Consolidation for Continual Learning

  • paper_url: http://arxiv.org/abs/2307.10845
  • repo_url: https://github.com/congwei45/spWC
  • paper_authors: Wei Cong, Yang Cong, Gan Sun, Yuyang Liu, Jiahua Dong
  • for: 这篇论文旨在提出一个robust continual learning框架,以提高继续学习过程中的性能和效率。
  • methods: 本文使用了一个自适应的Weight Consolidation(spWC)框架,通过评估先前任务的决策贡献来减少遗传学会忘记。
  • results: 实验结果显示,提出的spWC框架可以对比其他具有优化性的继续学习算法,实现更好的性能和效率。
    Abstract Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms.
    摘要 continuous learning算法,即保持新任务参数与前一个任务参数相似,在顺序任务学习Setting中广泛应用。然而,1)新的 Contemporary learner表现将受到前一个任务的贡献分化的影响; 2)随着任务数量的增加,大多数现有算法需要对所有前一个任务进行REGULARIZATION,从而提高计算成本。为了解决以上挑战,我们提出了自适应权重卷积(spWC)框架,以实现 Robust continual learning via 评估前一个任务的推导贡献。具体来说,我们开发了一种自适应REGULARIZATION,通过测量难度基于关键性能指标(即准确率)来衡量过去任务的优先级。在接触新任务时,所有前一个任务会被排序为"difficult"到"easy"的顺序,根据优先级。然后,新的 Contemporary learner将通过保留更难的过去任务的知识来胜任 catastrophic forgetting,而且可以降低计算成本。我们采用了一种alternative convex search来逐步更新模型参数和优先级权重在bi-convex表示形式中。我们的spWC框架是可插入的,可以应用到大多数 Contemporary learning算法(例如EWC、MAS和RCIL)以及不同的方向(例如分类和分割)。实验结果表明,我们的提出的框架可以在多个公共 benchmark dataset上提高性能,与其他流行的 Contemporary learning算法相比。

Global Precipitation Nowcasting of Integrated Multi-satellitE Retrievals for GPM: A U-Net Convolutional LSTM Architecture

  • paper_url: http://arxiv.org/abs/2307.10843
  • repo_url: https://github.com/reyhaneh-92/genesis_nowcast
  • paper_authors: Reyhaneh Rahimi, Ardeshir Ebtehaj, Ali Behrangi, Jackson Tan
  • for: 这个论文是用于研究一种深度学习架构,用于全球覆盖降水预报,每30分钟预测降水,预测时间为4小时。
  • methods: 该架构结合了U-Net和卷积长短期记忆神经网络,并使用IMERG和一些关键降水驱动因素来训练。研究不同训练损失函数的影响于降水预报质量,包括平均方差损失和关注损失。
  • results: 结果表明,回归网络在降水较轻(≤1.6毫米/小时)时表现良好,而分类网络在降水极端(>8毫米/小时)预报中表现更优,以critical success index(CSI)为标准。使用 Wasserstein 距离表明,预测降水由分类网络得到的概率分布更加接近IMERG。研究发现,包含物理变量可以提高降水预报,特别是在较长的预测时间内。对于IMERG作为参照,使用多比例技术分析表明,预报机器在10公里分辨率上保持技能(FSS > 0.5),而在50公里分辨率上,仅当降水率大于4毫米/小时时,分类网络保持技能。
    Abstract This paper presents a deep learning architecture for nowcasting of precipitation almost globally every 30 min with a 4-hour lead time. The architecture fuses a U-Net and a convolutional long short-term memory (LSTM) neural network and is trained using data from the Integrated MultisatellitE Retrievals for GPM (IMERG) and a few key precipitation drivers from the Global Forecast System (GFS). The impacts of different training loss functions, including the mean-squared error (regression) and the focal-loss (classification), on the quality of precipitation nowcasts are studied. The results indicate that the regression network performs well in capturing light precipitation (below 1.6 mm/hr), but the classification network can outperform the regression network for nowcasting of precipitation extremes (>8 mm/hr), in terms of the critical success index (CSI).. Using the Wasserstein distance, it is shown that the predicted precipitation by the classification network has a closer class probability distribution to the IMERG than the regression network. It is uncovered that the inclusion of the physical variables can improve precipitation nowcasting, especially at longer lead times in both networks. Taking IMERG as a relative reference, a multi-scale analysis in terms of fractions skill score (FSS), shows that the nowcasting machine remains skillful (FSS > 0.5) at the resolution of 10 km compared to 50 km for GFS. For precipitation rates greater than 4~mm/hr, only the classification network remains FSS-skillful on scales greater than 50 km within a 2-hour lead time.
    摘要 The results show that the regression network performs well in capturing light precipitation (below 1.6 mm/hr), but the classification network outperforms the regression network in predicting precipitation extremes (>8 mm/hr) in terms of the critical success index (CSI). Additionally, the classification network is found to have a closer class probability distribution to the IMERG than the regression network, as measured by the Wasserstein distance.The study also shows that including physical variables in the nowcasting model can improve its performance, especially at longer lead times. A multi-scale analysis using fractions skill score (FSS) reveals that the nowcasting machine remains skillful (FSS > 0.5) at the resolution of 10 km compared to 50 km for GFS. For precipitation rates greater than 4 mm/hr, only the classification network remains FSS-skillful on scales greater than 50 km within a 2-hour lead time.

Label Calibration for Semantic Segmentation Under Domain Shift

  • paper_url: http://arxiv.org/abs/2307.10842
  • repo_url: None
  • paper_authors: Ondrej Bohdal, Da Li, Timothy Hospedales
  • for: 这篇论文目的是为了提高 semantic segmentation 模型在新领域数据上的性能。
  • methods: 本研究使用了将预训好的模型转换到目标领域数据上,并计算软 Label prototype,以便根据最相似的 prototype 来做预测。这个适应程序快速、 computationally efficient,可以帮助提高模型的性能。
  • results: 研究结果显示,这个适应程序可以对 synthetic-to-real semantic segmentation 问题提供 considerable performance improvements。
    Abstract Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.
    摘要 “一个预训练的具有 semantics 的标注排序模型在新领域的数据上可能会受到重大的性能下降。我们展示了一个预训练模型可以通过计算域转换下的软标签原型,并根据预测的分类概率来做预测。我们的适应程序快速、Computational资源几乎没有成本,并带来了显著的性能改善。我们在实际上实现了这种标签整合的实用性。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists

  • paper_url: http://arxiv.org/abs/2307.10824
  • repo_url: None
  • paper_authors: Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, Jingren Zhou, Le Lu, Ling Zhang
  • for: 这个研究旨在提高肺癌早期检测的精度,以提高生存成本。
  • methods: 本研究提出了一个基于专业医生的诊断过程模拟方法,包括内容解析和原型呈现模组。
  • results: 实验结果显示,本方法在低剂量和无 контра斯creening情况下实现了进步的诊断性能。
    Abstract Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as location, shape, and peripheral vessels, and experienced radiologists can search for clues from previous cases as a reference to enrich the basis of decision-making. In this paper, we propose a radiologist-inspired method to simulate the diagnostic process of radiologists, which is composed of context parsing and prototype recalling modules. The context parsing module first segments the context structure of nodules and then aggregates contextual information for a more comprehensive understanding of the nodule. The prototype recalling module utilizes prototype-based learning to condense previously learned cases as prototypes for comparative analysis, which is updated online in a momentum way during training. Building on the two modules, our method leverages both the intrinsic characteristics of the nodules and the external knowledge accumulated from other nodules to achieve a sound diagnosis. To meet the needs of both low-dose and noncontrast screening, we collect a large-scale dataset of 12,852 and 4,029 nodules from low-dose and noncontrast CTs respectively, each with pathology- or follow-up-confirmed labels. Experiments on several datasets demonstrate that our method achieves advanced screening performance on both low-dose and noncontrast scenarios.
    摘要 肺癌是全球最主要的死亡原因之一, early screening 是提高存活率的关键。在临床实践中, nodule 的上下文结构和 radiologist 的总结经验是识别benign和malignant nodule的两个核心元素。上下文信息提供了 nodule 的位置、形状和周围血管等信息,经验丰富的 radiologist 可以从前次案例中搜寻相似之处作为参考,以扩展基于决策的基础。在这篇论文中,我们提出了一种基于 radiologist 的方法,它包括上下文解析和原型回忆模块。上下文解析模块首先将 nodule 的上下文结构分解,然后将上下文信息聚合以更全面地理解 nodule。原型回忆模块利用原型学习来压缩先前学习的案例,并在线更新。基于这两个模块,我们的方法可以充分利用 nodule 的内在特征和从其他 nodule 积累的外部知识,以实现准确的诊断。为了满足低剂量和不contrast扫描的需求,我们收集了12,852和4,029个 nodule 的低剂量和不contrast CT 数据,每个数据都有 pathology-或 follow-up-确认的标签。实验表明,我们的方法在低剂量和不contrast扫描场景中具有高级别的检测性能。

Gradient-Semantic Compensation for Incremental Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2307.10822
  • repo_url: None
  • paper_authors: Wei Cong, Yang Cong, Jiahua Dong, Gan Sun, Henghui Ding
    for: 这个研究的目的是提出一个能够不断学习新来的类别的增量 semantic segmentation 方法,并且能够解决遗传性遗传和背景迁移的问题。methods: 我们提出了一个 Gradient-Semantic Compensation (GSC) 模型,它从 Gradient 和 Semantic 两个角度来解决增量 semantic segmentation 的问题。具体来说,我们为了解决遗传性遗传的问题,开发了一个步骤意识的 Gradient 补偿,可以通过重新分配 Gradient 的传播来将遗传 pace 平衡 Previously seen 的类别。同时,我们提出了一个 Soft-Sharp 的 Semantic Relation Distillation,可以透过软 Label 来激发内部类别之间的 Semantic 关系,从 Semantic 角度来解决遗传性遗传的问题。results: 我们在 Pascal VOC 2012、ADE20K 和 Cityscapes 三个公共数据集上进行了广泛的实验,结果显示了我们提出的 GSC 模型的效果。
    Abstract Incremental semantic segmentation aims to continually learn the segmentation of new coming classes without accessing the training data of previously learned classes. However, most current methods fail to address catastrophic forgetting and background shift since they 1) treat all previous classes equally without considering different forgetting paces caused by imbalanced gradient back-propagation; 2) lack strong semantic guidance between classes. To tackle the above challenges, in this paper, we propose a Gradient-Semantic Compensation (GSC) model, which surmounts incremental semantic segmentation from both gradient and semantic perspectives. Specifically, to address catastrophic forgetting from the gradient aspect, we develop a step-aware gradient compensation that can balance forgetting paces of previously seen classes via re-weighting gradient backpropagation. Meanwhile, we propose a soft-sharp semantic relation distillation to distill consistent inter-class semantic relations via soft labels for alleviating catastrophic forgetting from the semantic aspect. In addition, we develop a prototypical pseudo re-labeling that provides strong semantic guidance to mitigate background shift. It produces high-quality pseudo labels for old classes in the background by measuring distances between pixels and class-wise prototypes. Extensive experiments on three public datasets, i.e., Pascal VOC 2012, ADE20K, and Cityscapes, demonstrate the effectiveness of our proposed GSC model.
    摘要 增量semantic segmentation的目标是不断学习新到来的类型,而不需要访问之前学习过的数据。然而,现有的方法通常无法解决崩溃性忘记和背景变化的问题,这是因为它们:1)对之前学习过的类型很平等,不考虑不同的忘记速度,由于权重反propagation而导致的忘记速度不平等; 2)缺乏强的semantic指导。为了解决这些挑战,在这篇论文中,我们提出了一种Gradient-Semantic Compensation(GSC)模型,可以从gradient和semantic两个角度上解决增量semantic segmentation问题。具体来说,为了解决崩溃性忘记的gradient方面问题,我们开发了一种步骤意识度补偿,可以通过重新分配权重反propagation来填充忘记的步骤。同时,我们提出了一种软锐semantic关系练习,可以通过软标签来练习consistent的inter-class semantic关系,从而解决崩溃性忘记的semantic方面问题。此外,我们开发了一种prototypical pseudo re-labeling,可以提供强的semantic指导,来mitigate background shift。它可以生成高质量的pseudo标签,以便更好地减少背景shift。extensive的实验表明,我们提出的GSC模型在Pascal VOC 2012、ADE20K和Cityscapes三个公共数据集上具有极高的效果。

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

  • paper_url: http://arxiv.org/abs/2307.10816
  • repo_url: https://github.com/showlab/boxdiff
  • paper_authors: Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng, Mike Zheng Shou
  • for: 这篇论文主要旨在提出一种无需训练的方法,用于控制 diffusion models 中的物体和上下文,以满足给定的空间条件。
  • methods: 该方法基于 diffusion models 的杂化过程,并在杂化过程中添加三种空间约束,即 Inner-Box、Outer-Box 和 Corner Constraints,以控制物体的位置和大小。
  • results: 实验结果表明,该方法可以准确地控制 diffusion models 中的物体和上下文,同时保持高效的杂化能力和多元概念覆盖率。
    Abstract Recent text-to-image diffusion models have demonstrated an astonishing capacity to generate high-quality images. However, researchers mainly studied the way of synthesizing images with only text prompts. While some works have explored using other modalities as conditions, considerable paired data, e.g., box/mask-image pairs, and fine-tuning time are required for nurturing models. As such paired data is time-consuming and labor-intensive to acquire and restricted to a closed set, this potentially becomes the bottleneck for applications in an open world. This paper focuses on the simplest form of user-provided conditions, e.g., box or scribble. To mitigate the aforementioned problem, we propose a training-free method to control objects and contexts in the synthesized images adhering to the given spatial conditions. Specifically, three spatial constraints, i.e., Inner-Box, Outer-Box, and Corner Constraints, are designed and seamlessly integrated into the denoising step of diffusion models, requiring no additional training and massive annotated layout data. Extensive experimental results demonstrate that the proposed constraints can control what and where to present in the images while retaining the ability of Diffusion models to synthesize with high fidelity and diverse concept coverage. The code is publicly available at https://github.com/showlab/BoxDiff.
    摘要 近期文本到图像扩散模型已经表现出了惊人的图像质量生成能力。然而,研究人员主要研究了使用文本提示来生成图像。而一些工作已经探索了使用其他modalities作为条件,但需要大量的配售数据,例如箱/面积图像对,以及微调时间。由于这些配售数据具有时间和劳动力的限制,这可能会成为应用在开放世界中的瓶颈。这篇论文将关注最简单的用户提供的条件,例如箱或涂抹。为了解决上述问题,我们提议一种没有需要训练的方法,以控制生成图像中的对象和上下文,遵循给定的空间条件。特别是,我们定义了三个空间约束,即内部箱、外部箱和角度约束,并将其集成到扩散模型的净化步骤中,无需额外训练和大量注释的布局数据。广泛的实验结果表明,我们的约束可以控制生成图像中的对象和上下文,同时保持扩散模型的高精度和多样性涵盖。代码可以在 GitHub 上获取:https://github.com/showlab/BoxDiff。

A novel integrated method of detection-grasping for specific object based on the box coordinate matching

  • paper_url: http://arxiv.org/abs/2307.11783
  • repo_url: None
  • paper_authors: Zongmin Liu, Jirui Wang, Jie Li, Zufeng Li, Kai Ren, Peng Shi
  • for: 提高服务机器人对特定物体的抓取能力,以更好地照顾老年和残疾人群。
  • methods: 提出了一种基于盒坐标匹配的检测-抓取融合方法(DG-BCM),包括对SOLOv2实例分 segmentation模型进行额外改进,并将ASPP和CAM添加到GR-CNN模型中,以优化抓取估算。
  • results: 对象检测和抓取估算分别进行了验证,并在模拟平台上进行了具体的抓取任务,证明了该方法的可行性和有效性。
    Abstract To better care for the elderly and disabled, it is essential for service robots to have an effective fusion method of object detection and grasp estimation. However, limited research has been observed on the combination of object detection and grasp estimation. To overcome this technical difficulty, a novel integrated method of detection-grasping for specific object based on the box coordinate matching is proposed in this paper. Firstly, the SOLOv2 instance segmentation model is improved by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation. Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to verify the superiority of improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, demonstrating the feasibility and effectiveness of DG-BCM algorithm proposed in this paper.
    摘要 Firstly, the SOLOv2 instance segmentation model is improved by adding a channel attention module (CAM) and a spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation.Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to verify the superiority of the improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, demonstrating the feasibility and effectiveness of the DG-BCM algorithm proposed in this paper.

Perceptual Quality Assessment of Omnidirectional Audio-visual Signals

  • paper_url: http://arxiv.org/abs/2307.10813
  • repo_url: None
  • paper_authors: Xilei Zhu, Huiyu Duan, Yuqin Cao, Yuxin Zhu, Yucheng Zhu, Jing Liu, Li Chen, Xiongkuo Min, Guangtao Zhai
  • for: 本研究旨在评估Omnidirectional视频(ODV)的质量,以提高用户的Quality of Experience(QoE)。
  • methods: 本研究使用了三种基eline方法, combinig existing state-of-the-art single-mode audio和视频Quality Assessment(QA)模型via multimodal fusion strategies,以实现全参考omnidirectional Audio-Visual Quality Assessment(OAVQA)。
  • results: 研究在大规模的Audio-Visual quality assessment dataset上验证了A/V multimodal fusion方法的效iveness,提供了一个新的Benchmark for omnidirectional QoE评估。
    Abstract Omnidirectional videos (ODVs) play an increasingly important role in the application fields of medical, education, advertising, tourism, etc. Assessing the quality of ODVs is significant for service-providers to improve the user's Quality of Experience (QoE). However, most existing quality assessment studies for ODVs only focus on the visual distortions of videos, while ignoring that the overall QoE also depends on the accompanying audio signals. In this paper, we first establish a large-scale audio-visual quality assessment dataset for omnidirectional videos, which includes 375 distorted omnidirectional audio-visual (A/V) sequences generated from 15 high-quality pristine omnidirectional A/V contents, and the corresponding perceptual audio-visual quality scores. Then, we design three baseline methods for full-reference omnidirectional audio-visual quality assessment (OAVQA), which combine existing state-of-the-art single-mode audio and video QA models via multimodal fusion strategies. We validate the effectiveness of the A/V multimodal fusion method for OAVQA on our dataset, which provides a new benchmark for omnidirectional QoE evaluation. Our dataset is available at https://github.com/iamazxl/OAVQA.
    摘要 “全方位视频”(ODV)在医疗、教育、广告、旅游等领域的应用越来越重要。评估ODV的质量非常重要,以提高用户的“用户体验质量”(QoE)。然而,现有的大多数质量评估研究只关注视频中的视觉扭曲,而忽略了音频信号对总体质量的影响。在本文中,我们首先建立了大规模的音频视频质量评估数据集,包括375个扭曲的全方位音频视频(A/V)序列,这些序列来自于15个高质量的原始全方位A/V内容。然后,我们设计了三种基线方法 для全方位音频视频质量评估(OAVQA),这些方法将现有单模式音频和视频质量评估模型通过多模式融合策略相结合。我们验证了这种A/V多模式融合方法的有效性,并提供了一个新的全方位QoE评估标准。我们的数据集可以在GitHub上获取:https://github.com/iamazxl/OAVQA。

HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces

  • paper_url: http://arxiv.org/abs/2307.10797
  • repo_url: https://github.com/stelabou/hyperreenact
  • paper_authors: Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos
  • for: 本研究旨在提出一种基于 StyleGAN2 生成器的神经网络方法,用于实现真实的 talking head 图像生成,驱动目标表情pose。
  • methods: 我们提出了一种使用 StyleGAN2 生成器的强化版本,通过首先将真实图像转换为其latent space,然后使用 hypernetwork 进行:(i)源 identity 特征纠正和(ii)面部pose 重新目标,从而消除了依赖于外部编辑方法的需要。
  • results: 我们的方法可以在一shot setting 下运行,并且可以进行cross-subject reenactment,无需任何主体特定的 fine-tuning。我们在 VoxCeleb1 和 VoxCeleb2 标准准样上进行了评估,并与其他状态对照方法进行了比较,显示了我们的方法在生成 artifact-free 图像方面的超越性,并且在极端头pose 变化下展现出了remarkable 的Robustness。
    Abstract In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes. We make the code and the pretrained models publicly available at: https://github.com/StelaBou/HyperReenact .
    摘要 在这篇论文中,我们提出了一种名为HyperReenact的神经网络 talking head 生成方法,旨在生成源人脸的真实对话图像,驱动目标脸部姿势。现有的状态对抗技术学习生成实际的脸部图像, yet 生成reenacted faces 容易出现显著的视觉缺陷,特别是在极端的头部姿势变化情况下或需要贵重的一些shot fine-tuning 来更好地保留源人脸特征。我们提议使用 pretrained StyleGAN2 生成器的 photorealistic 生成能力和分离性特性,首先将实际图像转换为其latent space,然后使用 hypernetwork 进行:(i)源人脸特征的精度调整和(ii)脸部姿势重定向,从而消除了对外部编辑方法的依赖,从而消除了artefacts。我们的方法在一shot Setting(即使用单个源帧)下运行,并允许跨主体reenactment,无需任何主体特定的调整。我们在VoxCeleb1和VoxCeleb2标准 bencmarks 上对我们的方法进行了量化和 каче度上的比较,demonstrating 我们的方法在生成无artefacts的图像,在极端头部姿势变化情况下表现出了很好的Robustness。我们在https://github.com/StelaBou/HyperReenact 上公开了代码和预训练模型。

Behavioral Analysis of Vision-and-Language Navigation Agents

  • paper_url: http://arxiv.org/abs/2307.10790
  • repo_url: https://github.com/yoark/vln-behave
  • paper_authors: Zijiao Yang, Arjun Majumdar, Stefan Lee
  • for: 这项研究的目的是研究语言导航(VLN)代理人是如何基于它的环境来固定指令。
  • methods: 这种方法是通过生成特定技能的干预来研究代理人的行为,并测量代理人的预测变化。
  • results: 研究发现,训练时的偏见会对代理人的行为产生持续的影响,而现有模型可以固定简单的引用表达。此外,我们 comparing多种模型,发现技能特定的score与整体VLN任务性能之间存在正相关关系。
    Abstract To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings. In this work, we develop a methodology to study agent behavior on a skill-specific basis -- examining how well existing agents ground instructions about stopping, turning, and moving towards specified objects or rooms. Our approach is based on generating skill-specific interventions and measuring changes in agent predictions. We present a detailed case study analyzing the behavior of a recent agent and then compare multiple agents in terms of skill-specific competency scores. This analysis suggests that biases from training have lasting effects on agent behavior and that existing models are able to ground simple referring expressions. Our comparisons between models show that skill-specific scores correlate with improvements in overall VLN task performance.
    摘要 Translated into Simplified Chinese:为实现成功,视觉语言导航(VLN)机器人必须能够将指令与它所处环境相连。在这项工作中,我们开发了一种方法来研究机器人行为的技能特性基础,包括评估现有机器人如何根据指令在具体的对象或房间中进行行为。我们的方法包括生成技能特定的干预和评估机器人预测的变化。我们在一个详细的案例研究中分析了一个最新的机器人的行为,然后比较多个机器人的技能特定能力分数。我们的发现表明,训练时的偏见会对机器人行为产生持续的影响,并且现有的模型能够基于简单的引用表达ground。我们在不同模型之间进行比较时发现,技能特定分数与整体VLN任务性能的提高相对耦合。

Feed-Forward Source-Free Domain Adaptation via Class Prototypes

  • paper_url: http://arxiv.org/abs/2307.10787
  • repo_url: None
  • paper_authors: Ondrej Bohdal, Da Li, Timothy Hospedales
  • for: 这篇论文是为了提出一个简单的输入验证方法,挑战传统的反射径方法。
  • methods: 本研究使用一个预训练的模型,计算classes的原型值在预设的类别转换下,以提高精度。
  • results: 相比于传统的反射径方法,这个方法可以实现更高的精度,并且只需要一小部分的时间。
    Abstract Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.
    摘要

SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with 4D Imaging Radar

  • paper_url: http://arxiv.org/abs/2307.10784
  • repo_url: None
  • paper_authors: Jianan Liu, Qiuchi Zhao, Weiyi Xiong, Tao Huang, Qing-Long Han, Bing Zhu
  • for: This paper is written for the purpose of addressing the challenges of 3D object detection using 4D millimeter wave (mmWave) radar technology, specifically the issues of sparsity and noise in radar point cloud data.
  • methods: The paper proposes a novel approach called spatial multi-representation fusion (SMURF) that leverages multiple representations of radar detection points to improve the accuracy of 3D object detection. SMURF uses pillarization and density features of a multi-dimensional Gaussian mixture distribution through kernel density estimation (KDE) to mitigate measurement inaccuracy and alleviate point cloud sparsity.
  • results: The paper demonstrates the effectiveness and generalization ability of SMURF through experimental evaluations on two datasets, View-of-Delft (VoD) and TJ4DRadSet. SMURF outperforms recently proposed 4D imaging radar-based single-representation models and achieves comparable performance to the state-of-the-art 4D imaging radar and camera fusion-based method, with an increase of 1.22% in the mean average precision on bird’s-eye view of TJ4DRadSet dataset and 1.32% in the 3D mean average precision on the entire annotated area of VoD dataset. Additionally, SMURF has impressive inference time, with the inference time no more than 0.05 seconds for most scans on both datasets.
    Abstract The 4D Millimeter wave (mmWave) radar is a promising technology for vehicle sensing due to its cost-effectiveness and operability in adverse weather conditions. However, the adoption of this technology has been hindered by sparsity and noise issues in radar point cloud data. This paper introduces spatial multi-representation fusion (SMURF), a novel approach to 3D object detection using a single 4D imaging radar. SMURF leverages multiple representations of radar detection points, including pillarization and density features of a multi-dimensional Gaussian mixture distribution through kernel density estimation (KDE). KDE effectively mitigates measurement inaccuracy caused by limited angular resolution and multi-path propagation of radar signals. Additionally, KDE helps alleviate point cloud sparsity by capturing density features. Experimental evaluations on View-of-Delft (VoD) and TJ4DRadSet datasets demonstrate the effectiveness and generalization ability of SMURF, outperforming recently proposed 4D imaging radar-based single-representation models. Moreover, while using 4D imaging radar only, SMURF still achieves comparable performance to the state-of-the-art 4D imaging radar and camera fusion-based method, with an increase of 1.22% in the mean average precision on bird's-eye view of TJ4DRadSet dataset and 1.32% in the 3D mean average precision on the entire annotated area of VoD dataset. Our proposed method demonstrates impressive inference time and addresses the challenges of real-time detection, with the inference time no more than 0.05 seconds for most scans on both datasets. This research highlights the benefits of 4D mmWave radar and is a strong benchmark for subsequent works regarding 3D object detection with 4D imaging radar.
    摘要 四维毫波雷达(4D mmWave radar)是一种有前途的技术,因其成本优化和气候不良情况下的可操作性。然而,这种技术的采用受到了雷达点云数据中稀疏和噪声问题的妨碍。本文介绍了空间多表示融合(SMURF),一种基于单个4D探测雷达的3D对象检测方法。SMURF利用了雷达检测点的多个表示,包括柱状化和密度特征,通过核密度估计(KDE)来减少雷达信号的测量不准确性,以及减少点云稀疏性。此外,KDE还有助于减少点云稀疏性。实验评估结果表明,SMURF在View-of-Delft(VoD)和TJ4DRadSet数据集上表现出色,超过了最近提出的4D探测雷达基于单表示模型。此外,只使用4D探测雷达,SMURF仍能与状态空间的4D探测雷达和摄像头融合基于方法相当,提高了TJ4DRadSet数据集的鸟瞰视图中的均值精度提升1.22%和整个注释区域中的3D均值精度提升1.32%。我们的提议方法具有快速的推理时间,在大多数扫描中的推理时间不超过0.05秒。这些研究表明4D毫波雷达的优势,并提供了后续相关4D探测雷达3D对象检测方法的参考。

See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data

  • paper_url: http://arxiv.org/abs/2307.10782
  • repo_url: https://github.com/4DVLab/See_More_Know_More
  • paper_authors: Yuhang Lu, Qi Jiang, Runnan Chen, Yuenan Hou, Xinge Zhu, Yuexin Ma
  • for: 本研究的目的是提出一种多模式零shot学习方法,使得深度模型能够识别未在训练阶段看到的物体。
  • methods: 本方法基于已经训练过的类别的标签,将视觉特征与 semantic特征进行对齐,使得模型可以更好地利用附近类别的信息来预测未经训练的类别。
  • results: 经过广泛的实验,我们的方法在SemanticKITTI和nuScenes两个标准benchmark上的平均提高了52%和49%,对于未经训练的类别的mIoU。
    Abstract Zero-shot point cloud segmentation aims to make deep models capable of recognizing novel objects in point cloud that are unseen in the training phase. Recent trends favor the pipeline which transfers knowledge from seen classes with labels to unseen classes without labels. They typically align visual features with semantic features obtained from word embedding by the supervision of seen classes' annotations. However, point cloud contains limited information to fully match with semantic features. In fact, the rich appearance information of images is a natural complement to the textureless point cloud, which is not well explored in previous literature. Motivated by this, we propose a novel multi-modal zero-shot learning method to better utilize the complementary information of point clouds and images for more accurate visual-semantic alignment. Extensive experiments are performed in two popular benchmarks, i.e., SemanticKITTI and nuScenes, and our method outperforms current SOTA methods with 52% and 49% improvement on average for unseen class mIoU, respectively.
    摘要 zero-shot 点云 segmentation 目标是让深度模型能够在训练阶段未看过的 novel 对象上进行识别。 current trends 倾向于使用 seen class 的标签来传递知识到 unseen class 上。它们通常将视觉特征与 semantic feature 进行对齐,其中 semantic feature 通常来自于 word embedding 的监督。然而,点云具有有限的信息,不能完全与 semantic feature 匹配。实际上,图像中的丰富的外观信息是点云的自然补充,而 previous literature 中未得到充分利用。 motivated by this, we propose 一种 novel multi-modal zero-shot learning 方法,可以更好地利用点云和图像之间的协同信息,以实现更高的视觉semantic alignment。我们在 SemanticKITTI 和 nuScenes 两个 популяр的 benchmark 上进行了广泛的实验,并比前一代 SOTA 方法提高了52%和49%的均值提升,对 unseen class mIoU 的识别率。

Learned Thresholds Token Merging and Pruning for Vision Transformers

  • paper_url: http://arxiv.org/abs/2307.10780
  • repo_url: https://github.com/mxbonn/ltmp
  • paper_authors: Maxim Bonnaerens, Joni Dambre
  • for: 这篇论文旨在提出一种名为learned Thresholds Token Merging and Pruning(LTMP)的新方法,以提高vision transformer在图像识别任务中的计算效率。
  • methods: 本paper使用learned threshold masking module, dynamically determine哪些token可以合并哪些可以剔除,以提高计算效率。
  • results: 我们通过广泛的实验 demonstrates that LTMP在reduction rate中实现了state-of-the-art的准确率,只需要一次精度调整,比前一代方法快得多一个次循环。
    Abstract Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the complexity of transformer models is quadratic with respect to the number of input tokens. Therefore techniques that reduce the number of input tokens that need to be processed have been proposed. This paper introduces Learned Thresholds token Merging and Pruning (LTMP), a novel approach that leverages the strengths of both token merging and token pruning. LTMP uses learned threshold masking modules that dynamically determine which tokens to merge and which to prune. We demonstrate our approach with extensive experiments on vision transformers on the ImageNet classification task. Our results demonstrate that LTMP achieves state-of-the-art accuracy across reduction rates while requiring only a single fine-tuning epoch, which is an order of magnitude faster than previous methods. Code is available at https://github.com/Mxbonn/ltmp .
    摘要 “视觉转换器在过去几年内表现出了惊人的成功,但它们的计算成本仍然是实际应用的重要障碍。特别是转换器模型的复杂度与输入符号数量直接相关。因此,减少输入符号数量的技术得到了关注。这篇论文介绍了学习梯度掩码层(LTMP),一种新的方法,它利用了token合并和token剪辑的优势。LTMP使用学习的阈值掩码模块,动态确定需要合并和剪辑哪些符号。我们通过对视觉转换器进行了详细的实验,在ImageNet分类任务上达到了最佳性能水平,同时只需要一个精度调整 epoch,这比之前的方法快了一个数量级。代码可以在https://github.com/Mxbonn/ltmp 上获取。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Urban Radiance Field Representation with Deformable Neural Mesh Primitives

  • paper_url: http://arxiv.org/abs/2307.10776
  • repo_url: None
  • paper_authors: Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang
  • for: 高品质的都市级场景表示和Synthesis
  • methods: 使用Deformable Neural Mesh Primitive(DNMP) parameterize整个场景,DNMP是一种灵活和减少存储的神经variant mesh representation,通过缩放和截断来控制度量的自由度,并使用视角相关的MLP来解码渲染颜色
  • results: 实现高品质的新视图合成,并且具有低计算成本和快速渲染速度(2.07ms/1k像素),与高优化的Instant-NGP相当(0.61vs0.71ms/1k像素)
    Abstract Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: $(1)$ High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. $(2)$ Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33$\times$ faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels). Project page: \href{https://dnmp.github.io/}{https://dnmp.github.io/}.
    摘要 neural radiance fields (NeRFs) 在过去几年内取得了很大的成功。然而,大多数当前方法仍然需要劳动密集的资源,即因为光柱 marching 的渲染。为了高效地构建城市级别的采集场景,我们设计了弹性神经网络模型~(DNMP),并提议使用这些模型来参数化整个场景。DNMP 是一种灵活和占用空间的神经变体,它拥有优化的效率和神经表示能力,可以实现高质量的图像合成。具体来说,一个 DNMP 由一组连接的弹性网格顶点组成,每个顶点具有对应的颜色特征,用于参数化场景的几何和采集信息。为了限制优化的自由度和储存占用量,我们在低维度的干扰空间中解码了每个 primitives 的形状。渲染颜色通过视角依赖的多层感知网络(MLP)进行解码。DNMP 提供了一种新的城市级场景表示方式,具有以下优点: $(1)$ 高质量渲染。我们的方法在城市场景中的新视图合成中实现了领先的性能。 $(2)$ 低计算成本。我们的表示方式可以快速渲染(2.07ms/1k像素),并且储存占用量很低(110MB/1k像素)。我们还提供了一个轻量级版本,可以在 vanilla NeRFs 的 33 倍快速运行,并且与高度优化的 Instant-NGP 相当(0.61 vs 0.71ms/1k像素)。项目页面:

LBL: Logarithmic Barrier Loss Function for One-class Classification

  • paper_url: http://arxiv.org/abs/2307.10753
  • repo_url: https://github.com/ml-hdu/lbl_lblsig
  • paper_authors: Tianlei Wang, Dekang Liu, Wandong Zhang, Jiuwen Cao
  • for: 本研究旨在提出一种新的一类分类(OCC)损失函数,用于深度学习中的一类分类问题。
  • methods: 本研究提出了一种基于对数梯度函数的OCC损失函数(LBL),以及一种基于这个损失函数的修改后的OCC损失函数(LBLSig)。
  • results: 实验结果表明,提出的LBL和LBLSig损失函数比之前的OCC损失函数更加稳定和有效,并且在不同的网络结构上都有优秀的表现。
    Abstract One-class classification (OCC) aims to train a classifier only with the target class data and attracts great attention for its strong applicability in real-world application. Despite a lot of advances have been made in OCC, it still lacks the effective OCC loss functions for deep learning. In this paper, a novel logarithmic barrier function based OCC loss (LBL) that assigns large gradients to the margin samples and thus derives more compact hypersphere, is first proposed by approximating the OCC objective smoothly. But the optimization of LBL may be instability especially when samples lie on the boundary leading to the infinity loss. To address this issue, then, a unilateral relaxation Sigmoid function is introduced into LBL and a novel OCC loss named LBLSig is proposed. The LBLSig can be seen as the fusion of the mean square error (MSE) and the cross entropy (CE) and the optimization of LBLSig is smoother owing to the unilateral relaxation Sigmoid function. The effectiveness of the proposed LBL and LBLSig is experimentally demonstrated in comparisons with several state-of-the-art OCC algorithms on different network structures. The source code can be found at https://github.com/ML-HDU/LBL_LBLSig.
    摘要

EdgeAL: An Edge Estimation Based Active Learning Approach for OCT Segmentation

  • paper_url: http://arxiv.org/abs/2307.10745
  • repo_url: https://github.com/mak-ta-reque/edgeal
  • paper_authors: Md Abdul Kadir, Hasan Md Tusfiqur Alam, Daniel Sonntag
  • for: 这个论文是为了提出一种基于边缘信息的活动学习算法,以优化模型在具有有限数据的情况下的训练。
  • methods: 该论文使用了模型对边缘信息的分析,以量化模型对未见数据的不确定性。然后,使用这个量化来选择需要注释的超像素。
  • results: 论文在多类 Optical Coherence Tomography(OCT) segmentation任务中实现了99%的 dice 得分,同时降低了注释标签成本至12%, 2.3%, 和3%,分别在三个公共可用的数据集(Duke、AROI 和 UMN)上。
    Abstract Active learning algorithms have become increasingly popular for training models with limited data. However, selecting data for annotation remains a challenging problem due to the limited information available on unseen data. To address this issue, we propose EdgeAL, which utilizes the edge information of unseen images as {\it a priori} information for measuring uncertainty. The uncertainty is quantified by analyzing the divergence and entropy in model predictions across edges. This measure is then used to select superpixels for annotation. We demonstrate the effectiveness of EdgeAL on multi-class Optical Coherence Tomography (OCT) segmentation tasks, where we achieved a 99% dice score while reducing the annotation label cost to 12%, 2.3%, and 3%, respectively, on three publicly available datasets (Duke, AROI, and UMN). The source code is available at \url{https://github.com/Mak-Ta-Reque/EdgeAL}
    摘要 aktive lerning algoritmen hao ying yong zhong zhi xiao xiang yu jiao yan zhong xiang zheng yan zhong yu yi zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong xiang zheng yan zhong

Comparison between transformers and convolutional models for fine-grained classification of insects

  • paper_url: http://arxiv.org/abs/2307.11112
  • repo_url: None
  • paper_authors: Rita Pucci, Vincent J. Kalkman, Dan Stowell
  • for: 本研究主要针对 insecta 分类问题,尤其是 Odonata 和 Coleoptera 等目的分类。
  • methods: 本研究使用了深度学习算法,主要比较了 transformer 层和 convolutional 层两种结构的性能。
  • results: 研究结果显示,混合模型在准确率和总性能方面表现最佳,而 transformer 层在 Speed 和推理时间方面表现最佳。
    Abstract Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.
    摘要 细致分类是因为找到划分特征而困难。这个问题在种类同一类时更加突出,因为种类经常共享形态特征,使其困难分辨。我们考虑了昆虫纲(Insecta)的分类。识别昆虫对生态系统的监测至关重要,因为它们是生态系统的基础居民之一。公民科学在野外采集昆虫图像,为专家创建了改进的分布图。我们有数十亿张图像需要自动分类,深度学习算法是细致任务中最主要的技术。当前领域的深度学习算法非常成熔,所以如何选择合适的算法?我们将关注蝴蝶目(Odonata)和拟毛目(Coleoptera)两个顺序,并提出了初步比较研究,检验了两种最为知名的计算机视觉层结构:变换层和卷积层。我们将T2TViT、EfficientNet和ViTAE三种模型进行比较,分析每种模型在同一个条件下的性能,包括每种物种、每个形态、性别和推理时间。虽然我们发现所有三种模型都有高性能,但我们的分析表明混合模型在准确性和推理速度两个方面都高于卷积层基础模型,而 transformer 基础模型在样本短缺和推理时间两个方面表现更好。

TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars

  • paper_url: http://arxiv.org/abs/2307.10705
  • repo_url: https://github.com/chequanghuy/TwinLiteNet
  • paper_authors: Quang Huy Che, Dinh Phuc Nguyen, Minh Quan Pham, Duc Khai Lam
  • for: 本研究旨在提出一种轻量级的模型,用于自动驾驶车辆中的驾驶区域和车道线分割。
  • methods: 该模型使用了TwinLiteNet,一种低计算成本的模型,实现了高精度和高效性的分割结果。
  • results: 实验结果表明,TwinLiteNet在BDD100K数据集上的mIoU分数为91.3%,与现代模型相当,而且只需0.4万参数,并在GPU RTX A5000上达到415 FPS的速度。此外,TwinLiteNet可以在具有有限计算能力的嵌入式设备上实时执行,特别是在Jetson Xavier NX上达到60 FPS,使其成为自动驾驶车辆中理想的解决方案。
    Abstract Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}.
    摘要 Semantic segmentation是自动驾驶中常见的任务,用于理解周围环境。driveable Area Segmentation和Lane Detection特别重要,以确保安全和高效的 Navigation on the road。然而,原始semantic segmentation模型 computationally expensive和需要高级硬件,这不是自动驾驶汽车中的可行。这篇论文提出了一种轻量级模型,用于driveable Area和Lane Line Segmentation。TwinLiteNet是一种低成本的设计,但它实现了精度和高效的分割结果。我们在BDD100K数据集上评估TwinLiteNet,并与现代模型进行比较。实验结果显示,我们的TwinLiteNet与现有的方法相似,但需要远 fewer computational resources。具体来说,TwinLiteNet在Drivable Area任务上 achievemIoU score为91.3%,在Lane Detection任务上achievIoU score为31.08%,只需0.4 million参数,并在GPU RTX A5000上实现415 FPS。此外,TwinLiteNet可以在具有有限计算能力的嵌入式设备上实时运行,特别是在Jetson Xavier NX上实现60 FPS,使其成为自动驾驶汽车的理想解决方案。代码可以在以下链接中获取:https://github.com/chequanghuy/TwinLiteNet。

Reverse Knowledge Distillation: Training a Large Model using a Small One for Retinal Image Matching on Limited Data

  • paper_url: http://arxiv.org/abs/2307.10698
  • repo_url: https://github.com/SaharAlmahfouzNasser/MeDAL-Retina
  • paper_authors: Sahar Almahfouz Nasser, Nihar Gupte, Amit Sethi
  • for: 该论文旨在提出一种基于反知识储存的方法,用于在有限数据情况下训练大型模型,并避免过拟合。
  • methods: 该方法基于一种修改后的CNN模型,以及一种基于视觉转换器编码器的更重要的模型。这种 reverse knowledge distillation 方法 surprisingly 提高了模型的通用化能力。
  • results: 该论文的实验结果表明,高维适应空间的匹配可以避免过拟合,而不是直接训练模型来匹配最终输出。此外,该论文还提供了一个公共数据集,用于推广适用于 retinal 图像关键点检测和匹配的算法研究。
    Abstract Retinal image matching plays a crucial role in monitoring disease progression and treatment response. However, datasets with matched keypoints between temporally separated pairs of images are not available in abundance to train transformer-based model. We propose a novel approach based on reverse knowledge distillation to train large models with limited data while preventing overfitting. Firstly, we propose architectural modifications to a CNN-based semi-supervised method called SuperRetina that help us improve its results on a publicly available dataset. Then, we train a computationally heavier model based on a vision transformer encoder using the lighter CNN-based model, which is counter-intuitive in the field knowledge-distillation research where training lighter models based on heavier ones is the norm. Surprisingly, such reverse knowledge distillation improves generalization even further. Our experiments suggest that high-dimensional fitting in representation space may prevent overfitting unlike training directly to match the final output. We also provide a public dataset with annotations for retinal image keypoint detection and matching to help the research community develop algorithms for retinal image applications.
    摘要 这文章提出了一个新的方法,用于对于透视图像进行匹配,以监控疾病的进展和治疗的反应。但是,对于时间分隔的图像对称点的数据不在充分的情况下进行训练,使得训练大型模型具有过滤过滤的问题。我们提出了一个 novel approach,基于倒逆知识传授,将大型模型训练为具有有限数据的情况下,并避免过滤。我们首先提出了一个改进的SuperRetina方法,并在公共可用的数据集上进行训练。接着,我们采用了一个更重要的vision transformer Encoder模型,并将其训练为一个较轻量级的CNN模型。这是在知识传授领域中较少见的对话方向,即将较重要的模型训练为较轻量级的模型。我们的实验结果显示,高维度的对应在表示空间中可以避免过滤,不同于直接对最终出力进行训练。我们还提供了一个公共数据集,并将其标注为透视图像关键点检测和匹配,以帮助研究人员发展透视图像应用程序。

SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning

  • paper_url: http://arxiv.org/abs/2307.10697
  • repo_url: None
  • paper_authors: Fernando Alonso-Fernandez, Kevin Hernandez-Diaz, Jose Maria Buades Rubio, Josef Bigun
  • for: 这个研究旨在提供一个轻量级的人脸识别网络,以满足现代移动设备上的各种数位服务需求。
  • methods: 本研究使用了一种称为Taylor scores的网络删除方法,将不重要的节点(filter)移除,以降低网络大小。
  • results: 研究发现,这种网络删除方法可以将原本的1.24M网络缩减至0.68M(即40%的缩减),而且不会对人脸识别性能造成重大影响。
    Abstract The widespread use of mobile devices for various digital services has created a need for reliable and real-time person authentication. In this context, facial recognition technologies have emerged as a dependable method for verifying users due to the prevalence of cameras in mobile devices and their integration into everyday applications. The rapid advancement of deep Convolutional Neural Networks (CNNs) has led to numerous face verification architectures. However, these models are often large and impractical for mobile applications, reaching sizes of hundreds of megabytes with millions of parameters. We address this issue by developing SqueezerFaceNet, a light face recognition network which less than 1M parameters. This is achieved by applying a network pruning method based on Taylor scores, where filters with small importance scores are removed iteratively. Starting from an already small network (of 1.24M) based on SqueezeNet, we show that it can be further reduced (up to 40%) without an appreciable loss in performance. To the best of our knowledge, we are the first to evaluate network pruning methods for the task of face recognition.
    摘要 广泛的移动设备用于多种数字服务的使用,导致了可靠和实时人身认证的需求。在这种情况下,人脸识别技术作为可靠的用户验证方法得到了广泛的应用。由于移动设备内置了摄像头,并且与日常应用程序集成,人脸识别技术成为了可靠的用户验证方法。然而,这些模型往往很大,不适合移动应用程序,其参数数量可以达到百万个,占用内存空间很大。我们解决这个问题,通过开发一个轻量级的人脸识别网络——SqueezerFaceNet,其参数数量低于100万。我们采用基于Taylor分数的网络剪辑方法,通过逐步移除无关紧要的滤波器来实现这一目标。我们开始于一个已经很小的网络(1.24M),基于SqueezeNet,并证明可以进一步减少(达到40%),无需感知性的性能下降。到目前为止,我们是第一个评估网络剪辑方法的人脸识别任务。

SLPD: Slide-level Prototypical Distillation for WSIs

  • paper_url: http://arxiv.org/abs/2307.10696
  • repo_url: https://github.com/carboxy/slpd
  • paper_authors: Zhimiao Yu, Tiancheng Lin, Yi Xu
  • for: 本文targets the problem of improving feature representation ability for whole slide pathological images (WSIs), with a focus on slide-level downstream tasks such as subtyping, grading, and staging.
  • methods: 本文提出了一种新的方法 called Slide-Level Prototypical Distillation (SLPD), which explores intra- и inter-slide semantic structures for context modeling on WSIs. The method iteratively performs intra-slide clustering to yield prototypes and assigns regions to their corresponding prototypes.
  • results: 本文 achieved state-of-the-art results on multiple slide-level benchmarks, demonstrating that representation learning of semantic structures of slides can make a suitable proxy task for WSI analysis.Here’s the format you requested:
  • for: 本文targets the problem of improving feature representation ability for whole slide pathological images (WSIs), with a focus on slide-level downstream tasks such as subtyping, grading, and staging.
  • methods: 本文提出了一种新的方法 called Slide-Level Prototypical Distillation (SLPD), which explores intra- и inter-slide semantic structures for context modeling on WSIs. The method iteratively performs intra-slide clustering to yield prototypes and assigns regions to their corresponding prototypes.
  • results: 本文 achieved state-of-the-art results on multiple slide-level benchmarks, demonstrating that representation learning of semantic structures of slides can make a suitable proxy task for WSI analysis.
    Abstract Improving the feature representation ability is the foundation of many whole slide pathological image (WSIs) tasks. Recent works have achieved great success in pathological-specific self-supervised learning (SSL). However, most of them only focus on learning patch-level representations, thus there is still a gap between pretext and slide-level downstream tasks, e.g., subtyping, grading and staging. Aiming towards slide-level representations, we propose Slide-Level Prototypical Distillation (SLPD) to explore intra- and inter-slide semantic structures for context modeling on WSIs. Specifically, we iteratively perform intra-slide clustering for the regions (4096x4096 patches) within each WSI to yield the prototypes and encourage the region representations to be closer to the assigned prototypes. By representing each slide with its prototypes, we further select similar slides by the set distance of prototypes and assign the regions by cross-slide prototypes for distillation. SLPD achieves state-of-the-art results on multiple slide-level benchmarks and demonstrates that representation learning of semantic structures of slides can make a suitable proxy task for WSI analysis. Code will be available at https://github.com/Carboxy/SLPD.
    摘要 提高图像特征表示能力是许多整张半导体图像(WSIs)任务的基础。最近的工作已经在生物学特有的自然学习(SSL)中取得了很大的成功。然而,大多数其中只是学习小块级别的表示,因此还有一个差距 между预测和板级下游任务,例如分类、评分和阶段评估。针对板级表示,我们提出了板级原型蒸馏(SLPD),以探索板级内部和板级之间的含义结构。具体来说,我们采用了逐步进行板级归一化,将板级内部每个块(4096x4096块)的表示与它所属的板级中的板级原型进行比较,以便使得块表示更加靠近它所属的板级原型。然后,我们通过将每个板级表示为其所属的板级原型,并将相似的板级进行距离设定,将块分配给相似的板级。SLPD实现了多个板级标准 bencmarks 的state-of-the-art 结果,并证明了表示板级含义结构可以作为WSI分析的适当代理任务。代码将在 GitHub 上公开。

Self2Self+: Single-Image Denoising with Self-Supervised Learning and Image Quality Assessment Loss

  • paper_url: http://arxiv.org/abs/2307.10695
  • repo_url: https://github.com/JK-the-Ko/Self2SelfPlus
  • paper_authors: Jaekyun Ko, Sanghwan Lee
  • for: 提高单图像的噪声去除性能
  • methods: 使用单图像自动超级vised学习,Feature Extraction通过闭合 convolution,Training Process guideline by no-reference Image Quality Assessment
  • results: 在 synthetic 和实际数据集上达到了状态的噪声去除性能, highlighting the effectiveness and practicality of the proposed method for various noise removal tasks.
    Abstract Recently, denoising methods based on supervised learning have exhibited promising performance. However, their reliance on external datasets containing noisy-clean image pairs restricts their applicability. To address this limitation, researchers have focused on training denoising networks using solely a set of noisy inputs. To improve the feasibility of denoising procedures, in this study, we proposed a single-image self-supervised learning method in which only the noisy input image is used for network training. Gated convolution was used for feature extraction and no-reference image quality assessment was used for guiding the training process. Moreover, the proposed method sampled instances from the input image dataset using Bernoulli sampling with a certain dropout rate for training. The corresponding result was produced by averaging the generated predictions from various instances of the trained network with dropouts. The experimental results indicated that the proposed method achieved state-of-the-art denoising performance on both synthetic and real-world datasets. This highlights the effectiveness and practicality of our method as a potential solution for various noise removal tasks.
    摘要 近期,基于supervised学习的减噪方法表现出色。然而,它们依赖于外部数据集中的噪音-清晰图像对,这限制了它们的应用性。为解决这个限制,研究人员对减噪网络的训练进行了重点研究,使用了具有闭合特性的卷积神经网络进行特征提取和无参考图像质量评估。此外,我们采用了bernoulli采样法,将输入图像集 samples instances,并通过dropout机制来随机采样不同实例。最终得到的结果是通过averaging多个实例的生成预测值来生成最终的结果。实验结果表明,我们提出的方法可以在synthetic和实际世界数据集上达到领先的减噪性能。这说明了我们的方法在减噪任务中的有效性和实用性。

Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged Object Detection

  • paper_url: http://arxiv.org/abs/2307.10685
  • repo_url: None
  • paper_authors: Yinghui Xing, Dexuan Kong, Shizhou Zhang, Geng Chen, Lingyan Ran, Peng Wang, Yanning Zhang
  • For: This paper is written for detecting camouflaged objects in images, which is a challenging task due to the similar patterns between the objects and the background.* Methods: The paper proposes a novel “pre-train, adapt and detect” paradigm, which uses a large pre-trained model to transfer knowledge from multi-modal data, and a lightweight parallel adapter to adjust the features for the downstream COD task.* Results: The paper shows that the proposed method outperforms existing state-of-the-art COD models by large margins on four challenging benchmark datasets, and also demonstrates the effectiveness of a multi-task learning scheme for improving the generalization ability of the model.
    Abstract Camouflaged object detection (COD), aiming to segment camouflaged objects which exhibit similar patterns with the background, is a challenging task. Most existing works are dedicated to establishing specialized modules to identify camouflaged objects with complete and fine details, while the boundary can not be well located for the lack of object-related semantics. In this paper, we propose a novel ``pre-train, adapt and detect" paradigm to detect camouflaged objects. By introducing a large pre-trained model, abundant knowledge learned from massive multi-modal data can be directly transferred to COD. A lightweight parallel adapter is inserted to adjust the features suitable for the downstream COD task. Extensive experiments on four challenging benchmark datasets demonstrate that our method outperforms existing state-of-the-art COD models by large margins. Moreover, we design a multi-task learning scheme for tuning the adapter to exploit the shareable knowledge across different semantic classes. Comprehensive experimental results showed that the generalization ability of our model can be substantially improved with multi-task adapter initialization on source tasks and multi-task adaptation on target tasks.
    摘要 隐形物体检测(COD),旨在分割隐形物体,其pattern和背景相似,是一项具有挑战性的任务。现有大多数工作都是专门设计用于完整、细节适应特定类型的特殊模块,而忽略物体边界的定位。在这篇论文中,我们提出了一种“预训练、适应并检测”的思路,用于检测隐形物体。我们引入了大量预训练的模型,以便直接将大量多样数据中学习的知识传递到COD任务。在下游任务中,我们插入了一个轻量级并行适应器,以适应COD任务的特点。我们对四个复杂的benchmark数据集进行了广泛的实验,结果显示,我们的方法在现有状态的COD模型之上大幅提高了性能。此外,我们还设计了一种多任务学习方案,用于调整适应器,以利用不同semantic类型之间的共享知识。实验结果表明,我们的模型的通用能力可以通过多任务适应器初始化和多任务调整来substantially提高。

Deep learning for classification of noisy QR codes

  • paper_url: http://arxiv.org/abs/2307.10677
  • repo_url: None
  • paper_authors: Rebecca Leygonie, Sylvain Lobry, ), Laurent Wendling (LIPADE)
  • for: 理解深度学习模型在抽象图像分类中的限制
  • methods: 使用QR码生成器生成QR码图像,并对图像进行深度学习模型训练和对比传统干扰方法
  • results: 研究结果表明,深度学习模型可以对抽象图像进行有效的分类,并且比传统干扰方法更具有抗干扰能力。
    Abstract We wish to define the limits of a classical classification model based on deep learning when applied to abstract images, which do not represent visually identifiable objects.QR codes (Quick Response codes) fall into this category of abstract images: one bit corresponding to one encoded character, QR codes were not designed to be decoded manually. To understand the limitations of a deep learning-based model for abstract image classification, we train an image classification model on QR codes generated from information obtained when reading a health pass. We compare a classification model with a classical (deterministic) decoding method in the presence of noise. This study allows us to conclude that a model based on deep learning can be relevant for the understanding of abstract images.
    摘要 我们想定义深度学习模型在抽象图像分类中的限制,当应用于不可识别目的的图像。QR codes(快速回应码)是这种抽象图像的一个例子:每个位元代表一个编码字符,QR codes不是设计来手动读取。通过对读取健康证书中的信息生成QR codes进行分类模型训练,我们可以评估深度学习模型在抽象图像分类中的限制,并与传统(决定的)解码方法进行比较,以评估模型在噪声存在的情况下的性能。这个研究可以帮助我们了解深度学习模型在抽象图像分类中的可行性。

Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors

  • paper_url: http://arxiv.org/abs/2307.10667
  • repo_url: None
  • paper_authors: Haechang Lee, Dongwon Park, Wongi Jeong, Kijeong Kim, Hyunwoo Je, Dongil Ryu, Se Young Chun
    for: 本研究旨在提出一种能够应用于不同 Bayer 预览图像排布(CFA)模式下的高效灵活排值方法,以便在不同的照明条件下进行排值。methods: 本研究使用了知识学习(KLAP)模型,其中包含1% 钥匙过滤器,可以有效地处理不同的 CFA 模式。在排值过程中,KLAP 模型还使用了元学习(KLAP-M),以避免在实际拍摄图像中产生的抽象遗憾。results: 根据实验结果,KLAP 和 KLAP-M 方法在 Both synthetic 和实际 RAW 数据中达到了领先的排值性能。
    Abstract As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introduce visual artifacts during demosaicing due to their inherent pixel pattern structures and sensor hardware characteristics. Previous demosaicing methods have primarily focused on Bayer CFA, necessitating distinct reconstruction methods for non-Bayer patterned CIS with various CFA modes under different lighting conditions. In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes CFA-adaptive filters for only 1% key filters in the network for each CFA, but still manages to effectively demosaic all the CFAs, yielding comparable performance to the large-scale models. Furthermore, by employing meta-learning during inference (KLAP-M), our model is able to eliminate unknown sensor-generic artifacts in real RAW data, effectively bridging the gap between synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved state-of-the-art demosaicing performance in both synthetic and real RAW data of Bayer and non-Bayer CFAs.
    摘要 As recent CMOS image sensors (CIS) become smaller, mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA due to their changeable pixel-bin sizes for different light conditions, but may introduce visual artifacts during demosaicing due to their inherent pixel pattern structures and sensor hardware characteristics. Previous demosaicing methods have primarily focused on Bayer CFA, necessitating distinct reconstruction methods for non-Bayer patterned CIS with various CFA modes under different lighting conditions. In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes CFA-adaptive filters for only 1% key filters in the network for each CFA, but still manages to effectively demosaic all the CFAs, yielding comparable performance to the large-scale models. Furthermore, by employing meta-learning during inference (KLAP-M), our model is able to eliminate unknown sensor-generic artifacts in real RAW data, effectively bridging the gap between synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved state-of-the-art demosaicing performance in both synthetic and real RAW data of Bayer and non-Bayer CFAs.

Lighting up NeRF via Unsupervised Decomposition and Enhancement

  • paper_url: http://arxiv.org/abs/2307.10664
  • repo_url: https://github.com/onpix/LLNeRF
  • paper_authors: Haoyuan Wang, Xiaogang Xu, Ke Xu, Rynson WH. Lau
  • for: 提高低光照场景中图像的质量和细节,并在不需要训练数据的情况下生成高质量的新视图图像。
  • methods: 提出了一种新的方法,即低光照频谱场景(LLNeRF),通过分解辐射场learning来增强场景表示,减少噪声和修正扭曲的颜色,同时与NeRF优化过程结合使用。
  • results: 实验表明,提出的方法可以很好地提高低光照图像的质量和细节,并生成高质量的新视图图像,与现有的低光照增强方法和NeRF方法相比,表现更好。
    Abstract Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement methods with NeRF methods also does not work well due to the view inconsistency caused by the individual 2D enhancement process. In this paper, we propose a novel approach, called Low-Light NeRF (or LLNeRF), to enhance the scene representation and synthesize normal-light novel views directly from sRGB low-light images in an unsupervised manner. The core of our approach is a decomposition of radiance field learning, which allows us to enhance the illumination, reduce noise and correct the distorted colors jointly with the NeRF optimization process. Our method is able to produce novel view images with proper lighting and vivid colors and details, given a collection of camera-finished low dynamic range (8-bits/channel) images from a low-light scene. Experiments demonstrate that our method outperforms existing low-light enhancement methods and NeRF methods.
    摘要 neural radiance field (nerf) 是一种有前途的方法,可以从场景中的图像和相关的摄像头姿态获得新视图。但是,由于低光照场景中的图像具有低ixel值、强度噪声和颜色扭曲,因此使用这些图像来训练 nerf 模型难以获得高质量结果。在这篇论文中,我们提出了一种新的方法,称为low-light nerf (llnerf),可以从 sRGB 低光照图像中直接生成正常光照的新视图。我们的方法的核心在于 radiance field 学习的分解,可以同时增强照明、减少噪声和正确颜色的扭曲。我们的方法可以从低动态范围(8 bits/通道)的Camera-finished low dynamic range图像中生成新视图图像,具有正确的照明和生动的颜色和细节。实验表明,我们的方法在与现有低光照增强方法和 nerf 方法进行比较时具有更高的性能。

RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection

  • paper_url: http://arxiv.org/abs/2307.10642
  • repo_url: None
  • paper_authors: Qichao Ying, Jiaxin Liu, Sheng Li, Haisheng Xu, Zhenxing Qian, Xinpeng Zhang
  • for: 本研究旨在提高涂抹技术的进步,解决短视频平台上广泛使用面部涂抹过滤器后,面部真实性和广告宣传的问题。
  • methods: 本研究使用了大规模和细化的面部涂抹数据集RetouchingFFHQ,包含超过50万条条件涂抹图像。RetouchingFFHQ与之前的数据集不同,拥有大规模、高质量、细化和定制等特点。
  • results: 研究通过提出多种面部涂抹类型和不同的涂抹水平,将面部涂抹检测升级为细化、多种涂抹类型和多个涂抹水平的估计问题。此外,提出了一种多级别注意模块(MAM),用于增强CNN背景学习的跨比例表示学习。实验表明,使用不同的基elines以及我们提posed方法在RetouchingFFHQ上得到了良好的性能。
    Abstract The widespread use of face retouching filters on short-video platforms has raised concerns about the authenticity of digital appearances and the impact of deceptive advertising. To address these issues, there is a pressing need to develop advanced face retouching techniques. However, the lack of large-scale and fine-grained face retouching datasets has been a major obstacle to progress in this field. In this paper, we introduce RetouchingFFHQ, a large-scale and fine-grained face retouching dataset that contains over half a million conditionally-retouched images. RetouchingFFHQ stands out from previous datasets due to its large scale, high quality, fine-grainedness, and customization. By including four typical types of face retouching operations and different retouching levels, we extend the binary face retouching detection into a fine-grained, multi-retouching type, and multi-retouching level estimation problem. Additionally, we propose a Multi-granularity Attention Module (MAM) as a plugin for CNN backbones for enhanced cross-scale representation learning. Extensive experiments using different baselines as well as our proposed method on RetouchingFFHQ show decent performance on face retouching detection. With the proposed new dataset, we believe there is great potential for future work to tackle the challenging problem of real-world fine-grained face retouching detection.
    摘要 广泛使用短视频平台上的面部修饰矩阵已引起了数字出现的真实性和滥购广告的问题。为解决这些问题,需要开发高级的面部修饰技术。然而,缺乏大规模和细化的面部修饰数据集是该领域进步的主要障碍。在这篇论文中,我们介绍了RetouchingFFHQ数据集,该数据集包含超过50万条条件修饰的图像。RetouchingFFHQ与之前的数据集不同,具有大规模、高质量、细化和定制特点。我们在数据集中包含了四种常见的面部修饰操作和不同的修饰水平,从而将binary face retouching detection扩展到细化、多种修饰类型和多个修饰水平的估计问题。此外,我们提出了一种多级别注意模块(MAM),用于增强CNN背景中的横跨级别表示学习。经验表明,使用不同的基准和我们提出的方法在RetouchingFFHQ上进行了良好的表达学习。我们认为,随着RetouchingFFHQ数据集的出现,将来的工作具有很大的潜力来解决现实世界中的细化面部修饰检测问题。

Quantized Feature Distillation for Network Quantization

  • paper_url: http://arxiv.org/abs/2307.10638
  • repo_url: None
  • paper_authors: Ke Zhu, Yin-Yin He, Jianxin Wu
  • For: The paper aims to accelerate and trim full-precision neural network models using low bit approximations, and proposes a novel and highly effective quantization aware training (QAT) method called quantized feature distillation (QFD).* Methods: QFD trains a quantized representation as the teacher, then quantizes the network using knowledge distillation (KD). The method is more flexible and effective than previous quantization methods, and is applied to image classification, object detection, and image segmentation tasks.* Results: QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection and segmentation tasks, and is the first method to quantize vision transformers for object detection and image segmentation tasks.
    Abstract Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in real world deployment. To the best of our knowledge, this is the first time that vision transformers have been quantized in object detection and image segmentation tasks.
    摘要

Learning and Evaluating Human Preferences for Conversational Head Generation

  • paper_url: http://arxiv.org/abs/2307.10636
  • repo_url: https://github.com/dc3ea9f/PreferenceScore
  • paper_authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei
  • for: 这个论文是为了提出一种新的评价指标,帮助提高对话头视频生成算法和系统的发展。
  • methods: 该论文使用了学习基于的评价指标,名为偏好分数(Preference Score,PS),可以quantitative evaluations across different dimensions,而不需要人工标注。
  • results: 实验结果表明,PS 可以准确地反映人类偏好,同时也具有robustness和通用性,可以应用于未看到的数据集,这使得它成为一个有价值的工具,推动对话头生成的进步。
    Abstract A reliable and comprehensive evaluation metric that aligns with manual preference assessments is crucial for conversational head video synthesis methods development. Existing quantitative evaluations often fail to capture the full complexity of human preference, as they only consider limited evaluation dimensions. Qualitative evaluations and user studies offer a solution but are time-consuming and labor-intensive. This limitation hinders the advancement of conversational head generation algorithms and systems. In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions. PS can serve as a quantitative evaluation without the need for human annotation. Experimental results validate the superiority of Preference Score in aligning with human perception, and also demonstrate robustness and generalizability to unseen data, making it a valuable tool for advancing conversation head generation. We expect this metric could facilitate new advances in conversational head generation. Project Page: https://https://github.com/dc3ea9f/PreferenceScore.
    摘要 <>使用一个可靠且全面的评价指标是对对话头视频生成方法的发展非常重要。现有的量化评价通常只考虑有限的评价维度,而不能捕捉人类偏好的全面性。 qualitative评价和用户研究提供了一种解决方案,但它们需要大量的时间和劳动力。这种限制了对对话头生成算法和系统的进步。在这篇论文中,我们提出了一种基于学习的评价指标 named Preference Score (PS),可以根据不同维度的量化评价来适应人类偏好。PS可以作为一种不需要人类注释的量化评价。实验结果证明了 PS 的优势在与人类感知相符,同时也表明了其对未见数据的稳定性和普适性,这使得它成为了对对话头生成领域的一种有价值的工具。我们期望这个指标能够推动对话头生成的进步。项目页面:https://github.com/dc3ea9f/PreferenceScore。

Parallelization of a new embedded application for automatic meteor detection

  • paper_url: http://arxiv.org/abs/2307.10632
  • repo_url: None
  • paper_authors: Mathuran Kandeepan, Clara Ciocan, Adrien Cassagne, Lionel Lacassagne
  • for: 这篇论文主要是为了描述一种新的计算机视觉应用程序的并行化方法。该系统可以自动检测陨星从不稳定的摄像头和噪声视频序列中。该应用程序设计用于天气球或空中观测活动,因此最终目标是一个低功耗系统在板(< 10 瓦特),软件需要计算流程图像频率(> 25 帧每秒)。
  • methods: 为了实现这些目标,首先将应用程序拆分成任务图,然后应用不同的并行化技术。实验结果表明并行化方法的效率。例如,在raspberry Pi 4 和一个高清视频序列上,处理链达到 42 帧每秒,而只消耗 6 瓦特的电力。
  • results: 实验结果表明,使用这些并行化方法可以大幅提高处理速度和能效性。例如,在raspberry Pi 4 和一个高清视频序列上,处理链达到 42 帧每秒,而只消耗 6 瓦特的电力。
    Abstract This article presents the methods used to parallelize a new computer vision application. The system is able to automatically detect meteor from non-stabilized cameras and noisy video sequences. The application is designed to be embedded in weather balloons or for airborne observation campaigns. Thus, the final target is a low power system-on-chip (< 10 Watts) while the software needs to compute a stream of frames in real-time (> 25 frames per second). For this, first the application is split in a tasks graph, then different parallelization techniques are applied. Experiment results demonstrate the efficiency of the parallelization methods. For instance, on the Raspberry Pi 4 and on a HD video sequence, the processing chain reaches 42 frames per second while it only consumes 6 Watts.
    摘要

Learning Discriminative Visual-Text Representation for Polyp Re-Identification

  • paper_url: http://arxiv.org/abs/2307.10625
  • repo_url: https://github.com/jeremyxsc/vt-reid
  • paper_authors: Suncheng Xiang, Cang Liu, Sijia Du, Dahong Qian
  • for: 这个研究旨在提高colonoscopic polyp re-identification的精度,以帮助预防和治疗colonrectal cancer。
  • methods: 本研究使用了VT-ReID训练方法,具体包括将高水平semantic信息与视觉表现学习拓展,以及引入文本数据的对应知识来提高分类效果。
  • results: 根据实验结果,本方法与现有的州际方法相比,具有明显的优势,并且可以更好地适应新的应用环境。
    Abstract Colonoscopic Polyp Re-Identification aims to match a specific polyp in a large gallery with different cameras and views, which plays a key role for the prevention and treatment of colorectal cancer in the computer-aided diagnosis. However, traditional methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training, which may easily leads to poor generalization capability when adapted the pretrained model into the new scenarios. To relieve this dilemma, we propose a simple but effective training method named VT-ReID, which can remarkably enrich the representation of polyp videos with the interchange of high-level semantic information. Moreover, we elaborately design a novel clustering mechanism to introduce prior knowledge from textual data, which leverages contrastive learning to promote better separation from abundant unlabeled text data. To the best of our knowledge, this is the first attempt to employ the visual-text feature with clustering mechanism for the colonoscopic polyp re-identification. Empirical results show that our method significantly outperforms current state-of-the art methods with a clear margin.
    摘要 COLONOSCOPIC POLYP RE-IDENTIFICATION 目的是匹配大量不同摄像头和视图中的特定肿块,这对于抗击和治疗肠Rectal Cancer 具有关键作用。然而,传统方法主要Focus on the visual representation learning,而忽略了在训练时 explore the potential of semantic features,这可能会导致预训练模型在新scenario 中的恶劣泛化性。为了解决这个问题,我们提出了一种简单 yet effective 的训练方法,名为 VT-ReID,该方法可以很好地增强肿块视频的表示。此外,我们 elaborate on a novel clustering mechanism to introduce prior knowledge from textual data,该机制 leverages contrastive learning to promote better separation from abundant unlabeled text data。到目前为止,这是首次采用视觉文本特征与 clustering mechanism 进行肿块重识别。实验结果表明,我们的方法在与当前状态最佳方法进行比较中具有显著的优势。

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification

  • paper_url: http://arxiv.org/abs/2307.10624
  • repo_url: https://github.com/VUT-HFUT/MiGA2023_Track1
  • paper_authors: Kun Li, Dan Guo, Guoliang Chen, Xinge Peng, Meng Wang
  • for: Micros-gesture Classification Challenge at IJCAI 2023
  • methods: 3D-CNNs-based micro-gesture recognition network with skeletal and semantic embedding loss
  • results: Ranked 1st in the challenge with Top-1 accuracy 1.10% higher than the second-place team
    Abstract In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023. The micro-gesture classification task aims at recognizing the action category of a given video based on the skeleton data. For this task, we propose a 3D-CNNs-based micro-gesture recognition network, which incorporates a skeletal and semantic embedding loss to improve action classification performance. Finally, we rank 1st in the Micro-gesture Classification Challenge, surpassing the second-place team in terms of Top-1 accuracy by 1.10%.
    摘要 在这篇论文中,我们简要介绍我们团队HFUT-VUT在IJCAI 2023年的MiGA挑战中的解决方案。微表达识别任务的目标是根据视频中的skeleton数据识别动作类别。为此,我们提议一种基于3D-CNNs的微表达识别网络,该网络包含skeletal和semantic嵌入损失以提高动作分类性能。最后,我们在微表达分类挑战中名列第一,比第二名组织的Top-1准确率高出1.10%。

Quaternion tensor ring decomposition and application for color image inpainting

  • paper_url: http://arxiv.org/abs/2307.10620
  • repo_url: None
  • paper_authors: Jifei Miao, Kit Ian Kou
  • for: color image inpainting
  • methods: quaternion tensor ring (QTR) decomposition, low-rank quaternion tensor completion (LRQTC) model
  • results: highly competitive performance in color image inpainting tasks
    Abstract In recent years, tensor networks have emerged as powerful tools for solving large-scale optimization problems. One of the most promising tensor networks is the tensor ring (TR) decomposition, which achieves circular dimensional permutation invariance in the model through the utilization of the trace operation and equitable treatment of the latent cores. On the other hand, more recently, quaternions have gained significant attention and have been widely utilized in color image processing tasks due to their effectiveness in encoding color pixels. Therefore, in this paper, we propose the quaternion tensor ring (QTR) decomposition, which inherits the powerful and generalized representation abilities of the TR decomposition while leveraging the advantages of quaternions for color pixel representation. In addition to providing the definition of QTR decomposition and an algorithm for learning the QTR format, this paper also proposes a low-rank quaternion tensor completion (LRQTC) model and its algorithm for color image inpainting based on the QTR decomposition. Finally, extensive experiments on color image inpainting demonstrate that the proposed QTLRC method is highly competitive.
    摘要 近年来,tensor网络已经出现为大规模优化问题提供了 poderous工具。TR嵌入(Tensor Ring)是其中之一,它实现了循环维度Permutation invariance 的模型,通过跟踪操作和平衡latent核的待遇,从而获得了一种通用和泛化的表示能力。然而,更近期,quaternions 在color image processing tasks 中得到了广泛的应用,因为它们可以高效地编码color pixels。因此,在这篇论文中,我们提出了quaternion tensor ring(QTR)嵌入,它继承了TR嵌入的强大和通用表示能力,同时利用了quaternions 的优点来表示color pixel。此外,我们还提出了一种基于QTR嵌入的low-rank quaternion tensor completion(LRQTC)模型和其算法,用于color image inpainting。最后,我们进行了大量的实验,并证明了我们的方法在color image inpainting中具有高端竞争力。

Hybrid Feature Embedding For Automatic Building Outline Extraction

  • paper_url: http://arxiv.org/abs/2307.10609
  • repo_url: None
  • paper_authors: Weihang Ran, Wei Yuan, Xiaodan Shi, Zipei Fan, Ryosuke Shibasaki
  • for: 本研究旨在提高从高分辨 Aerial 图像中提取的建筑 outline 精度,以便应用于变化探测和灾害评估等领域。
  • methods: 本研究提出了一种基于 CNN 和 Transformer 模型的 actice contour 模型,并采用了 triple-branch decoder 结构来处理不同的特征生成。
  • results: 实验结果显示,我们的模型在两个 dataset 上的测试结果均高于基线模型,达到了 Vaihingen 的 91.1% mIoU 和 Bing huts 的 83.8%。
    Abstract Building outline extracted from high-resolution aerial images can be used in various application fields such as change detection and disaster assessment. However, traditional CNN model cannot recognize contours very precisely from original images. In this paper, we proposed a CNN and Transformer based model together with active contour model to deal with this problem. We also designed a triple-branch decoder structure to handle different features generated by encoder. Experiment results show that our model outperforms other baseline model on two datasets, achieving 91.1% mIoU on Vaihingen and 83.8% on Bing huts.
    摘要 traditional CNN模型无法准确地识别原始图像中的边界,这在变化检测和灾害评估等应用领域中存在问题。在这篇论文中,我们提出了一种基于CNN和Transformer的模型,并与活动轮廓模型结合使用。我们还设计了一种三重分支解码结构,以处理由编码器生成的不同特征。实验结果显示,我们的模型在两个数据集上比基准模型高效,达到了91.1%的mIoU在杰宁根和83.8%在冰寨。

Physics-Driven Turbulence Image Restoration with Stochastic Refinement

  • paper_url: http://arxiv.org/abs/2307.10603
  • repo_url: https://github.com/vita-group/pirn
  • paper_authors: Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang
  • For: This paper aims to improve the restoration of images degraded by atmospheric turbulence in long-range optical imaging systems.* Methods: The proposed method, called Physics-integrated Restoration Network (PiRN), incorporates physics-based simulation tools into the training process to help the network disentangle the stochasticity from the degradation and the underlying image. Additionally, PiRN with Stochastic Refinement (PiRN-SR) is introduced to boost perceptual quality.* Results: The proposed method improves the generalization to real-world unknown turbulence conditions and provides state-of-the-art restoration in both pixel-wise accuracy and perceptual quality.Here’s the full summary in Simplified Chinese:
  • for: 这篇论文目标是改进长距离光学感知系统中由大气扰动引起的图像质量下降的修复。
  • methods: 提议的方法是将物理基础的模拟工具直接引入训练过程,以帮助网络从图像中分离杂散性和底层图像。此外,还引入PiRN with Stochastic Refinement(PiRN-SR),以提高可见质量。
  • results: 提议的方法可以提高对实际未知大气扰动条件的泛化,并提供像素精度和可见质量方面的国际顶峰修复。
    Abstract Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.
    摘要 图像扭曲因大气扰动是一种随机干扰,对远距离光学图像系统来说是一个重要问题。过去几十年内,有许多研究被进行,包括基于模型的和新兴深度学习解决方案,这些解决方案均依靠生成的数据和真实数据对。 although 最近引入了快速和基于物理的 simulate 工具,这些工具只是用于帮助深度学习模型适应实际大气扰动 conditioons。本文提出了物理结合修复网络(PiRN),将物理基础的模拟器直接 интеGRATE 到训练过程中,以帮助网络分离随机性和扰动以及下面的图像。此外,为了解决由 deterministic 模型引入的“平均效应”和实际世界的扰动域之间的域隔,我们进一步引入 PiRN with Stochastic Refinement(PiRN-SR),以提高其视觉质量。总的来说,我们的 PiRN 和 PiRN-SR 可以更好地泛化到实际世界未知的大气扰动条件,并提供了 pixeling 精度和视觉质量两个方面的状态ixel-wise 精度和视觉质量两个方面的状态-of-the-art 修复。我们的代码可以在 \url{https://github.com/VITA-Group/PiRN} 上获取。

Flatness-Aware Minimization for Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.11108
  • repo_url: None
  • paper_authors: Xingxuan Zhang, Renzhe Xu, Han Yu, Yancheng Dong, Pengfei Tian, Peng Cu
  • for: 这个研究旨在探讨预测分布shift下的模型学习 Robustness,具体来说是选择适当的优化器以提高预测性能。
  • methods: 本研究提出了一种新的方法,即 Flatness-Aware Minimization for Domain Generalization (FAD),可以快速且高效地优化零项和一项的平坦性同时,以提高预测性能。
  • results: 实验结果显示,FAD在多个预测分布shift datasets上具有较高的性能,并且可以更好地找到平坦的极小值。
    Abstract Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.
    摘要 领域通用化 (DG) 目标是学习能够适应未知分布变化的模型。作为critical aspect of DG, optimizer选择还没有得到深入研究。目前,大多数 DG 方法采用了广泛使用的标准底层,DomainBed,并使用 Adam 作为所有数据集的默认优化器。然而,我们发现 Adam 并不是现有的 DG 方法和数据集中的优imal choice。基于损失函数地形的角度,我们提出了一种新的方法,适应性考虑的最小化 для领域通用化 (FAD),可以有效地最小化零预测和首预测的损失函数平坦性。我们提供了对 FAD 的出度报告和抽象分析。我们的实验结果表明 FAD 在各种 DG 数据集上具有优越性。此外,我们证明 FAD 可以更好地发现平坦的极点,相比其他零预测和首预测平坦性意识的优化方法。

SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval

  • paper_url: http://arxiv.org/abs/2307.10601
  • repo_url: None
  • paper_authors: Dongyun Lin, Yi Cheng, Aiyuan Guo, Shangbo Mao, Yiqun Li
    for:* 3D object retrievalmethods:* self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet)* deep features extracted from point clouds and multi-view images* two types of feature aggregation modules: In-Modality Aggregation Module (IMAM) and Cross-Modality Aggregation Module (CMAM)results:* superiority of the proposed SCA-PVNet over state-of-the-art methods* extensive experiments and analysis conducted on three datasets, ranging from small to large scale.
    Abstract To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors of 3D objects represented by a single modality, e.g., voxels, point clouds or multi-view images. It is promising to leverage the complementary information from multi-modality representations of 3D objects to further improve retrieval performance. However, multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets. In this paper, we propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the In-Modality Aggregation Module (IMAM) and the Cross-Modality Aggregation Module (CMAM), for effective feature fusion. IMAM leverages a self-attention mechanism to aggregate multi-view features while CMAM exploits a cross-attention mechanism to interact point cloud features with multi-view features. The final descriptor of a 3D object for object retrieval can be obtained via concatenating the aggregated features from both modules. Extensive experiments and analysis are conducted on three datasets, ranging from small to large scale, to show the superiority of the proposed SCA-PVNet over the state-of-the-art methods.
    摘要 要解决3D对象检索问题,有substantial effortshave been made to generate高度 дискリメイト的描述符3D对象,例如voxels、point clouds或多视图图像。可以利用多模态表示3D对象的补充信息来进一步提高检索性能。然而,多模态3D对象检索rarely developed and analyzed on large-scale datasets。在这篇论文中,我们提出了自我和交叉注意力基于点云和多视图图像的汇集(SCA-PVNet) для3D对象检索。通过点云和多视图图像中的深度特征,我们设计了两种类型的特征汇集模块,namely the In-Modality Aggregation Module (IMAM)和the Cross-Modality Aggregation Module (CMAM),以实现有效的特征融合。IMAM利用了一种自我注意力机制来汇集多视图特征,而CMAM利用了一种交叉注意力机制来互动点云特征和多视图特征。最终的3D对象检索中的描述符可以通过 concatenating the aggregated features from both modules来获得。我们在三个dataset上进行了广泛的实验和分析,以显示我们提出的SCA-PVNet在现状方法中的超越。

Event Blob Tracking: An Asynchronous Real-Time Algorithm

  • paper_url: http://arxiv.org/abs/2307.10593
  • repo_url: https://github.com/ziweiwwang/event-blob-tracking
  • paper_authors: Ziwei Wang, Timothy Molloy, Pieter van Goor, Robert Mahony
  • for: 这 paper 是为了跟踪 fast-moving objects 的 tracking 而设计的。
  • methods: 这 paper 使用了 raw events 的 asynchronous 处理,并使用了一种新的 algorithm 来跟踪 event blobs。
  • results: 这 paper 实现了高精度的 tracking 和 event blob shape estimation,even under challenging lighting conditions 和高速运动情况下。
    Abstract Event-based cameras have become increasingly popular for tracking fast-moving objects due to their high temporal resolution, low latency, and high dynamic range. In this paper, we propose a novel algorithm for tracking event blobs using raw events asynchronously in real time. We introduce the concept of an event blob as a spatio-temporal likelihood of event occurrence where the conditional spatial likelihood is blob-like. Many real-world objects generate event blob data, for example, flickering LEDs such as car headlights or any small foreground object moving against a static or slowly varying background. The proposed algorithm uses a nearest neighbour classifier with a dynamic threshold criteria for data association coupled with a Kalman filter to track the event blob state. Our algorithm achieves highly accurate tracking and event blob shape estimation even under challenging lighting conditions and high-speed motions. The microsecond time resolution achieved means that the filter output can be used to derive secondary information such as time-to-contact or range estimation, that will enable applications to real-world problems such as collision avoidance in autonomous driving.
    摘要 Translated into Simplified Chinese:事件基于摄像头已经在跟踪快速移动 объек的应用中变得越来越流行,这是因为它们的高时间分辨率、低延迟和高动态范围。在这篇论文中,我们提出了一种新的算法,用于在实时中跟踪事件blob,使用Raw事件 asynchronous。我们定义了事件blob为空间时间的可能性,其中条件的空间可能性是菱形的。许多实际世界中的对象会生成事件blob数据,例如跑动的LED灯或任何小背景对象在静止或慢变化的背景下移动。我们的算法使用最近的几何类型分类器和动态阈值 criterion for data association,并与加拓Filter进行跟踪事件blob状态。我们的算法可以在挑战性的照明条件和高速运动下实现高精度的跟踪和事件blob形状估计。微秒级时间分辨率实现的结果可以用来 derivate secondary information,例如时间距离或距离估计,这将为实际问题如自动驾驶中的碰撞避免提供应用。

Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap

  • paper_url: http://arxiv.org/abs/2307.10584
  • repo_url: None
  • paper_authors: Dejia Xu, Xingqian Xu, Wenyan Cong, Humphrey Shi, Zhangyang Wang
  • for: 这篇论文旨在提出一种新的图像填充方法,可以将新的对象Insert到艺术图像中,以实现创新的画廊效果。
  • methods: 该方法基于一种新的扩散框架,称为RefPaint,它可以处理大量域之差的参考图像,并使用一种新的掩码融合机制和一个梯形分支来实现填充mask。
  • results: 实验结果表明,RefPaint方法可以生成较好的结果,而且比现有的方法更有创造力和灵活性。
    Abstract Have you ever imagined how it would look if we placed new objects into paintings? For example, what would it look like if we placed a basketball into Claude Monet's ``Water Lilies, Evening Effect''? We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks. Although previous works have examined reference-based inpainting, they are not designed for large domain discrepancies between the target and the reference, such as inpainting an artistic image using a photorealistic reference. This paper proposes a novel diffusion framework, dubbed RefPaint, to ``inpaint more wildly'' by taking such references with large domain gaps. Built with an image-conditioned diffusion model, we introduce a ladder-side branch and a masked fusion mechanism to work with the inpainting mask. By decomposing the CLIP image embeddings at inference time, one can manipulate the strength of semantic and style information with ease. Experiments demonstrate that our proposed RefPaint framework produces significantly better results than existing methods. Our method enables creative painterly image inpainting with reference objects that would otherwise be difficult to achieve. Project page: https://vita-group.github.io/RefPaint/
    摘要 你们ever imagined如果我们在画作中放置了新的物品呢?例如,如果我们在 Claude Monet的“水莲花”晚效版本中放置了篮球呢?我们提出了Reference-based Painterly Inpainting,这是一个跨越野 referenece领域的新任务,可以插入 novel objects 到艺术作品中。 although previous works have examined reference-based inpainting, they are not designed for large domain discrepancies between the target and the reference, such as inpainting an artistic image using a photorealistic reference. This paper proposes a novel diffusion framework, dubbed RefPaint, to "inpaint more wildly" by taking such references with large domain gaps. Built with an image-conditioned diffusion model, we introduce a ladder-side branch and a masked fusion mechanism to work with the inpainting mask. By decomposing the CLIP image embeddings at inference time, one can manipulate the strength of semantic and style information with ease. Experiments demonstrate that our proposed RefPaint framework produces significantly better results than existing methods. Our method enables creative painterly image inpainting with reference objects that would otherwise be difficult to achieve. Project page: https://vita-group.github.io/RefPaint/Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need any further assistance.

No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection

  • paper_url: http://arxiv.org/abs/2307.10567
  • repo_url: None
  • paper_authors: Qi Zhang, Sipeng Zheng, Qin Jin
  • for: 本研究旨在提高视频语言查询中的时间间隔检索精度,尤其是在低含义噪声比例(SNR)下。
  • methods: 本文提出了一种简单易用的视频语言查询模型,包括多Scale邻域注意和缩进边界检测两个核心模块。
  • results: 与先前的研究相比,本模型在不同的TVG标准准则上实现了竞争力的性能,同时具有更快的推理速度和轻量级的模型参数。
    Abstract Temporal video grounding (TVG) aims to retrieve the time interval of a language query from an untrimmed video. A significant challenge in TVG is the low "Semantic Noise Ratio (SNR)", which results in worse performance with lower SNR. Prior works have addressed this challenge using sophisticated techniques. In this paper, we propose a no-frills TVG model that consists of two core modules, namely multi-scale neighboring attention and zoom-in boundary detection. The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, enabling the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment. With an end-to-end training strategy, our model achieves competitive performance on different TVG benchmarks, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture.
    摘要 The multi-scale neighboring attention restricts each video token to only aggregate visual contexts from its neighbor, allowing the extraction of the most distinguishing information with multi-scale feature hierarchies from high-ratio noises. The zoom-in boundary detection then focuses on local-wise discrimination of the selected top candidates for fine-grained grounding adjustment.Our model achieves competitive performance on different TVG benchmarks with an end-to-end training strategy, while also having the advantage of faster inference speed and lighter model parameters, thanks to its lightweight architecture.

Interactive Segmentation for Diverse Gesture Types Without Context

  • paper_url: http://arxiv.org/abs/2307.10518
  • repo_url: None
  • paper_authors: Josh Myers-Dean, Yifei Fan, Brian Price, Wilson Chan, Danna Gurari
  • for: 这个论文旨在解决现有方法的限制,它们只支持单一的手势类型(如点击或涂抹),或者需要知道使用哪种手势类型。
  • methods: 这篇论文提出了一种简化的交互分割任务,只需要用户标注图像,无需指定手势类型。他们还引入了首个支持多种手势类型的交互分割数据集,以及一新的评价指标。
  • results: 研究人员对多种交互分割算法进行了分析,包括他们修改后的算法。虽然总体表现良好,但还有一些需要进一步改进的地方。研究人员将数据集公开发布在 GitHub 上(https://github.com/joshmyersdean/dig),以便进一步扩展这项工作。
    Abstract Interactive segmentation entails a human marking an image to guide how a model either creates or edits a segmentation. Our work addresses limitations of existing methods: they either only support one gesture type for marking an image (e.g., either clicks or scribbles) or require knowledge of the gesture type being employed, and require specifying whether marked regions should be included versus excluded in the final segmentation. We instead propose a simplified interactive segmentation task where a user only must mark an image, where the input can be of any gesture type without specifying the gesture type. We support this new task by introducing the first interactive segmentation dataset with multiple gesture types as well as a new evaluation metric capable of holistically evaluating interactive segmentation algorithms. We then analyze numerous interactive segmentation algorithms, including ones adapted for our novel task. While we observe promising performance overall, we also highlight areas for future improvement. To facilitate further extensions of this work, we publicly share our new dataset at https://github.com/joshmyersdean/dig.
    摘要 互动分割包括一个人用图像进行指导,以便模型创建或编辑分割。我们的工作解决了现有方法的限制:它们可以只支持一种图像标记类型(例如,单击或scribbles),或者需要知道用户使用哪种手势类型,并且需要指定标记区域是否包括在最终分割中。我们而是提议简化互动分割任务,其中用户只需要标记图像,而无需指定手势类型。我们支持这项新任务,并发布了首个包含多种手势类型的互动分割数据集,以及一种新的评价指标,能够全面评估互动分割算法。我们然后分析了许多互动分割算法,包括我们所采用的新任务中的算法。虽然我们在总的来说表现良好,但我们也指出了未来改进的方向。为了促进这项工作的进一步扩展,我们将数据集公开分享在 GitHub 上,请参考

FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation

  • paper_url: http://arxiv.org/abs/2307.10507
  • repo_url: None
  • paper_authors: Minghui Chen, Meirui Jiang, Qi Dou, Zehua Wang, Xiaoxiao Li
  • for: 这篇论文旨在解决跨存储层 Federated Learning(FL)中模型在不同数据中心的数据分布下的问题。
  • methods: 该论文提出了一种新的 federated model soup 方法(即选择式模型参数 interpolating)来优化本地和全局性能的负面选择。在联合训练阶段,每个客户端都维护自己的全局模型池,并在本地和全局模型之间进行模型参数 interpolating,以避免过拟合和寻找平坦的最优点。
  • results: 该论文在透视和病理图像分类任务上进行了评估,并达到了显著改善的对于不同数据分布的泛化性能。代码可以在 https://github.com/ubc-tea/FedSoup 上获取。
    Abstract Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers such as hospitals and clinical research laboratories. However, recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. Specifically, personalized FL methods have a tendency to overfit to local data, leading to a sharp valley in the local model and inhibiting its ability to generalize to out-of-distribution data. In this paper, we propose a novel federated model soup method (i.e., selective interpolation of model parameters) to optimize the trade-off between local and global performance. Specifically, during the federated training phase, each client maintains its own global model pool by monitoring the performance of the interpolated model between the local and global models. This allows us to alleviate overfitting and seek flat minima, which can significantly improve the model's generalization performance. We evaluate our method on retinal and pathological image classification tasks, and our proposed method achieves significant improvements for out-of-distribution generalization. Our code is available at https://github.com/ubc-tea/FedSoup.
    摘要 跨存储silos的联合学习(FL)可以在数据中心如医院和临床研究室中的数据集上开发机器学习模型。然而,latest research发现当面临分布变化时,当前的FL算法存在本地和全局性能之间的负面选择。具体来说,个性化FL方法容易过拟合本地数据,导致本地模型峰值降低,阻碍其对于不同数据集的泛化表现。在这篇论文中,我们提出了一种新的联合模型汤(i.e., 选择模型参数的 interpolación)来优化本地和全局性能之间的负面选择。具体来说,在联合训练阶段,每个客户端都维护自己的全局模型池,并通过监测 interpolated model 的性能来监控本地和全局模型之间的交互。这样可以减轻过拟合,寻找平坦的最佳点,可以显著提高模型的泛化性能。我们在Retinal和pathological图像分类任务上评估了我们的方法,并确认了对于不同数据集的泛化性能具有显著提高。我们的代码可以在https://github.com/ubc-tea/FedSoup中找到。

Is Grad-CAM Explainable in Medical Images?

  • paper_url: http://arxiv.org/abs/2307.10506
  • repo_url: None
  • paper_authors: Subhashis Suara, Aayush Jha, Pratik Sinha, Arif Ahmed Sekh
  • for: This paper is written for the field of artificial intelligence (AI) and medical imaging, specifically to explore the principles of Explainable Deep Learning and its relevance to medical imaging.
  • methods: The paper discusses various explainability techniques, including Grad-CAM, and their limitations in medical imaging applications.
  • results: The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging.Here is the same information in Simplified Chinese text:
  • for: 这篇论文是为人工智能(AI)和医学成像领域写的,具体来说是探讨Explainable Deep Learning的原理和它在医学成像中的应用。
  • methods: 论文讨论了各种解释技术,包括Grad-CAM,以及它们在医学成像应用中的局限性。
  • results: 结论显示Explainable Deep Learning和Grad-CAM在医学成像中可以提高深度学习模型的准确性和可读性。
    Abstract Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (will be available).
    摘要 “几何学深度学习(Explainable Deep Learning)在人工智能(AI)领域内已经获得了很大的注意力,特别是在医疗影像领域,因为需要高度精度和可解释的机器学习模型,以便更好地诊断和规划治疗。Grad-CAM是一个基准,可以显示 deep learning 模型在决策过程中使用的最重要的影像区域,从而增加了解释力和信任度。它在许多计算机视觉(CV)任务中,如分类和解释,获得了应用。本研究探讨了Explainable Deep Learning的原则和医疗影像领域的应用,讨论了各种解释技术的限制,并评估了 Grad-CAM 在医疗影像中的应用。研究结果显示了Explainable Deep Learning和 Grad-CAM在医疗影像中的应用潜力。代码将在(将会available)。”Here's the breakdown of the translation:1. 几何学深度学习 (Explainable Deep Learning) - This is the name of the field that combines geometry and deep learning to improve the interpretability of deep learning models.2. 人工智能 (AI) - This refers to the broader field of artificial intelligence, which includes machine learning and deep learning.3. 医疗影像 (medical imaging) - This refers to the use of deep learning models in medical imaging to improve diagnosis and treatment planning.4. Grad-CAM - This is a technique for visualizing the most important regions of an image used in a deep learning model's decision-making process.5. 计算机视觉 (CV) - This refers to the field of computer vision, which includes tasks such as image classification and object detection.6. 应用 (applications) - This refers to the specific uses of Explainable Deep Learning and Grad-CAM in medical imaging.7. 原则 (principles) - This refers to the fundamental principles of Explainable Deep Learning and how it can be applied in medical imaging.8. 限制 (limitations) - This refers to the limitations of Explainable Deep Learning and Grad-CAM in medical imaging.9. 应用 (applications) - This refers to the specific uses of Grad-CAM in medical imaging.10. 代码 (code) - This refers to the code used to implement Explainable Deep Learning and Grad-CAM in medical imaging. The code will be available in the future.

Identifying Interpretable Subspaces in Image Representations

  • paper_url: http://arxiv.org/abs/2307.10504
  • repo_url: None
  • paper_authors: Neha Kalibhat, Shweta Bhardwaj, Bayan Bruss, Hamed Firooz, Maziar Sanjabi, Soheil Feizi
  • for: 本文提出了一种自动Feature Explanation使用对比概念(FALCON)解释图像表示的解释性框架,用于解释图像表示中的特征。
  • methods: 本文使用了一个大型captioningdataset(如LAION-400m)和一个预训练的视觉语言模型(如CLIP)来captioning高活动图像,并对每个单词进行 scoring和排名,从而获得了一小number of共同、人类可理解的概念,可以准确地描述目标特征。此外,本文还应用了对比解释,使用低活动(counterfactual)图像来消除幻象概念。
  • results: 本文发现,许多现有的方法只能解释特征独立地,而不能够解释图像表示中的大部分空间。然而,通过使用FALCON,我们发现,在更大的空间中,特征可以更好地被解释,并且可以通过高级 scoring概念来描述。此外,本文还提出了一种将概念从一个可解释的表示空间传递到另一个未知表示空间的技术。
    Abstract We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.
    摘要 我们提出了自动Feature解释使用对比概念(FALCON)的解释框架,用于解释图像表示中的特征。对于目标特征,FALCON使用大量captioning数据集(如LAION-400m)和预训练的视觉语言模型(如CLIP)来描述高活跃的裁剪图像。每个单词在caption中的得分和排名,导致一小number of共享、人类可理解的概念,准确地描述目标特征。此外,FALCON还应用了对比解释,使用低活跃(counterfactual)图像来消除幻样概念。我们发现,许多现有的方法在独立地解释特征,但我们观察到,在当前领域最佳自动学习和监督学习模型中,仅有少于20%的表示空间可以由单个特征解释。我们显示,在更大的表示空间中,特征在组合上变得更加解释性,可以通过FALCON中的高阶得分概念来解释。我们还讨论了如何使用提取的概念来解释和调试下游任务中的失败。最后,我们介绍了一种将概念从一个可解释的表示空间传递到另一个未知表示空间的简单线性变换的技术。

Eye Disease Classification Using Deep Learning Techniques

  • paper_url: http://arxiv.org/abs/2307.10501
  • repo_url: https://github.com/akbarreis/Classification-eyes-disease
  • paper_authors: Tareq Babaqi, Manar Jaradat, Ayse Erdem Yildirim, Saif H. Al-Nimer, Daehan Won
  • for: Early detection and diagnosis of eye diseases to prevent vision loss or blindness.
  • methods: Utilized Convolutional Neural Networks (CNN) and transfer learning for multi-class classification.
  • results: Achieved high accuracy of 94%, outperforming traditional CNN at 84%.
    Abstract Eye is the essential sense organ for vision function. Due to the fact that certain eye disorders might result in vision loss, it is essential to diagnose and treat eye diseases early on. By identifying common eye illnesses and performing an eye check, eye care providers can safeguard patients against vision loss or blindness. Convolutional neural networks (CNN) and transfer learning were employed in this study to discriminate between a normal eye and one with diabetic retinopathy, cataract, or glaucoma disease. Using transfer learning for multi-class classification, high accuracy was achieved at 94% while the traditional CNN achieved 84% rate.
    摘要 眼是视力功能的关键感知器官。由于某些眼病可能会导致视力损失,因此早期诊断和治疗眼病是非常重要的。通过识别常见眼病和进行眼检查,眼科医生可以为患者保护视力或失明。本研究使用卷积神经网络(CNN)和传输学习来区分正常眼和糖尿病、水晶体疾病或高压病眼。使用传输学习进行多类分类,达到了94%的高精度率,而传统的CNN则达到了84%的率。

Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection

  • paper_url: http://arxiv.org/abs/2307.10499
  • repo_url: None
  • paper_authors: Guangzhi Wang, Yangyang Guo, Mohan Kankanhalli
  • For: The paper focuses on human-object interaction detection, which is crucial for human-centric scene understanding and has various applications.* Methods: The proposed method uses a Part Semantic Network (PSN) with a Conditional Part Attention (CPA) mechanism to automatically focus on the most informative human parts conditioned on the involved object, generating more semantically meaningful features for interaction recognition. Additionally, the Occluded Part Extrapolation (OPE) strategy is proposed to facilitate interaction recognition under occluded scenarios.* Results: The proposed method consistently outperforms prior approaches on the V-COCO and HICO-DET datasets without external data or extra annotations.
    Abstract Human-Object Interaction Detection is a crucial aspect of human-centric scene understanding, with important applications in various domains. Despite recent progress in this field, recognizing subtle and detailed interactions remains challenging. Existing methods try to use human-related clues to alleviate the difficulty, but rely heavily on external annotations or knowledge, limiting their practical applicability in real-world scenarios. In this work, we propose a novel Part Semantic Network (PSN) to solve this problem. The core of PSN is a Conditional Part Attention (CPA) mechanism, where human features are taken as keys and values, and the object feature is used as query for the computation in a cross-attention mechanism. In this way, our model learns to automatically focus on the most informative human parts conditioned on the involved object, generating more semantically meaningful features for interaction recognition. Additionally, we propose an Occluded Part Extrapolation (OPE) strategy to facilitate interaction recognition under occluded scenarios, which teaches the model to extrapolate detailed features from partially occluded ones. Our method consistently outperforms prior approaches on the V-COCO and HICO-DET datasets, without external data or extra annotations. Additional ablation studies validate the effectiveness of each component of our proposed method.
    摘要 人机物交互检测是人工智能场景理解的关键方面,具有重要应用于多个领域。尽管最近在这个领域取得了一些进展,但感知细微和细腻交互仍然是挑战。现有方法通过人类相关的准确来减轻这个问题,但它们依赖于外部注释或知识,限制其在实际场景中的实用性。在这种情况下,我们提出了一种新的部Semantic Network(PSN)来解决这个问题。PSN的核心是一种Conditional Part Attention(CPA)机制,其中人类特征被用作键和值,对象特征被用作查询进行计算。这种方式使得我们的模型能够自动将注意力集中在与物体交互时最有用的人类部分上,从而生成更加含义强的交互特征。此外,我们还提出了Occluded Part Extrapolation(OPE)策略,以便在受阻场景下进行交互检测,即教学模型从部分遮挡的特征中提取细节特征。我们的方法在V-COCO和HICO-DET数据集上 consistently 击败先前的方法,而无需外部数据或额外注释。此外,我们还进行了附加的抽象研究,以证明每个我们提出的方法的效果。

Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets

  • paper_url: http://arxiv.org/abs/2307.10495
  • repo_url: https://github.com/chapman20j/sar_bal
  • paper_authors: James Chapman, Bohan Chen, Zheng Tan, Jeff Calder, Kevin Miller, Andrea L. Bertozzi
  • for: This paper is written for improving the performance of machine learning methods on synthetic aperture radar (SAR) data using active learning techniques.
  • methods: The paper proposes a novel, two-part approach for batch active learning, which includes Dijkstra’s Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling.
  • results: The proposed approach achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. The paper also demonstrates the effectiveness of the approach on classifying FUSAR-Ship and OpenSARShip datasets, outperforming state-of-the-art CNN-based methods.Here are the three points in Simplified Chinese text:
  • for: 这篇论文是为了通过活动学习技术提高机器学习方法对 Synthetic Aperture Radar(SAR)数据的性能。
  • methods: 论文提出了一种新的、两部分的批处理活动学习方法,包括 Dijkstra的 Annulus Core-Set(DAC)和 LocalMax。
  • results: 该方法可以和顺序活动学习方法准确性相似,但是更高效,与批处理大小成正比。论文还在使用传输学习特征嵌入、图学习、DAC和LocalMax классифици了 FUSAR-Ship 和 OpenSARShip 数据集,超越了现有的 CNN 方法。
    Abstract Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods.
    摘要 活动学习可以提高机器学习方法的性能,通过选择一个有限的无标注数据点进行问题,以最大程度地提高下面的分类器性能。最近,在 Synthetic Aperture Radar(SAR)数据上使用了连续活动学习,引用文献:arXiv:2204.00005。在每一轮中,连续活动学习选择一个问题集大小为一的查询集,而批量活动学习选择多个数据点的查询集。虽然批量活动学习方法更高效,但是保持模型准确性相对来说是一个挑战。我们提出了一种新的、两部分的批量活动学习方法:Dijkstra的核心圈(DAC)和LocalMax。这个批量活动学习过程将DAC和LocalMax结合起来,可以达到与连续活动学习方法相同的准确性,但是更高效,与批量大小成比例。我们使用了传输学习特征嵌入、图学习、DAC和LocalMax来分类FUSAR-Ship和OpenSARShip数据集。我们的管道超越了基于Convolutional Neural Network(CNN)的状态流方法。

Confidence Estimation Using Unlabeled Data

  • paper_url: http://arxiv.org/abs/2307.10440
  • repo_url: https://github.com/topoxlab/consistency-ranking-loss
  • paper_authors: Chen Li, Xiaoling Hu, Chao Chen
  • for: 本研究旨在提出一种 semi-supervised 环境下的信任估计方法,以便更好地在实际应用中使用深度神经网络。
  • methods: 我们提出了一种基于训练过程中的预测一致性的信任估计方法,使用了训练一致性作为代理函数,并提出了一种一致性排名损失函数。
  • results: 在图像分类和分割任务上,我们的方法实现了状态之最的性能在信任估计中,并且在下游活动学任务中表现出了良好的效果。
    Abstract Overconfidence is a common issue for deep neural networks, limiting their deployment in real-world applications. To better estimate confidence, existing methods mostly focus on fully-supervised scenarios and rely on training labels. In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. We stipulate that even with limited training labels, we can still reasonably approximate the confidence of model on unlabeled samples by inspecting the prediction consistency through the training process. We use training consistency as a surrogate function and propose a consistency ranking loss for confidence estimation. On both image classification and segmentation tasks, our method achieves state-of-the-art performances in confidence estimation. Furthermore, we show the benefit of the proposed method through a downstream active learning task. The code is available at https://github.com/TopoXLab/consistency-ranking-loss
    摘要 常见的问题是深度神经网络的过度自信,限制它们在实际应用中的部署。现有方法主要集中在充分监督的enario下,通过训练标签来估算自信。在这篇论文中,我们提出了首个在半监督设置下进行自信估算的方法。我们认为,即使有限的训练标签,我们仍可以通过训练过程中的预测一致性来合理地估算模型对无标签样本的自信。我们使用训练一致性作为代理函数,并提出了一种一致排名损失来进行自信估算。在图像分类和 segmentation 任务上,我们的方法实现了状态的最佳性能。此外,我们还证明了我们的方法的优点通过下游活动学习任务。代码可以在 https://github.com/TopoXLab/consistency-ranking-loss 上获取。

POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities

  • paper_url: http://arxiv.org/abs/2307.10387
  • repo_url: None
  • paper_authors: Rui Wang, Sophokles Ktistakis, Siwei Zhang, Mirko Meboldt, Quentin Lohmeyer
    for:POV-Surgery is a large-scale, synthetic, egocentric dataset for pose estimation of hands with surgical gloves and three orthopedic surgical instruments.methods:The dataset consists of high-resolution RGB-D video streams with activity annotations, accurate 3D and 2D annotations for hand-object pose, and 2D hand-object segmentation masks.results:The dataset is fine-tuned for current state-of-the-art methods and is shown to be generalizable to real-life cases with surgical gloves and tools through extensive evaluations.Here is the simplified Chinese translation of the three key points:for:POV-Surgery 是一个大规模的合成 egocentric 数据集,用于手部位置估计,涉及到不同的手术手套和三种 orthopedic 手术工具。methods:数据集包括高分辨率 RGB-D 视频流,含有活动标注、精度的 3D 和 2D 手部位置标注,以及 2D 手部对象分割mask。results:数据集通过对当前 SOTA 方法进行 fine-tuning,并在实际术中使用手术手套和工具进行了广泛的评估,并显示了普适性。
    Abstract The surgical usage of Mixed Reality (MR) has received growing attention in areas such as surgical navigation systems, skill assessment, and robot-assisted surgeries. For such applications, pose estimation for hand and surgical instruments from an egocentric perspective is a fundamental task and has been studied extensively in the computer vision field in recent years. However, the development of this field has been impeded by a lack of datasets, especially in the surgical field, where bloody gloves and reflective metallic tools make it hard to obtain 3D pose annotations for hands and objects using conventional methods. To address this issue, we propose POV-Surgery, a large-scale, synthetic, egocentric dataset focusing on pose estimation for hands with different surgical gloves and three orthopedic surgical instruments, namely scalpel, friem, and diskplacer. Our dataset consists of 53 sequences and 88,329 frames, featuring high-resolution RGB-D video streams with activity annotations, accurate 3D and 2D annotations for hand-object pose, and 2D hand-object segmentation masks. We fine-tune the current SOTA methods on POV-Surgery and further show the generalizability when applying to real-life cases with surgical gloves and tools by extensive evaluations. The code and the dataset are publicly available at batfacewayne.github.io/POV_Surgery_io/.
    摘要 <>translate english text into simplified chineseThe surgical usage of Mixed Reality (MR) has received growing attention in areas such as surgical navigation systems, skill assessment, and robot-assisted surgeries. For such applications, pose estimation for hand and surgical instruments from an egocentric perspective is a fundamental task and has been studied extensively in the computer vision field in recent years. However, the development of this field has been impeded by a lack of datasets, especially in the surgical field, where bloody gloves and reflective metallic tools make it hard to obtain 3D pose annotations for hands and objects using conventional methods. To address this issue, we propose POV-Surgery, a large-scale, synthetic, egocentric dataset focusing on pose estimation for hands with different surgical gloves and three orthopedic surgical instruments, namely scalpel, friem, and diskplacer. Our dataset consists of 53 sequences and 88,329 frames, featuring high-resolution RGB-D video streams with activity annotations, accurate 3D and 2D annotations for hand-object pose, and 2D hand-object segmentation masks. We fine-tune the current SOTA methods on POV-Surgery and further show the generalizability when applying to real-life cases with surgical gloves and tools by extensive evaluations. The code and the dataset are publicly available at batfacewayne.github.io/POV_Surgery_io/.TRANSLATED TEXT:<>将英文文本翻译成简化中文混合现实(MR)在外科导航系统、技能评估和机器人助手手术等领域 receiving growing attention。在这些应用中,从自我中心 perspective 的手和外科工具的 pose 估计是一个基本任务,在计算机视觉领域已经进行了广泛的研究。然而,该领域的发展受到了 dataset 的缺乏,特别是在医疗领域, где 血肉手套和镜面金属工具使得通过 convential 方法获取手和对象的 3D pose 约注意过来。为解决这个问题,我们提出了 POV-Surgery,一个大规模、 sintetic、 egocentric 数据集, focuses on hand pose estimation with different surgical gloves and three orthopedic surgical instruments, namely scalpel, friem, and diskplacer。我们的数据集包括 53 个序列和 88,329 帧, featuring high-resolution RGB-D 视频流、活动约注意、手-对象 pose 的准确 3D 和 2D 约注意、以及手-对象分割mask。我们在 POV-Surgery 上练化当前 SOTA 方法,并通过实际应用中的评估表明其普适性。数据集和代码在 batfacewayne.github.io/POV_Surgery_io/ 上公共可用。

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

  • paper_url: http://arxiv.org/abs/2307.10373
  • repo_url: https://github.com/omerbt/TokenFlow
  • paper_authors: Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel
  • for: 这个研究旨在提高视频编辑的质量和用户对生成内容的控制,使用文本到图像扩散模型。
  • methods: 这个方法基于干扰特征的扩散特征空间的一致性来保证生成的视频质量和原始视频的空间布局和动作一致性。
  • results: 这个方法可以在多种实际视频上达到当今最佳的编辑效果,无需训练或调整。 Webpage: https://diffusion-tokenflow.github.io/
    Abstract The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. Specifically, given a source video and a target text-prompt, our method generates a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video. Our method is based on a key observation that consistency in the edited video can be obtained by enforcing consistency in the diffusion feature space. We achieve this by explicitly propagating diffusion features based on inter-frame correspondences, readily available in the model. Thus, our framework does not require any training or fine-tuning, and can work in conjunction with any off-the-shelf text-to-image editing method. We demonstrate state-of-the-art editing results on a variety of real-world videos. Webpage: https://diffusion-tokenflow.github.io/
    摘要 “文本驱动视频编辑的AI革命已经在视频方面扩展。然而,当前的状态艺术视频模型仍然落后于图像模型以视觉质量和用户对生成内容的控制。在这种工作中,我们提出了一种基于文本到图像扩散模型的文本驱动视频编辑框架。具体来说,我们给出了一个源视频和目标文本提示的情况下,生成高质量的视频,并且保持输入视频的空间布局和运动。我们的方法基于一个关键观察:在编辑视频时,可以通过强制在扩散特征空间中保持一致来确保编辑结果的质量。我们实现这一点通过显式地填充扩散特征基于间帧对应关系,这些关系Ready available在模型中。因此,我们的框架不需要任何训练或调整,可以与任何市场上的文本到图像编辑方法协作。我们在多种实际视频上达到了国际先进编辑效果。网页:https://diffusion-tokenflow.github.io/”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering

  • paper_url: http://arxiv.org/abs/2307.10173
  • repo_url: https://github.com/DNA-Rendering/DNA-Rendering
  • paper_authors: Wei Cheng, Ruixiang Chen, Wanqi Yin, Siming Fan, Keyu Chen, Honglin He, Huiwen Luo, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu Lin, Daxuan Ren, Lei Yang, Ziwei Liu, Chen Change Loy, Chen Qian, Wayne Wu, Dahua Lin, Bo Dai, Kwan-Yee Lin
  • for: 这个论文是为了提供一个大规模、高精度的人性化渲染数据集,以便进一步推动计算机视觉和计算机图形学领域的研究。
  • methods: 这个论文使用了一种大规模、高精度的人性化渲染数据集,并提供了丰富的人体特征数据,如人体keypoint、背景掩蔽图、SMPLX模型、衣物/配饰材料、多视图图像和视频等。这些数据资源有助于提高当前 методы的精度在下游渲染任务中。
  • results: 这个论文提供了一个大规模的、全面的人性化渲染数据集,包括1500名人类参与者、5000个运动序列和67.5万帧数据量。此外,论文还提供了一个专业的多视图捕捉系统,以获取高质量的数据资源用于任务训练和评估。
    Abstract Realistic human-centric rendering plays a key role in both computer vision and computer graphics. Rapid progress has been made in the algorithm aspect over the years, yet existing human-centric rendering datasets and benchmarks are rather impoverished in terms of diversity, which are crucial for rendering effect. Researchers are usually constrained to explore and evaluate a small set of rendering problems on current datasets, while real-world applications require methods to be robust across different scenarios. In this work, we present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering. DNA-Rendering presents several alluring attributes. First, our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume. Second, we provide rich assets for each subject -- 2D/3D human body keypoints, foreground masks, SMPLX models, cloth/accessory materials, multi-view images, and videos. These assets boost the current method's accuracy on downstream rendering tasks. Third, we construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps, ensuring high-quality resources for task training and evaluation. Along with the dataset, we provide a large-scale and quantitative benchmark in full-scale, with multiple tasks to evaluate the existing progress of novel view synthesis, novel pose animation synthesis, and novel identity rendering methods. In this manuscript, we describe our DNA-Rendering effort as a revealing of new observations, challenges, and future directions to human-centric rendering. The dataset, code, and benchmarks will be publicly available at https://dna-rendering.github.io/
    摘要 现代人像渲染在计算机视觉和计算机图形中扮演着关键性角色。随着时间的推移,算法方面的进步很快,但现有的人像渲染数据集和标准却很缺乏多样性,这些多样性是渲染效果的关键因素。研究人员通常只能在当前数据集上进行有限的探索和评估,而实际应用中需要的方法应该能够在不同的场景下展现出Robust性。在这项工作中,我们提出了DNA-Rendering,一个大规模、高精度的人像渲染数据集。DNA-Rendering具有以下吸引人的特点:首先,我们的数据集包含1500名人类素材,5000个动作序列,67.5万帧数据量。其次,我们为每名素材提供了丰富的资源,包括2D/3D人体关键点、背景掩蔽、SMPLX模型、衣物材料、多视图图像和视频。这些资源可以提高当前方法在下游渲染任务上的准确率。第三,我们构建了专业多视图捕捉系统,包括60个同步相机,最高分辨率为4096 x 3000,帧率为15,相机准备过程严格,以保证高质量的资源 для任务训练和评估。同时,我们还提供了全面的大规模量表标准,包括多个任务来评估现有的新视角合成、新姿势动画合成和新人脸渲染方法的进步。在这篇论文中,我们描述了DNA-Rendering的努力,并揭示了新观察、挑战和未来方向,以便人像渲染领域的发展。数据集、代码和标准将在https://dna-rendering.github.io/上公开。

Adversarial Latent Autoencoder with Self-Attention for Structural Image Synthesis

  • paper_url: http://arxiv.org/abs/2307.10166
  • repo_url: None
  • paper_authors: Jiajie Fan, Laure Vuaille, Hao Wang, Thomas Bäck
    for:SA-ALAE is proposed to facilitate industrial engineering processes, particularly in generating feasible design images of complex engineering parts.methods:SA-ALAE employs a novel Self-Attention Adversarial Latent Autoencoder architecture, which combines the strengths of adversarial training and latent space control to generate high-quality design images.results:SA-ALAE is demonstrated to generate engineering blueprints in a real automotive design task, showcasing its potential in efficient industrial design exploration and novel variant generation.
    Abstract Generative Engineering Design approaches driven by Deep Generative Models (DGM) have been proposed to facilitate industrial engineering processes. In such processes, designs often come in the form of images, such as blueprints, engineering drawings, and CAD models depending on the level of detail. DGMs have been successfully employed for synthesis of natural images, e.g., displaying animals, human faces and landscapes. However, industrial design images are fundamentally different from natural scenes in that they contain rich structural patterns and long-range dependencies, which are challenging for convolution-based DGMs to generate. Moreover, DGM-driven generation process is typically triggered based on random noisy inputs, which outputs unpredictable samples and thus cannot perform an efficient industrial design exploration. We tackle these challenges by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE), which allows generating feasible design images of complex engineering parts. With SA-ALAE, users can not only explore novel variants of an existing design, but also control the generation process by operating in latent space. The potential of SA-ALAE is shown by generating engineering blueprints in a real automotive design task.
    摘要 生成工程设计方法,受到深度生成模型(DGM)的推动,以便促进工程设计过程。在这些过程中,设计通常表示为图像,例如蓝图、工程图纸和CAD模型,具体程度因应用场景而异。DGM已成功应用于自然图像的合成,如显示动物、人脸和风景。然而,工程设计图像与自然场景有所不同,它们具有较为复杂的结构征特和长距离依赖关系,这使得核心变换基于DGM的生成过程困难。此外,DGM驱动的生成过程通常是基于随机噪声输入触发的,输出的样本随机且不可预测,因此无法实现有效的工程设计探索。我们解决这些挑战的方法是提出一种新模型,即Self-Attention Adversarial Latent Autoencoder(SA-ALAE)。SA-ALAE可以生成复杂工程部件的可行的设计图像。在一个真实的汽车设计任务中,SA-ALAE的潜在能力得到了证明。Simplified Chinese translation:生成工程设计方法,受到深度生成模型(DGM)的推动,以促进工程设计过程。在这些过程中,设计通常表示为图像,例如蓝图、工程图纸和CAD模型。DGM已成功应用于自然图像的合成,如显示动物、人脸和风景。然而,工程设计图像与自然场景有所不同,它们具有较为复杂的结构征特和长距离依赖关系。这使得核心变换基于DGM的生成过程困难。此外,DGM驱动的生成过程通常是基于随机噪声输入触发的,输出的样本随机且不可预测。我们解决这些挑战的方法是提出一种新模型,即Self-Attention Adversarial Latent Autoencoder(SA-ALAE)。SA-ALAE可以生成复杂工程部件的可行的设计图像。在一个真实的汽车设计任务中,SA-ALAE的潜在能力得到了证明。

Improving Multimodal Datasets with Image Captioning

  • paper_url: http://arxiv.org/abs/2307.10350
  • repo_url: None
  • paper_authors: Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt
  • for: 增强大型视觉语言模型的成功,需要大量的网络数据集。但是,原始网络数据具有噪音,现有的噪音筛选方法可能会导致数据多样性减少。本文强调caption质量作为噪音的一个主要来源,并研究如何使用生成的caption提高网络抽取的数据点 Utility。
  • methods: 通过不同的混合策略,将raw caption和生成的caption混合使用,以提高ImageNet和38个任务的性能。在128M图像文本对的候选人中,我们的最佳方法比DataCompbenchmark中提出的最佳策略提高2%和4%。我们的最佳方法还在Flickr和MS-COCO检索中比前一个最佳策略提高2倍。
  • results: 我们的实验表明,使用生成的caption可以提高多模态训练的性能。在不同的图像描述模型中,我们还发现了标准图像描述 benchmarks(如NoCaps CIDEr)中模型的性能不是一个可靠的指标,用于衡量生成的caption的用于多模态训练的效果。最后,我们在DataComp的大规模环境(1.28B图像文本对)中进行了实验,并提供了生成的caption在噪音下的局限性,以及图像淘汰的重要性,随着训练数据量的增加。
    Abstract Massive web datasets play a key role in the success of large vision-language models like CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to reduce noise often come at the expense of data diversity. Our work focuses on caption quality as one major source of noise, and studies how generated captions can increase the utility of web-scraped datapoints with nondescript text. Through exploring different mixing strategies for raw and generated captions, we outperform the best filtering method proposed by the DataComp benchmark by 2% on ImageNet and 4% on average across 38 tasks, given a candidate pool of 128M image-text pairs. Our best approach is also 2x better at Flickr and MS-COCO retrieval. We then analyze what makes synthetic captions an effective source of text supervision. In experimenting with different image captioning models, we also demonstrate that the performance of a model on standard image captioning benchmarks (e.g., NoCaps CIDEr) is not a reliable indicator of the utility of the captions it generates for multimodal training. Finally, our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text, as well as the importance of image curation with increasing training data quantity.
    摘要 庞大的网络数据集对大型视觉语言模型如CLIP和Flamingo的成功起到关键作用。然而,原始网络数据充满噪音,现有的过滤方法通常会导致数据多样性减少。我们的工作将注重caption质量作为噪音的一个主要来源,研究如何使用生成的caption提高网络抓取得分点中的不明文案。通过不同的混合策略来Raw和生成的caption,我们超过了DataComp benchmark中提出的最佳过滤方法,在ImageNet和38个任务中的平均提高2%和4%,给定候选池中的128M图像文本对。我们的最佳方法还在Flickr和MS-COCO检索中提高2倍。然后,我们分析了生成caption的有效性为多模态训练的文本指导。在不同的图像captioning模型中,我们也示出了标准图像captioning benchmark(例如NoCaps CIDEr)中模型的性能不是多模态训练中captions生成的性能的可靠指标。最后,我们在DataComp的大规模环境(1.28B图像文本对)中进行实验,发现生成的caption在大量训练数据量下有限制,同时图像筛选也是非常重要的。

Drone navigation and license place detection for vehicle location in indoor spaces

  • paper_url: http://arxiv.org/abs/2307.10165
  • repo_url: None
  • paper_authors: Moa Arvidsson, Sithichot Sawirot, Cristofer Englund, Fernando Alonso-Fernandez, Martin Torstensson, Boris Duran
    for: 这个研究旨在创建一个基于奈米潜在飞行器的解决方案,可以在停车场中探测车辆的位置。methods: 这个方案使用了墙跟踪算法和一个对探测车牌号的生物辨识器(CNN)进行探测。所有计算都在机器上进行,并将位置和探测到的图像发送到主机。results: 在八个测试案例中,这个解决方案能够成功遍历多列车辆,并实现了实时的位置探测。
    Abstract Millions of vehicles are transported every year, tightly parked in vessels or boats. To reduce the risks of associated safety issues like fires, knowing the location of vehicles is essential, since different vehicles may need different mitigation measures, e.g. electric cars. This work is aimed at creating a solution based on a nano-drone that navigates across rows of parked vehicles and detects their license plates. We do so via a wall-following algorithm, and a CNN trained to detect license plates. All computations are done in real-time on the drone, which just sends position and detected images that allow the creation of a 2D map with the position of the plates. Our solution is capable of reading all plates across eight test cases (with several rows of plates, different drone speeds, or low light) by aggregation of measurements across several drone journeys.
    摘要 每年有数百万辆车被运输,紧挨着在船或舟上运输。以降低相关的安全问题(如火灾)为目的,知道车辆的位置是非常重要,因为不同的车辆可能需要不同的缓解措施,比如电动车。这项工作旨在创造一种基于尺寸扫描仪的解决方案,该仪器通过沿着车辆行驶的方式探测车辆的车牌。我们使用了墙面跟踪算法和用于检测车牌的 convolutional neural network(CNN)进行检测。所有计算都在实时进行,扫描仪只需将位置和检测到的图像发送给地面站,以便在地面站上创建2D图像,包含车牌的位置。我们的解决方案可以在八个测试案例中读取所有车牌,包括不同的车辆行驶速度、低光照等情况下。

FABRIC: Personalizing Diffusion Models with Iterative Feedback

  • paper_url: http://arxiv.org/abs/2307.10159
  • repo_url: https://github.com/sd-fabric/fabric
  • paper_authors: Dimitri von Rütte, Elisabetta Fedele, Jonathan Thomm, Lukas Wolf
  • for: 这个研究旨在探讨diffusion-based文本到图像模型中如何integrate人类反馈,以提高用户体验和输出质量。
  • methods: 该研究提出了一种无需训练的方法,可以应用于多种流行的扩散模型,利用扩散过程中的自注意层来condition diffusion过程,并通过多轮反馈来逐渐改进生成结果。
  • results: 研究表明,通过多轮反馈,生成结果可以得到改进,并且可以适应用户的个性化需求。
    Abstract In an era where visual content generation is increasingly driven by machine learning, the integration of human feedback into generative models presents significant opportunities for enhancing user experience and output quality. This study explores strategies for incorporating iterative human feedback into the generative process of diffusion-based text-to-image models. We propose FABRIC, a training-free approach applicable to a wide range of popular diffusion models, which exploits the self-attention layer present in the most widely used architectures to condition the diffusion process on a set of feedback images. To ensure a rigorous assessment of our approach, we introduce a comprehensive evaluation methodology, offering a robust mechanism to quantify the performance of generative visual models that integrate human feedback. We show that generation results improve over multiple rounds of iterative feedback through exhaustive analysis, implicitly optimizing arbitrary user preferences. The potential applications of these findings extend to fields such as personalized content creation and customization.
    摘要 在机器学习驱动的视觉内容生成领域中,人类反馈的集成到生成模型中具有重要的可能性,以提高用户体验和输出质量。本研究探讨了把反馈图像纳入扩散型文本到图像模型的生成过程中的策略。我们提出了一种无需训练的方法,可以应用于广泛使用的扩散模型,利用最常用的架构中的自注意层来控制扩散过程,并通过一组反馈图像来进行条件。为了有系统地评估我们的方法,我们提出了一种完整的评估方法,可以准确评估生成视觉模型中的人类反馈。我们通过对多轮反馈的分析表明,通过多轮反馈,生成结果会得到改进,并在用户的Preferences中进行隐式优化。这些发现的应用领域包括个性化内容创作和自定义。

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

  • paper_url: http://arxiv.org/abs/2307.10157
  • repo_url: None
  • paper_authors: Javad Peymanfard, Vahid Saeedi, Mohammad Reza Mohammadi, Hossein Zeinali, Nasser Mozayani
  • for: lip reading task, 包括 speech recognition、human-computer interaction 和安全系统等
  • methods: 使用visemes(lip shape groups)提取更有特征和可靠的视频特征,以便进行更高精度的lip reading
  • results: 在word-level和 sentence-level lip reading任务以及Arman-AV数据集上的audiovisual speech recognition任务中,提出的方法都能够超越当前状态的方法,并且实现了9.1%的lip-reading word error rate(WER)下降。
    Abstract Lip reading is a challenging task that has many potential applications in speech recognition, human-computer interaction, and security systems. However, existing lip reading systems often suffer from low accuracy due to the limitations of video features. In this paper, we propose a novel approach that leverages visemes, which are groups of phonetically similar lip shapes, to extract more discriminative and robust video features for lip reading. We evaluate our approach on various tasks, including word-level and sentence-level lip reading, and audiovisual speech recognition using the Arman-AV dataset, a largescale Persian corpus. Our experimental results show that our viseme based approach consistently outperforms the state-of-theart methods in all these tasks. The proposed method reduces the lip-reading word error rate (WER) by 9.1% relative to the best previous method.
    摘要 叙述读 lips 是一项具有广泛应用前景的任务,包括语音识别、人机交互和安全系统。然而,现有的叙述读系统经常受到视频特征的限制,导致准确性较低。本文提出了一种新的方法,利用 viseme,即phonetically similar lip shapes的组合,提取更加特征化和Robust的视频特征来进行叙述读。我们在不同的任务上进行了评估,包括单词和句子层次的叙述读以及audiovisual speech recognition。我们的实验结果表明,我们的 viseme 基于方法在所有这些任务中均表现出优于之前最佳方法。我们的方法可以将叙述读 word error rate (WER) 降低9.1%。

An Improved NeuMIP with Better Accuracy

  • paper_url: http://arxiv.org/abs/2307.10135
  • repo_url: None
  • paper_authors: Bowen Xue, Shuang Zhao, Henrik Wann Jensen, Zahra Montazeri
  • for: 增强 neural reflectance 模型的精度和细节表现,特别是对高光散射材料的处理。
  • methods: 使用频谱编码输入数据,启用 NeRF 技术,并使用多阶段 gradient-based 损失函数来提高网络性能。
  • results: 通过多种synthetic和实际示例,证明了方法的有效性和精度,特别是对高光散射材料的处理。
    Abstract Neural reflectance models are capable of accurately reproducing the spatially-varying appearance of many real-world materials at different scales. However, existing methods have difficulties handling highly glossy materials. To address this problem, we introduce a new neural reflectance model which, compared with existing methods, better preserves not only specular highlights but also fine-grained details. To this end, we enhance the neural network performance by encoding input data to frequency space, inspired by NeRF, to better preserve the details. Furthermore, we introduce a gradient-based loss and employ it in multiple stages, adaptive to the progress of the learning phase. Lastly, we utilize an optional extension to the decoder network using the Inception module for more accurate yet costly performance. We demonstrate the effectiveness of our method using a variety of synthetic and real examples.
    摘要 神经反射模型可以准确地复制多种实际世界材料的空间分布变化。然而,现有方法在处理高光散材料时存在困难。为解决这个问题,我们介绍了一种新的神经反射模型,与现有方法相比,更好地保留不仅聚光点亮点,还有细节。为此,我们使用频谱空间编码输入数据,以更好地保留细节。此外,我们引入了Gradient-based损失函数,并在多个阶段适应学习阶段进行适应。最后,我们使用可选的扩展器网络使用Inception模块,以更加准确 yet costly的性能。我们通过多种 sintetic和实际例子示示了我们的方法的效果。

General vs. Long-Tailed Age Estimation: An Approach to Kill Two Birds with One Stone

  • paper_url: http://arxiv.org/abs/2307.10129
  • repo_url: None
  • paper_authors: Zenghao Bao, Zichang Tan, Jun Li, Jun Wan, Xibo Ma, Zhen Lei
  • for: 这个研究的目的是提出一个简单、有效、柔软的训练方法,以提高面部年龄估计的性能,同时能够优化头篇和尾篇类别的性能。
  • methods: 这个方法基于一个简单的内部组合函数,并运用了一个新的训练方法,名为GLAE。GLAE的训练方法包括一个内部组合函数和一个杜尼卡广泛函数。
  • results: 这个研究的结果显示,GLAE在Morph II上的MAE和CMAE分别为1.14年和1.27年,与之前最好的方法相比,MAE下降了34%,并且MAE接近1年。此外,GLAE在其他年龄评估数据集上也具有优秀的性能。
    Abstract Facial age estimation has received a lot of attention for its diverse application scenarios. Most existing studies treat each sample equally and aim to reduce the average estimation error for the entire dataset, which can be summarized as General Age Estimation. However, due to the long-tailed distribution prevalent in the dataset, treating all samples equally will inevitably bias the model toward the head classes (usually the adult with a majority of samples). Driven by this, some works suggest that each class should be treated equally to improve performance in tail classes (with a minority of samples), which can be summarized as Long-tailed Age Estimation. However, Long-tailed Age Estimation usually faces a performance trade-off, i.e., achieving improvement in tail classes by sacrificing the head classes. In this paper, our goal is to design a unified framework to perform well on both tasks, killing two birds with one stone. To this end, we propose a simple, effective, and flexible training paradigm named GLAE, which is two-fold. Our GLAE provides a surprising improvement on Morph II, reaching the lowest MAE and CMAE of 1.14 and 1.27 years, respectively. Compared to the previous best method, MAE dropped by up to 34%, which is an unprecedented improvement, and for the first time, MAE is close to 1 year old. Extensive experiments on other age benchmark datasets, including CACD, MIVIA, and Chalearn LAP 2015, also indicate that GLAE outperforms the state-of-the-art approaches significantly.
    摘要 人脸年龄估算Received a lot of attention due to its diverse application scenarios. Most existing studies treat each sample equally and aim to reduce the average estimation error for the entire dataset, which can be summarized as General Age Estimation. However, due to the long-tailed distribution prevalent in the dataset, treating all samples equally will inevitably bias the model towards the head classes (usually the adult with a majority of samples). Driven by this, some works suggest that each class should be treated equally to improve performance in tail classes (with a minority of samples), which can be summarized as Long-tailed Age Estimation. However, Long-tailed Age Estimation usually faces a performance trade-off, i.e., achieving improvement in tail classes by sacrificing the head classes. In this paper, our goal is to design a unified framework to perform well on both tasks, killing two birds with one stone. To this end, we propose a simple, effective, and flexible training paradigm named GLAE, which is two-fold. Our GLAE provides a surprising improvement on Morph II, reaching the lowest MAE and CMAE of 1.14 and 1.27 years, respectively. Compared to the previous best method, MAE dropped by up to 34%, which is an unprecedented improvement, and for the first time, MAE is close to 1 year old. Extensive experiments on other age benchmark datasets, including CACD, MIVIA, and Chalearn LAP 2015, also indicate that GLAE outperforms the state-of-the-art approaches significantly.

Two Approaches to Supervised Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.10123
  • repo_url: https://github.com/USTCPCS/CVPR2018_attention
  • paper_authors: Alexandre Benatti, Luciano da F. Costa
  • for: 图像分割 (image segmentation)
  • methods: 深度学习 (deep learning) 和多集 neurons 方法 (multiset neurons method)
  • results: 更高的准确率 (higher accuracy) with less computational resources using the multiset neurons method compared to deep learning.
    Abstract Though performed almost effortlessly by humans, segmenting 2D gray-scale or color images into respective regions of interest (e.g.~background, objects, or portions of objects) constitutes one of the greatest challenges in science and technology as a consequence of several effects including dimensionality reduction(3D to 2D), noise, reflections, shades, and occlusions, among many other possibilities. While a large number of interesting related approaches have been suggested along the last decades, it was mainly thanks to the recent development of deep learning that more effective and general solutions have been obtained, currently constituting the basic comparison reference for this type of operation. Also developed recently, a multiset-based methodology has been described that is capable of encouraging image segmentation performance combining spatial accuracy, stability, and robustness while requiring little computational resources (hardware and/or training and recognition time). The interesting features of the multiset neurons methodology mostly follow from the enhanced selectivity and sensitivity, as well as good robustness to data perturbations and outliers, allowed by the coincidence similarity index on which the multiset approach to supervised image segmentation is founded. After describing the deep learning and multiset neurons approaches, the present work develops comparison experiments between them which are primarily aimed at illustrating their respective main interesting features when applied to the adopted specific type of data and parameter configurations. While the deep learning approach confirmed its potential for performing image segmentation, the alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
    摘要 人类可以几乎无需努力地 segmenting 2D灰度或彩色图像为不同的区域关注(例如背景、物体或物体部分)是科学和技术中的一大挑战,这主要归因于数维度减少(3D到2D)、噪声、反射、阴影和遮挡等多种因素。过去几十年,有很多有趣的相关方法被提出,但是由于深度学习的发展,目前可以获得更有效和通用的解决方案。此外,最近开发的多集合方法可以提高图像分割性能,同时具有空间准确性、稳定性和鲁棒性,但需要少量计算资源(硬件和训练和识别时间)。多集合神经元方法的主要优点是增强选择性和敏感度,以及对数据异常和异常值的好robustness。在描述深度学习和多集合神经元方法后,本工作进行了这两种方法之间的比较实验,以 Illustrate它们在采用的特定数据和参数配置下的主要有趣特点。虽然深度学习方法表现出图像分割的潜力,但是对于特定数据和参数配置,替代的多集合方法可以提高准确性。

Boundary-Refined Prototype Generation: A General End-to-End Paradigm for Semi-Supervised Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2307.10097
  • repo_url: https://github.com/djh-dzxw/BRPG
  • paper_authors: Junhao Dong, Zhu Meng, Delong Liu, Zhicheng Zhao, Fei Su
  • for: 本研究主要针对 semi-supervised semantic segmentation 领域,提出了一种基于 Prototype-based Classification 的新方法,以提高现有方法的性能。
  • methods: 本方法使用了一种名为 boundary-refined prototype generation (BRPG) 的新方法,通过将高信任特征和低信任特征分别归一类,生成更加接近类划的原型。此外,还提出了一种 adaptive prototype optimization 策略,以适应分布在不同类划上的特征分布。
  • results: 在 PASCAL VOC 2012 和 Cityscapes datasets 上进行了广泛的实验,显示了该方法的优越性和扩展性,比对当前状态的方法更高效。
    Abstract Prototype-based classification is a classical method in machine learning, and recently it has achieved remarkable success in semi-supervised semantic segmentation. However, the current approach isolates the prototype initialization process from the main training framework, which appears to be unnecessary. Furthermore, while the direct use of K-Means algorithm for prototype generation has considered rich intra-class variance, it may not be the optimal solution for the classification task. To tackle these problems, we propose a novel boundary-refined prototype generation (BRPG) method, which is incorporated into the whole training framework. Specifically, our approach samples and clusters high- and low-confidence features separately based on a confidence threshold, aiming to generate prototypes closer to the class boundaries. Moreover, an adaptive prototype optimization strategy is introduced to make prototype augmentation for categories with scattered feature distributions. Extensive experiments on the PASCAL VOC 2012 and Cityscapes datasets demonstrate the superiority and scalability of the proposed method, outperforming the current state-of-the-art approaches. The code is available at xxxxxxxxxxxxxx.
    摘要 proto-based 分类是一种经典的机器学习方法,最近在半指导式semantic segmentation中表现出了很好的成果。然而,现行方法在初始化prototype过程中封闭了主要训练框架,这似乎是不必要的。此外,直接使用K-Means算法生成prototype可能并不是分类任务的最佳解决方案。为了解决这些问题,我们提出了一种新的边缘精制prototype生成(BRPG)方法,它在整个训练框架中集成。具体来说,我们根据信任度分类特征高低分开,并将它们分别样本和聚类,以便在类boundary附近生成prototype。此外,我们还引入了一种适应prototype优化策略,以便对分布在类boundary处的类进行prototype更新。我们在PASCAL VOC 2012和Cityscapes数据集上进行了广泛的实验,并证明了我们提出的方法的优越性和可扩展性,超过当前状态的术语。代码可以在xxxxxxxxxxxxx上找到。

Make-A-Volume: Leveraging Latent Diffusion Models for Cross-Modality 3D Brain MRI Synthesis

  • paper_url: http://arxiv.org/abs/2307.10094
  • repo_url: None
  • paper_authors: Lingting Zhu, Zeyue Xue, Zhenchao Jin, Xian Liu, Jingzhen He, Ziwei Liu, Lequan Yu
    for:这 paper 的目的是提出一种新的方法 для医学影像合成,使得可以更好地处理不同模态的医学影像。methods:这 paper 使用了 diffusion-based 框架,并使用了 latent diffusion 模型来学习 slice-wise 映射。results:这 paper 的实验结果表明,其方法可以更好地Synthesize 3D 医学影像,并且可以保持volumetric consistency。
    Abstract Cross-modality medical image synthesis is a critical topic and has the potential to facilitate numerous applications in the medical imaging field. Despite recent successes in deep-learning-based generative models, most current medical image synthesis methods rely on generative adversarial networks and suffer from notorious mode collapse and unstable training. Moreover, the 2D backbone-driven approaches would easily result in volumetric inconsistency, while 3D backbones are challenging and impractical due to the tremendous memory cost and training difficulty. In this paper, we introduce a new paradigm for volumetric medical data synthesis by leveraging 2D backbones and present a diffusion-based framework, Make-A-Volume, for cross-modality 3D medical image synthesis. To learn the cross-modality slice-wise mapping, we employ a latent diffusion model and learn a low-dimensional latent space, resulting in high computational efficiency. To enable the 3D image synthesis and mitigate volumetric inconsistency, we further insert a series of volumetric layers in the 2D slice-mapping model and fine-tune them with paired 3D data. This paradigm extends the 2D image diffusion model to a volumetric version with a slightly increasing number of parameters and computation, offering a principled solution for generic cross-modality 3D medical image synthesis. We showcase the effectiveness of our Make-A-Volume framework on an in-house SWI-MRA brain MRI dataset and a public T1-T2 brain MRI dataset. Experimental results demonstrate that our framework achieves superior synthesis results with volumetric consistency.
    摘要 医疗影像合成是一个关键话题,它具有潜在的推广应用前景在医疗影像领域。 despite recent successes in deep learning-based generative models, most current medical image synthesis methods rely on generative adversarial networks and suffer from notorious mode collapse and unstable training. Moreover, the 2D backbone-driven approaches would easily result in volumetric inconsistency, while 3D backbones are challenging and impractical due to the tremendous memory cost and training difficulty.在这篇论文中,我们提出了一种新的思路,利用2D backbone来实现医疗数据三维合成。我们提出了一个扩展了2D影像扩散模型的框架,称之为Make-A-Volume。为了学习交叉模态的slice-wise映射,我们采用了一个缓动模型,学习一个低维度的 latent space,从而实现高效计算。为了实现3D图像合成并避免Volumetric Inconsistency,我们进一步插入了一系列的volumetric层到2D slice-mapping模型中,并对它们进行了对应的3D数据的微调。这种思路将2D影像扩散模型扩展到了三维版本,只需要一些增加参数和计算量,提供了一个原理性的解决方案 для通用的交叉模态3D医疗影像合成。我们在一个自有的SWI-MRA脑MRI数据集和一个公共的T1-T2脑MRI数据集上展示了我们的Make-A-Volume框架的效果,实验结果表明我们的框架可以 achieve superior synthesis results with volumetric consistency。

cs.AI - 2023-07-20

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

  • paper_url: http://arxiv.org/abs/2307.11128
  • repo_url: None
  • paper_authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris
  • for: 本文旨在探讨近似计算的应用和技术,以提高计算系统的能效性和性能。
  • methods: 本文使用了多种应用特定和体系结构的近似技术,包括系统下的硬件和软件近似技术。
  • results: 本文对近似计算的应用谱和技术进行了全面的检讨和分析,并提出了未来研究的挑战和方向。
    Abstract The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.
    摘要 “ computationally intensive 应用程序(如人工智能和数位信号处理)的部署困难,迫使计算系统社群探索新的设计方法。粗略计算被视为一种emerging solution,允许在系统设计中调整结果质量,以提高能效性和/或性能。这个崭新的思维方式在学术和业界都引起了广泛关注,并且在不同的设计层(从系统到集成电路)进行了大量的研究。惊叹于过去十年内 aproximate computing 的广泛吸引力,我们将进行两部分的调查,涵盖关键方面(如术语和应用),并回顾各层设计的粗略技术。在第二部分中,我们分类和详细介绍了特定应用和建筑方面的粗略技术,包括资源有效的处理器/加速器和系统的设计。此外,我们也进行了综合的应用spectrum 分析,讨论开启的挑战和未来方向。”

PE-YOLO: Pyramid Enhancement Network for Dark Object Detection

  • paper_url: http://arxiv.org/abs/2307.10953
  • repo_url: https://github.com/xiangchenyin/pe-yolo
  • paper_authors: Xiangchen Yin, Zhenda Yu, Zetao Fei, Wenjun Lv, Xin Gao
  • for: 提高黑暗环境下物体检测的精度和效率
  • methods: 使用 Laplacian Pyramid decomposes 图像,并提出了细节处理模块 (DPM) 和低频增强筛 (LEF) 来增强图像细节和低频semantics,并采用端到端结合训练方法和normal detection loss 进行训练。
  • results: 在 ExDark 数据集上进行测试,PE-YOLO 在黑暗环境下物体检测中达到了 78.0% 的 mAP 和 53.6 的 FPS,较其他黑暗检测器和低光照增强模型更高。
    Abstract Current object detection models have achieved good results on many benchmark datasets, detecting objects in dark conditions remains a large challenge. To address this issue, we propose a pyramid enhanced network (PENet) and joint it with YOLOv3 to build a dark object detection framework named PE-YOLO. Firstly, PENet decomposes the image into four components of different resolutions using the Laplacian pyramid. Specifically we propose a detail processing module (DPM) to enhance the detail of images, which consists of context branch and edge branch. In addition, we propose a low-frequency enhancement filter (LEF) to capture low-frequency semantics and prevent high-frequency noise. PE-YOLO adopts an end-to-end joint training approach and only uses normal detection loss to simplify the training process. We conduct experiments on the low-light object detection dataset ExDark to demonstrate the effectiveness of ours. The results indicate that compared with other dark detectors and low-light enhancement models, PE-YOLO achieves the advanced results, achieving 78.0% in mAP and 53.6 in FPS, respectively, which can adapt to object detection under different low-light conditions. The code is available at https://github.com/XiangchenYin/PE-YOLO.
    摘要 当前的对象检测模型在许多标准数据集上已经实现了好的结果,但检测对象在黑暗环境中仍然是一大挑战。为解决这个问题,我们提出了卷积扩展网络(PENet),并与YOLOv3结合以建立一个适用于黑暗对象检测的框架,称为PE-YOLO。首先,PENet将图像分解成四个不同的分辨率的组件使用卷积 pyramid。我们提出了一个细节处理模块(DPM),用于增强图像的细节,该模块包括上下文分支和边分支。此外,我们还提出了一个低频增强筛选器(LEF),用于捕捉低频 semantics 并避免高频噪声。PE-YOLO采用了端到端集成训练方法,并只使用 normal detection loss 简化训练过程。我们在 ExDark 降霾对象检测数据集上进行了实验,结果表明,与其他黑暗检测器和低照明增强模型相比,PE-YOLO 实现了更高的 mAP 和 FPS 结果,分别达到 78.0% 和 53.6%。代码可以在 GitHub 上找到:https://github.com/XiangchenYin/PE-YOLO。

Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery

  • paper_url: http://arxiv.org/abs/2307.10943
  • repo_url: None
  • paper_authors: Hyungmin Kim, Sungho Suh, Daehwan Kim, Daun Jeong, Hansang Cho, Junmo Kim
  • for: 提出了一种无监督的类增长学习方法,用于在不具备先知知识的情况下,精确地找到 novel 类别。
  • methods: 方法基于 feature extractor 和 proxy anchors,首先在已知集上精度地调整 feature extractor,然后将样本分为 old 和 novel 类别,并在无监督集上进行 clustering。同时,使用 proxy anchors-based exemplar 来避免 catastrophic forgetting。
  • results: 实验结果表明,提出的方法在细化 datasets 上下适用于实际情况,并且超过了当前状态的方法。
    Abstract Recent advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets and prior knowledge about the number of novel categories and the proportion of novel samples in the batch. To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. The proposed method fine-tunes the feature extractor and proxy anchors on labeled sets, then splits samples into old and novel categories and clusters on the unlabeled dataset. Furthermore, the proxy anchors-based exemplar generates representative category vectors to mitigate catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art methods on fine-grained datasets under real-world scenarios.
    摘要 Translated into Simplified Chinese:latest advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets and prior knowledge about the number of novel categories and the proportion of novel samples in the batch. To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. The proposed method fine-tunes the feature extractor and proxy anchors on labeled sets, then splits samples into old and novel categories and clusters on the unlabeled dataset. Furthermore, the proxy anchors-based exemplar generates representative category vectors to mitigate catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art methods on fine-grained datasets under real-world scenarios.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese dialects. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that dialect as well.

PASTA: Pretrained Action-State Transformer Agents

  • paper_url: http://arxiv.org/abs/2307.10936
  • repo_url: None
  • paper_authors: Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot
  • for: 这个论文的目的是研究基于自动学习的RL探索方法,并提供一种基于transformer模型的模型PASTA,用于解决多种下游任务。
  • methods: 这个论文使用了一种统一的方法ology,包括使用各种基本的预训练目标,如下一个token预测,并在多个领域同时训练模型。
  • results: 这个论文的研究显示,使用PASTA模型可以在多种下游任务上达到优秀的性能,并且可以在不同的领域中进行参数效率的微调。
    Abstract Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.
    摘要 自顾学学习在不同的计算领域中引发了一场革命性的思维方式转移,包括自然语言处理、视觉和生物学。现有的方法通常是在大量无标签数据上预训练变换器模型,作为下游任务解决的起点。在回归学领域,研究人员已经采用了这些方法,开发了基于专家轨迹预训练的模型,以解决从机器人到推荐系统等广泛的任务。然而,现有的方法大多采用特定下游应用程序的复杂预训练目标。本文提出了一种统一的方法ologies,涵盖了广泛的通用下游任务,包括行为做模拟、离线RL、感知器失效鲁棒性和动力学变化适应。我们的目标是系统地比较不同的设计选择,提供价值的反思和建议给实践者。关键的高亮之处包括动作和状态层次化,使用基本预训练目标如下一个符号预测,在多个领域同时训练模型,以及使用参数高效缩放(PEFT)。我们的研究中的模型含 fewer than 10 million parameters,并且通过PEFT来缩放 fewer than 10,000 parameters During downstream adaptation,使得广泛的社区可以使用这些模型和复制我们的实验。我们希望这种研究能够鼓励更多的研究人员采用基于first principles的设计选择来表示RL轨迹,并为稳定政策学习做出贡献。

Identical and Fraternal Twins: Fine-Grained Semantic Contrastive Learning of Sentence Representations

  • paper_url: http://arxiv.org/abs/2307.10932
  • repo_url: None
  • paper_authors: Qingfa Xiao, Shuangyin Li, Lei Chen
  • for: 提高不监督学习的句子表示学习效果
  • methods: 提出一种新的同类双胞胎学习框架(IFTCL),能够同时适应不同的增强技术生成的正例对
  • results: IFTCL在九个语义文本相似性任务中表现出色,比前方法更高效和有效。
    Abstract The enhancement of unsupervised learning of sentence representations has been significantly achieved by the utility of contrastive learning. This approach clusters the augmented positive instance with the anchor instance to create a desired embedding space. However, relying solely on the contrastive objective can result in sub-optimal outcomes due to its inability to differentiate subtle semantic variations between positive pairs. Specifically, common data augmentation techniques frequently introduce semantic distortion, leading to a semantic margin between the positive pair. While the InfoNCE loss function overlooks the semantic margin and prioritizes similarity maximization between positive pairs during training, leading to the insensitive semantic comprehension ability of the trained model. In this paper, we introduce a novel Identical and Fraternal Twins of Contrastive Learning (named IFTCL) framework, capable of simultaneously adapting to various positive pairs generated by different augmentation techniques. We propose a \textit{Twins Loss} to preserve the innate margin during training and promote the potential of data enhancement in order to overcome the sub-optimal issue. We also present proof-of-concept experiments combined with the contrastive objective to prove the validity of the proposed Twins Loss. Furthermore, we propose a hippocampus queue mechanism to restore and reuse the negative instances without additional calculation, which further enhances the efficiency and performance of the IFCL. We verify the IFCL framework on nine semantic textual similarity tasks with both English and Chinese datasets, and the experimental results show that IFCL outperforms state-of-the-art methods.
    摘要 “对不同增强学习方法的自动识别学习优化有很大进步,特别是通过对比学习。这种方法将调整后的正例和anchor例集 clustering以创建愿景空间。但是,仅仅靠对比目标无法区别微妙的Semantic variation between positive pairs,导致训练时的低效。具体来说,常用的数据增强技术会导致Semantic distortion,从而导致正例之间的Semantic margin。而InfoNCE损失函数忽略这个Semantic margin,将主要关注在正例之间的相似性最大化,从而导致训练后的模型无法深入理解Semantic的问题。在这篇文章中,我们提出了一个名为Identical and Fraternal Twins of Contrastive Learning(简称IFCL)的新框架,可以同时适应不同的增强技术生成的正例。我们提出了一个名为Twins Loss的损失函数,以保持内在的Semantic margin during training,并且提高数据增强的可能性,以解决低效的问题。我们还提出了一个 hippocampus queue 机制,可以复原和重复使用负例而不需要额外计算,这有助于提高效率和表现。我们在九个semantic textual similarity任务上验证了IFCL框架,结果显示IFCL在训练后比州前方法更高效。”

MediaGPT : A Large Language Model For Chinese Media

  • paper_url: http://arxiv.org/abs/2307.10930
  • repo_url: None
  • paper_authors: Zhonghao Wang, Zijia Lu, Bo Jin, Haiying Deng
  • for: 这个论文主要针对中文媒体领域的语言模型(LLMs)进行研究,旨在开发一个专门为中文媒体领域设计的语言模型(MediaGPT)。
  • methods: 这篇论文使用了多种任务类型和域定义提示类型,并在这些任务和提示类型的基础上训练了MediaGPT模型。
  • results: 根据专家评估和强模型评估,这篇论文证明了MediaGPT模型在多种中文媒体领域任务上表现出色,并证明了域数据和域定义提示类型对于建立有效的域特定LLM的重要性。
    Abstract Large language models (LLMs) have shown remarkable capabilities in generating high-quality text and making predictions based on large amounts of data, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. This paper examines the unique characteristics of media-domain-specific LLMs compared to general LLMs, designed a diverse set of task instruction types to cater the specific requirements of the domain and constructed unique datasets that are tailored to the media domain. Based on these, we proposed MediaGPT, a domain-specific LLM for the Chinese media domain, training by domain-specific data and experts SFT data. By performing human experts evaluation and strong model evaluation on a validation set, this paper demonstrated that MediaGPT outperforms mainstream models on various Chinese media domain tasks and verifies the importance of domain data and domain-defined prompt types for building an effective domain-specific LLM.
    摘要 大型语言模型(LLM)已经展示了优质文本生成和基于大量数据的预测能力,包括媒体领域。然而,在实际应用中,媒体使用场景和通用应用场景中LML的不同特点在增加显著,特别是中文。本文研究媒体领域专门的LML与通用LML的特点相比,设计了具有媒体领域特定需求的多种任务指令类型,并构建了适应媒体领域的唯一数据集。基于这些,我们提出了MediaGPT,专门为中文媒体领域的领域特定LML,通过媒体领域特定的数据和专家SFT数据进行训练。通过专家评估和强模型评估在验证集上,本文证明了MediaGPT在多种中文媒体领域任务上表现出色,并证明了领域数据和领域定义的提示类型的重要性于建立有效的领域特定LML。

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

  • paper_url: http://arxiv.org/abs/2307.10928
  • repo_url: https://github.com/kaistai/flask
  • paper_authors: Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo
  • for: 本研究旨在提供一种细化评估语言模型(LLMs)的方法,以更好地评估LLMs的表现,特别是在多个技能的组合下。
  • methods: 本研究使用的方法包括定义12种细化技能,以及根据用户指令的域名和难度水平来分配这些技能。此外,研究还使用了模型基于和人类基于的评估方法。
  • results: 研究发现,使用FLASK评估协议可以更好地评估LLMs的表现,并且可以分析模型在特定技能、域名和难度水平上的表现。此外,研究还发现了多种开源和商业LLMs之间的高度相关性。
    Abstract Evaluation of Large Language Models (LLMs) is challenging because aligning to human values requires the composition of multiple skills and the required set of skills varies depending on the instruction. Recent studies have evaluated the performance of LLMs in two ways, (1) automatic evaluation on several independent benchmarks and (2) human or machined-based evaluation giving an overall score to the response. However, both settings are coarse-grained evaluations, not considering the nature of user instructions that require instance-wise skill composition, which limits the interpretation of the true capabilities of LLMs. In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment SKill Sets), a fine-grained evaluation protocol that can be used for both model-based and human-based evaluation which decomposes coarse-level scoring to an instance-wise skill set-level. Specifically, we define 12 fine-grained skills needed for LLMs to follow open-ended user instructions and construct an evaluation set by allocating a set of skills for each instance. Additionally, by annotating the target domains and difficulty level for each instance, FLASK provides a holistic view with a comprehensive analysis of a model's performance depending on skill, domain, and difficulty. Through using FLASK, we compare multiple open-sourced and proprietary LLMs and observe highly-correlated findings between model-based and human-based evaluations. FLASK enables developers to more accurately measure the model performance and how it can be improved by analyzing factors that make LLMs proficient in particular skills. For practitioners, FLASK can be used to recommend suitable models for particular situations through comprehensive comparison among various LLMs. We release the evaluation data and code implementation at https://github.com/kaistAI/FLASK.
    摘要 评估大型自然语言模型(LLM)具有挑战性,因为需要考虑多种技能的组合,并且这些技能的集合可以根据指令而变化。现有研究通过两种方法评估LLM的性能:一是自动评估多个独立的标准 benchmark,二是人或机器基于的全面评分。然而,这两种设置都是粗粒度的评估,不能考虑用户的指令需要实例化的技能组合,这限制了LLM的真实能力的解释。在这篇论文中,我们提出了FLASK(细化语言模型评估基于配对技能集)协议,可以用于模型基于和人类基于的评估,它将粗粒度评估 decomposes 到实例化技能集级别。具体来说,我们定义了LLM需要遵循开放式用户指令的12种细化技能,并构建了评估集,各实例分配了一组技能。此外,我们还对每个实例标注目标领域和难度水平,从而提供了全面分析模型性能的整体视图。通过使用FLASK,我们对多种开源和商业LLM进行比较,并发现了高度相关的发现。FLASK可以帮助开发者更准确地评估模型性能,并分析因素使得LLM在特定技能方面突出。对于实践者,FLASK可以用来建议适用于特定情况的模型,通过对多种LLM进行全面比较。我们在github上发布了评估数据和代码实现。

Syntactic vs Semantic Linear Abstraction and Refinement of Neural Networks

  • paper_url: http://arxiv.org/abs/2307.10891
  • repo_url: https://github.com/cxlvinchau/linna
  • paper_authors: Calvin Chau, Jan Křetínský, Stefanie Mohr
  • for: 提高神经网络的可扩展性
  • methods: 使用抽象技术进行神经网络的减少
  • results: 提供了一个更 flexible的抽象框架,并通过实验证明了它的效果
    Abstract Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.
    摘要 吧抽象是一种关键的验证技术,可以提高神经网络的可扩展性。然而,它们在神经网络上的使用尚未得到广泛应用。先前的方法是将一些神经元替换为它们相似的一个,这种相似性可以是语法性(基于连接 между神经元的量)或semantic(基于神经元对各种输入的活动值)。 unfortunately,这些先前的方法只能实现moderate的减少,甚至不得不实现。在这种工作中,我们提供了一个更灵活的框架,允许一个神经元被替换为一个线性组合其他神经元,从而提高减少。我们应用这种方法both on syntactic和semantic abstractions,并实现和评估它们。此外,我们还引入了一种精度调整方法,allowing for finding a better balance between reduction and precision。

Divide & Bind Your Attention for Improved Generative Semantic Nursing

  • paper_url: http://arxiv.org/abs/2307.10864
  • repo_url: None
  • paper_authors: Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva
  • for: 提高文本到图像生成模型的可靠性和精度,特别是处理复杂的提示语时。
  • methods: 提出了两个新的损失函数:novel attendance loss和binding loss,用于提高生成的对象的特征匹配和精度。
  • results: 在多个评估标准上显示出优于现有方法的性能,能够准确地生成desired对象并改进特征匹配。
    Abstract Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.
    摘要 新型大规模文本至图生成模型,如稳定扩散(SD),已经展现出惊人的成绩,高度准确。 DESPITE THE AMAZING PROGRESS, current state-of-the-art models still struggle to generate images that fully adhere to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate semantics. It has shown promising results in generating simple prompts, such as "a cat and a dog". However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding.To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page [https://sites.google.com/view/divide-and-bind].

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

  • paper_url: http://arxiv.org/abs/2307.10846
  • repo_url: None
  • paper_authors: Zhifeng Qian, Mingyu You, Hongjun Zhou, Xuanhui Xu, Bin He
  • for: 解决高维状态空间中的远程目标问题,提高目标Conditioned Reinforcement Learning(GCRL)的效率和性能。
  • methods: 提出了一种组合了目标Conditioned Reinforcement Learning(GCRL)和分解性能规划(REPlan)的算法,包括一种分解表示模块(DRM)和一种 temporal reachability discrimination module(REM),以解决高维状态空间中的远程目标问题。
  • results: 在三个视觉基于的模拟任务和一个真实世界任务中,对比PRIOR的方法,OUR的REPlan显示了明显的优异性能。
    Abstract Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spontaneously set diverse goals to learn a set of skills. Despite the excellent works proposed in various fields, reaching distant goals in temporally extended tasks remains a challenge for GCRL. Current works tackled this problem by leveraging planning algorithms to plan intermediate subgoals to augment GCRL. Their methods need two crucial requirements: (i) a state representation space to search valid subgoals, and (ii) a distance function to measure the reachability of subgoals. However, they struggle to scale to high-dimensional state space due to their non-compact representations. Moreover, they cannot collect high-quality training data through standard GC policies, which results in an inaccurate distance function. Both affect the efficiency and performance of planning and policy learning. In the paper, we propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks. In REPlan, a Disentangled Representation Module (DRM) is proposed to learn compact representations which disentangle robot poses and object positions from high-dimensional observations in a self-supervised manner. A simple REachability discrimination Module (REM) is also designed to determine the temporal distance of subgoals. Moreover, REM computes intrinsic bonuses to encourage the collection of novel states for training. We evaluate our REPlan in three vision-based simulation tasks and one real-world task. The experiments demonstrate that our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.
    摘要 goal-conditioned 学习(GCRL)可以让代理人自发设定多种目标,以学习一组技能。然而,在远程目标的任务中,GCRL仍然面临一个挑战。现有的方法通过使用规划算法计划间接产生的目标,以增强GCRL的能力。这些方法需要两个关键的要求:(i)一个状态表示空间来搜索有效的产生目标,以及(ii)一个距离函数来衡量产生目标的可达性。然而,它们在高维状态空间中缺乏扩展性,而且无法通过标准GC策略收集高质量的训练数据,这会导致距离函数的误差。这两个因素都会影响规划和策略学习的效率和性能。在本文中,我们提出一种具有目标条件的RL算法,并与基于分解的可达性规划(REPlan)结合,以解决时间扩展任务。在REPlan中,我们提出了一个分解表示模块(DRM),用于在自主学习方式下学习高维观察数据中的紧凑表示,并分解机器人姿态和物体位置。我们还设计了一个简单的REachability推理模块(REM),用于确定产生目标的时间距离。此外,REM还计算了内在的资金奖励,以鼓励收集训练数据。我们在三个视觉基础任务和一个实际任务中评估了我们的REPlan。实验结果显示,我们的REPlan在解决时间扩展任务方面明显超越了先前的状态艺术方法。

Modifications of the Miller definition of contrastive (counterfactual) explanations

  • paper_url: http://arxiv.org/abs/2307.10832
  • repo_url: None
  • paper_authors: Kevin McAreavey, Weiru Liu
  • for: 本研究旨在探讨贝尔纳和珀尔(HP)定义的对比(counterfactual)解释,以及迈尔(Miller)的定义是基于HP定义的修改后的版本。
  • methods: 本研究使用了HP定义的修改后的版本和迈尔定义,以及两种改进后的定义。
  • results: 本研究显示,迈尔定义继承了HP定义的问题,而我们提出的两种改进后的定义可以解决这些问题,同时保持与迈尔定义的精神。
    Abstract Miller recently proposed a definition of contrastive (counterfactual) explanations based on the well-known Halpern-Pearl (HP) definitions of causes and (non-contrastive) explanations. Crucially, the Miller definition was based on the original HP definition of explanations, but this has since been modified by Halpern; presumably because the original yields counterintuitive results in many standard examples. More recently Borner has proposed a third definition, observing that this modified HP definition may also yield counterintuitive results. In this paper we show that the Miller definition inherits issues found in the original HP definition. We address these issues by proposing two improved variants based on the more robust modified HP and Borner definitions. We analyse our new definitions and show that they retain the spirit of the Miller definition where all three variants satisfy an alternative unified definition that is modular with respect to an underlying definition of non-contrastive explanations. To the best of our knowledge this paper also provides the first explicit comparison between the original and modified HP definitions.
    摘要 美利尔最近提出了一种对比(Counterfactual)解释定义,基于著名的哈勒普-珀尔(HP)解释和非对比解释定义。关键是,美利尔定义基于原始HP解释定义,但是这已经被 modificado 由哈勒普,因为原始定义在许多标准例子中会导致counterintuitive结果。更 recientemente,博erner也提出了第三个定义,认为这 modificado HP定义也可能导致counterintuitive结果。在这篇论文中,我们示出了美利尔定义中的问题,并提出了两种改进的变体,基于更加robust的修改后HP定义和博erner定义。我们分析了我们的新定义,并证明它们保留了美利尔定义的灵魂,其中所有三个变体都满足一个备用的 reunified定义,该定义是对非对比解释定义的模块化。到目前为止,这篇论文也是第一篇明确地比较原始HP定义和修改后HP定义的论文。

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

  • paper_url: http://arxiv.org/abs/2307.11784
  • repo_url: None
  • paper_authors: Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao
  • for: 本研究旨在解决机器学习技术在安全关键领域中能够可靠地应用的挑战。
  • methods: 本研究提出了一种两步验证方法,以实现可证明的统计 garantía。
  • results: 研究发现现有的方法无法实际实现可证明的 garantía,提出了一种新的验证方法以实现这一目标。
    Abstract Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.
    摘要 机器学习技术已经取得了很大的进步,但在安全关键领域使用学习启用的组件仍然存在挑战。其中一个最大的挑战是实现安全保证的方法。在这篇论文中,我们首先讨论了在设计和验证这些系统方面的工程和研究挑战。然后,根据现有的工作不能实现可证明的保证的观察,我们提出了一种两步验证方法以实现可证明的统计保证。Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".Here's the translation of the text into Traditional Chinese:机器学习技术已经取得了很大的进步,但在安全关键领域使用学习启用的 компонент仍然存在挑战。其中一个最大的挑战是实现安全保证的方法。在这篇论文中,我们首先讨论了在设计和验证这些系统方面的工程和研究挑战。然后,根据现有的工作不能实现可证明的保证的观察,我们提出了一种两步验证方法以实现可证明的统计保证。Note: Traditional Chinese is also known as "Formal Chinese" or "Classical Chinese".

On Combining Expert Demonstrations in Imitation Learning via Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.10810
  • repo_url: https://github.com/ilanasebag/Sliced-MMOT-Imitation-Learning
  • paper_authors: Ilana Sebag, Samuel Cohen, Marc Peter Deisenroth
  • For: 本研究旨在解决多个专家示范的组合问题,以便智能代理人可以吸取多个专家的知识并学习多种不同的状态轨迹。* Methods: 本研究使用多元优质运输距离来组合多个专家示范,以提供一种更合理的状态轨迹的均值。* Results: 研究表明,使用多元优质运输距离可以更好地组合多个专家示范,并且可以在OpenAI Gym控制环境中提高智能代理人的学习效率。
    Abstract Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.
    摘要 <>模仿学习(IL)目标是教育代理人特定任务通过专家示范。一种关键的IL方法是定义代理人和专家之间的距离,并找到一个代理人策略,以最小化这个距离。优化运输方法在模仿学习中广泛使用,因为它们提供了衡量代理人和专家轨迹之间有意义的距离的方法。然而,如何优化多个专家示范的结合尚未得到广泛研究。标准方法是将状态(-动作)轨迹直接 concatenate,这会导致轨迹多模式问题。我们提出了一种另一种方法,使用多元优化运输距离,可以在OT意义下组合多个和多样的状态轨迹,提供一个更合理的示范均值。我们的方法允许代理人从多个专家学习,并在OpenAI Gym控制环境中分析了其效率,结果显示,标准方法并不总是最佳。Note: "优化运输方法" in the original text refers to "optimal transport methods" in English.

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

  • paper_url: http://arxiv.org/abs/2307.10805
  • repo_url: None
  • paper_authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
  • for: 提高分布式学习中的通信效率,减少中间特征和梯度 вектор的传输过程中的占用率。
  • methods: 提出了一种新的通信压缩框架 SplitFC,利用矩阵列表中不同的分散度来减少通信过程中的占用率。该框架包括两种压缩策略:首先,采用适应性特征排除和适应性量化。其次,通过链式规则, dropped 的特征 вектор和相关的梯度 вектор也会被 dropped。
  • results: 在 MNIST、CIFAR-10 和 CelebA 数据集上进行了实验,得到了与当前SL框架相比, SplitFC 提供了更多的5.6%的分类精度,同时它们需要与无压缩SL框架相比,320倍少的通信过程占用率。
    Abstract This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.
    摘要
  1. Adaptive feature-wise dropout: The intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped.2. Adaptive feature-wise quantization: The non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels are derived in a closed-form expression.The proposed SplitFC framework achieves a more than 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while requiring 320 times less communication overhead compared to the vanilla SL framework without compression. The experimental results on the MNIST, CIFAR-10, and CelebA datasets demonstrate the effectiveness of SplitFC.

Meta-Transformer: A Unified Framework for Multimodal Learning

  • paper_url: http://arxiv.org/abs/2307.10802
  • repo_url: https://github.com/invictus717/MetaTransformer
  • paper_authors: Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue
  • for: 本研究旨在提出一种能够处理多Modalities数据的框架,并不需要任何对准多Modalities训练数据。
  • methods: 该框架基于一个冻结encoder来实现多Modalities感知, Raw输入数据从不同Modalities被映射到共享的Token空间中,然后使用冻结的encoder提取高级别的semantic特征。
  • results: 实验表明,Meta-Transformer可以处理各种任务,包括基本感知(文本、图像、点云、音频、视频)、实用应用(X射、红外、偏振、IMU)以及数据挖掘(图表、时间序列、图表)。
    Abstract Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers. Code will be available at https://github.com/invictus717/MetaTransformer
    摘要 多模态学习目标是建立可处理多种模式的模型。尽管多年的开发努力,仍然困难设计一个统一网络来处理多种模式(如自然语言、2D图像、3D点云、音频、视频、时间序列、表格数据),因为这些模式之间存在深刻的差异。在这种情况下,我们提出了一个框架,名为Meta-Transformer,它利用一个冻结的编码器来实现多模态感知,无需任何配对多模态训练数据。Meta-Transformer框架由三个主要组成部分组成:一个统一数据tokenizer,一个共享encoder和下游任务特定的头。Meta-Transformer可以在12种模式上进行统一学习,无需配对数据。在不同的benchmark上进行的实验表明,Meta-Transformer可以处理各种任务,包括基本感知(文本、图像、点云、音频、视频)、实用应用(X射、红外、多spectral、IMU)和数据挖掘(图形、表格、时间序列)。Meta-Transformer表明了未来发展多模态智能的 transformer 的美好前景。代码将在 GitHub 上提供。

Optimizing PatchCore for Few/many-shot Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.10792
  • repo_url: https://github.com/scortexio/patchcore-few-shot
  • paper_authors: João Santos, Triet Tran, Oliver Rippel
  • for: 本研究探讨了使用少量样本进行异常检测(AD)的新趋势,并评估了现有的普shot AD算法在少量样本情况下的性能。
  • methods: 本研究使用了PatchCore算法,现代普shot AD/异常分割(AS)算法的现状标准。研究人员对PatchCore算法的多种超参数进行优化,并应用了减少supervised learning中的减少学习技术。
  • results: 实验结果表明,对PatchCore算法的超参数优化可以实现显著性能提升,而图像水平的增强技术可能会,但并不一定,提高性能。基于这些发现,研究人员在VisA数据集上达到了新的状态Espoir en ligne de few-shot AD。此外,研究人员还确认了在AD/AS领域探索特征提取器 WITH strong inductive bias的可能性。
    Abstract Few-shot anomaly detection (AD) is an emerging sub-field of general AD, and tries to distinguish between normal and anomalous data using only few selected samples. While newly proposed few-shot AD methods do compare against pre-existing algorithms developed for the full-shot domain as baselines, they do not dedicatedly optimize them for the few-shot setting. It thus remains unclear if the performance of such pre-existing algorithms can be further improved. We address said question in this work. Specifically, we present a study on the AD/anomaly segmentation (AS) performance of PatchCore, the current state-of-the-art full-shot AD/AS algorithm, in both the few-shot and the many-shot settings. We hypothesize that further performance improvements can be realized by (I) optimizing its various hyperparameters, and by (II) transferring techniques known to improve few-shot supervised learning to the AD domain. Exhaustive experiments on the public VisA and MVTec AD datasets reveal that (I) significant performance improvements can be realized by optimizing hyperparameters such as the underlying feature extractor, and that (II) image-level augmentations can, but are not guaranteed, to improve performance. Based on these findings, we achieve a new state of the art in few-shot AD on VisA, further demonstrating the merit of adapting pre-existing AD/AS methods to the few-shot setting. Last, we identify the investigation of feature extractors with a strong inductive bias as a potential future research direction for (few-shot) AD/AS.
    摘要 新兴的几拍异常检测(AD)领域,即使使用只有几个选择的样本来分类正常和异常数据。而新提出的几拍AD方法都会与全shot领域的已有算法进行比较,但它们并没有专门优化这些算法以适应几拍设置。因此,它们的性能是否可以进一步改进,是一个未知问题。我们在这里回答这个问题。我们专门研究了使用PatchCore算法,当前的全shot AD/AS算法,在几拍和多拍设置下的AD/AS性能。我们假设可以通过(I)优化其多种超参数,以及(II)将几拍supervised学习中知道改进技术转移到AD领域来实现性能改进。我们对公共的VisA和MVTec AD数据集进行了广泛的实验,发现(I)可以通过优化特征提取器来实现显著性能提高,并且(II)图像水平的扩展可能会,但并不一定,提高性能。基于这些发现,我们在VisA上实现了新的state of the art,进一步证明了适应pre-existing AD/AS方法到几拍设置的优势。最后,我们认为在AD/AS领域中调查具有强 inductive bias的特征提取器是未来研究的 potential future research direction。

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

  • paper_url: http://arxiv.org/abs/2307.10768
  • repo_url: https://github.com/zhanglab-deepneurocoglab/worm
  • paper_authors: Ankur Sikarwar, Mengmi Zhang
  • for: 本研究开发了一个全面的工作记忆(Working Memory,WM) benchmark dataset,用于评估人工智能(AI)WM模型的效果。
  • methods: 研究使用了10个任务和100万次试验,评估了4种功能、3种领域和11种行为和神经特征。同时,研究还包括了人类行为标准 als an upper bound for comparison。
  • results: 研究发现,AI模型在某些方面与大脑的工作记忆相似,如 primacy 和 recency 效应,以及各个领域和功能的神经团集和相关性。然而,研究也发现现有模型存在一些限制,无法完全模拟人类行为。这个数据集将成为认知心理学、神经科学和AI社区的 valuable resource,用于比较和改进WM模型,调查WM的神经基础,并开发人类类似的WM模型。
    Abstract Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.
    摘要 工作记忆(WM),一种基本的认知过程,扮演短期存储、 интеграción、操作和检索信息的重要角色,对理智和决策任务起着重要作用。为了有效开发和评估人工智能WM模型,需要一些具有多方面特点的WM数据集。在这里,我们介绍了一个全面的Working Memory(WorM)数据集,用于这种目的。WorM包含10项任务和总共100万次评估,覆盖4种功能、3种领域和11种行为和神经特征。我们同时训练和测试了当前最佳的回归神经网络和转换器模型。我们还包括了人类行为标准 als an upper bound for comparison。我们的结果表明,人工智能模型在部分特征上与大脑WM相似,主要是 primacy和 recency效应,以及各个领域和功能特征的神经团和相关性。在实验中,我们还发现了现有模型的一些限制,无法模拟人类行为。这个数据集将成为认知心理学、神经科学和人工智能这些领域的价值资源,提供一个标准化的框架,用于比较和改进WM模型,调查WM的神经基础,并开发人类类似的WM模型。我们的源代码和数据可以在https://github.com/ZhangLab-DeepNeuroCogLab/WorM上获取。

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

  • paper_url: http://arxiv.org/abs/2307.10763
  • repo_url: https://github.com/mondalanindya/msqnet
  • paper_authors: Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta
  • for: 提高多种演员(包括人类和动物)动作识别的灵活性和性能。
  • methods: 提出了一种新的多模态多标签动作识别方法,基于 transformer 框架和文本特征,不需要actor pose estimation,可以更好地利用视觉和文本特征来表示动作类别。
  • results: 在五个公共数据集上进行了广泛的实验,与之前的actor-specificalternatives相比,MSQNet在人类和动物单标和多标动作识别任务上表现出了50%的提高。
    Abstract Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.
    摘要 现有的动作识别方法通常是对应者特定的,因为actor的内在结构和外在特征之间存在差异。这需要对特定的演员进行姿势估计(例如人类 vs. 动物),导致模型设计复杂和维护成本高昂。此外,它们通常专注于学习视觉modal alone,单一标签分类,而忽略其他可用的资料来源(例如类别名称文本)和同时发生的多个动作。为了解决这些限制,我们提出了一个新的方法,即“actor-agnostic multi-modal multi-label action recognition”,它提供了各种演员类型的统一解决方案,包括人类和动物。我们进一步提出了一个名为“Multi-modal Semantic Query Network”(MSQNet)的新模型,它在基于物件检测框架(例如DETR)中实现了融合视觉和文本modalities以更好地表现动作类别。删除对特定演员的模型设计是MSQNet的关键优点,因为它消除了对姿势估计的需求。实际实验结果显示,我们的MSQNet在人类和动物单一和多个动作识别任务上顶对应的先进方法的50%。代码将会在https://github.com/mondalanindya/MSQNet中发布。

Exploring Perspectives on the Impact of Artificial Intelligence on the Creativity of Knowledge Work: Beyond Mechanised Plagiarism and Stochastic Parrots

  • paper_url: http://arxiv.org/abs/2307.10751
  • repo_url: None
  • paper_authors: Advait Sarkar
    for: 这篇论文旨在探讨人工智能(AI)如何影响知识工作中的创造力和归功问题。methods: 本文使用例如文学批评、艺术史和版权法等领域的例子,展示了创造力和原创性是一个过程、作者或观众的财产,而不是一个可观测或信息理论的特征。results: 根据这些例子,我们可以看到AI在创造知识工作中shift知识生产到批评集成,从而更好地认可用户的创造和审核voice。这篇论文的目的是开始一个关于使用AI时更加细致地评估创造力和归功问题的对话。
    Abstract Artificial Intelligence (AI), and in particular generative models, are transformative tools for knowledge work. They problematise notions of creativity, originality, plagiarism, the attribution of credit, and copyright ownership. Critics of generative models emphasise the reliance on large amounts of training data, and view the output of these models as no more than randomised plagiarism, remix, or collage of the source data. On these grounds, many have argued for stronger regulations on the deployment, use, and attribution of the output of these models. However, these issues are not new or unique to artificial intelligence. In this position paper, using examples from literary criticism, the history of art, and copyright law, I show how creativity and originality resist definition as a notatable or information-theoretic property of an object, and instead can be seen as the property of a process, an author, or a viewer. Further alternative views hold that all creative work is essentially reuse (mostly without attribution), or that randomness itself can be creative. I suggest that creativity is ultimately defined by communities of creators and receivers, and the deemed sources of creativity in a workflow often depend on which parts of the workflow can be automated. Using examples from recent studies of AI in creative knowledge work, I suggest that AI shifts knowledge work from material production to critical integration. This position paper aims to begin a conversation around a more nuanced approach to the problems of creativity and credit assignment for generative models, one which more fully recognises the importance of the creative and curatorial voice of the users of these models and moves away from simpler notational or information-theoretic views.
    摘要 人工智能(AI),尤其是生成模型,是知识工作中转化的工具。它们问题化了创ativity、原创性、抄袭、功劳归属和版权所有权的问题。对于这些模型的批评者来说,它们的输出只是基于大量训练数据的随机抄袭、重新排版或拼接,而不是真正的创新。为了应对这些问题,许多人提出了更加严格的法规,限制生成模型的部署、使用和归属。但是,这些问题并不是人工智能的新或特有的问题。在这份位点纸中,我使用文学批评、艺术历史和版权法的例子,表明创ativity和原创性无法定义为一个可观测或信息学性的属性,而是一个过程的属性、作者的属性或观众的属性。此外,一些人认为所有的创作工作都是无认准的重复(大多数无需归属),或者Randomness本身可以是创新的。我认为创ativity是由创作者和接收者社区定义的,并且生成模型在创作过程中的产生部分可以被自动化。使用最近的人工智能在创造知识工作中的研究例子,我建议AI将知识工作从物质生产转移到批判集成。本位点纸的目标是开启一种更加细致的方法,更好地认可生成模型的创新和归属问题,从而更好地满足用户的创作和评价需求。

Fairness-Aware Client Selection for Federated Learning

  • paper_url: http://arxiv.org/abs/2307.10738
  • repo_url: None
  • paper_authors: Yuxin Shi, Zelei Liu, Zhuan Shi, Han Yu
  • for: 这个论文的目的是解决 Federated Learning (FL) 中 client 选择问题,以实现性能和公平性的平衡。
  • methods: 该方法基于 Lyapunov 优化,通过考虑客户端的声誉、参与 FL 任务的时间和对模型性能的贡献,动态调整客户端选择概率。不使用阈值基于声誉筛选,因此允许客户端在感知性能不佳时重新恢复声誉。
  • results: 对实际 multimedia 数据集进行了广泛的实验,结果显示,FairFedCS 比最佳现有方法平均提高了19.6%的公平性和0.73%的测试准确率。
    Abstract Federated learning (FL) has enabled multiple data owners (a.k.a. FL clients) to train machine learning models collaboratively without revealing private data. Since the FL server can only engage a limited number of clients in each training round, FL client selection has become an important research problem. Existing approaches generally focus on either enhancing FL model performance or enhancing the fair treatment of FL clients. The problem of balancing performance and fairness considerations when selecting FL clients remains open. To address this problem, we propose the Fairness-aware Federated Client Selection (FairFedCS) approach. Based on Lyapunov optimization, it dynamically adjusts FL clients' selection probabilities by jointly considering their reputations, times of participation in FL tasks and contributions to the resulting model performance. By not using threshold-based reputation filtering, it provides FL clients with opportunities to redeem their reputations after a perceived poor performance, thereby further enhancing fair client treatment. Extensive experiments based on real-world multimedia datasets show that FairFedCS achieves 19.6% higher fairness and 0.73% higher test accuracy on average than the best-performing state-of-the-art approach.
    摘要 Federated learning (FL) 已经允许多个数据所有者(简称 FL 客户)共同训练机器学习模型,不需要披露私人数据。由于 FL 服务器只能在每次训练中选择一定数量的客户,因此 FL 客户选择成为了重要的研究问题。现有的方法通常是专注于提高 FL 模型性能或对 FL 客户进行公平对待。对于将性能和公平考虑进行平衡的问题,仍然是一个开问题。为解决这个问题,我们提出了 Fairness-aware Federated Client Selection(FairFedCS)方法。基于 Lyapunov 优化,它可以在考虑客户的声誉、参与 FL 任务的时间和对模型性能的贡献之间进行动态调整客户选择机会的概率。不使用阈值基于声誉滤过,它为 FL 客户提供了重新证明自己的声誉的机会,进一步增强公平客户待遇。经过了基于实际 multimedia 数据的广泛实验,我们发现 FairFedCS 在平均情况下高于最佳现有方法的 19.6% 和 0.73% 。

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

  • paper_url: http://arxiv.org/abs/2307.10719
  • repo_url: None
  • paper_authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan
  • for: 本研究旨在探讨现有防御机制对大语言模型(LLM)的潜在危害的效果。
  • methods: 本研究使用理论分析表明现有的语义审核方法存在理论上的限制,并且攻击者可以通过重构允许的输出来重建禁止的输出。
  • results: 研究结果表明,semantic censorship是一个不可解决的问题, LLMs 的程序化和指令遵循能力使得潜在危害的输出仍然可以被重构。此外,攻击者可以通过重构允许的输出来重建禁止的输出,从而降低现有防御机制的效果。
    Abstract Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed censorship approaches treat the issue as a machine learning problem and rely on another LM to detect undesirable content in LLM outputs. In this paper, we present the theoretical limitations of such semantic censorship approaches. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities. Furthermore, we argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones. As a result, we propose that the problem of censorship needs to be reevaluated; it should be treated as a security problem which warrants the adaptation of security-based approaches to mitigate potential risks.
    摘要

Introducing Risk Shadowing For Decisive and Comfortable Behavior Planning

  • paper_url: http://arxiv.org/abs/2307.10714
  • repo_url: None
  • paper_authors: Tim Puphal, Julian Eggert
  • for: 本研究旨在解决城市驾驶中的群体互动问题,现有的自驾车行为规划器通常是对每个单个agent-to-agent互动 separately进行成本函数来找到最优的行为方案,以避免与其他agent相撞。
  • methods: 本研究提出了风险附层(risk shadowing)方法,可以超越单个互动,通过分析三个agent之间的群体互动来更好地理解情况。具体来说,提出的方法可以确定ego agent中的第一个其他agent不需要在行为规划器中考虑,因为这个第一个其他agent无法到达ego agent的路径由第二个其他agent阻挡。
  • results: 在实验中,使用风险附层作为行为规划器的上游筛选器可以规划出更加决策和舒适的驾驶策略,保证安全性。这种方法的可用性在不同的交叉口enario和长途驾驶中被证明。
    Abstract We consider the problem of group interactions in urban driving. State-of-the-art behavior planners for self-driving cars mostly consider each single agent-to-agent interaction separately in a cost function in order to find an optimal behavior for the ego agent, such as not colliding with any of the other agents. In this paper, we develop risk shadowing, a situation understanding method that allows us to go beyond single interactions by analyzing group interactions between three agents. Concretely, the presented method can find out which first other agent does not need to be considered in the behavior planner of an ego agent, because this first other agent cannot reach the ego agent due to a second other agent obstructing its way. In experiments, we show that using risk shadowing as an upstream filter module for a behavior planner allows to plan more decisive and comfortable driving strategies than state of the art, given that safety is ensured in these cases. The usability of the approach is demonstrated for different intersection scenarios and longitudinal driving.
    摘要 我们考虑了城市驾驶中群体互动的问题。当前最佳行为规划器 для自动驾驶车辆通常对每个单个代理-to-代理交互 separately 在成本函数中来找到优化的行为 для ego 代理,例如不与任何其他代理相撞。在这篇论文中,我们开发了风险阴影,一种群体理解方法,允许我们超越单一交互。具体来说,所表示的方法可以找出egos 代理中不需要考虑的第一个其他代理,因为这个第一个其他代理无法到达 ego 代理的因为第二个其他代理阻挡了其路径。在实验中,我们表明了使用风险阴影作为行为规划器的上游筛选模块可以规划更加决策和舒适的驾驶策略,只要保证安全。我们在不同的交叉点enario和长途驾驶中证明了这种方法的可用性。

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV

  • paper_url: http://arxiv.org/abs/2307.10713
  • repo_url: https://github.com/jspenmar/slowtv_monodepth
  • paper_authors: Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden
  • For: The paper is written for scaling self-supervised monocular depth estimation (SS-MDE) to vast quantities of data, and addressing the limitation of existing approaches to the automotive domain.* Methods: The paper proposes a large-scale SlowTV dataset curated from YouTube, which contains 1.7M images from diverse environments, and trains an SS-MDE model using this dataset that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The paper also introduces a collection of best-practices to further maximize performance and zero-shot generalization.* Results: The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture.Here’s the same information in Simplified Chinese:* For: 这篇论文是为了扩大自监督单目深度估计(SS-MDE)的应用范围,并解决现有方法困于自动驾驶领域的局限性。* Methods: 论文提出了一个大规模的 SlowTV 数据集,来自 YouTube 上的170万张图像,包含了世界各地不同的季节旅游、景观驾驶和潜水等环境。使用这个数据集,论文训练了一个 SS-MDE 模型,能够零shot泛化到大量的室内/外部数据集。* Results: 根据论文的结果,该模型比所有现有的 SSL 方法更高效,并且与超级vision SoTA 靠近。
    Abstract Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. SlowTV contains 1.7M images from a rich diversity of environments, such as worldwide seasonal hiking, scenic driving and scuba diving. Using this dataset, we train an SS-MDE model that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture. We additionally introduce a collection of best-practices to further maximize performance and zero-shot generalization. This includes 1) aspect ratio augmentation, 2) camera intrinsic estimation, 3) support frame randomization and 4) flexible motion estimation. Code is available at https://github.com/jspenmar/slowtv_monodepth.
    摘要 自我指导的单目深度估计(SS-MDE)有可能扩展到庞大的数据量。然而,现有的方法只限于汽车领域,导致模型无法泛化到复杂的环境,如自然环境或室内环境。为此,我们提出了一个大规模的SlowTV数据集,从YouTube上收集的170万张图像,包括世界各地不同季节的步行、景观驾车和潜水等环境。使用这个数据集,我们训练了一个SS-MDE模型,可以零执行泛化到大量的室内/外部数据集。这个模型与所有现有的SSL方法相比,表现出了更高的性能和零执行泛化能力,即使使用更有效的架构。此外,我们还介绍了一些最佳实践来进一步提高性能和零执行泛化。这些包括:1. 增加比例增强2. 摄像机内参数估计3. 支持帧随机化4. 灵活运动估计代码可以在https://github.com/jspenmar/slowtv_monodepth上下载。

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2307.10711
  • repo_url: None
  • paper_authors: Jiachun Pan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng, Hanshu Yan
  • for: 这 paper 目的是解决 diffusion probabilistic models (DPMs) 的自适应 Customization 问题,只使用用户提供的 differentiable metric 作为指导。
  • methods: 这 paper 使用了一种新的方法 called AdjointDPM,它首先使用泛化模型生成新样本,然后使用逆变函数感知方法来归因损失到模型参数(包括 conditioning signals、网络参数和初始噪声)。
  • results: 这 paper 在三个有趣的任务上 demonstrate 了 AdjointDPM 的效果:将视觉特效转换为标识符文本嵌入,finetune DPMs для特定类型的风格化,以及优化初始噪声来生成安全审核中的敌意样本。
    Abstract Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.
    摘要 现有的自定义方法需要访问多个参考示例来对预训练的扩散概率模型(DPM)进行对接。这篇论文目标是解决DPM自定义时只有用户提供的可微分度量的挑战。由于扩散过程中的搜索过程包含多个递归调用杂化网络,直观的梯度反射需要保存所有迭代的 intermediate states,从而导致内存占用极高。为解决这个问题,我们提出了一种新的方法:AdjointDPM。AdjointDPM方法首先通过解决相应的概率流方程来生成新的样本。然后,它使用散度敏感方法来倒推损失的梯度到模型参数(包括conditioning信号、网络参数和初始噪声)。为了减少在前向生成和梯度倒推过程中的数值错误,我们进一步重parameterize概率流方程和扩充方程为简单的非硬式ODE。最后,我们示cases了AdjointDPM在三个有趣的任务上的效果:将视觉特效转换为标识符 embedding,finetune DPM для特定类型的风格化,以及优化初始噪声以生成安全审核中的敌意样本。

Towards an architectural framework for intelligent virtual agents using probabilistic programming

  • paper_url: http://arxiv.org/abs/2307.10693
  • repo_url: None
  • paper_authors: Anton Andreev, Grégoire Cattan
    for:ECAs (embodied conversational agents)methods:probabilistic programming, Bayesian networks, and distributionsresults:more natural behavior, adaptation to user preferences, and evolving internal states over time
    Abstract We present a new framework called KorraAI for conceiving and building embodied conversational agents (ECAs). Our framework models ECAs' behavior considering contextual information, for example, about environment and interaction time, and uncertain information provided by the human interaction partner. Moreover, agents built with KorraAI can show proactive behavior, as they can initiate interactions with human partners. For these purposes, KorraAI exploits probabilistic programming. Probabilistic models in KorraAI are used to model its behavior and interactions with the user. They enable adaptation to the user's preferences and a certain degree of indeterminism in the ECAs to achieve more natural behavior. Human-like internal states, such as moods, preferences, and emotions (e.g., surprise), can be modeled in KorraAI with distributions and Bayesian networks. These models can evolve over time, even without interaction with the user. ECA models are implemented as plugins and share a common interface. This enables ECA designers to focus more on the character they are modeling and less on the technical details, as well as to store and exchange ECA models. Several applications of KorraAI ECAs are possible, such as virtual sales agents, customer service agents, virtual companions, entertainers, or tutors.
    摘要 我们提出了一个新的框架called KorraAI,用于设计和建立具有身体语言功能的会话代理人(ECAs)。我们的框架对ECAs的行为进行了考虑,包括上下文信息(如环境和互动时间)以及由人类互动伙伴提供的不确定信息。此外,由KorraAI构建的代理人可以显示主动的行为,可以与人类互动伙伴发起互动。为达到这些目的,KorraAI利用概率编程。概率模型在KorraAI中用于模型行为和人类互动。它们允许适应用户的偏好,并在ECAs中实现更自然的行为。内部状态,如情绪、喜好和惊喜(例如)可以在KorraAI中被模型为分布和 bayesian网络。这些模型可以随着时间的推移而发展,无需与用户交互。ECAs的模型实现为插件,共享公共接口。这使得ECA设计者可以更注重模仿的人物,而不是技术细节,同时也可以存储和交换ECAs模型。ECAs可以应用于虚拟销售代理人、客户服务代理人、虚拟伴侣、娱乐者或教育师。

Bounded Combinatorial Reconfiguration with Answer Set Programming

  • paper_url: http://arxiv.org/abs/2307.10688
  • repo_url: None
  • paper_authors: Yuya Yamada, Mutsunori Banbara, Katsumi Inoue, Torsten Schaub
  • for: 解决 combinatorial reconfiguration 问题
  • methods: 使用 Answer Set Programming (ASP) 开发 bounded combinatorial reconfiguration 方法
  • results: 实现了 solver track 中所有 metric 的解决方案,并在 single-engine solvers track 中独立排名第一,并进行了 empirical analysis 评估所有 CoRe Challenge 2022 实例。
    Abstract We develop an approach called bounded combinatorial reconfiguration for solving combinatorial reconfiguration problems based on Answer Set Programming (ASP). The general task is to study the solution spaces of source combinatorial problems and to decide whether or not there are sequences of feasible solutions that have special properties. The resulting recongo solver covers all metrics of the solver track in the most recent international competition on combinatorial reconfiguration (CoRe Challenge 2022). recongo ranked first in the shortest metric of the single-engine solvers track. In this paper, we present the design and implementation of bounded combinatorial reconfiguration, and present an ASP encoding of the independent set reconfiguration problem that is one of the most studied combinatorial reconfiguration problems. Finally, we present empirical analysis considering all instances of CoRe Challenge 2022.
    摘要 我们开发了一种名为 bounded combinatorial reconfiguration 的方法,用于解决 combinatorial reconfiguration 问题,基于 Answer Set Programming (ASP)。通用任务是研究源 combinatorial 问题的解空间,并判断是否存在特殊性的解册列表。 resulting recongo solver 覆盖了最新的国际合作计划(CoRe Challenge 2022)中的所有纪录。 recongo 在单引擎solvers track中的最短度metric中排名第一。在这篇论文中,我们介绍 bounded combinatorial reconfiguration 的设计和实现,并提供了独立集重配置问题的 ASP 编码,这是 combinatorial reconfiguration 问题中最受研究的一个。 finally,我们对 CoRe Challenge 2022 所有实例进行了实验分析。

A Personalized Recommender System Based-on Knowledge Graph Embeddings

  • paper_url: http://arxiv.org/abs/2307.10680
  • repo_url: https://github.com/nil9/Master-Thesis
  • paper_authors: Ngoc Luyen Le, Marie-Hélène Abel, Philippe Gouspillou
  • for: 这篇论文是关于构建个性化推荐系统的研究,通过知识图 embedding 技术来更好地捕捉用户和物品之间的隐式连接,提供更加准确的推荐。
  • methods: 该论文提出了一种基于知识图 embedding 技术的个性化推荐系统,通过在知识图中嵌入用户和物品,更好地捕捉用户的偏好和需求,提供更加准确的推荐。
  • results: 实验结果表明,该方法可以提供高度相似的推荐,并且可以捕捉用户的偏好和需求,提供更加准确的推荐。
    Abstract Knowledge graphs have proven to be effective for modeling entities and their relationships through the use of ontologies. The recent emergence in interest for using knowledge graphs as a form of information modeling has led to their increased adoption in recommender systems. By incorporating users and items into the knowledge graph, these systems can better capture the implicit connections between them and provide more accurate recommendations. In this paper, we investigate and propose the construction of a personalized recommender system via knowledge graphs embedding applied to the vehicle purchase/sale domain. The results of our experimentation demonstrate the efficacy of the proposed method in providing relevant recommendations that are consistent with individual users.
    摘要 知识图有效地模型了实体和其关系,通过使用 ontology。近期关于使用知识图作为信息模型的兴趣增长,导致它们在推荐系统中的应用更加普遍。通过将用户和项目 embedding到知识图中,这些系统可以更好地捕捉用户和项目之间的隐式连接,提供更准确的推荐。本文investigates和提出了基于知识图embedding的个性化推荐系统的建立,应用于汽车购买/销售领域。实验结果表明,提posed方法可以提供适合个人用户的有关推荐。

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10635
  • repo_url: https://github.com/mandyyyyii/scibench
  • paper_authors: Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang
  • for: 这篇论文旨在检验大型自然语言模型(LLM)的推理能力,以便在科学问题解决中提高其表现。
  • methods: 这篇论文使用了两个精心准备的数据集:一个是大学生水平的科学问题,包括数学、化学和物理等领域的问题,另一个是计算机科学和数学等专业考试的问题。研究者使用了多种提示策略来评估 LLM 的表现。
  • results: 研究结果显示,当前 LLM 的表现并不出色,总得分只有 35.80%。此外,研究者还分类了 LLM 的错误为十种问题解决能力,发现没有一种提示策略能够明显超越其他策略,一些策略可以提高某些问题解决能力,却导致其他能力下降。
    Abstract Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.
    摘要 SciBench contains two datasets: an open set with a range of collegiate-level scientific problems from mathematics, chemistry, and physics textbooks, and a closed set with problems from undergraduate-level exams in computer science and mathematics. We conduct an in-depth benchmark study of two representative LLMs with various prompting strategies, and find that current LLMs score only 35.80% overall.Through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis shows that no single prompting strategy significantly outperforms others, and that some strategies that improve in certain problem-solving skills result in declines in other skills. We believe that SciBench will drive further developments in the reasoning abilities of LLMs, ultimately contributing to scientific research and discovery.

Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

  • paper_url: http://arxiv.org/abs/2307.10631
  • repo_url: None
  • paper_authors: Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung, Philippe Charland
  • for: 本研究旨在解决现有的Assembly clone search方法在面对未看过的架构和库时的限制。
  • methods: 本研究提议通过大规模预训练的自然语言模型来帮助现有的学习型方法,以扩大其应用范围。此外,我们还提出一种强化学习代理来消除无用和重复的токен。
  • results: 我们通过模拟未看过架构做clone搜索场景,并实验结果表明我们的方法比现有方法更有效。
    Abstract The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.
    摘要 software开发中代码重用是一项非常重要的做法,可以帮助提高开发周期的速度和效率。然而,实际上代码重用很难受到有效的控制,这会导致问题如攻击协议和知识产权侵犯。Assembly clone search是一种重要的防御机制,可以 identificatin vulnerable code resulting from reuse in released executables。 recent studies have shown a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains。然而,这些方法受到训练中使用的小量工具链变体的限制,使其对未看过的架构和相应的编译工具链变体无法适用。这篇论文是关于assembly clone search with unseen architectures和libraries的首个研究。我们提议通过大规模预训练的自然语言模型来 incorporate human common knowledge into current learning-based approaches for assembly clone search。这种方法可以帮助解决现有方法的局限性,因为它可以带来更广泛的人类专家知识。我们进一步解决序列限制问题,提出一种强化学习代理来 remov redundant和无用的 токен。与新的Variational Information Bottleneck学习策略相结合,我们的提案可以减少架构和优化设置的可能指标,以提高对未看过的架构的总体化。我们在模拟未看过架构做clone search的场景下进行了实验,结果显示了我们的方法的效果,比起当前的状态对策。

Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

  • paper_url: http://arxiv.org/abs/2307.10617
  • repo_url: None
  • paper_authors: Anusuya Baby Hari Krishnan
  • for: 本研究旨在提出一种机器学习模型,用于 indentifying deceptive reviews,尤其是针对餐厅评论。
  • methods: 本研究使用n-gram模型和max features技术来提取特征,并 coupling five distinct machine learning classification algorithms进行分类。
  • results: 实验结果显示,通过使用passive aggressive类型的分类器,可以具有最高的精度和鲁棒性,并且在文本分类和假评论检测方面表现出色。
    Abstract In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.
    摘要 在当今数字化时代,在线评论已成为不同业务的不可或缺的工具。广告商、市场推广人员和在线业务在创建假评论以便自己的产品或者降低竞争对手的产品时,假评论的写作已成为不可避免的做法。因此,检测假评论已成为一项激烈和持续的研究领域。本研究论文提出一种机器学习模型,用于识别假评论,尤其是针对餐厅评论。本研究通过对知名的餐厅评论数据集——假评论敏感数据集进行了多个实验。为了实现这一目标,我们开发了ngram模型和最佳特征,以便有效地识别假内容,特别是假评论。我们进行了benchmark研究,以 explore两种不同的特征提取技术的相对性,然后与五种不同的机器学习分类算法结合。实验结果表明,通过的情感攻击分类器在文本分类和识别假评论方面具有最高精度。此外,我们还对数据增强和深度学习技术进行了应用,以进一步提高假评论检测的过程。研究成果照明了我们提出的机器学习方法的效果,并对在线业务中的假评论处理提供了有价值的思路。

Heterogeneous Federated Learning: State-of-the-art and Research Challenges

  • paper_url: http://arxiv.org/abs/2307.10616
  • repo_url: https://github.com/marswhu/hfl_survey
  • paper_authors: Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao
  • for: 本研究主要针对 Federated Learning (FL) 在大规模实际应用中的挑战,特别是在客户端数据、模型、网络环境和硬件设备等方面存在多样化的情况下进行研究。
  • methods: 本文首先总结了 Heterogeneous Federated Learning (HFL) 中的多种研究挑战,包括统计学上的多样性、模型上的多样性、通信上的多样性、设备上的多样性以及其他挑战。此外,本文还对现有的 HFL 方法进行了系统的审视,并提出了一种新的分类方法,即根据 HFL 过程的不同级别分为数据级、模型级和服务级。
  • results: 本文对 HFL 的研究进行了深入的分析,并提出了一些关键和前景的未来研究方向。这些方向可能会促进 HFL 领域的进一步发展。此外,本文还提供了一个 periodic 更新的 HFL 集成资源,可以在 https://github.com/marswhu/HFL_Survey 上获取。
    Abstract Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.
    摘要 Federated learning (FL) 已经引起了越来越多的关注,因为它在大规模工业应用中具有潜在的潜力。现有的联邦学习研究主要集中在模型同质Setting下进行。然而,实际的联邦学习通常面临参与客户端数据分布、模型架构、网络环境和硬件设备之间的不同性。这种不同性的联邦学习(HFL)是更加复杂和多样化的,因此需要一个系统的检视和分析。在这个检视中,我们首先总结了HFL的多种研究挑战,从五个方面出发:统计不同性、模型不同性、通信不同性、设备不同性以及附加挑战。此外,我们还评论了现有的HFL方法,并提出了一种新的分类方法,从三个不同的水平进行分类:数据水平、模型水平和服务器水平。最后,我们讨论了一些重要和有前途的未来研究方向,这些方向可能会促进这个领域的进一步发展。关于HFL的更新集可以在https://github.com/marswhu/HFL_Survey中找到。

Challenges and Solutions in AI for All

  • paper_url: http://arxiv.org/abs/2307.10600
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Rifat Ara Shams, Didar Zowghi, Muneera Bano
  • for: 本研究旨在探讨人工智能(AI)设计中的多样性和包容性(D&I)原则,以提高系统的公平、信任和透明度。
  • methods: 本研究采用系统性回访方法,检索2017年至2022年间发表的48篇研究论文。通过开 coding,找到了55个D&I在AI中的挑战和33个解决方案,以及24个优化D&I实践使用AI的挑战和23个解决方案。
  • results: 本研究提供了对D&I在AI中的问题更深入的理解,为研究人员和实践者们提供了 referencing的知识,以便将这些原则integrated into future AI系统。
    Abstract Artificial Intelligence (AI)'s pervasive presence and variety necessitate diversity and inclusivity (D&I) principles in its design for fairness, trust, and transparency. Yet, these considerations are often overlooked, leading to issues of bias, discrimination, and perceived untrustworthiness. In response, we conducted a Systematic Review to unearth challenges and solutions relating to D&I in AI. Our rigorous search yielded 48 research articles published between 2017 and 2022. Open coding of these papers revealed 55 unique challenges and 33 solutions for D&I in AI, as well as 24 unique challenges and 23 solutions for enhancing such practices using AI. This study, by offering a deeper understanding of these issues, will enlighten researchers and practitioners seeking to integrate these principles into future AI systems.
    摘要 人工智能(AI)的普遍存在和多样性需要多样性和包容性(D&I)的设计,以确保公平、信任和透明度。然而,这些考虑因常被忽略,导致了偏见、歧视和被视为不可信的问题。为了应对这些问题,我们进行了系统性审查,探索了AI中D&I的挑战和解决方案。我们的严格搜索结果为48篇发表于2017-2022年的研究论文,经开放编码 revelaed 55个唯一的挑战和33个解决方案,以及24个唯一的挑战和23个解决方案,用于提高这些原则的实践。这项研究,通过深入理解这些问题,将为研究人员和实践者提供指导,以便将这些原则integrated into future AI系统。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Exploiting Structure for Optimal Multi-Agent Bayesian Decentralized Estimation

  • paper_url: http://arxiv.org/abs/2307.10594
  • repo_url: None
  • paper_authors: Christopher Funk, Ofer Dagan, Benjamin Noack, Nisar R. Ahmed
  • for: 这篇论文是关于泛矩阵中的异谱传播问题,具体来说是针对在多 Agent 分布式整合问题中的 rumor propagation 现象进行研究。
  • methods: 这篇论文提出了一种基于多因子权重的非正式 CI 算法,以及一种基于概率独立结构的优化方案,用于解决这个问题。
  • results: 对于一个简单的问题,这两种方法都能够 converge 到同一解,而且在一个大规模的目标跟踪 simulate 中,非正式 CI 算法能够实现更紧张的 bound 和更准确的估计。
    Abstract A key challenge in Bayesian decentralized data fusion is the `rumor propagation' or `double counting' phenomenon, where previously sent data circulates back to its sender. It is often addressed by approximate methods like covariance intersection (CI) which takes a weighted average of the estimates to compute the bound. The problem is that this bound is not tight, i.e. the estimate is often over-conservative. In this paper, we show that by exploiting the probabilistic independence structure in multi-agent decentralized fusion problems a tighter bound can be found using (i) an expansion to the CI algorithm that uses multiple (non-monolithic) weighting factors instead of one (monolithic) factor in the original CI and (ii) a general optimization scheme that is able to compute optimal bounds and fully exploit an arbitrary dependency structure. We compare our methods and show that on a simple problem, they converge to the same solution. We then test our new non-monolithic CI algorithm on a large-scale target tracking simulation and show that it achieves a tighter bound and a more accurate estimate compared to the original monolithic CI.
    摘要 “统计分散式数据融合中的一个主要挑战是传闻传播(double counting)现象,其中先前发送的数据会再次回到发送者。通常这个问题会用近似方法如协变积分(CI)来解决,这个方法会将估计值加权平均以计算范围。然而,这个范围通常是不紧的,即估计通常是过保守的。在这篇论文中,我们显示了在多代理分散式数据融合问题中,通过利用多元独立的概率结构,可以使用多个不同的加权因子而不是单一的固定加权因子,从而获得更紧的范围。我们比较了我们的方法和原始单一CI方法,并证明了在一个简单问题上,它们均 converge 到相同的解决方案。然后,我们将我们的新非单一CI算法应用到一个大规模目标追踪 simulator 中,并证明了它在获得更紧的范围和更准的估计方面比原始单一CI更好。”

Boundary State Generation for Testing and Improvement of Autonomous Driving Systems

  • paper_url: http://arxiv.org/abs/2307.10590
  • repo_url: None
  • paper_authors: Matteo Biagiola, Paolo Tonella
  • for: This paper aims to improve the dependability of autonomous driving systems (ADSs) by presenting a novel test generator called GenBo.
  • methods: GenBo mutates the driving conditions of the ego vehicle (position, velocity, and orientation) collected in a failure-free environment instance to generate challenging driving conditions at the behavior boundary, where the model starts to misbehave.
  • results: The retrained model using GenBo has up to 16% higher success rate on a separate set of evaluation tracks compared to the original DNN model.Here’s the simplified Chinese text:
  • for: 本文提出了一种新的自动驾驶系统测试生成器(GenBo),以提高自动驾驶系统的可靠性。
  • methods: GenBo通过对ego车辆的驾驶条件(位置、速度和方向)进行突变,在同一个环境实例中生成了难题的驾驶条件,以便在模型开始出问题的边界上进行测试。
  • results: 使用GenBo重新训练模型,模型在分离的评估车道上的成功率提高了16%。
    Abstract Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. Such approaches have two main drawbacks: (1) modifications to the simulated environment might not be easily transferable to the in-field test setting (e.g., changing the road shape); (2) environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16 higher success rate on a separate set of evaluation tracks with respect to the original DNN model.
    摘要 In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity, and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (where the model starts to misbehave) in the same environment. We use these boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16% higher success rate on a separate set of evaluation tracks compared to the original DNN model.

Forecasting Battery Electric Vehicle Charging Behavior: A Deep Learning Approach Equipped with Micro-Clustering and SMOTE Techniques

  • paper_url: http://arxiv.org/abs/2307.10588
  • repo_url: None
  • paper_authors: Hanif Tayarani, Trisha V. Ramadoss, Vaishnavi Karanam, Gil Tal, Christopher Nitta
    for:This paper aims to improve the forecasting of battery electric vehicle (BEV) charging events, which is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively.methods:The paper develops a novel Micro Clustering Deep Neural Network (MCDNN) algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events.results:The proposed MCDNN outperforms benchmark approaches, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models, in predicting the charging events.
    Abstract Energy systems, climate change, and public health are among the primary reasons for moving toward electrification in transportation. Transportation electrification is being promoted worldwide to reduce emissions. As a result, many automakers will soon start making only battery electric vehicles (BEVs). BEV adoption rates are rising in California, mainly due to climate change and air pollution concerns. While great for climate and pollution goals, improperly managed BEV charging can lead to insufficient charging infrastructure and power outages. This study develops a novel Micro Clustering Deep Neural Network (MCDNN), an artificial neural network algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events, information that is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively. The MCDNN is configured using a robust dataset of trips and charges that occurred in California between 2015 and 2020 from 132 BEVs, spanning 5 BEV models for a total of 1570167 vehicle miles traveled. The numerical findings revealed that the proposed MCDNN is more effective than benchmark approaches in this field, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models in predicting the charging events.
    摘要 transportation electrification 是为了降低排放和提高公共健康而努力往前,其中能源系统、气候变化和公共健康都是主要原因。随着全球各地推广电动汽车,许多汽车制造商即将停止生产内燃机械汽车,而且BEV采购率在加利福尼亚州正在增长,主要是由于气候变化和空气污染的问题。然而,不当管理BEV充电可能会导致充电基础设施不足和停电。这种研究开发了一种 Micro Clustering Deep Neural Network (MCDNN),这是一种人工神经网络算法,可以很好地学习BEV行驶和充电数据,以预测BEV充电事件,这些信息对电力聚集器和供电公司来说非常重要。MCDNN使用了加利福尼亚州2015-2020年间132台BEV的行驶记录,涵盖5种BEV型号,共计1570167公里。数值发现表明,提案的MCDNN比 benchmark方法更有效,例如支持向量机、最近邻居、决策树和其他神经网络模型在预测充电事件方面更高。

Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding & Contextual Label Affinity

  • paper_url: http://arxiv.org/abs/2307.10577
  • repo_url: None
  • paper_authors: Hugo Latapie, Shan Yu, Patrick Hammer, Kristinn R. Thorisson, Vahagn Petrosyan, Brandon Kynoch, Alind Khare, Payman Behnam, Alexey Tumanov, Aksheit Saxena, Anish Aralikatti, Hanning Chen, Mohsen Imani, Mike Archbold, Tangrui Li, Pei Wang, Justin Hart
  • for: 本研究旨在提出一种flexible和适应的零shot视频分析系统,以解决传统计算机视觉模型在实际应用中的缺点,如高False Positive和False Negative率,以及新场景投入重新训练等问题。
  • methods: 本研究使用用户定义的视频分析、自然语言或关键词指定,并利用 joint embedding 模型和基于 ontologies 如 WordNet 和 ConceptNet 的理解机制。 Ethosight 在低成本的边缘设备上运行,支持 runtime 适应,从而提供一种新的连续学习方法,不受恶化学习的限制。
  • results: 本研究提供了 Ethosight 的实际效果的证明,在多种复杂的应用场景中表现出色,同时也采取了全部源代码和数据集的发布,以便重新复制和进一步推动研究和商业领域的创新。
    Abstract Traditional computer vision models often necessitate extensive data acquisition, annotation, and validation. These models frequently struggle in real-world applications, resulting in high false positive and negative rates, and exhibit poor adaptability to new scenarios, often requiring costly retraining. To address these issues, we present Ethosight, a flexible and adaptable zero-shot video analytics system. Ethosight begins from a clean slate based on user-defined video analytics, specified through natural language or keywords, and leverages joint embedding models and reasoning mechanisms informed by ontologies such as WordNet and ConceptNet. Ethosight operates effectively on low-cost edge devices and supports enhanced runtime adaptation, thereby offering a new approach to continuous learning without catastrophic forgetting. We provide empirical validation of Ethosight's promising effectiveness across diverse and complex use cases, while highlighting areas for further improvement. A significant contribution of this work is the release of all source code and datasets to enable full reproducibility and to foster further innovation in both the research and commercial domains.
    摘要 传统的计算机视觉模型经常需要大量的数据收集、注释和验证,这些模型经常在实际应用中遇到高的假阳性和假阴性率,并且具有贫富新场景适应性,需要高成本的重新训练。为解决这些问题,我们介绍了Ethosight,一个灵活和适应的零shot视频分析系统。Ethosight从用户定义的视频分析开始,通过自然语言或关键词指定,并利用联合嵌入模型和基于 ontology 的理解机制,例如 WordNet 和 ConceptNet。Ethosight在低成本的边缘设备上运行,支持增强的运行时适应,因此提供了一种新的连续学习方法,不会导致恰等忘记。我们提供了多种用例的实验 validate Ethosight 的承诺效果,同时还提出了进一步改进的方向。这项工作的一个重要贡献是发布所有源代码和数据集,以便完全重现和促进研究和商业领域的进一步创新。

Boosting Federated Learning Convergence with Prototype Regularization

  • paper_url: http://arxiv.org/abs/2307.10575
  • repo_url: None
  • paper_authors: Yu Qiao, Huy Q. Le, Choong Seon Hong
  • for: 这篇论文旨在提高 Federated Learning(FL)中 Client 间的资料分布不均势,以提高模型表现。
  • methods: 本文提出了一种基于 Prototype 的调整策略,通过服务器将分布在 Client 上的本地 Prototype 聚合为全域 Prototype,然后将其返回到各 Client,以帮助它们本地训练。
  • results: 在 MNIST 和 Fashion-MNIST 上进行的实验结果显示,该提案可以与最受欢迎的基于 FedAvg 的基线比较,在不均势的设定下实现更快的整合速度和3.3% 和8.9% 的平均测试准确率提高。
    Abstract As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings.
    摘要 为了应对分布式机器学习技术中的客户端数据不均衡问题,本文提出了一种原型基于准则化策略。具体来说,这种准则化策略包括客户端分布式的本地原型被服务器聚合成global原型,然后将global原型返回给每个客户端,以帮助每个客户端本地进行训练。我们在MNIST和Fashion-MNIST上进行了实验,结果显示,我们的方案可以与最流行的基准FedAvg相比,提高测试准确率3.3%和8.9%。此外,我们的方法在不均衡情况下具有快速的收敛速率。

Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.10574
  • repo_url: None
  • paper_authors: Can Jiang, Xin Li, Jia-Rui Lin, Ming Liu, Zhiliang Ma
  • for: 本研究旨在提出一种模型和方法,以优化建筑项目中工作和资金流的控制,以适应复杂的建筑项目环境下的不确定性和多样性。
  • methods: 本研究使用了深度强化学习(DRL)技术,实现了继续适应控制劳务和材料流动,从而优化工作和资金流。同时,为了有效地训练DRL,还开发了基于分割事件 simulations的模拟器。
  • results: 实验结果表明,我们的方法在多种项目和外部环境下表现出色,并且在不同的项目和环境下具有remarkable的可靠性和灵活性。此外,杂合DRL和经验法则的代理人得到了最佳结果。本研究的成果可能对建筑项目管理中的适应控制和优化做出重要贡献。
    Abstract Due to complexity and dynamics of construction work, resource, and cash flows, poor management of them usually leads to time and cost overruns, bankruptcy, even project failure. Existing approaches in construction failed to achieve optimal control of resource flow in a dynamic environment with uncertainty. Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects. First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resource, and cash flows as well as uncertainty and variability of diverse influence factors. Meanwhile, to efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize the continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows. To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios illustrate that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resource, and cash flows, and may serve as a step stone for adopting DRL technology in construction project management.
    摘要 First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resources, and cash flows, as well as uncertainty and variability of diverse influence factors. To efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows.To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios show that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resources, and cash flows, and may serve as a stepping stone for adopting DRL technology in construction project management.

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

  • paper_url: http://arxiv.org/abs/2307.10573
  • repo_url: None
  • paper_authors: Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo
  • for: 这篇论文旨在探讨语义模型如何通过逻辑无效的Chain-of-Thought(CoT)提问方法来提高性能。
  • methods: 研究人员使用了无效CoT提问方法和编辑CoT提问方法来测试语义模型的性能。
  • results: 研究人员发现,无效CoT提问方法可以提高语义模型的性能,并且这种提高效果与有效提问方法相当。此外,研究人员还发现了一些先前的CoT提问方法中的逻辑错误。这表示,以外于逻辑正确的因素也可能导致性能提高。
    Abstract Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.
    摘要 语言模型可以通过 проблеme 的推理来提高性能,但是为什么这种提高性能是如此明显不清楚。 latest work 表明,使用无效的链条(Chain-of-Thought,CoT)提问可以几乎达到有效的 CoT 提问的性能水平,并且编辑 CoT 提问以替换问题特定信息或者out-of-distribution信息通常不会增加性能的风险。 however, some critics 认为,这些发现是基于 too few 和 too easily solved tasks 来 drew meaningful conclusions。 To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically invalid reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts。 We also discover that some CoT prompts used by previous works contain logical errors。 This suggests that covariates beyond logically valid reasoning are responsible for performance improvements。

Deceptive Alignment Monitoring

  • paper_url: http://arxiv.org/abs/2307.10569
  • repo_url: None
  • paper_authors: Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 防止大数据学模型的伪装Alignment威胁
  • methods: 多种多样的机器学习子领域的研究
  • results: 提出了新的研究机遇和挑战,促进了对伪装Alignment的监测和防御
    Abstract As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.
    摘要 large machine learning models的能力不断增强,模型自身的自主权也在扩大,这使得一种新的敌人隐约出现:模型自身的偏见。这种偏见被称为“欺骗吧”(deceptive alignment)在AI安全与对齐社区中,我们称这个新方向为“欺骗吧监测”。在这篇文章中,我们识别出了未来几年将成为关键和互相关联的多种机器学习子领域的发展趋势,并 argue That these fields present both long-term challenges and new research opportunities. Finally, we advocate for greater involvement by the adversarial machine learning community in these emerging directions.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

  • paper_url: http://arxiv.org/abs/2307.10563
  • repo_url: None
  • paper_authors: Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 这篇论文的主要目的是提高深度神经网络中的不可预测性攻击防护,以提高模型的灵活性和可靠性。
  • methods: 这篇论文提出了一个新的几率和几何方法,用于无监督机器学习中的不可预测性攻击探测。这个方法基于几率分布预测,从激活空间中提取高维模式,以提高模型的不可预测性和可靠性。
  • results: 这篇论文的结果显示,FACADE方法可以实现高度精确地探测深度神经网络中的不可预测性攻击,并且可以实现模型的稳定性和可靠性。此外,FACADE方法还可以应用于实际应用场景中,以提高模型的运行效率和可靠性。
    Abstract We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
    摘要 我们介绍FACADE,一个新的机会概率和几何框架,用于无超级机器学习模型中的机会性异常检测。它的主要目标是提高防火墙攻击的理解和缓和。FACADE通过生成逻辑统计分布,以提供异常检测中几何特性的关键洞察,从而实现高效地探测和抵御防火墙攻击。我们的方法可以提高模型的抗性、增强可扩展的模型监控,并且在实际应用中展现了有前途的应用。

Air Traffic Controller Workload Level Prediction using Conformalized Dynamical Graph Learning

  • paper_url: http://arxiv.org/abs/2307.10559
  • repo_url: https://github.com/ymlasu/para-atm-collection
  • paper_authors: Yutian Pang, Jueming Hu, Christopher S. Lieber, Nancy J. Cooke, Yongming Liu
  • for: 这个论文的目的是预测空交通控制员(ATCo)的工作负担,以避免过载和保证操作安全性和空域使用效率。
  • methods: 这个论文使用了人工智能技术,特别是图学深度学习和协形预测,以分析空交通数据和工作负担标签。
  • results: 研究结果表明, besides 交通密度特征,交通冲突特征也对工作负担预测做出了贡献(即最小水平/垂直距离)。 direct learning from 空间时间图形的图 neural network 可以实现更高的预测精度,比手工设计的交通复杂度特征。 conformal prediction 是一种有价值的工具,可以进一步提高模型预测精度,生成一个范围内的预测Label。
    Abstract Air traffic control (ATC) is a safety-critical service system that demands constant attention from ground air traffic controllers (ATCos) to maintain daily aviation operations. The workload of the ATCos can have negative effects on operational safety and airspace usage. To avoid overloading and ensure an acceptable workload level for the ATCos, it is important to predict the ATCos' workload accurately for mitigation actions. In this paper, we first perform a review of research on ATCo workload, mostly from the air traffic perspective. Then, we briefly introduce the setup of the human-in-the-loop (HITL) simulations with retired ATCos, where the air traffic data and workload labels are obtained. The simulations are conducted under three Phoenix approach scenarios while the human ATCos are requested to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis is conducted. Next, we propose a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels. The code used is available at \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Prediction/}{$\mathsf{Link}$}.
    摘要 空交通控制(ATC)是一个安全关键的服务系统,需要地面空交通控制员(ATCo)不断注意力,以维护每天的航空业务。ATCo的工作负担可能会对运行安全和空间使用产生负面影响。为了避免过载和确保ATCo的工作负担水平为可接受,需要准确预测ATCo的工作负担。在这篇论文中,我们首先进行了关于ATCo工作负担的研究,主要从空交通的角度进行评估。然后,我们简要介绍了使用退休ATCo进行人类在Loop(HITL) simulations的设置,其中获取了空交通数据和工作负担标签。在三个 Phoenixtapproach 场景下,人类ATCo被请求进行自我评估工作负担水平(从低1到高7)。我们对数据进行了初步分析。然后,我们提出了基于图的深度学习框架,并使用协形预测来预测ATCo工作负担水平。由于空交通中的飞机数量在控制员的控制范围内变化 both spatially和 temporally, resulting in dynamically evolving graphs。实验结果表明,(a) besides traffic density feature, traffic conflict feature也对工作负担预测做出了贡献(i.e., minimum horizontal/vertical separation distance);(b) directly learning from spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features;(c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels。代码使用 $\mathsf{Link}$。

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

  • paper_url: http://arxiv.org/abs/2307.10554
  • repo_url: https://github.com/lilujunai/emq-series
  • paper_authors: Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan
  • for: 这个论文是为了提出一种自动生成混合精度量化(MQ)代理的框架,以提高MQ的准确率和效率。
  • methods: 该论文使用了一种自动搜索方法来找到最佳的MQ代理,并提出了一种多样性推动选择策略和兼容性检测协议来避免快速落后。
  • results: 该论文的实验结果表明,使用该自动生成MQ代理框架可以在ImageNet上 достичь比州态艺术方法更高的性能,并且具有显著更高的效率。
    Abstract Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.
    摘要 含杂精度量化~(MQ)可以实现模型的竞争性精度复杂度质量规则。传统的训练基本方法需要耗时的候选人训练来搜索MQ中的优化每层比特宽配置。最近,一些无需训练的方法已经提出了多种MQ代理,并显著提高了搜索效率。然而,这些代理与量化精度之间的相关性不够了解。为了解决这个差距,我们首先建立了MQ-Bench-101,它包括不同的比特配置和量化结果。然后,我们发现现有的无需训练代理在MQ-Bench-101上表现出弱相关性。为了有效寻找优秀代理,我们开发了一个自动搜索代理框架 дляMQ。在特定的搜索空间中,我们采用了现有的代理和演化算法来找到最佳相关的MQ代理。我们提出了一种多样性激发选择策略和兼容性检查协议,以避免早期 converges和提高搜索效率。因此,我们的演化代理 для混合精度量化~(EMQ)框架可以自动生成代理,无需重重的调整和专家知识。我们的实验结果表明,在ImageNet上使用不同的ResNet和MobileNet家族时,我们的EMQ可以在相对较少的成本下达到当前混合精度方法的优秀性能。代码将被发布。

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

  • paper_url: http://arxiv.org/abs/2307.10551
  • repo_url: None
  • paper_authors: Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao, Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang
  • for: 本研究旨在提高键信息EXTRACTION(KIE)任务中的结构化值semantic entityextraction的精度和效率,以满足实际世界中的复杂场景需求。
  • methods: 本文提出了一种新的大规模人工标注数据集 named Complex Layout form for key information EXtraction (CLEX),以及一种基于并行指针的网络模型(PPN),可以在零shot和几shot情况下应用。PPN可以利用semantic entity之间的隐式做法来帮助提取,并且其并行提取机制可以同时和高效地提取多个结果。
  • results: 对CLEX数据集进行测试,PPN模型比现有状态的方法更高效,同时具有更快的推理速度。
    Abstract Key Information Extraction (KIE) is a challenging multimodal task that aims to extract structured value semantic entities from visually rich documents. Although significant progress has been made, there are still two major challenges that need to be addressed. Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios. Secondly, existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem. Additionally, they are difficult to apply in situations where unseen semantic entity categories emerge. To address the first challenge, we propose a new large-scale human-annotated dataset named Complex Layout form for key information EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity categories. To solve the second challenge, we introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios. PPN leverages the implicit clues between semantic entities to assist extracting, and its parallel extraction mechanism allows it to extract multiple results simultaneously and efficiently. Experiments on the CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods while also offering a much faster inference speed.
    摘要 《键信息提取(KIE)是一项具有挑战性的多Modal任务,旨在从视觉丰富的文档中提取结构化的值含义实体。尽管已经取得了显著的进步,但还有两个主要挑战需要解决。首先,现有的数据集的布局相对固定,数量有限,与实际世界场景相比存在差距。其次,现有的方法采用两阶段管道策略,可能会导致错误堆叠问题。此外,它们难以应用于新的Semantic实体类型出现的情况。为了解决第一个挑战,我们提出了一个新的大规模人工标注数据集 named Complex Layout form for key information EXtraction(CLEX),该数据集包含5860张图像和1162个Semantic实体类型。为了解决第二个挑战,我们引入了并行指针网络(PPN),这是一种端到端模型,可以在零 shot和几 shot情况下应用。PPN利用Semantic实体之间的隐式做法来帮助提取,并且其并行提取机制使得它可以同时提取多个结果,高效地。实验表明,PPN在CLEX数据集上的性能明显超过了现有的状态态先进方法,同时也提供了 Much faster的推理速度。》

Dynamic Large Language Models on Blockchains

  • paper_url: http://arxiv.org/abs/2307.10549
  • repo_url: None
  • paper_authors: Yuanhao Gong
  • for: 这篇论文是为了提出一种基于区块链的动态大语言模型训练和部署方法,以解决现有大语言模型训练和部署所需的高 computation performance 和静态性的问题。
  • methods: 本篇论文提出了一种基于区块链的方法,通过将大语言模型训练和部署 onto 区块链上,以获得高 computation performance 和分布式的优势。此外,本篇论文还提出了一种基于用户输入的动态训练方法,让模型可以不断地学习用户的反馈。
  • results: 本篇论文的结果显示,基于区块链的动态大语言模型可以在不同的用户输入下进行不断的学习和改善,并且可以提供更高的准确率和更好的使用者体验。此外,本篇论文的结果还显示出了基于区块链的大语言模型训练和部署的可行性和可调性。
    Abstract Training and deploying the large language models requires a large mount of computational resource because the language models contain billions of parameters and the text has thousands of tokens. Another problem is that the large language models are static. They are fixed after the training process. To tackle these issues, in this paper, we propose to train and deploy the dynamic large language model on blockchains, which have high computation performance and are distributed across a network of computers. A blockchain is a secure, decentralized, and transparent system that allows for the creation of a tamper-proof ledger for transactions without the need for intermediaries. The dynamic large language models can continuously learn from the user input after the training process. Our method provides a new way to develop the large language models and also sheds a light on the next generation artificial intelligence systems.
    摘要 translate("Training and deploying the large language models requires a large mount of computational resource because the language models contain billions of parameters and the text has thousands of tokens. Another problem is that the large language models are static. They are fixed after the training process. To tackle these issues, in this paper, we propose to train and deploy the dynamic large language model on blockchains, which have high computation performance and are distributed across a network of computers. A blockchain is a secure, decentralized, and transparent system that allows for the creation of a tamper-proof ledger for transactions without the need for intermediaries. The dynamic large language models can continuously learn from the user input after the training process. Our method provides a new way to develop the large language models and also sheds a light on the next generation artificial intelligence systems.")Here's the translation:训练和部署大语言模型需要巨量计算资源,因为模型包含数十亿参数,文本有千个token。另一个问题是大语言模型是静态的,它们在训练过程后固化。为解决这些问题,在这篇论文中,我们提议在区块链上训练和部署动态大语言模型,区块链具有高计算性能和分布在计算机网络上,可以创建一个不可篡改的交易记录,无需中介人员。动态大语言模型可以在用户输入后继续学习。我们的方法提供了一种新的大语言模型开发方式,也照亮了下一代人工智能系统。

TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

  • paper_url: http://arxiv.org/abs/2307.10543
  • repo_url: https://github.com/windylee0822/trea
  • paper_authors: Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen
  • for: 提高对话 Context 理解,提供更有针对性的 Response。
  • methods: 利用多层次可拓展 Tree 结构来解释 causality 关系,并充分利用历史对话来生成更有针对性的 Response。
  • results: 在两个公共 CRS 数据集上进行了广泛的实验,证明了我们的方法的有效性。
    Abstract Conversational recommender systems (CRS) aim to timely trace the dynamic interests of users through dialogues and generate relevant responses for item recommendations. Recently, various external knowledge bases (especially knowledge graphs) are incorporated into CRS to enhance the understanding of conversation contexts. However, recent reasoning-based models heavily rely on simplified structures such as linear structures or fixed-hierarchical structures for causality reasoning, hence they cannot fully figure out sophisticated relationships among utterances with external knowledge. To address this, we propose a novel Tree structure Reasoning schEmA named TREA. TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach.
    摘要 对话式推荐系统 (CRS) 目的是在对话中追踪用户的动态喜好,并为其提供相应的回答以进行物品推荐。现在,许多外部知识库 (特别是知识 graphs) 被 incorporated into CRS 以增强对话上下文的理解。然而,最近的推理基于模型倾向于使用简单的结构,如线性结构或固定层次结构,以进行 causality 推理,因此无法完全理解对话中的复杂关系。为解决这个问题,我们提出了一个 noval Tree structure Reasoning schEmA 名为 TREA。TREA 使用多层次可扩展的树结构来clarify 受到提及的实体之间的 causal 关系,并充分利用历史对话来产生更合理和适合的回答来进行推荐。实验结果显示了我们的方法的有效性。

The Extractive-Abstractive Axis: Measuring Content “Borrowing” in Generative Language Models

  • paper_url: http://arxiv.org/abs/2307.11779
  • repo_url: None
  • paper_authors: Nedelina Teneva
  • for: 本文提出了一个新的评估生成模型的方法,即EXTRACTIVE-ABSTRACTIVE轴。
  • methods: 本文使用了生成模型的抽象性和EXTRACTIVE-ABSTRACTIVE轴来评估生成模型的性能。
  • results: 本文提出了一些相关的指标、数据集和注解指南,以便评估生成模型的抽象性和EXTRACTIVE-ABSTRACTIVE轴。
    Abstract Generative language models produce highly abstractive outputs by design, in contrast to extractive responses in search engines. Given this characteristic of LLMs and the resulting implications for content Licensing & Attribution, we propose the the so-called Extractive-Abstractive axis for benchmarking generative models and highlight the need for developing corresponding metrics, datasets and annotation guidelines. We limit our discussion to the text modality.
    摘要 Generative语言模型会生成非常抽象的输出,与搜索引擎的EXTRACTIVE响应不同。由于这一特点和内容授权的后果,我们提出了所谓的EXTRACTIVE-ABSTRACTIVE轴,用于评估生成模型,并需要开发相应的指标、数据集和注释指南。我们只讨论文本 modalities。Note: "EXTRACTIVE-ABSTRACTIVE轴" is a made-up term, and "文本modalities" is the plural form of "文本modalities" in Chinese.

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

  • paper_url: http://arxiv.org/abs/2307.10529
  • repo_url: None
  • paper_authors: Xueying Ding, Yue Zhao, Leman Akoglu
  • for: 本文旨在提出一种有效的超参数(HP)调整方法,以提高无监督的异常点检测(OD)模型的性能。
  • methods: 本文提出了一种新的嵌入式神经网络(DOD)模型,并使用了一种名为HYPER的 hypernetwork(HN)来调整DOD模型的超参数。HYPER使用了一种新的 meta-学arning 技术,以使用历史的OD任务中的标签来训练一个代理验证函数。
  • results: 在35个OD任务上,HYPER achieved high performance against 8 baselines with significant efficiency gains.
    Abstract Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.
    摘要 外异检测(OD)在许多应用中找到了广泛的应用,而且有许多技术的研究。深度神经网络基于的外异检测(DOD)在最近几年得到了广泛的关注,因为深度学习技术的进步。在这篇论文中,我们考虑了一个尚未得到充分研究的挑战:对于无监督的外异检测模型,有效地调整超参数(HP)。虽然之前的研究已经证明了外异检测模型对HP的敏感性,但是现代DOD模型的HP列表却非常长。我们提出了一种名为HYPER的方法,用于调整DOD模型。HYPER解决了两个基本挑战:无监督验证(由于缺乏异常数据)和高效地搜索HP/模型空间(由于HP的数量的增长)。我们的关键想法是设计和训练一个新的超网络(HN),将HP映射到外异检测模型的优化参数。然后,HYPER可以通过单个HN来动态生成多个DOD模型(对应于不同的HP),从而提供了显著的速度提升。此外,它还使用元学习来训练一个代理验证函数,这个函数通过我们提出的HN有效地训练。我们对35个OD任务进行了广泛的实验,结果显示HYPER可以高效地与8个基准模型进行比较,同时具有显著的效率优势。

Building Socio-culturally Inclusive Stereotype Resources with Community Engagement

  • paper_url: http://arxiv.org/abs/2307.10514
  • repo_url: None
  • paper_authors: Sunipa Dev, Jaya Goyal, Dinesh Tewari, Shachi Dave, Vinodkumar Prabhakaran
  • For: The paper aims to address the need for more culturally sensitive and representative evaluation resources for generative language models, specifically in the Indian context.* Methods: The authors use a community-engaged approach to build a resource of stereotypes unique to India, which increases the number of stereotypes known for the Indian context by over 1000.* Results: The authors demonstrate the utility and effectiveness of the expanded resource for evaluating language models and show that it can help identify harmful stereotypes that may be overlooked by traditional evaluation methods.Here are the three points in Simplified Chinese text:* For: 本文目的是为了增强语言模型评估资源的文化敏感性和代表性,特别是在印度上。* Methods: 作者采用社区参与的方法建立了印度特有的刻板印象资源,该资源在印度上增加了More than 1000的刻板印象。* Results: 作者表明了扩展资源的有用性和效果,并示出它可以帮助确定语言模型中的刻板印象,这些刻板印象可能被传统评估方法忽略。
    Abstract With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities to address this, as they are not representative of diverse, locally situated but global, socio-cultural perspectives. It is imperative that our evaluation resources are enhanced and calibrated by including people and experiences from different cultures and societies worldwide, in order to prevent gross underestimations or skews in measurements of harm. In this work, we demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping. We devise a community engaged effort to build a resource which contains stereotypes for axes of disparity that are uniquely present in India. The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities. We also demonstrate the utility and effectiveness of such expanded resources for evaluations of language models. CONTENT WARNING: This paper contains examples of stereotypes that may be offensive.
    摘要 随着生成语言模型在全球范围内的快速发展和应用,有一定的急需要扩大我们对害的衡量,不仅包括各种各样的害的类型和数量,还要考虑当地文化上下文,包括弱化的标签和社会偏见。现有的评估方法有限,因为它们没有代表多样化的、全球化的社会文化观点。为了避免严重的下预估或偏见,我们需要加强和调整我们的评估资源,包括从不同文化和社会世界中的人和经验中获取知识。在这种情况下,我们展示了一种具有社会文化意识的评估资源扩展方法,特别是在印度社会上下文中,对刻板印度人的害进行了社区参与的努力。我们制定了一个含有印度独特的负担轴的 sterotypes 资源,该资源包含了在印度上下文中独特的1000多个刻板印度人。我们还证明了这种扩展资源的有用性和效果,用于评估语言模型。警告:本文可能包含有害的刻板印度人示例。

IvyGPT: InteractiVe Chinese pathwaY language model in medical domain

  • paper_url: http://arxiv.org/abs/2307.10512
  • repo_url: None
  • paper_authors: Rongsheng Wang, Yaofei Duan, ChanTong Lam, Jiexi Chen, Jiangsheng Xu, Haoming Chen, Xiaohong Liu, Patrick Cheong-Iao Pang, Tao Tan
  • for: 这个论文是为了提出一种基于 LLaMA 的大型自然语言处理模型(IvyGPT),用于医疗问答和诊断。
  • methods: 这个论文使用了高质量医疗问答(QA)实例和人工回馈学习(RLHF)来训练和精度调整 IvyGPT。
  • results: 实验结果显示,IvyGPT 已经超越了其他医疗 GPT 模型,并且可以输出更加详细的诊断和治疗答案。
    Abstract General large language models (LLMs) such as ChatGPT have shown remarkable success. However, such LLMs have not been widely adopted for medical purposes, due to poor accuracy and inability to provide medical advice. We propose IvyGPT, an LLM based on LLaMA that is trained and fine-tuned with high-quality medical question-answer (QA) instances and Reinforcement Learning from Human Feedback (RLHF). After supervised fine-tuning, IvyGPT has good multi-turn conversation capabilities, but it cannot perform like a doctor in other aspects, such as comprehensive diagnosis. Through RLHF, IvyGPT can output richer diagnosis and treatment answers that are closer to human. In the training, we used QLoRA to train 33 billion parameters on a small number of NVIDIA A100 (80GB) GPUs. Experimental results show that IvyGPT has outperformed other medical GPT models.
    摘要 通用大型语言模型(LLM)如ChatGPT已经表现出了惊人的成功。然而,这些LLM还没有广泛应用于医疗领域,主要因为它们的精度不高并无法提供医学建议。我们提出了IvyGPT,基于LLaMA的LLM,通过高质量的医学问答(QA)实例和人工智能反馈学习(RLHF)进行训练和细化。经过超vision训练,IvyGPT具有良好的多turn对话能力,但它无法像医生一样在其他方面做出全面诊断。通过RLHF,IvyGPT可以输出更加丰富的诊断和治疗答案,更加接近人类。在训练中,我们使用了QLoRA来训练330亿参数的NVIDIA A100(80GB)GPU。实验结果表明,IvyGPT已经超过了其他医学GPT模型。

Markov Decision Processes with Time-Varying Geometric Discounting

  • paper_url: http://arxiv.org/abs/2307.10491
  • repo_url: None
  • paper_authors: Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic
  • for: 研究无穷 infinithorizon Markov decision processes(MDPs)中的时间变化的折损因子模型。
  • methods: 从游戏观点出发,每个时间步骤 treated as an independent decision maker with their own(固定)折损因子,研究其相对稳定点(SPE)的游戏观点和相关的算法问题。
  • results: 存在一个SPE的构造证明,并证明计算SPE是EXPTIME-hard的。此外,还证明存在一个$\epsilon$-SPE,并提供了一种计算$\epsilon$-SPE的算法,其时间复杂度upper bound为函数于时间变化的折损因子的收敛性。
    Abstract Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of $\epsilon$-SPE and show that an $\epsilon$-SPE exists under milder assumptions. An algorithm is presented to compute an $\epsilon$-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.
    摘要 标准的马尔可夫决策过程(MDP)模型通常使用固定的折扣因子进行减折扣。然而,一些最近的研究表明,在某些应用场景中,时变折扣是必要的。这篇论文研究了无穷 horizon MDP 中的时变折扣因子。我们从游戏观点出发,即每个时间步骤都是一个独立的决策者,每个决策者都有自己的固定折扣因子。我们研究这个游戏的子游戏完善平衡(SPE)以及相关的算法问题。我们提供了一个构造性的证明,证明了 SPE 的存在,并证明了计算 SPE 的复杂度是 EXPTIME 困难的。此外,我们还研究了 $\epsilon $-SPE 的概念,并证明了在较宽的假设下, $\epsilon $-SPE 存在。我们还提供了一个算法来计算 $\epsilon $-SPE,并给出了时间复杂度的Upper bound,具体取决于时变折扣因子的收敛性。

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

  • paper_url: http://arxiv.org/abs/2307.10490
  • repo_url: https://github.com/ebagdasa/multimodal_injection
  • paper_authors: Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov
  • for: 这篇论文旨在描述如何使用图像和声音作为多模态语言模型(LLM)的间接提示和指令注入攻击。
  • methods: 袋虫生成抗击噪阻抗器,并将其混合到图像或音频记录中。当用户问问 benign 模型关于损害图像或音频时,抗击噪阻会导致模型输出攻击者选择的文本和/或使 subsequential dialog 遵循攻击者的指令。
  • results: 论文通过several proof-of-concept例子,证明了这种攻击可以成功地控制 LLVA 和 PandaGPT 等多模态语言模型。
    Abstract We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
    摘要 我们展示了图像和声音可以用来导入多 modal LLM 中的间接提示和指令注入攻击。攻击者创建了这些提示的恶意变化,然后把它与图像或音频录音混合在一起。当用户对未修改的、良好的模型询问这些图像或音频时,变化将引导模型发出攻击者选择的文本和/或使之在对话中按照攻击者的指令继续。我们透过几个证明例子,显示了这种攻击可以对 LLaVa 和 PandaGPT 进行。

Backdoor Attack against Object Detection with Clean Annotation

  • paper_url: http://arxiv.org/abs/2307.10487
  • repo_url: None
  • paper_authors: Yize Cheng, Wenbin Hu, Minhao Cheng
  • for: 防止深度神经网络中的攻击,特别是对象检测任务中的攻击。
  • methods: 利用深度学习中的特性,对对象检测任务进行袋式攻击,包括对象消失攻击和对象生成攻击。
  • results: 在PASCAL VOC07+12和MSCOCO两个对象检测数据集上,实现了对象检测攻击成功率超过92%,且杂点率仅5%。
    Abstract Deep neural networks (DNNs) have shown unprecedented success in object detection tasks. However, it was also discovered that DNNs are vulnerable to multiple kinds of attacks, including Backdoor Attacks. Through the attack, the attacker manages to embed a hidden backdoor into the DNN such that the model behaves normally on benign data samples, but makes attacker-specified judgments given the occurrence of a predefined trigger. Although numerous backdoor attacks have been experimented on image classification, backdoor attacks on object detection tasks have not been properly investigated and explored. As object detection has been adopted as an important module in multiple security-sensitive applications such as autonomous driving, backdoor attacks on object detection could pose even more severe threats. Inspired by the inherent property of deep learning-based object detectors, we propose a simple yet effective backdoor attack method against object detection without modifying the ground truth annotations, specifically focusing on the object disappearance attack and object generation attack. Extensive experiments and ablation studies prove the effectiveness of our attack on two benchmark object detection datasets, PASCAL VOC07+12 and MSCOCO, on which we achieve an attack success rate of more than 92% with a poison rate of only 5%.
    摘要 Inspired by the inherent properties of deep learning-based object detectors, we propose a simple yet effective backdoor attack method against object detection without modifying the ground truth annotations. We focus on two types of attacks: object disappearance and object generation. Our attack method can achieve an attack success rate of over 92% with a poison rate of only 5% on two popular object detection datasets, PASCAL VOC07+12 and MSCOCO.Extensive experiments and ablation studies demonstrate the effectiveness of our attack method. We also show that existing defenses against backdoor attacks are ineffective against our method. Our findings highlight the urgent need for more robust defenses against backdoor attacks in object detection tasks.

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

  • paper_url: http://arxiv.org/abs/2307.10472
  • repo_url: None
  • paper_authors: Omkar Dige, Jacob-Junqi Tian, David Emerson, Faiza Khan Khattak
  • for: 这个论文是为了评估指定 fine-tuned 语言模型中的偏见性而写的。
  • methods: 这个论文使用了零shot提问的方法来评估语言模型的偏见性。
  • results: 研究发现,使用 Alpaca 7B 模型,在偏见性识别任务上获得了56.7%的准确率,而 scaling up LLM 大小和数据多样性可能会导致更好的性能。
    Abstract As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
    摘要 为了应对语言模型应用的快速扩展,现在越来越重要建立有效的测试和干预模型学习或遗传社会偏见的框架。在这篇论文中,我们展示了我们在零条件提示中评估推广语言模型的偏见识别能力,包括链接思维(CoT)提示。在LLaMA和其两个导入精通版本中,Alpaca 7B在偏见识别任务上表现最好,具体的准确率为56.7%。我们也示出了将LLM大小和数据多样性增加可以带来更大的性能提升。这是我们的偏见干预框架的首个部分,我们将继续更新这个工作,当我们获得更多结果时。

Classification of Visualization Types and Perspectives in Patents

  • paper_url: http://arxiv.org/abs/2307.10471
  • repo_url: https://github.com/tibhannover/patentimageclassification
  • paper_authors: Junaid Ahmed Ghauri, Eric Müller-Budack, Ralph Ewerth
  • for: 这篇论文的目的是提高专利搜寻和检索的效率,使用最新的深度学习方法来分类专利图像中的不同类型和角度。
  • methods: 本论文使用了现今的深度学习方法,包括矩阵变数储存(Transformers),来分类专利图像中的不同类型和角度。
  • results: 实验结果显示了提案的方法的可行性,并且提供了可用于实际应用的代码、模型和数据集。
    Abstract Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
    摘要 In this paper, we employ state-of-the-art deep learning techniques for the classification of visualization types and perspectives in patent images. We expand the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. Additionally, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Our experimental results demonstrate the feasibility of the proposed approaches. The source code, models, and dataset will be publicly available.

A data science axiology: the nature, value, and risks of data science

  • paper_url: http://arxiv.org/abs/2307.10460
  • repo_url: None
  • paper_authors: Michael L. Brodie
  • for: 这篇论文是为了探讨数据科学的axiology,即其目的、性质、重要性、风险和价值,以帮助理解和定义数据科学,并找到其可能的 beneficii、风险和研究挑战。
  • methods: 本论文使用了 axiological 方法,通过探讨数据科学的remarkable和definitive特性来评估其值和风险。
  • results: 本论文认为,数据科学在其初始阶段,axiology 可以帮助我们更好地理解和定义它,并找到其可能的 beneficii、风险和研究挑战。
    Abstract Data science is not a science. It is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery that is not otherwise possible and can be beyond human reasoning. It is changing our world practically and profoundly already widely deployed in tens of thousands of applications in every discipline in an AI Arms Race that, due to its inscrutability, can lead to unfathomed risks. This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving, by exploring and evaluating its remarkable, definitive features. As data science is in its infancy, this initial, speculative axiology is intended to aid in understanding and defining data science to recognize its potential benefits, risks, and open research challenges. AI based data science is inherently about uncertainty that may be more realistic than our preference for the certainty of science. Data science will have impacts far beyond knowledge discovery and will take us into new ways of understanding the world.
    摘要 数据科学不是一门科学。它是一种研究方法论,具有未曾探索的范围、大小、复杂性和知识发现的能力,超过人类的理解。它正在改变我们的世界,在每个领域普遍应用了数以千计的应用程序,并在人工智能竞赛中广泛应用。由于它的不可知晓性,这可能会导致不可预期的风险。本文提出了数据科学的axiology,即其目的、性质、重要性、风险和问题解决的价值,通过探索和评估它的卓越特征来帮助理解和定义数据科学,并识别其潜在的好处、风险和研究挑战。由于数据科学处于其初期阶段,这个初步的、观测的axiology可以帮助我们更好地理解和定义它,并掌握其潜在的价值和风险。人工智能基础的数据科学是不确定的,可能比我们更好地适应现实世界。数据科学将对我们的世界产生深远的影响,将带我们进入新的世界理解方式。

A New Computationally Simple Approach for Implementing Neural Networks with Output Hard Constraints

  • paper_url: http://arxiv.org/abs/2307.10459
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Andrei V. Konstantinov, Lev V. Utkin
  • for: 这篇论文的目的是提出一种新的计算简单的神经网络输出值约束方法。
  • methods: 该方法的关键思想是将神经网络参数 вектор映射到一个确保处于可能的约束集的点上。该映射是通过额外的神经网络层实现,该层带有约束条件。
  • results: 该方法可以简单地扩展到对输出 Vector 以外的同时受约束的情况,并且可以简单地实现投影方法来实现约束。该方法的计算简单,前向传播的复杂度为O(n*m)和O(n^2*m)。 numerics 实验用于解决优化和分类问题。
    Abstract A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
    摘要 一种新的 computationally simple method for imposing hard convex constraints on neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by an additional neural network layer with constraints on the output. The proposed method can be easily extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can also be simply implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, as well as constraints in the form of boundaries. An important feature of the method is its computational simplicity, with the complexities of the forward pass of the proposed neural network layer being O(n\*m) and O(n^2\*m), respectively, where n is the number of variables and m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems, and the code implementing the method is publicly available.

Complying with the EU AI Act

  • paper_url: http://arxiv.org/abs/2307.10458
  • repo_url: None
  • paper_authors: Jacintha Walters, Diptish Dey, Debarati Bhaumik, Sophie Horsman
  • for: 本研究旨在描述欧盟人工智能法案(AI Act)中不同类别的实施情况,并通过问卷调查获取量化数据,以提供有关组织实施AI Act的启示。
  • methods: 本研究使用问卷调查方法来收集数据,并分析数据以挖掘不同类别组织面临的挑战,以及这些挑战如何与组织特点相关。
  • results: 研究发现,不同类别组织面临的挑战有所不同,而大型和特定领域的组织面临更大的挑战。此外,问卷调查还显示了各个问题的占比,包括AI Act的内容和应用方面。
    Abstract The EU AI Act is the proposed EU legislation concerning AI systems. This paper identifies several categories of the AI Act. Based on this categorization, a questionnaire is developed that serves as a tool to offer insights by creating quantitative data. Analysis of the data shows various challenges for organizations in different compliance categories. The influence of organization characteristics, such as size and sector, is examined to determine the impact on compliance. The paper will also share qualitative data on which questions were prevalent among respondents, both on the content of the AI Act as the application. The paper concludes by stating that there is still room for improvement in terms of compliance with the AIA and refers to a related project that examines a solution to help these organizations.
    摘要 Translation notes:* "AI Act" is translated as "欧盟AI法规" (EU AI Act)* "categories" is translated as "类别" (categories)* "questionnaire" is translated as "问卷" (questionnaire)* "compliance" is translated as "合规" (compliance)* "organization" is translated as "组织" (organization)* "size" is translated as "大小" (size)* "sector" is translated as "领域" (sector)* "qualitative data" is translated as "质量数据" (qualitative data)* "content" is translated as "内容" (content)* "application" is translated as "应用" (application)* "improvement" is translated as "改进" (improvement)* "solution" is translated as "解决方案" (solution)

A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

  • paper_url: http://arxiv.org/abs/2307.10455
  • repo_url: https://github.com/zahrag/BIOSCAN-1M
  • paper_authors: Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T. A. McKeown, Chris C. Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth
  • for: 本研究旨在开发一个基于图像的生物多样性调查方法,以探索全球生物多样性的详细结构。
  • methods: 该研究使用了大量手动标注的昆虫图像集,以及相关的遗传信息,包括raw nucleotide barcode sequences和归类指标。
  • results: 研究人员通过实现和分析一种基线分类器来介绍图像基于的生物分类问题的特点和挑战。
    Abstract In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier.
    摘要 我团队在尝试 catalog insect 多样性时,我们提出了一个新的大型手标记昆虫图像集合,称为 BIOSCAN-Insect 数据集。每个记录都被taxonomically分类由专家,同时还包括Raw nucleotide barcode sequences和分配给barcode index numbers,这些是基于物种分类的生物 marker。本文介绍了一个精心约 million 张图像集合,主要用于训练计算机视觉模型,以提供图像基本的 taxonomic assessment。然而,该数据集还具有一些吸引人的特点,研究这些特点会对机器学习社区产生感兴趣。由于数据集的生物性质,它展现了一种长尾分布。此外,taxonomic 标签是一种层次分类方案,这使得图像分类问题变得非常细化。本文介绍了数据集和基线分类器的实现和分析,以便进一步推动生物多样性研究在机器学习社区中的发展。

Learning Formal Specifications from Membership and Preference Queries

  • paper_url: http://arxiv.org/abs/2307.10434
  • repo_url: None
  • paper_authors: Ameesh Shah, Marcell Vazquez-Chanlatte, Sebastian Junges, Sanjit A. Seshia
  • for: 学习正式规定,如自动机。
  • methods: 提议一种新的框架,异步请求组合会员标签和对比 preference,而不仅仅是会员标签。
  • results: 在两个不同领域中实现了框架,并显示了通过组合Modalities来异步学习规定的可靠性和方便性。
    Abstract Active learning is a well-studied approach to learning formal specifications, such as automata. In this work, we extend active specification learning by proposing a novel framework that strategically requests a combination of membership labels and pair-wise preferences, a popular alternative to membership labels. The combination of pair-wise preferences and membership labels allows for a more flexible approach to active specification learning, which previously relied on membership labels only. We instantiate our framework in two different domains, demonstrating the generality of our approach. Our results suggest that learning from both modalities allows us to robustly and conveniently identify specifications via membership and preferences.
    摘要 aktive lerning 是一种已经广泛研究的学习形式,用于学习正式规范,如自动机。在这项工作中,我们延伸 aktive specification learning 的框架,提议一种新的框架,强调策略性地请求混合成员标签和对比标签,这是成员标签的受欢迎替代方案。这种混合的方式允许更加灵活地进行 aktive specification learning,之前只能通过成员标签进行学习。我们在两个不同的领域中实现了我们的框架,证明了我们的方法的一致性。我们的结果表明,从两种模式中学习可以robustly和方便地识别规范via成员和偏好。

PreDiff: Precipitation Nowcasting with Latent Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.10422
  • repo_url: None
  • paper_authors: Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang
  • for: 这篇研究旨在提出一个可能性推断模型,以实现地球系统预测中的不确定性处理和专业知识整合。
  • methods: 研究使用了两个阶段构成的潜在阶段推断管线,包括一个名为PreDiff的可能性条件扩散模型,以及一个内置的专业知识控制机制,以确保预测符合物理限制。
  • results: 实验结果显示PreDiff能够有效地处理不确定性,整合专业知识,并产生高操作价值的预测。
    Abstract Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
    摘要 地球系统预报traditionally rely于复杂的物理模型,这些模型 computationally expensive和需要特定领域专家知识。过去一代,随着无 precedented 的类时空 Earth observation data的增加,使得可以使用深度学习技术建立数据驱动的预报模型。这些模型在多种 Earth system预报任务中表现了承诺,但是它们可能会处理不确定性或忽略领域特定的专业知识,导致预报结果模糊或生成物理不合理的预测。为了解决这些限制,我们提议了一个two-stage管道 для抽象的类时空预报:1. 我们开发了一个名为PreDiff的conditional latent diffusion模型,可以实现抽象的预报。2. 我们将Explicit知识控制机制加入管道中,以确保预报与领域特定的物理限制相align。这是通过在每个排除步骤中估算对套用的条件为预报过滤条件,然后调整转换分布而实现。我们在两个数据集上进行了实验:N-body MNIST和SEVIR。Specifically,我们在N-body MNIST中强制遵循能量守恒定律,并在SEVIR中预测预测 precipitation intensity。实验结果显示PreDiff可以高效地处理不确定性,把领域特定的专业知识纳入预报中,并生成高效用的预报。

GOOSE Algorithm: A Powerful Optimization Tool for Real-World Engineering Challenges and Beyond

  • paper_url: http://arxiv.org/abs/2307.10420
  • repo_url: None
  • paper_authors: Rebwar Khalid Hamad, Tarik A. Rashid
  • For: The paper proposes a novel metaheuristic algorithm called GOOSE, which is inspired by the behavior of geese during rest and foraging. The algorithm is designed to solve optimization problems.* Methods: The GOOSE algorithm uses a combination of balance and guarding mechanisms to search for the optimal solution. It is benchmarked on 19 well-known test functions and compared with four other algorithms.* Results: The results show that the GOOSE algorithm outperforms the other algorithms on 10 modern benchmark functions and 5 classical benchmark functions. It is also applied to three real-world engineering challenges and shows good performance in optimizing these problems.
    Abstract This study proposes the GOOSE algorithm as a novel metaheuristic algorithm based on the goose's behavior during rest and foraging. The goose stands on one leg and keeps his balance to guard and protect other individuals in the flock. The GOOSE algorithm is benchmarked on 19 well-known benchmark test functions, and the results are verified by a comparative study with genetic algorithm (GA), particle swarm optimization (PSO), dragonfly algorithm (DA), and fitness dependent optimizer (FDO). In addition, the proposed algorithm is tested on 10 modern benchmark functions, and the gained results are compared with three recent algorithms, such as the dragonfly algorithm, whale optimization algorithm (WOA), and salp swarm algorithm (SSA). Moreover, the GOOSE algorithm is tested on 5 classical benchmark functions, and the obtained results are evaluated with six algorithms, such as fitness dependent optimizer (FDO), FOX optimizer, butterfly optimization algorithm (BOA), whale optimization algorithm, dragonfly algorithm, and chimp optimization algorithm (ChOA). The achieved findings attest to the proposed algorithm's superior performance compared to the other algorithms that were utilized in the current study. The technique is then used to optimize Welded beam design and Economic Load Dispatch Problem, three renowned real-world engineering challenges, and the Pathological IgG Fraction in the Nervous System. The outcomes of the engineering case studies illustrate how well the suggested approach can optimize issues that arise in the real-world.
    摘要 这项研究提出了一种新的元朋凝融算法,基于鹅的休息和搜寻行为。鹅站在一个脚上,保持平衡,以保护和保障鸟群其他成员。这种算法被测试在19个知名的测试函数上,并与遗传算法(GA)、 particle swarm优化算法(PSO)、龙虾算法(DA)和优化器(FDO)进行比较研究。此外,提出的算法还被测试在10个现代测试函数上,并与三种最新的算法,如龙虾算法(WOA)、鳄鱼算法(SSA)和蝴蝶算法(BOA)进行比较。此外,GOOSE算法还被测试在5个经典测试函数上,并与6种算法,如依赖度优化器(FDO)、FOX优化器、蝴蝶算法(BOA)、龙虾算法、鳄鱼算法和猩猩算法(ChOA)进行比较。实验结果证明,提出的算法在与其他算法进行比较时表现出色。然后,这种算法被应用于焊接梁设计和经济荷负调度问题,以及神经系统中免疫力IgG分数的优化问题。工程实践研究的结果表明,这种方法可以高效地解决现实中出现的问题。

Explaining Autonomous Driving Actions with Visual Question Answering

  • paper_url: http://arxiv.org/abs/2307.10408
  • repo_url: https://github.com/shahin-01/vqa-ad
  • paper_authors: Shahin Atakishiyev, Mohammad Salameh, Housam Babiker, Randy Goebel
  • for: 本研究旨在提供一种可解释的自动驾驶技术,以便更好地理解自动驾驶车辆的决策过程。
  • methods: 本研究使用了可见问答(VQA)框架,通过问题answering-based causal reasoning来解释自动驾驶车辆的决策。
  • results: 研究发现,VQA机制可以提供支持来解释自动驾驶车辆的决策过程,并帮助提高整体驾驶安全性。
    Abstract The end-to-end learning ability of self-driving vehicles has achieved significant milestones over the last decade owing to rapid advances in deep learning and computer vision algorithms. However, as autonomous driving technology is a safety-critical application of artificial intelligence (AI), road accidents and established regulatory principles necessitate the need for the explainability of intelligent action choices for self-driving vehicles. To facilitate interpretability of decision-making in autonomous driving, we present a Visual Question Answering (VQA) framework, which explains driving actions with question-answering-based causal reasoning. To do so, we first collect driving videos in a simulation environment using reinforcement learning (RL) and extract consecutive frames from this log data uniformly for five selected action categories. Further, we manually annotate the extracted frames using question-answer pairs as justifications for the actions chosen in each scenario. Finally, we evaluate the correctness of the VQA-predicted answers for actions on unseen driving scenes. The empirical results suggest that the VQA mechanism can provide support to interpret real-time decisions of autonomous vehicles and help enhance overall driving safety.
    摘要 自驾车技术在过去一代取得了重大突破,归功于深度学习和计算机视觉算法的快速发展。然而,由于自驾车技术是安全关键的人工智能应用,因此需要解释自驾车的决策。为实现自驾车决策的解释,我们提出了视觉问答(VQA)框架,该框架通过问答对 causal 理解来解释自驾车的决策。首先,我们使用回归学习(RL)在模拟环境中收集了驾驶视频数据,并从这些日志数据中采样出了五个动作类别的连续帧。然后,我们手动标注了这些抽取的帧,使用问题对答对为每个场景选择的动作提供了证明。最后,我们对未看过的驾驶场景中VQA预测的答案进行了评估。实际结果表明,VQA机制可以为自驾车决策提供支持,并帮助提高总体驾驶安全性。

Technical Challenges of Deploying Reinforcement Learning Agents for Game Testing in AAA Games

  • paper_url: http://arxiv.org/abs/2307.11105
  • repo_url: None
  • paper_authors: Jonas Gillberg, Joakim Bergdahl, Alessandro Sestini, Andrew Eakins, Linus Gisslen
  • for: 本研究旨在推广游戏生产中的机器学习应用,特意是通过强化学习增加自动游戏测试解决方案中的测试覆盖率。
  • methods: 本研究使用了训练bot的强化学习系统,与现有的脚本bot测试解决方案集成。
  • results: 研究在AAA游戏《战field 2042》和《尸 espacio》(2023)中实现了增加测试覆盖率的目的,并提出了一些可能有价值的研究方向,以帮助游戏业者快速采用这种技术。
    Abstract Going from research to production, especially for large and complex software systems, is fundamentally a hard problem. In large-scale game production, one of the main reasons is that the development environment can be very different from the final product. In this technical paper we describe an effort to add an experimental reinforcement learning system to an existing automated game testing solution based on scripted bots in order to increase its capacity. We report on how this reinforcement learning system was integrated with the aim to increase test coverage similar to [1] in a set of AAA games including Battlefield 2042 and Dead Space (2023). The aim of this technical paper is to show a use-case of leveraging reinforcement learning in game production and cover some of the largest time sinks anyone who wants to make the same journey for their game may encounter. Furthermore, to help the game industry to adopt this technology faster, we propose a few research directions that we believe will be valuable and necessary for making machine learning, and especially reinforcement learning, an effective tool in game production.
    摘要 从研究到生产,特别是 для大型和复杂的软件系统,是一个基本困难的问题。在大规模游戏生产中,一个主要的原因是开发环境和产品环境之间的差异。在这份技术著作中,我们描述了将实验式学习系统添加到现有的自动游戏测试解决方案基于脚本 Bot 以增加其容量。我们报告了在一些 AAA 游戏,包括 Battlefield 2042 和 Dead Space (2023) 中如何将这个学习系统与测试系统集成,以增加测试覆盖率。这份技术著作的目的是展示在游戏生产中如何使用学习系统,并讨论一些可能会遇到的主要时间潜雷。此外,为了帮助游戏业界更快地采用这些技术,我们建议了一些研究方向,我们认为这些方向将是有价值和必要的,以使机器学习和特别是学习系统成为游戏生产中的有效工具。

Interpreting and Correcting Medical Image Classification with PIP-Net

  • paper_url: http://arxiv.org/abs/2307.10404
  • repo_url: https://github.com/m-nauta/pipnet
  • paper_authors: Meike Nauta, Johannes H. Hegeman, Jeroen Geerdink, Jörg Schlötterer, Maurice van Keulen, Christin Seifert
  • for: 这篇论文探讨了使用可解释的机器学习模型,尤其是PIP-Net,在实际医疗图像数据上进行自动诊断支持。
  • methods: 这篇论文使用PIP-Net模型,学习人类理解的图像组件prototype,并评估其精度和可解释性在骨折检测和皮肤癌诊断方面。
  • results: 研究发现,PIP-Net的决策过程与医学分类标准相一致,只需要提供图像级别的类别标签。此外,PIP-Net还可以轻松地检测数据质量问题,如X光图像中的不必要文本或标签错误。最后,我们发现,人类可以直接禁用PIP-Net的不想要的 проtotypes,以 corrrect其决策过程。
    Abstract Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.
    摘要 <>translate the following text into Simplified Chinese:Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.Translate the text into Simplified Chinese.Here's the translation:Part-prototype 模型是一种可解释的设计图像分类器,也是黑盒AI的有力的替代方案。本文探讨了PIP-Net在实际医疗影像数据上的适用性和潜力,并评估了它的准确率和可解释性在骨折检测和皮肤癌诊断方面。我们发现PIP-Net的决策过程与医学分类标准相一致,只需要图像级别的类别标签。由于PIP-Net在前期无监督学习prototype的方式下,可以轻松地标识数据质量问题,如X射线图像中的不想要的文本或标签错误。此外,我们是首次显示人类可以手动 corriger PIP-Net的逻辑,通过直接禁用不需要的prototype来改善其判断。我们 conclude that part-prototype 模型在医疗应用中具有可解释性和进一步的模型调试潜力。

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

  • paper_url: http://arxiv.org/abs/2307.10172
  • repo_url: https://github.com/salesforce/DialogStudio
  • paper_authors: Jianguo Zhang, Kun Qian, Zhiwei Liu, Shelby Heinecke, Rui Meng, Ye Liu, Zhou Yu, Huan Wang, Silvio Savarese, Caiming Xiong
  • For: The paper aims to introduce DialogStudio, a large and diverse collection of dialogue datasets, to address the challenges of handling diverse conversational tasks and improve the comprehensiveness of existing dialogue dataset collections.* Methods: The paper uses a consistent format to unify diverse dialogue datasets, including open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues. The authors also identify licenses for each dataset and design domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning.* Results: The authors develop conversational AI models using the dataset collection and demonstrate their superiority in both zero-shot and few-shot learning scenarios. They also make all datasets, licenses, codes, and models associated with DialogStudio publicly accessible at https://github.com/salesforce/DialogStudio to support dataset and task-based research, as well as language model pre-training.
    Abstract Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness. To tackle these issues, we introduce DialogStudio: the largest and most diverse collection of dialogue datasets, unified under a consistent format while preserving their original information. Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues, making it an incredibly rich and diverse resource for dialogue research and model training. To further enhance the utility of DialogStudio, we identify the licenses for each dataset and design domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning. Furthermore, we develop conversational AI models using the dataset collection, and our experiments in both zero-shot and few-shot learning scenarios demonstrate the superiority of DialogStudio. To improve transparency and support dataset and task-based research, as well as language model pre-training, all datasets, licenses, codes, and models associated with DialogStudio are made publicly accessible at https://github.com/salesforce/DialogStudio
    摘要 尽管 conversational AI 技术有所进步,但语言模型在各种对话任务中仍然遇到挑战,现有对话集合也常lacks diversity和completeness。为解决这些问题,我们介绍 DialogStudio:最大最多样化的对话集合,具有一致的格式而不产生数据损失。我们的集合包括了开放领域对话、任务启发对话、自然语言理解、对话推荐、对话概要和知识启发对话,这使得它成为对话研究和模型训练的极其富裕和多样化资源。为进一步提高 DialogStudio 的实用性,我们确定了每个数据集的许可证,并设计了适应域匹配的提示,以便实现 instrucion-aware fine-tuning。此外,我们使用 DialogStudio 集合来开发 conversational AI 模型,我们的实验表明,在零shot 和几shot 学习场景中,DialogStudio 具有显著的优势。为提高透明度和支持数据集和任务基础研究以及语言模型预训练,我们将所有相关的数据集、许可证、代码和模型 associatted WITH DialogStudio 公开访问于 GitHub 上,请参考 https://github.com/salesforce/DialogStudio。

LightPath: Lightweight and Scalable Path Representation Learning

  • paper_url: http://arxiv.org/abs/2307.10171
  • repo_url: None
  • paper_authors: Sean Bin Yang, Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen
  • for: 提供高效、可扩展的路径表示学习框架,用于智能交通和智能城市应用。
  • methods: 提议一种轻量级、可扩展的路径表示学习框架,包括稀疏自动编码器、关系逻辑推理框架和全球-本地知识传播。
  • results: 经过广泛的实验 validate 了该框架的效率、可扩展性和表现力。
    Abstract Movement paths are used widely in intelligent transportation and smart city applications. To serve such applications, path representation learning aims to provide compact representations of paths that enable efficient and accurate operations when used for different downstream tasks such as path ranking and travel cost estimation. In many cases, it is attractive that the path representation learning is lightweight and scalable; in resource-limited environments and under green computing limitations, it is essential. Yet, existing path representation learning studies focus on accuracy and pay at most secondary attention to resource consumption and scalability. We propose a lightweight and scalable path representation learning framework, termed LightPath, that aims to reduce resource consumption and achieve scalability without affecting accuracy, thus enabling broader applicability. More specifically, we first propose a sparse auto-encoder that ensures that the framework achieves good scalability with respect to path length. Next, we propose a relational reasoning framework to enable faster training of more robust sparse path encoders. We also propose global-local knowledge distillation to further reduce the size and improve the performance of sparse path encoders. Finally, we report extensive experiments on two real-world datasets to offer insight into the efficiency, scalability, and effectiveness of the proposed framework.
    摘要 移动路径在智能交通和智能城市应用广泛使用。为服务这些应用,路径表示学习目标寻求提供高效精准的操作,以便于不同下游任务 such as 路径排名和旅行成本估算。在资源有限的环境和绿色计算限制下,很多情况下,很重要的路径表示学习是轻量级和可扩展的。然而,现有的路径表示学习研究主要关注准确性,尽管只是次要关注资源消耗和可扩展性。我们提出一个轻量级和可扩展的路径表示学习框架,名为LightPath,以减少资源消耗并实现可扩展性,而无需减少准确性。更具体来说,我们首先提出一个稀疏自动编码器,以确保框架在路径长度方面具有良好的可扩展性。然后,我们提出一个关系理解框架,以更快地训练更稀疏的路径编码器。最后,我们提出全球-本地知识传播,以进一步减小路径编码器的大小并提高其性能。我们在两个真实世界数据集上进行了广泛的实验,以提供有关效率、可扩展性和有效性的深入了解。

Challenges and Applications of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10169
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy
  • for: 本研究旨在为机器学习研究人员快速了解大语言模型(LLMs)领域的当前状况,以便更快地成为生产力的一员。
  • methods: 本研究采用系统的方法描述了LLMs领域的开放问题和成功应用领域,以便帮助研究人员更快地了解领域的当前状况。
  • results: 本研究对LLMs领域的当前状况进行了系统的描述,并identified several open problems and successful application areas, which can help researchers quickly understand the field and become productive.
    Abstract Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive.
    摘要 大语言模型(LLM)从不存在到普遍的Machine Learning话语中几年内快速发展。由于这个领域的快速进程,它很难分析剩下的挑战和已经有成果的应用领域。在这篇论文中,我们想要建立一套系统的开问和应用成功,以便ML研究人员更快地理解领域的当前状态,更快地成为产品力。

Robust Driving Policy Learning with Guided Meta Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.10160
  • repo_url: None
  • paper_authors: Kanghoon Lee, Jiachen Li, David Isele, Jinkyoo Park, Kikuo Fujimura, Mykel J. Kochenderfer
  • for: 实现自驾车辆在互动交通enario中的自主通行
  • methods: 采用一个单一meta-policy来训练多元的驾驶策略,通过随机调整社交车辆之间的互动奖励函数来生成多元的目标,并透过导引策略来训练meta-policy
  • results: 成功将ego车辆的驾驶策略导入不同的社交车辆行为中,并在一个具有挑战性的无控T字路口scenario中训练出一个具有弹性和可靠性的驾驶策略
    Abstract Although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors. In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives. We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.
    摘要 In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives.We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.translate into Simplified Chinese:although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors.在这个工作中,我们提出了一种有效的方法,通过随机化社交车辆之间的互动奖励函数,来训练多样化的驾驶策略。我们通过指导策略来有效地训练单一的元策略。此外,我们还提出了一种增强驾驶策略的训练策略,使用已学习的元策略控制社交车辆在环境中。我们成功地在一个复杂的无控T路口场景中学习了一个普适的驾驶策略,并能够在未看到的社交车辆行为下保持稳定。

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

  • paper_url: http://arxiv.org/abs/2307.10142
  • repo_url: https://github.com/se-hwan/pbrs-humanoid
  • paper_authors: Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim
  • for: 本研究旨在比较标准形式的奖励拟合和 potential based reward shaping (PBRS) 在人工智能机器人中的表现。
  • methods: 本研究使用了标准的奖励拟合和 PBRS 方法,并对两种方法在高维系统中进行了比较。
  • results: 研究发现,在高维系统中,PBRS 的性能提升效果相对较弱,但 PBRS 奖励项在不同的缩放比例下表现更加稳定和易于调整。
    Abstract The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.
    摘要 主要挑战在开发有效的强化学习(RL)管道是设计和调整奖励函数。Well-designed 形式的奖励可以导致学习速度更快。然而,未经适当调整的奖励可能会与 желаем的行为冲突,导致过拟合或者even erratic performance。理论上,广泛的 potential based reward shaping(PBRS)可以帮助学习过程进行导航,无需影响最佳策略。虽然一些研究已经探讨了使用 potential based reward shaping 加速学习的潜在性,但大多数研究仅局限于格子世界和低维系统,RL在 robotics 中主要依靠标准的奖励形式。在这篇论文中,我们对标准的奖励形式和 PBRS 进行了比较,发现在这个高维系统中,PBRS 只有微妙的提高了速度。然而,PBRS 的奖励项目较标准的奖励形式更加稳定,更容易调整。

cs.CL - 2023-07-20

What Twitter Data Tell Us about the Future?

  • paper_url: http://arxiv.org/abs/2308.02035
  • repo_url: None
  • paper_authors: Alina Landowska, Marek Robak, Maciej Skorski
  • for: This paper aims to investigate the futures projected by futurists on Twitter and explore the impact of language cues on anticipatory thinking among social media users.
  • methods: The study uses a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using state-of-the-art models. The research employs topic modeling techniques, such as LDA and BERT, to identify the topics and language cues used by futurists.
  • results: The study finds that the futurists’ language cues signal futures-in-the-making, which enhance social media users’ ability to anticipate and respond to their own scenarios in the present. The research identifies 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists’ tweets, providing insights into the futures anticipated by Twitter’s futurists.
    Abstract Anticipation is a fundamental human cognitive ability that involves thinking about and living towards the future. While language markers reflect anticipatory thinking, research on anticipation from the perspective of natural language processing is limited. This study aims to investigate the futures projected by futurists on Twitter and explore the impact of language cues on anticipatory thinking among social media users. We address the research questions of what futures Twitter's futurists anticipate and share, and how these anticipated futures can be modeled from social data. To investigate this, we review related works on anticipation, discuss the influence of language markers and prestigious individuals on anticipatory thinking, and present a taxonomy system categorizing futures into "present futures" and "future present". This research presents a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using SOTA models. The study identifies 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists' tweets. These findings contribute to the research on topic modelling and provide insights into the futures anticipated by Twitter's futurists. The research demonstrates the futurists' language cues signals futures-in-the-making that enhance social media users to anticipate their own scenarios and respond to them in present. The fully open-sourced dataset, interactive analysis, and reproducible source code are available for further exploration.
    摘要 人类思维能力中的预期是一种基本的认知能力,它涉及到思考和生活在未来的事物。虽然语言标记反映了预期思维,但从自然语言处理的角度进行研究的研究却很有限。这项研究希望通过Twitter上的未来推测者来调查他们预测的未来,并explore语言提示对预期思维的影响。我们回答了关于Twitter上未来推测者预测和分享的未来是什么,以及这些预测的语言标记如何模型社交数据。为了调查这一点,我们查看相关的研究成果,讨论语言标记和具有影响力的人员对预期思维的影响,并提出一个分类系统,将未来分为“现在未来”和“未来现在”。这项研究通过对公共分享的微博上的未来推测者的数据进行编译,并使用最新的NLP管道实现了可扩展的NLU。我们通过LDA方法和BERTopic方法对未来推测者的微博中提取了15个话题和100个特定话题。这些发现对话题模型进行研究做出了贡献,并为Twitter上未来推测者的语言提示提供了深入的理解。研究表明,未来推测者的语言提示是未来在创造的信号,可以使社交媒体用户预测自己的enario并在现在回应。完整的开源数据、交互分析和可重复的源代码都可以进一步探索。

FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback

  • paper_url: http://arxiv.org/abs/2307.10867
  • repo_url: https://github.com/figcapshf/figcapshf
  • paper_authors: Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi
  • for: 这篇论文是为了提高科学图像标题的自动生成技术,以满足读者的需求。
  • methods: 该论文使用了一种新的框架,即 FigCaps-HF,来生成图像标题。该框架包括自动评估图像标题对的质量以及人工反馈学习(RLHF)方法,以便根据读者的喜好进行标题生成。
  • results: 该论文的实验结果表明,使用 FigCaps-HF 框架可以提高标题生成的性能,特别是当使用 BLIP 作为基础模型时,RLHF 方法可以实现mean gain的35.7%、16.9%和9%在 ROUGE、BLEU 和 Meteor 等指标上。此外,该论文还释放了一个大规模的人工反馈标题对数据集,以便进一步评估和开发RLHF技术。
    Abstract Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.
    摘要 科学视觉和文档中的标签是非常重要的,但现有的科学标签生成方法依然存在一些缺陷,如帮助度、解释性和视觉描述性等指标不够高。这些缺陷导致生成的标签与读者需求不匹配。为了生成高质量的标签,我们提出了一种新的figure-to-caption生成框架,可以根据领域专家反馈来优化标签,以满足读者需求。我们的框架包括以下两个部分:1. 一种自动评估figure-to-caption对的质量方法。2. 一种基于人工反馈的强化学习(RLHF)方法,用于优化一个生成figure-to-caption模型,以满足读者需求。我们的简单学习框架可以在不同类型的模型上提高性能,特别是当使用BLIP作为基础模型时,我们的RLHF框架可以 achieve a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. finally,我们发布了一个大规模的人工反馈 benchmark dataset,以便进一步评估和发展RLHF技术。

Adversarial Conversational Shaping for Intelligent Agents

  • paper_url: http://arxiv.org/abs/2307.11785
  • repo_url: None
  • paper_authors: Piotr Tarasiewicz, Sultan Kenjeyev, Ilana Sebag, Shehab Alshehabi
  • for: 提高对话机器人的智能性和准确性
  • methods: 使用生成对抗网络(GANPG)和奖励每个生成步骤(REGS)模型,并在强化学习框架中使用 seq2seq 和 transformers 等不同的训练细节
  • results: 研究表明,使用 GANPG 和 REGS 模型可以提高对话机器人的对话能力和准确性,并且不同的训练细节可以影响模型的性能
    Abstract The recent emergence of deep learning methods has enabled the research community to achieve state-of-the art results in several domains including natural language processing. However, the current robocall system remains unstable and inaccurate: text generator and chat-bots can be tedious and misunderstand human-like dialogue. In this work, we study the performance of two models able to enhance an intelligent conversational agent through adversarial conversational shaping: a generative adversarial network with policy gradient (GANPG) and a generative adversarial network with reward for every generation step (REGS) based on the REGS model presented in Li et al. [18] . This model is able to assign rewards to both partially and fully generated text sequences. We discuss performance with different training details : seq2seq [ 36] and transformers [37 ] in a reinforcement learning framework.
    摘要 Recent advances in deep learning methods have enabled the research community to achieve state-of-the-art results in various domains, including natural language processing. However, the current robocall system remains unstable and inaccurate, with text generators and chatbots often producing tedious and inhuman-like dialogue. In this study, we evaluate the performance of two models that can enhance an intelligent conversational agent through adversarial conversational shaping: a generative adversarial network with policy gradient (GANPG) and a generative adversarial network with reward for every generation step (REGS) based on the REGS model presented in Li et al. [18] . This model can assign rewards to both partially and fully generated text sequences. We discuss the performance of these models with different training details, such as seq2seq [36] and transformers [37], in a reinforcement learning framework.

Yelp Reviews and Food Types: A Comparative Analysis of Ratings, Sentiments, and Topics

  • paper_url: http://arxiv.org/abs/2307.10826
  • repo_url: None
  • paper_authors: Wenyu Liao, Yiqing Shi, Yujia Hu, Wei Quan
  • for: 这项研究探讨了 Yelp 评论与食品类型之间的关系,并研究评论中的评分、情感和话题如何随食品类型而变化。
  • methods: 研究使用了评论分析和机器学习模型来描述评论中的话题,并将食品类型分为四个群组基于评分和情感。
  • results: 研究发现,一些食品类型的评分、情感和话题呈现相似的特征,而其他类型则具有明显的特征。 评论者对不同类型的食品进行评论时,往往会关注不同的话题。
    Abstract This study examines the relationship between Yelp reviews and food types, investigating how ratings, sentiments, and topics vary across different types of food. Specifically, we analyze how ratings and sentiments of reviews vary across food types, cluster food types based on ratings and sentiments, infer review topics using machine learning models, and compare topic distributions among different food types. Our analyses reveal that some food types have similar ratings, sentiments, and topics distributions, while others have distinct patterns. We identify four clusters of food types based on ratings and sentiments and find that reviewers tend to focus on different topics when reviewing certain food types. These findings have important implications for understanding user behavior and cultural influence on digital media platforms and promoting cross-cultural understanding and appreciation.
    摘要 Translated into Simplified Chinese:这个研究研究了Yelp评论和食品类型之间的关系,具体来说是检查评论中的评分和情感是如何随食品类型而变化。我们分析了评论中的评分和情感是如何随食品类型而变化,使用机器学习模型推断评论中的话题,并比较不同食品类型中的话题分布。我们的分析发现,一些食品类型的评分和情感都很相似,而其他些类型则有明显的差异。我们将食品类型分为四个群组 based on ratings and sentiments,并发现在评论某些食品类型时,评论者们更关注的话题不同。这些发现有关于用户行为和文化影响在数字媒体平台上的理解,以及促进跨文化理解和喜爱的重要意义。

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

  • paper_url: http://arxiv.org/abs/2307.10814
  • repo_url: None
  • paper_authors: Ephrem Afele Retta, Richard Sutcliffe, Jabar Mahmood, Michael Abebe Berwo, Eiad Almekhlafi, Sajjad Ahmed Khan, Shehzad Ashraf Chaudhry, Mustafa Mhamed, Jun Feng
  • for: 这项研究是为了解决缺乏语言资源的情感识别 зада务。
  • methods: 这项研究使用了跨语言和多语言的情感识别方法,使用了阿姆哈里语、英语、德语和乌尔都语的数据集。
  • results: 研究发现,使用英语或德语作为源语言,并将其转换到阿姆哈里语为目标语言,可以获得最佳效果。此外,使用多种非阿姆哈里语言进行训练,可以获得更高的准确率。
    Abstract In a conventional Speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language does not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German and URDU. For Amharic, we use our own publicly-available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu we use the existing RAVDESS, EMO-DB and URDU datasets. We followed previous research in mapping labels for all datasets to just two classes, positive and negative. Thus we can compare performance on different languages directly, and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. Results averaged for the three models were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each pair: Amharic<->German, Amharic<->English, and Amharic<->Urdu. Results with Amharic as target suggested that using English or German as source will give the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percent greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training a SER classifier when resources for a language are scarce.
    摘要 传统的语音情感识别(SER)任务中,一个分类器会在一种语言的已有数据集上进行训练。然而,当数据集不存在时,可以使用其他语言的数据集。我们在阿姆哈里亚语、英语、德语和 Urdu 语言上进行了实验,使用我们自己的公共可用的阿姆哈里亚语 Speech Emotion 数据集(ASED),以及现有的 RAVDESS、EMO-DB 和 URDU 数据集。我们按照之前的研究方法,将所有数据集的标签映射到两个类别中,即正面和负面。这样我们可以直接比较不同语言的性能,并将不同语言组合在训练和测试中。在实验 1 中,我们使用了三个模型:AlexNet、VGGE 和 ResNet50,进行了单语言 SER 试验。结果表明,ASED 和 RAVDESS 的性能很相似, suggesting that Amharic 和 English SER 是等效的。同时,德语 SER 更加困难,而 Urdu SER 更加容易。在实验 2 中,我们将一种语言作为输入,并将另一种语言作为目标进行测试,在每个对的两个方向上进行了测试。结果表明,使用英语或德语作为源语言,可以获得最好的结果。在实验 3 中,我们将多种非阿姆哈里亚语言作为训练数据,然后测试在阿姆哈里亚语言上。最好的准确率比实验 2 中的最好准确率高出几个百分点,表明使用两三种非阿姆哈里亚语言进行训练可以获得更好的结果。总的来说,结果表明,跨语言和多语言训练是一种有效的方法,当语言资源匮乏时。

Layer-wise Representation Fusion for Compositional Generalization

  • paper_url: http://arxiv.org/abs/2307.10799
  • repo_url: None
  • paper_authors: Yafang Zheng, Lei Lin, Zhaohong Lai, Binling Wang, Shan Liu, Biao Fu, Wenhao Rao, Peigen Ye, Yidong Chen, Xiaodong Shi
  • for: 提高序列模型的可compose普遍性,即使在各种应用场景中已经取得了成功,但是这些模型的解决方案被指控为不具有人类化普遍性。
  • methods: 我们提出了一种名为\textsc{FuSion}的扩展,它通过在编码和解码过程中引入一个\emph{融合注意模块}来适当地融合前几层信息。
  • results: 我们在两个实际的 benchmark 上测试了\textsc{FuSion},得到了竞争力和even state-of-the-art 的结果,这种结果证明了我们的提议的有效性。I hope this helps! Let me know if you have any other questions.
    Abstract Despite successes across a broad range of applications, sequence-to-sequence models' construct of solutions are argued to be less compositional than human-like generalization. There is mounting evidence that one of the reasons hindering compositional generalization is representations of the encoder and decoder uppermost layer are entangled. In other words, the syntactic and semantic representations of sequences are twisted inappropriately. However, most previous studies mainly concentrate on enhancing token-level semantic information to alleviate the representations entanglement problem, rather than composing and using the syntactic and semantic representations of sequences appropriately as humans do. In addition, we explain why the entanglement problem exists from the perspective of recent studies about training deeper Transformer, mainly owing to the ``shallow'' residual connections and its simple, one-step operations, which fails to fuse previous layers' information effectively. Starting from this finding and inspired by humans' strategies, we propose \textsc{FuSion} (\textbf{Fu}sing \textbf{S}yntactic and Semant\textbf{i}c Representati\textbf{on}s), an extension to sequence-to-sequence models to learn to fuse previous layers' information back into the encoding and decoding process appropriately through introducing a \emph{fuse-attention module} at each encoder and decoder layer. \textsc{FuSion} achieves competitive and even \textbf{state-of-the-art} results on two realistic benchmarks, which empirically demonstrates the effectiveness of our proposal.
    摘要 不withstanding its success across a broad range of applications, sequence-to-sequence models的构建方法受到了人类化泛化的批评,其中一个原因是编码和解码层的表示不能正确地分离。即序列的语法和 semantics表示被不当地杂mix。然而,大多数前一些研究主要集中在增强токен级别的 semantic信息,以降低表示杂mix问题,而不是正确地使用序列的语法和 semantics表示。此外,我们解释了表示杂mix问题的起因,即由于 recient studies about training deeper Transformer模型,主要归因于“浅”的径向连接和简单的一步操作,无法有效地融合上一层的信息。从这个发现出发,我们提出了\textsc{FuSion}( Fu 合并 Syn 统和 Sem antics 表示),一种基于 sequence-to-sequence 模型的扩展,通过引入一个“融合注意模块”来在编码和解码过程中正确地融合上一层的信息。\textsc{FuSion}在两个实际的 benchmark 上实现了竞争力和even state-of-the-art 的结果,这使得我们的提议得到了实质性的证明。

Extreme Multi-Label Skill Extraction Training using Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10778
  • repo_url: None
  • paper_authors: Jens-Joris Decorte, Severine Verlinden, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester
  • for: 本研究旨在提高在线招聘广告中技能的自动检测精度,以便在劳动市场分析和电子招聘过程中更好地了解技能需求。
  • methods: 本研究使用自然语言处理(NLP)技术来自动处理在线招聘广告,并利用大量技能ontology来链接技能。
  • results: 研究结果显示,使用与文本相关的抽象技能生成器和对比学习策略可以提高技能检测精度,并在三个技能检测标准测试数据集上显示出15-25%的提高。
    Abstract Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classification (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in \textit{R-Precision@5} compared to previously published results that relied solely on distant supervision through literal matches.
    摘要 Translated into Simplified Chinese:在线职位广告 serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. 自然语言处理(NLP)技术被用来自动处理这些广告,特别是在检测技能(直接或间接描述)并将其链接到大量技能 ontology 中,这是一种EXTREME multi-label classification(XMLC)任务。由于没有可用的大规模标注(训练)数据集,我们提出了利用通用 Large Language Models(LLMs)的技术。我们描述了一种经济可行的方法来生成准确的、完全 sintetic 标注数据集,并提出了一种对比学习策略,其在任务中证明有效。我们在三个技能抽取benchmark上得到的结果表明,我们的方法可以与之前基于Literal Matches的结果相比,提高R-Precision@5的准确率(15%-25%)。

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

  • paper_url: http://arxiv.org/abs/2307.10757
  • repo_url: https://github.com/happycolor/vesper
  • paper_authors: Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu
  • for: 这个论文提出了一种适应大规模预训练模型(PTM)到语音情感识别任务的 paradigm。
  • methods: 该论文提出了一种基于 WavLM 的语音 dataset 的听说预训练 encoder 的改进方法,称为 Vesper。 Vesper 采用了情感导向的面积层束策略,以提高对情感信息的敏感度。
  • results: 实验结果表明,与 WavLM Base 的 12 层模型相比,Vesper WITH 4 层模型在 IEMOCAP、MELD 和 CREMA-D datasets 上表现出色,而 Vesper WITH 12 层模型则超越了 WavLM Large 的 24 层模型。
    Abstract This paper presents a paradigm that adapts general large-scale pretrained models (PTMs) to speech emotion recognition task. Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved. Additionally, employing PTMs in practical applications can be challenging due to their considerable size. Above limitations spawn another research direction, namely, optimizing large-scale PTMs for specific tasks to generate task-specific PTMs that are both compact and effective. In this paper, we focus on the speech emotion recognition task and propose an improved emotion-specific pretrained encoder called Vesper. Vesper is pretrained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Subsequently, Vesper employs hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, both of which are crucial for emotion recognition. Experimental results on the IEMOCAP, MELD, and CREMA-D datasets demonstrate that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers.
    摘要 The authors focus on the speech emotion recognition task and propose an improved emotion-specific pre-trained encoder called Vesper. Vesper is pre-trained on a speech dataset based on WavLM and takes into account emotional characteristics. To enhance sensitivity to emotional information, Vesper employs an emotion-guided masking strategy to identify the regions that need masking. Additionally, Vesper uses hierarchical and cross-layer self-supervision to improve its ability to capture acoustic and semantic representations, which are crucial for emotion recognition.The authors evaluate Vesper on the IEMOCAP, MELD, and CREMA-D datasets and compare its performance to that of WavLM Base and WavLM Large. The results show that Vesper with 4 layers outperforms WavLM Base with 12 layers, and the performance of Vesper with 12 layers surpasses that of WavLM Large with 24 layers.In summary, this paper presents a new approach to speech emotion recognition that adapts large-scale pre-trained models to the specific task of speech emotion recognition. The proposed model, Vesper, is pre-trained on a speech dataset based on WavLM and employs an emotion-guided masking strategy and hierarchical self-supervision to improve its performance. The authors evaluate Vesper on three datasets and show that it outperforms WavLM Base and WavLM Large.

Large language models shape and are shaped by society: A survey of arXiv publication patterns

  • paper_url: http://arxiv.org/abs/2307.10700
  • repo_url: None
  • paper_authors: Rajiv Movva, Sidhika Balachandar, Kenny Peng, Gabriel Agostini, Nikhil Garg, Emma Pierson
  • for: 本研究目的是分析大型自然语言模型(LLM)论文的发展趋势,特别是2023年 vs. 2018-2022年的发表模式。
  • methods: 该研究基于388K篇CS和Stat arXiv上的论文,分析了LLM相关论文的发展趋势,包括论文数量的增加、主题的分布、作者的背景和研究方向的相关性、引用率的分布、国际合作的趋势等方面。
  • results: 研究发现,LLM研究在社会影响方面呈现18倍增长趋势,新参与LLM研究的作者更likely关注应用和社会影响,而经验较深的作者则更关注理论和基础研究。此外,研究还发现了性别和学术/产业领域的差异,以及美国和中国在合作网络中的分歧。总的来说,本研究证明了LLM研究不仅被社会 shapes,而且也 shapes society。
    Abstract There has been a steep recent increase in the number of large language model (LLM) papers, producing a dramatic shift in the scientific landscape which remains largely undocumented through bibliometric analysis. Here, we analyze 388K papers posted on the CS and Stat arXivs, focusing on changes in publication patterns in 2023 vs. 2018-2022. We analyze how the proportion of LLM papers is increasing; the LLM-related topics receiving the most attention; the authors writing LLM papers; how authors' research topics correlate with their backgrounds; the factors distinguishing highly cited LLM papers; and the patterns of international collaboration. We show that LLM research increasingly focuses on societal impacts: there has been an 18x increase in the proportion of LLM-related papers on the Computers and Society sub-arXiv, and authors newly publishing on LLMs are more likely to focus on applications and societal impacts than more experienced authors. LLM research is also shaped by social dynamics: we document gender and academic/industry disparities in the topics LLM authors focus on, and a US/China schism in the collaboration network. Overall, our analysis documents the profound ways in which LLM research both shapes and is shaped by society, attesting to the necessity of sociotechnical lenses.
    摘要 Translated into Simplified Chinese: Recently, there has been a sharp increase in the number of large language model (LLM) papers, leading to a significant shift in the scientific landscape, but this has been largely undocumented through bibliometric analysis. In this study, we analyze 388,000 papers posted on the CS and Stat arXivs, focusing on changes in publication patterns in 2023 compared to 2018-2022. We examine how the proportion of LLM papers is increasing, which LLM-related topics are receiving the most attention, and the authors writing LLM papers. We also explore how authors' research topics correlate with their backgrounds, the factors that distinguish highly cited LLM papers, and the patterns of international collaboration. Our findings show that LLM research is increasingly focused on societal impacts: there has been an 18-fold increase in the proportion of LLM-related papers on the Computers and Society sub-arXiv, and authors who are new to LLM research are more likely to focus on applications and societal impacts than more experienced authors. LLM research is also influenced by social dynamics, such as gender and academic/industry disparities in the topics LLM authors focus on, and a US/China schism in the collaboration network. Overall, our analysis demonstrates the profound ways in which LLM research both shapes and is shaped by society, highlighting the importance of sociotechnical lenses.

A Dataset and Strong Baselines for Classification of Czech News Texts

  • paper_url: http://arxiv.org/abs/2307.10666
  • repo_url: https://github.com/hynky1999/czech-news-classification-dataset
  • paper_authors: Hynek Kydlíček, Jindřich Libovický
  • for: 评估捷克自然语言处理模型的可行性,通过使用多种新闻来源和多个新闻类别,以及推测作者的性别和日期等四个分类任务。
  • methods: 使用各种先进的自然语言处理技术和大规模生成语言模型进行评估。
  • results: 人工评估表明机器学习基于预训练变换器模型的性能落后人类表现,而语言特定预训练Encoder分析表现胜过选择的商业化大规模生成语言模型。
    Abstract Pre-trained models for Czech Natural Language Processing are often evaluated on purely linguistic tasks (POS tagging, parsing, NER) and relatively simple classification tasks such as sentiment classification or article classification from a single news source. As an alternative, we present CZEch~NEws~Classification~dataset (CZE-NEC), one of the largest Czech classification datasets, composed of news articles from various sources spanning over twenty years, which allows a more rigorous evaluation of such models. We define four classification tasks: news source, news category, inferred author's gender, and day of the week. To verify the task difficulty, we conducted a human evaluation, which revealed that human performance lags behind strong machine-learning baselines built upon pre-trained transformer models. Furthermore, we show that language-specific pre-trained encoder analysis outperforms selected commercially available large-scale generative language models.
    摘要 很多预训练模型在捷克自然语言处理领域通常会在语言学任务(POS标记、分析、NER)和简单的分类任务(情感分类或文章分类)中进行评估。作为一个 alternatif,我们介绍了 CZEch~NEws~Classification~dataset(CZE-NEC),这是一个包含新闻文章来源于多种新闻来源,覆盖了两十年的大型捷克分类数据集。这个数据集允许更加严格地评估这些模型。我们定义了四个分类任务:新闻来源、新闻类别、推测作者的性别和天数。为了证明任务的困难,我们进行了人类评估,发现人类表现落后于基于预训练变换器模型的强大机器学习基线。此外,我们还表明语言特定预训练Encoder分析的语言模型在选择的大规模生成语言模型中表现出色。

Exploring the Landscape of Natural Language Processing Research

  • paper_url: http://arxiv.org/abs/2307.10652
  • repo_url: https://github.com/sebischair/exploring-nlp-research
  • paper_authors: Tim Schopf, Karim Arabi, Florian Matthes
  • for: 本研究旨在提供一份系统性地分类和分析ACL Anthology中的NLP研究论文,以提供研究领域的结构化概述、领域分类、最新发展和未来研究方向。
  • methods: 本研究使用系统性的分类和分析方法,对ACL Anthology中的NLP研究论文进行了分类和分析,从而提供了研究领域的结构化概述、领域分类和最新发展。
  • results: 本研究结果显示,NLP领域的研究主要涉及到语义理解、语言模型、自然语言处理、语音识别等领域,并且在最新的发展中,深度学习、word embeddings等技术在NLP领域中得到了广泛的应用和发展。
    Abstract As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent. Contributing to closing this gap, we have systematically classified and analyzed research papers in the ACL Anthology. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields of study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.
    摘要 natural language processing (NLP) 研究在最近几年内展示了快速扩散和广泛采用的趋势。随着研究工作的增加,NLP相关领域的研究也逐渐增多。然而,一个全面的研究,把已有的主题分类、趋势识别出来,并对未来研究领域提出建议,仍然缺失。为了填补这一漏洞,我们在ACL Anthology中 sistematically 分类和分析了研究论文。以下是我们的结果:1. 研究领域分类:我们对NLP研究领域进行了系统分类,并将其分为多个子领域。2. 趋势分析:我们分析了最近几年NLP研究的趋势,并对其进行了总结。3. 研究成果概述:我们对NLP研究成果进行了概述,并提出了未来研究的建议。以下是我们的发现:1. 在NLP领域,最近几年内有很多新的研究方向出现,如语义理解、语言生成、机器翻译等。2. 许多研究都在尝试将NLP应用于实际场景中,如语音识别、自然语言处理等。3. 随着数据集的不断扩大和改进,NLP模型的性能也在不断提高。以上是我们对NLP研究领域的一个系统性的分析和概述。未来,我们可能会看到更多的新的研究方向和应用场景出现,同时,我们也需要继续关注NLP领域的发展和进步。

Generative Language Models on Nucleotide Sequences of Human Genes

  • paper_url: http://arxiv.org/abs/2307.10634
  • repo_url: https://github.com/boun-tabi/generativelm-genes
  • paper_authors: Musa Nuri Ihtiyar, Arzucan Ozgur
  • for: 本研究旨在开发一个基于转换器的生成语言模型,以探讨DNA序列生成的可能性。
  • methods: 研究使用了RNN和N-gram等简单技术,以及一些实际生活中的任务来评估模型性能。
  • results: 研究发现,使用生成模型可以在DNA序列生成中 дости得比较好的效果,但是数据充足性仍然是一个问题。
    Abstract Language models, primarily transformer-based ones, obtained colossal success in NLP. To be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. Yet, the generative side of the coin is mainly unexplored to the best of our knowledge. Consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. Because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. This decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. First of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. How essential using real-life tasks beyond the classical metrics such as perplexity is observed. Furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. The reason for reviewing this was that choosing such a language might make the problem easier. However, what we observed in this study was it did not provide that much of a change in the amount of data needed.
    摘要 语言模型,主要是基于转换器的一种,在NLU和NLG等领域取得了巨大的成功。DNasekwalence structure和自然语言很相似,因此DNABert等探索性模型在生物信息学领域具有重要意义。然而,生成方面尚未得到充分的探索,我们因此决定开发一种基于GPT-3的探索性生成语言模型,专门针对DNasekwalence。由于处理整个DNasekwalence的计算资源需求很高,我们决定对小规模的NUcleotide序列进行研究,而不是整个DNasekwalence。这种决定并没有改变问题结构,因为DNasekwalence和NUcleotide序列都可以看作1D序列,由四种不同的核苷酸组成,无需失去太多信息和简化太多。我们首先系统地探讨了几乎未曾被探索的问题,并发现RNNs表现最佳,而简单的技术如N-grams也表现了良好。此外,我们发现在使用生成模型时,不需要了解语言,与自然语言不同。此外,我们发现通过使用实际任务而不仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅

Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa

  • paper_url: http://arxiv.org/abs/2307.10633
  • repo_url: None
  • paper_authors: Shriyash K. Upadhyay, Etan J. Ginsberg
  • for: 提高语言模型的可用性和性能
  • methods: 多种方法自动训练
  • results: 1) 提高较弱方法性能(最高提升30%),2) 提高较强方法性能(最高提升32.2%),3) 提高相关 yet distinct tasks的性能(最高提升10.3%)Here’s a breakdown of each point:
  • for: The paper aims to improve the availability and performance of language models by introducing a novel training method called Multi-Method Self-Training (MMST).
  • methods: The paper uses multiple methods for self-training, including the filtered outputs of another method, to augment the strengths and ameliorate the weaknesses of each method.
  • results: The paper shows that MMST can improve the performance of less performant methods (up to 30%), more performant methods (up to 32.2%), and related yet distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. The improvement in performance is driven by the use of multiple methods, and the paper also explores prompt-engineering and anti-correlated performance between methods as means of making MMST more effective.
    Abstract Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths and ameliorate the weaknesses of each method. Using a 176B parameter model trained on both language and code, we show that MMST can 1) improve the less performant method (up to 30%) making the model easier to use, 2) improve the more performant method (up to 32.2%) making the model more performant, and 3) improve the performance of related but distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. We then conduct ablation analyses to explore why MMST works. We show that MMST generates more data than traditional self-training, but the improvement in performance is driven by the use of multiple methods. We also analyze prompt-engineering and anti-correlated performance between methods as means of making MMST more effective. We hope the evidence from our paper motivates machine learning researchers to explore ways in which advances in language models allow for new forms of training.
    摘要 Note: Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. It is written using the same characters as Traditional Chinese, but with some differences in stroke order and vocabulary.Here are some key differences between Simplified Chinese and Traditional Chinese:* Simplified Chinese has fewer characters than Traditional Chinese, with about 2,000 commonly used characters compared to over 5,000 in Traditional Chinese.* Simplified Chinese has simpler stroke order and character forms, making it easier to write and read.* Simplified Chinese uses more homophones, which can make it more difficult to understand for non-native speakers.* Simplified Chinese has a more standardized vocabulary and grammar, while Traditional Chinese has more regional variations and idiomatic expressions.I hope this helps! Let me know if you have any other questions.

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos

  • paper_url: http://arxiv.org/abs/2307.10587
  • repo_url: None
  • paper_authors: Anand Kumar Rai, Siddharth D Jaiswal, Animesh Mukherjee
  • for: 这个论文是为了评估自动语音识别系统在不同地区和民族群体中的性能,并提出了更加包容和可靠的ASR系统和数据集。
  • methods: 该论文使用了大量的NPTEL MOOC平台上的技术讲解视频和译文,并使用了YouTube自动字幕和OpenAI Whisper模型来评估印度各地区和民族群体的语音特征对自动语音识别系统的影响。
  • results: 研究发现,存在gender、native region、age和speech rate等因素导致自动语音识别系统的性能差异,但cast不存在差异。此外,研究还发现了不同讲解领域的语音特征差异,这些结果表明需要更加包容和可靠的ASR系统和更多的 represenative 数据集来评估差异。
    Abstract Automatic speech recognition (ASR) systems are designed to transcribe spoken language into written text and find utility in a variety of applications including voice assistants and transcription services. However, it has been observed that state-of-the-art ASR systems which deliver impressive benchmark results, struggle with speakers of certain regions or demographics due to variation in their speech properties. In this work, we describe the curation of a massive speech dataset of 8740 hours consisting of $\sim9.8$K technical lectures in the English language along with their transcripts delivered by instructors representing various parts of Indian demography. The dataset is sourced from the very popular NPTEL MOOC platform. We use the curated dataset to measure the existing disparity in YouTube Automatic Captions and OpenAI Whisper model performance across the diverse demographic traits of speakers in India. While there exists disparity due to gender, native region, age and speech rate of speakers, disparity based on caste is non-existent. We also observe statistically significant disparity across the disciplines of the lectures. These results indicate the need of more inclusive and robust ASR systems and more representational datasets for disparity evaluation in them.
    摘要

Instruction-following Evaluation through Verbalizer Manipulation

  • paper_url: http://arxiv.org/abs/2307.10558
  • repo_url: None
  • paper_authors: Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin
  • For: The paper aims to evaluate the ability of instruction-tuned language models to follow instructions accurately, particularly in responding to less natural or unexpected instructions.* Methods: The proposed evaluation protocol is called verbalizer manipulation, which instructs the model to verbalize the task label with words that align with the model’s priors to different extents. This protocol can be integrated with any classification benchmark to assess the model’s reliance on priors and its ability to override them.* Results: The evaluation results show that the instruction-following abilities of different model families and scales are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, highlighting the need for further advancements to improve their instruction-following abilities.Here’s the Chinese translation of the three key points:* For: 这篇论文旨在评估指令驱动的自然语言处理模型是否能够准确遵循指令。* Methods: 该论文提出了一种新的评估协议,即使语言模型的任务标签用语。这种协议可以让模型根据不同的扩展来采用不同的语言表达方式,以评估模型是否能够准确遵循指令。* Results: 评估结果显示,不同的模型家族和规模在遵循指令的能力上存在显著的差异,尤其是在使用不太自然的语言表达时。即使使用最强的GPT-4模型,它也难以在最Difficult的语言表达下超过随机猜测的水平,这 highlights the need for continued advancements to improve their instruction-following abilities。
    Abstract While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting ``postive'' for positive sentiment), to minimally aligned (e.g., outputting ``negative'' for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities.
    摘要 而ん行模型在不同的自然语言处理任务上有显著的成功,但确切评估它们能否按照指令行为仍然是一个挑战。现有的标准benchmark主要集中在训练中学习的指令上,但是能够准确回应这些指令并不意味着强大的指令遵从能力。在这篇论文中,我们提出了一种新的指令遵从评估协议,即语言映射(Verbalizer Manipulation)。它要求模型将任务标签用与模型在训练中学习的词汇进行映射,从高度相似(例如,输出“正面” для正面情感)到最低度相似(例如,输出“负面” для正面情感)。语言映射可以轻松地与任何分类benchmark集成,以检验模型是否能够根据指令而准确遵从。我们在九个数据集上进行了四家主要模型家族的全面评估,使用每个模型的十二个语言映射。我们发现,不同家族和规模的模型在使用不同的语言映射时,其指令遵从能力异常分化。 zelfs the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer,这 подчеркивает我们需要继续进行技术创新,以提高模型的指令遵从能力。

Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models

  • paper_url: http://arxiv.org/abs/2307.10522
  • repo_url: None
  • paper_authors: Somayeh Ghanbarzadeh, Yan Huang, Hamid Palangi, Radames Cruz Moreno, Hamed Khanpour
  • for: 降低 PLM 中的社会偏见,提高 PLM 的性别准确率。
  • methods: 提出了 Gender-tuning 方法,通过在下游任务的数据集上进行粘贴语言模型(MLM)训练目标的综合使用,以降低 PLM 中的社会偏见。
  • results: Gender-tuning 方法可以在 PLM 中减少平均性别偏见得分,同时提高 PLM 在下游任务上的性能,不需要额外的训练数据集和资源投入。
    Abstract Recent studies have revealed that the widely-used Pre-trained Language Models (PLMs) propagate societal biases from the large unmoderated pre-training corpora. Existing solutions require debiasing training processes and datasets for debiasing, which are resource-intensive and costly. Furthermore, these methods hurt the PLMs' performance on downstream tasks. In this study, we propose Gender-tuning, which debiases the PLMs through fine-tuning on downstream tasks' datasets. For this aim, Gender-tuning integrates Masked Language Modeling (MLM) training objectives into fine-tuning's training process. Comprehensive experiments show that Gender-tuning outperforms the state-of-the-art baselines in terms of average gender bias scores in PLMs while improving PLMs' performance on downstream tasks solely using the downstream tasks' dataset. Also, Gender-tuning is a deployable debiasing tool for any PLM that works with original fine-tuning.
    摘要

Transsion TSUP’s speech recognition system for ASRU 2023 MADASR Challenge

  • paper_url: http://arxiv.org/abs/2307.11778
  • repo_url: None
  • paper_authors: Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue Yang, Jianchao Zhu
  • For: 这篇论文描述了由Transsion Speech Understanding Processing Team (TSUP)开发的一种扩展ASR模型,用于ASRU 2023 MADASR Challenge。该系统强调适应低资源印度语言的ASR模型,并覆盖了挑战赛的四个轨道。* Methods: 在轨道1和2中,音响模型使用了压缩形态器编码器和双向转换器解码器,并在CTC-Attention培育训练中使用了共同训练损失。此外,在TLG beam search解码中使用了外部KenLM语言模型。在轨道3和4中,采用了预训练的IndicWhisper模型,并在挑战数据集和公共可用数据集上进行了finetuning。另外,在喊叫搜索解码中支持了外部KenLM语言模型,以便更好地利用挑战中提供的额外文本。* Results: 提案的方法在四个轨道中取得了 Bengali语言的单词错误率(WER)为24.17%、24.43%、15.97%和15.97%,以及 Bhojpuri语言的WER为19.61%、19.54%、15.48%和15.48%。这些结果表明提案的方法的效果。
    Abstract This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding. For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge. The proposed method achieved word error rates (WER) of 24.17%, 24.43%, 15.97%, and 15.97% for Bengali language in the four tracks, and WER of 19.61%, 19.54%, 15.48%, and 15.48% for Bhojpuri language in the four tracks. These results demonstrate the effectiveness of the proposed method.
    摘要 For tracks 1 and 2, the acoustic model used a squeezeformer encoder and bidirectional transformer decoder with joint CTC-Attention training loss. Additionally, an external KenLM language model was used during TLG beam search decoding.For tracks 3 and 4, pretrained IndicWhisper models were employed and finetuned on both the challenge dataset and publicly available datasets. The whisper beam search decoding was also modified to support an external KenLM language model, which enabled better utilization of the additional text provided by the challenge.The proposed method achieved word error rates (WER) of 24.17%, 24.43%, 15.97%, and 15.97% for Bengali language in the four tracks, and WER of 19.61%, 19.54%, 15.48%, and 15.48% for Bhojpuri language in the four tracks. These results demonstrate the effectiveness of the proposed method.

General Debiasing for Multimodal Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2307.10511
  • repo_url: https://github.com/Teng-Sun/GEAR
  • paper_authors: Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, Liqiang Nie
  • for: 这个论文主要针对 Multimodal Sentiment Analysis (MSA) 领域的问题,即如何减少模型对偏扰关系的依赖性,以提高模型的 Out-Of-Distribution (OOD) 泛化能力。
  • methods: 该论文提出了一种通用的减少偏扰关系的框架,基于 Inverse Probability Weighting (IPW) 技术,可以适应不同的数据集和模型。这个框架包括两个主要步骤:1) 分解每个模式中的可靠特征和偏扰特征,2) 使用偏扰特征来估算样本的偏扰程度。最后,使用 IPW 技术来减少大偏扰样本的影响,以便学习有 robustness 的特征 для 情感预测。
  • results: 该论文通过使用多个 benchmark 和 OOD 测试集来评估模型的泛化能力,并证明了其在不同的数据集和模型下的超越性。 codes 和数据可以在 https://github.com/Teng-Sun/GEAR 上下载。
    Abstract Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal information for prediction yet unavoidably suffers from fitting the spurious correlations between multimodal features and sentiment labels. For example, if most videos with a blue background have positive labels in a dataset, the model will rely on such correlations for prediction, while "blue background" is not a sentiment-related feature. To address this problem, we define a general debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD) generalization ability of MSA models by reducing their reliance on spurious correlations. To this end, we propose a general debiasing framework based on Inverse Probability Weighting (IPW), which adaptively assigns small weights to the samples with larger bias (i.e., the severer spurious correlations). The key to this debiasing framework is to estimate the bias of each sample, which is achieved by two steps: 1) disentangling the robust features and biased features in each modality, and 2) utilizing the biased features to estimate the bias. Finally, we employ IPW to reduce the effects of large-biased samples, facilitating robust feature learning for sentiment prediction. To examine the model's generalization ability, we keep the original testing sets on two benchmarks and additionally construct multiple unimodal and multimodal OOD testing sets. The empirical results demonstrate the superior generalization ability of our proposed framework. We have released the code and data to facilitate the reproduction https://github.com/Teng-Sun/GEAR.
    摘要 现有的多模态情感分析(MSA)研究使用多模态信息进行预测,但是不可避免地受到多模态特征和情感标签之间的误 corrrelation的影响。例如,如果 datasets 中的大多数视频具有蓝色背景,模型就会依赖于这些 corrrelation 进行预测,而“蓝色背景”不是情感相关的特征。为解决这个问题,我们定义了一种总体debiasing MSA任务,它的目的是提高多模态情感分析模型的 Out-Of-Distribution(OOD)泛化能力,减少它们对误 corrrelation 的依赖。为此,我们提出了一种基于 inverse probability weighting(IPW)的通用debiasing框架。该框架可适应地将样本中具有更大偏见(即更强的误 corrrelation)的样本 assign 小权重。键点在于估计样本的偏见,我们通过以下两步来实现:1)在每种模式中分解robust feature和偏见 feature,2)利用偏见特征来估计样本的偏见。最后,我们利用 IPW 减少大偏见样本的影响,促进 Robust feature learning для情感预测。为评估模型的泛化能力,我们保留原始测试集在两个 benchmark 上,并在多种单模态和多模态 OOD 测试集上进行评估。实际结果表明我们的提出的框架具有更高的泛化能力。我们已经将代码和数据发布到https://github.com/Teng-Sun/GEAR,以便复制。

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

  • paper_url: http://arxiv.org/abs/2307.10488
  • repo_url: https://github.com/thakur-nandan/sprint
  • paper_authors: Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin
  • for: 本研究的目的是提供一个Python工具包(SPRINT),用于评估神经稀疏检索模型。
  • methods: 本研究使用了Pyserini和Lucene创建了一个共同接口,用于支持多种神经稀疏检索模型的评估。现有五种内置模型:uniCOIL、DeepImpact、SPARTA、TILDEv2和SPLADEv2。用户也可以轻松地添加自定义模型,只需要定义权重方法即可。
  • results: 使用SPRINT工具包,我们在BEIR benchmark上建立了强大和可重复的零基eline神经稀疏检索基线。我们的结果显示,SPLADEv2在BEIR上的平均得分为0.470 nDCG@10,比其他神经稀疏检索模型高。此外,我们还发现了SPLADEv2的性能提升的原因,即它生成的稀疏表示中大多数的字符在查询和文档之外,这经常是其性能提升的关键。我们在https://github.com/thakur-nandan/sprint中公开了我们的SPRINT工具包、模型和实验所用的数据。
    Abstract Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here at https://github.com/thakur-nandan/sprint.
    摘要 传统上,稀疏检索系统都是基于 lexical representation来检索文档的,如BM25,这些系统在信息检索任务中占据了主导地位。 però,随着预训练 transformer 模型如 BERT 的出现, neural sparse retrieval 引入了一个新的 paradigm 内 Retrieval。 despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here at .

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10485
  • repo_url: https://github.com/ai4finance-foundation/fingpt
  • paper_authors: Xiao-Yang Liu, Guoxuan Wang, Daochen Zha
  • For: FinGPT aims to democratize FinLLMs and stimulate innovation in open finance by providing researchers and practitioners with accessible and transparent resources for developing their FinLLMs.* Methods: FinGPT uses an open-sourced and data-centric framework to automate the collection and curation of real-time financial data from >34 diverse sources on the Internet. It also proposes a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). Additionally, it adopts the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from open-source general-purpose LLMs at a low cost.* Results: FinGPT showcases several applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. These applications demonstrate the potential of FinGPT in unlocking new opportunities in open finance.
    Abstract Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available (quite small size), and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, \textit{Financial Generative Pre-trained Transformer (FinGPT)}, that automates the collection and curation of real-time financial data from >34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from open-source general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes are available at https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation/FinNLP
    摘要 大型自然语言模型(LLM)已经表现出了人类语言理解和生成的惊人能力,这可能会革命化金融业。然而,现有的LLM通常在金融领域下手,这主要归结于通用文本数据和金融文本数据之间的差异。尽管存在只有有限的金融文本数据集 available(数据集较小),而且BloombergGPT,首个金融LLM(FinLLM),是关闭源的(只发布了训练日志)。为了普及互联网级金融数据 для LLM,这是一个开放的挑战,因为数据来源多样化,信号噪声比较低,时效性很高。为了解决这些挑战,我们提出了一个开源和数据中心的框架,名为金融生成预训练变换器(FinGPT)。FinGPT自动收集和筛选互联网上>34种多样化的金融数据源,为研究人员和实践者提供了可访问的和透明的资源,以便开发自己的FinLLM。此外,我们还提出了一种简单 yet有效的策略,用于FinLLM的追加训练,称为股票价格反馈学习(RLSP)。此外,我们采用了低级适应(LoRA,QLoRA)方法,允许用户自定义自己的FinLLM,从开源通用自然语言处理器(NLP)获得低成本。FinGPT还应用于多个领域,包括爬虫、情感分析 для算法交易、以及低代码开发。FinGPT旨在普及FinLLM,促进创新,解锁开放金融领域的新机会。代码可以在https://github.com/AI4Finance-Foundation/FinGPT和https://github.com/AI4Finance-Foundation/FinNLP 中找到。

What can we learn from Data Leakage and Unlearning for Law?

  • paper_url: http://arxiv.org/abs/2307.10476
  • repo_url: None
  • paper_authors: Jaydeep Borkar
  • for: 这篇论文主要关注大语言模型(LLMs)的隐私问题, LLMS 可能会在训练数据中记忆 personally identifiable information(PII),如电子邮件和电话号码,并在推理过程中泄露这些信息。
  • methods: 作者使用了一种名为 “right to be forgotten” 的隐私法规,以删除用户数据点中最容易提取的数据点,以保护用户的隐私。他们还发现,删除这些数据点后,新的数据点会变得更容易提取。此外,作者还发现,精度调整后的模型不仅会泄露训练数据,还会泄露在预训练阶段记忆的数据和 PII。
  • results: 作者发现,随着用户数据点的删除,新的数据点会变得更容易提取,这可能会对公司使用 LLMs 提供服务的隐私和法律问题产生影响。作者希望通过这篇论文,引起 AI 和法律社区之间的交互性讨论,以解决这些问题。
    Abstract Large Language Models (LLMs) have a privacy concern because they memorize training data (including personally identifiable information (PII) like emails and phone numbers) and leak it during inference. A company can train an LLM on its domain-customized data which can potentially also include their users' PII. In order to comply with privacy laws such as the "right to be forgotten", the data points of users that are most vulnerable to extraction could be deleted. We find that once the most vulnerable points are deleted, a new set of points become vulnerable to extraction. So far, little attention has been given to understanding memorization for fine-tuned models. In this work, we also show that not only do fine-tuned models leak their training data but they also leak the pre-training data (and PII) memorized during the pre-training phase. The property of new data points becoming vulnerable to extraction after unlearning and leakage of pre-training data through fine-tuned models can pose significant privacy and legal concerns for companies that use LLMs to offer services. We hope this work will start an interdisciplinary discussion within AI and law communities regarding the need for policies to tackle these issues.
    摘要

Findings of Factify 2: Multimodal Fake News Detection

  • paper_url: http://arxiv.org/abs/2307.10475
  • repo_url: None
  • paper_authors: S Suryavardan, Shreyash Mishra, Megha Chakraborty, Parth Patwa, Anku Rani, Aman Chadha, Aishwarya Reganti, Amitava Das, Amit Sheth, Manoj Chinnakotla, Asif Ekbal, Srijan Kumar
  • for: 针对社交媒体上快速增长的假新闻,这篇论文提出了自动检测假信息和证明其准确性的研究。
  • methods: 该论文使用了多模态的真实性检测和讽刺新闻 dataset,并采用了对比基于方法,将社交媒体声明与支持文档、图像进行对比,分为5类多模态关系。
  • results: 在第二次任务中,有60多名参与者和9个测试集提交,最高的F1分平均为81.82%。使用DeBERTatext和Swinv2和CLIP图像得到了最佳表现。
    Abstract With social media usage growing exponentially in the past few years, fake news has also become extremely prevalent. The detrimental impact of fake news emphasizes the need for research focused on automating the detection of false information and verifying its accuracy. In this work, we present the outcome of the Factify 2 shared task, which provides a multi-modal fact verification and satire news dataset, as part of the DeFactify 2 workshop at AAAI'23. The data calls for a comparison based approach to the task by pairing social media claims with supporting documents, with both text and image, divided into 5 classes based on multi-modal relations. In the second iteration of this task we had over 60 participants and 9 final test-set submissions. The best performances came from the use of DeBERTa for text and Swinv2 and CLIP for image. The highest F1 score averaged for all five classes was 81.82%.
    摘要

Improving the Reusability of Pre-trained Language Models in Real-world Applications

  • paper_url: http://arxiv.org/abs/2307.10457
  • repo_url: None
  • paper_authors: Somayeh Ghanbarzadeh, Hamid Palangi, Yan Huang, Radames Cruz Moreno, Hamed Khanpour
  • for: 提高预训练语言模型(PLM)的可 reuse性和实际应用效果
  • methods: integrate 掩码语言模型(MLM)训练目标到细化过程中,以提高 PLM 的通用化
  • results: compared with现有状态的技术,Mask-tuning 能够提高 PLM 的通用化性和实际应用效果,并且在不同数据集上的性能也有提高。
    Abstract The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their generalization problem, where their performance drastically decreases when evaluated on examples that differ from the training dataset, known as Out-of-Distribution (OOD)/unseen examples. This limitation arises from PLMs' reliance on spurious correlations, which work well for frequent example types but not for general examples. To address this issue, we propose a training approach called Mask-tuning, which integrates Masked Language Modeling (MLM) training objectives into the fine-tuning process to enhance PLMs' generalization. Comprehensive experiments demonstrate that Mask-tuning surpasses current state-of-the-art techniques and enhances PLMs' generalization on OOD datasets while improving their performance on in-distribution datasets. The findings suggest that Mask-tuning improves the reusability of PLMs on unseen data, making them more practical and effective for real-world applications.
    摘要 现代预训练语言模型(PLM)的再利用性受到其泛化问题的限制,其性能在训练集不同的示例上逐渐下降,称为Out-of-Distribution(OOD)/未见示例。这种限制来自PLM的假 correlate的依赖,它们在常见示例类型上工作良好,但不适用于通用示例。为解决这个问题,我们提出了一种叫Mask-tuning的训练方法,它将Masked Language Modeling(MLM)训练目标纳入细化过程中,以提高PLM的泛化性。经过探索性实验,我们发现Mask-tuning在OOD数据集上的泛化性超过了当前状态艺技术,同时在入distribution数据集上提高PLM的性能。这些发现表明Mask-tuning可以提高PLM在未见数据上的再利用性,使其更加实用和有效于实际应用。

Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model

  • paper_url: http://arxiv.org/abs/2307.10443
  • repo_url: None
  • paper_authors: Shima Foolad, Kourosh Kiani
  • for: 本文旨在提高 transformer 模型在复杂理解任务中的表现,通过在输入序列中嵌入显式知识来解决 transformer 模型缺乏显式知识的限制。
  • methods: 本文提出了一种新的注意模式,即在 transformer 架构中 integrate 知识来自多元图的推理知识,而不需要外部知识。该注意模式包括三个关键元素:全局-本地注意 для单词标签、图注意 для实体标签,以及对每个实体标签和单词标签之间的关系类型的考虑。这种注意模式与 LUKE 模型的实体自注意机制结合,以优化注意力的分配。
  • results: 实验结果表明,我们的模型在 ReCoRD 数据集上的表现较高,超过了当前顶峰 LUKE-Graph 和基线 LUKE 模型。
    Abstract Despite the significant progress made by transformer models in machine reading comprehension tasks, they still fall short in handling complex reasoning tasks due to the absence of explicit knowledge in the input sequence. To address this limitation, many recent works have proposed injecting external knowledge into the model. However, selecting relevant external knowledge, ensuring its availability, and requiring additional processing steps remain challenging. In this paper, we introduce a novel attention pattern that integrates reasoning knowledge derived from a heterogeneous graph into the transformer architecture without relying on external knowledge. The proposed attention pattern comprises three key elements: global-local attention for word tokens, graph attention for entity tokens that exhibit strong attention towards tokens connected in the graph as opposed to those unconnected, and the consideration of the type of relationship between each entity token and word token. This results in optimized attention between the two if a relationship exists. The pattern is coupled with special relative position labels, allowing it to integrate with LUKE's entity-aware self-attention mechanism. The experimental findings corroborate that our model outperforms both the cutting-edge LUKE-Graph and the baseline LUKE model on the ReCoRD dataset that focuses on commonsense reasoning.
    摘要 尽管 transformer 模型在机器阅读理解任务中已经做出了重要进步,但它们仍然在处理复杂的推理任务时缺乏明确的知识。为解决这个限制,许多最近的工作已经提议在模型中注入外部知识。然而,选择相关的外部知识、确保其可用性和需要额外处理步骤仍然是挑战。在这篇论文中,我们介绍了一种新的注意模式,它可以在 transformer 架构中 integrate 推理知识,不需要外部知识。这个注意模式包括三个关键元素:全球-地方注意WORD token,对connected在图形上的entity token的注意力,以及对每个entity token和word token的关系型别进行考虑。这导致了两者之间的优化注意力。这个模式与特殊的相对位置标签相结合,使其能够与LUKE的entity-aware自注意运算机制集成。实验结果证实了我们的模型在ReCoRD dataset上的表现比cutting-edge LUKE-Graph和基准LUKE模型更好。

Thrust: Adaptively Propels Large Language Models with External Knowledge

  • paper_url: http://arxiv.org/abs/2307.10442
  • repo_url: None
  • paper_authors: Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Jianshu Chen
  • for: 这 paper 的目的是如何提高大规模预训练语言模型(PTLM)中的知识利用效率,以及External Knowledge 的搜索方法。
  • methods: 该 paper 提出了一种名为 Instance-level Adaptive Propulsion of External Knowledge(IAPEK)的方法,该方法通过测量 PTLM 模型中的知识量来决定是否需要进行 External Knowledge 的搜索。该方法使用 Thrust 指标,该指标基于一小量的 seen instances 的表示分布来衡量 PTLM 模型的实例级知识程度。
  • results: experiments 表明,Thrust 指标是一个好的 Measurement of PTLM 模型的实例级知识程度。此外,通过使用 Thrust 指标作为搜索指标,可以在 88% 的任务上实现显著的成本效益,即提高了 26% 的平均性能。这些发现有助于在实际应用中提高知识增强 LM 的效率和成本控制。
    Abstract Although large-scale pre-trained language models (PTLMs) are shown to encode rich knowledge in their model parameters, the inherent knowledge in PTLMs can be opaque or static, making external knowledge necessary. However, the existing information retrieval techniques could be costly and may even introduce noisy and sometimes misleading knowledge. To address these challenges, we propose the instance-level adaptive propulsion of external knowledge (IAPEK), where we only conduct the retrieval when necessary. To achieve this goal, we propose measuring whether a PTLM contains enough knowledge to solve an instance with a novel metric, Thrust, which leverages the representation distribution of a small number of seen instances. Extensive experiments demonstrate that thrust is a good measurement of PTLM models' instance-level knowledgeability. Moreover, we can achieve significantly higher cost-efficiency with the Thrust score as the retrieval indicator than the naive usage of external knowledge on 88% of the evaluated tasks with 26% average performance improvement. Such findings shed light on the real-world practice of knowledge-enhanced LMs with a limited knowledge-seeking budget due to computation latency or costs.
    摘要 Translation notes:* "pre-trained language models" (PTLMs) is translated as "预训练语言模型" (预PTLMs)* "instance-level adaptive propulsion of external knowledge" (IAPEK) is translated as "实例级适应性外知启动" (IAPEK)* "novel metric" is translated as "新的指标" (新指标)* "representation distribution" is translated as "表示分布" (表示分布)* "instance-level knowledgeability" is translated as "实例级知识能力" (实例知识能力)* "cost-efficiency" is translated as "成本效益" (成本效益)* "knowledge-seeking budget" is translated as "知识寻找预算" (知识寻找预算)

PharmacyGPT: The AI Pharmacist

  • paper_url: http://arxiv.org/abs/2307.10432
  • repo_url: None
  • paper_authors: Zhengliang Liu, Zihao Wu, Mengxuan Hu, Bokai Zhao, Lin Zhao, Tianyi Zhang, Haixing Dai, Xianyan Chen, Ye Shen, Sheng Li, Brian Murray, Tianming Liu, Andrea Sikora
  • For: The paper aims to assess the capabilities of large language models (LLMs) in emulating the role of clinical pharmacists, with potential applications in patient care and the development of future AI-driven healthcare solutions.* Methods: The paper uses real data from the intensive care unit (ICU) at the University of North Carolina Chapel Hill (UNC) Hospital to evaluate the performance of PharmacyGPT, a novel framework that leverages LLMs to generate comprehensible patient clusters, formulate medication plans, and forecast patient outcomes.* Results: The paper offers valuable insights into the potential applications and limitations of LLMs in clinical pharmacy, with implications for both patient care and the development of future AI-driven healthcare solutions. The analysis provides a comprehensive evaluation of PharmacyGPT’s performance and contributes to the ongoing discourse surrounding the integration of artificial intelligence in healthcare settings.Here’s the same information in Simplified Chinese text:* For: 这篇论文目标是评估大语言模型(LLMs)在供药医生角色中的能力,有可能应用于患者护理和未来的人工智能驱动医疗解决方案。* Methods: 这篇论文使用了北卡罗来纳大学 CHAPEL HILL 医院岗位护理单元(ICU)的真实数据来评估PharmacyGPT框架的表现,该框架利用LLMs生成易于理解的患者团集、药物计划和patient outcome预测。* Results: 这篇论文提供了价值的应用和LLMs在供药医生方面的限制,对患者护理和未来人工智能驱动医疗解决方案的开发有重要意义。分析提供了PharmacyGPT表现的全面评估,贡献到健康设施中人工智能的权威使用。
    Abstract In this study, we introduce PharmacyGPT, a novel framework to assess the capabilities of large language models (LLMs) such as ChatGPT and GPT-4 in emulating the role of clinical pharmacists. Our methodology encompasses the utilization of LLMs to generate comprehensible patient clusters, formulate medication plans, and forecast patient outcomes. We conduct our investigation using real data acquired from the intensive care unit (ICU) at the University of North Carolina Chapel Hill (UNC) Hospital. Our analysis offers valuable insights into the potential applications and limitations of LLMs in the field of clinical pharmacy, with implications for both patient care and the development of future AI-driven healthcare solutions. By evaluating the performance of PharmacyGPT, we aim to contribute to the ongoing discourse surrounding the integration of artificial intelligence in healthcare settings, ultimately promoting the responsible and efficacious use of such technologies.
    摘要 在这项研究中,我们介绍了药店GPT,一种新的框架,用于评估大型自然语言模型(LLMs)如ChatGPT和GPT-4在仿真药师的角色。我们的方法包括使用LLMs生成可读的患者群集,制定药物计划,预测患者结果。我们在北卡罗来纳大学夏洛 Chapel Hill医院的劳动 intensivist 单元(ICU)中收集到的实际数据进行了研究。我们的分析提供了价值的洞察,探讨了LLMs在临床药师领域的应用和局限性,对患者护理和未来基于人工智能的医疗解决方案的发展产生了影响。通过评估药店GPT的性能,我们希望能在艺术智能在医疗设置中的整合中发挥作用,并促进负责任和有效的使用这些技术。

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

  • paper_url: http://arxiv.org/abs/2307.10168
  • repo_url: None
  • paper_authors: Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T. Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang
  • for: 研究是否可以使用机器学习模型(LLMs)来复制人类在协作任务中的行为。
  • methods: 研究使用现代机器学习模型来模拟人类在“人类计算算法”中的能力,并评估这些模型的成功程度。
  • results: 研究发现,现代机器学习模型可以在一些复杂的协作任务中模拟人类的能力,但成功程度受到请求者对LLM能力的理解、任务下的具体技能要求以及完成这些任务的最佳交互方式的影响。研究还发现人类和LLM在接受指令方面存在差异,并重要地强调了启用人类面向的安全措施,以及训练人类和LLM的合作技能。
    Abstract LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.
    摘要

Exploring Transformer Extrapolation

  • paper_url: http://arxiv.org/abs/2307.10156
  • repo_url: None
  • paper_authors: Zhen Qin, Yiran Zhong, Hui Deng
  • for: 本研究旨在 investigate the conditions for length extrapolation in transformers using Relative Positional Encodings (RPEs), and to derive a new Theoretical Receptive Field (TRF) to measure the receptive field of RPEs without taking any training steps.
  • methods: 本研究使用了 Thorough mathematical and empirical analysis to determine the conditions for length extrapolation in transformers, and derived two practices for language modeling tasks on a variety of corpora.
  • results: 研究发现,如果系列对应的RPE的对数列 converges,那么 transformer 就一定拥有 length extrapolation 性能。 并且在 Wikitext-103、Books、Github 和 WikiBook 等多个dataset上进行了广泛的实验,以证明我们发现的条件的有效性。同时,我们还对 Empirical Receptive Field (ERF) 和 TRF 进行了比较,并在不同的模型上显示了一致的趋势。
    Abstract Length extrapolation has attracted considerable attention recently since it allows transformers to be tested on longer sequences than those used in training. Previous research has shown that this property can be attained by using carefully designed Relative Positional Encodings (RPEs). While these methods perform well on a variety of corpora, the conditions for length extrapolation have yet to be investigated. This paper attempts to determine what types of RPEs allow for length extrapolation through a thorough mathematical and empirical analysis. We discover that a transformer is certain to possess this property as long as the series that corresponds to the RPE's exponential converges. Two practices are derived from the conditions and examined in language modeling tasks on a variety of corpora. As a bonus from the conditions, we derive a new Theoretical Receptive Field (TRF) to measure the receptive field of RPEs without taking any training steps. Extensive experiments are conducted on the Wikitext-103, Books, Github, and WikiBook datasets to demonstrate the viability of our discovered conditions. We also compare TRF to Empirical Receptive Field (ERF) across different models, showing consistently matched trends on the aforementioned datasets. The code is available at https://github.com/OpenNLPLab/Rpe.
    摘要 lenght extrapolation 在 recent years 吸引了许多关注,因为它允许 transformers 在训练中使用的序列长度上进行测试。 previous research 表明,这种属性可以通过使用特制的 Relative Positional Encodings (RPEs) 实现。 although these methods perform well on a variety of corpora, the conditions for length extrapolation have yet to be investigated. This paper attempts to determine what types of RPEs allow for length extrapolation through a thorough mathematical and empirical analysis. We discover that a transformer is certain to possess this property as long as the series that corresponds to the RPE's exponential converges. two practices are derived from the conditions and examined in language modeling tasks on a variety of corpora. As a bonus from the conditions, we derive a new Theoretical Receptive Field (TRF) to measure the receptive field of RPEs without taking any training steps. extensive experiments are conducted on the Wikitext-103, Books, Github, and WikiBook datasets to demonstrate the viability of our discovered conditions. We also compare TRF to Empirical Receptive Field (ERF) across different models, showing consistently matched trends on the aforementioned datasets. the code is available at https://github.com/OpenNLPLab/Rpe.

Gradient Sparsification For Masked Fine-Tuning of Transformers

  • paper_url: http://arxiv.org/abs/2307.10098
  • repo_url: None
  • paper_authors: James O’ Neill, Sourav Dutta
  • for: 本研究目的是提高下游任务的训练 speed和性能,通过对预训练语言模型进行细化调整。
  • methods: 本研究使用的方法包括 GradDrop 和其变种,它们在 backwards pass 中随机mask gradients,以增强预训练语言模型的性能。
  • results: 实验结果表明,GradDrop 能够与使用额外的翻译数据进行中间预训练相比,在 multilingual XGLUE benchmark 上表现出色,并且超过标准的 fine-tuning 和慢搅拌。post-analysis 还显示,GradDrop 能够在未经训练的语言上提高性能。
    Abstract Fine-tuning pretrained self-supervised language models is widely adopted for transfer learning to downstream tasks. Fine-tuning can be achieved by freezing gradients of the pretrained network and only updating gradients of a newly added classification layer, or by performing gradient updates on all parameters. Gradual unfreezing makes a trade-off between the two by gradually unfreezing gradients of whole layers during training. This has been an effective strategy to trade-off between storage and training speed with generalization performance. However, it is not clear whether gradually unfreezing layers throughout training is optimal, compared to sparse variants of gradual unfreezing which may improve fine-tuning performance. In this paper, we propose to stochastically mask gradients to regularize pretrained language models for improving overall fine-tuned performance. We introduce GradDrop and variants thereof, a class of gradient sparsification methods that mask gradients during the backward pass, acting as gradient noise. GradDrop is sparse and stochastic unlike gradual freezing. Extensive experiments on the multilingual XGLUE benchmark with XLMR-Large show that GradDrop is competitive against methods that use additional translated data for intermediate pretraining and outperforms standard fine-tuning and gradual unfreezing. A post-analysis shows how GradDrop improves performance with languages it was not trained on, such as under-resourced languages.
    摘要 现在广泛采用已经预训练的自主学习语言模型进行传输学习,这包括冻结预训练网络的梯度并且只更新新增的分类层的梯度,或者是在所有参数上进行梯度更新。渐进解冻可以考虑到这两种方法之间的折衔,以便在存储和训练速度之间做出一个平衡。然而,是否在训练过程中逐渐解冻层的最佳方法还没有得出确定的答案。在这篇论文中,我们提议使用杂色梯度抑制来规范预训练语言模型,以提高总的精通率。我们引入了GradDrop和其变种,它是一种杂色梯度抑制方法,在反向传播中随机地Mask梯度。GradDrop不同于渐进解冻,它是粒子和随机的。我们在多语言XGLUE测试准则上进行了广泛的实验,结果显示GradDrop与使用额外翻译数据进行中间预训练的方法相当竞争,并且超过了标准的精通和渐进解冻。后续分析表明,GradDrop可以提高语言模型的表现,包括未经训练的语言,如少数语言。

cs.LG - 2023-07-20

Synthetic Control Methods by Density Matching under Implicit Endogeneity

  • paper_url: http://arxiv.org/abs/2307.11127
  • repo_url: None
  • paper_authors: Masahiro Kato, Akari Ohda, Masaaki Imaizumi, Kenichiro McAlinn
  • for: 本研究使用Synthetic control方法(SCM)估计对比案例研究中的效果,SCM可以估计对待单位的Counterfactual outcome,并且是对比案例研究中的一种重要工具。
  • methods: 本研究提出了一种新的SCM方法,基于density matching假设,即对待单位的结果density可以被 aproximated为一个权重加权的 mixture model。通过这个假设,我们可以估计SC weights,并且我们的估计器具有三个优点:一、我们的估计器是 asymptotically unbiased; two、我们可以降低对counterfactual prediction的mean squared error; three、我们的方法可以生成对待单位的治疗效果的全体分布,不仅是预期值。
  • results: 本研究通过实验结果展示了我们的方法的效果,并且证明了其比既有SCM方法更加精准和有效。
    Abstract Synthetic control methods (SCMs) have become a crucial tool for causal inference in comparative case studies. The fundamental idea of SCMs is to estimate counterfactual outcomes for a treated unit by using a weighted sum of observed outcomes from untreated units. The accuracy of the synthetic control (SC) is critical for estimating the causal effect, and hence, the estimation of SC weights has been the focus of much research. In this paper, we first point out that existing SCMs suffer from an implicit endogeneity problem, which is the correlation between the outcomes of untreated units and the error term in the model of a counterfactual outcome. We show that this problem yields a bias in the causal effect estimator. We then propose a novel SCM based on density matching, assuming that the density of outcomes of the treated unit can be approximated by a weighted average of the densities of untreated units (i.e., a mixture model). Based on this assumption, we estimate SC weights by matching moments of treated outcomes and the weighted sum of moments of untreated outcomes. Our proposed method has three advantages over existing methods. First, our estimator is asymptotically unbiased under the assumption of the mixture model. Second, due to the asymptotic unbiasedness, we can reduce the mean squared error for counterfactual prediction. Third, our method generates full densities of the treatment effect, not only expected values, which broadens the applicability of SCMs. We provide experimental results to demonstrate the effectiveness of our proposed method.
    摘要 Synthetic control methods (SCMs) 已成为比较研究中的重要工具,用于估计 causal inference。 SCMs 的基本思想是使用一个权重和平∑ 观察到的结果来估计对待单位的 counterfactual 结果。 SC 的准确性是估计 causal effect 的关键,因此 SCMs 的 estimation 问题已经引起了很多研究。在这篇论文中,我们首先指出了现有 SCMs 存在一种隐藏的内生性问题,即对 untreated units 的结果和 counterfactual 结果模型中的错误项之间的相关性。我们证明了这个问题会导致 causal effect 估计器偏移。然后,我们提出了一种基于 density matching 的新的 SCM,假设待处理单位的结果的概率可以通过一个权重和平∑ untreated units 的结果概率来近似。基于这个假设,我们可以通过匹配待处理结果的 moments 和权重和平∑ untreated units 的 moments来估计 SC 权重。我们的提议方法有三个优点:首先,我们的估计器在 mixture model 的假设下是 asymptotically unbiased。其次,由于 asymptotic unbiasedness,我们可以降低 counterfactual prediction 的 mean squared error。第三,我们的方法可以生成对待单位的治疗效果的全部概率分布,不仅是预期值,这扩展了 SCMs 的应用范围。我们提供实验结果,以证明我们的提议方法的效果。

A Markov Chain Model for Identifying Changes in Daily Activity Patterns of People Living with Dementia

  • paper_url: http://arxiv.org/abs/2307.11126
  • repo_url: https://github.com/nvfl/markov-chain-model
  • paper_authors: Nan Fletcher-Lloyd, Alina-Irina Serban, Magdalena Kolanko, David Wingfield, Danielle Wilson, Ramin Nilforooshan, Payam Barnaghi, Eyal Soreq
  • for: 这个研究是为了检测老人生活在困难中的营养和液体消耗情况,以及对这些情况的影响。
  • methods: 这个研究使用了互联网物联网技术收集了73户老人生活在家中的数据,并使用线性混合效应分析检测了COVID-19大流行对老人的吃饭和喝水习惯的影响。
  • results: 研究发现在白天,老人的厨房活动增加了(t(147) = -2.90,p < 0.001),而在夜晚,厨房活动减少了(t(147) = -2.90,p < 0.001)。此外,研究还提出了一种基于营养模型的方法来检测老人的行为变化。
    Abstract Malnutrition and dehydration are strongly associated with increased cognitive and functional decline in people living with dementia (PLWD), as well as an increased rate of hospitalisations in comparison to their healthy counterparts. Extreme changes in eating and drinking behaviours can often lead to malnutrition and dehydration, accelerating the progression of cognitive and functional decline and resulting in a marked reduction in quality of life. Unfortunately, there are currently no established methods by which to objectively detect such changes. Here, we present the findings of an extensive quantitative analysis conducted on in-home monitoring data collected from 73 households of PLWD using Internet of Things technologies. The Coronavirus 2019 (COVID-19) pandemic has previously been shown to have dramatically altered the behavioural habits, particularly the eating and drinking habits, of PLWD. Using the COVID-19 pandemic as a natural experiment, we conducted linear mixed-effects modelling to examine changes in mean kitchen activity within a subset of 21 households of PLWD that were continuously monitored for 499 days. We report an observable increase in day-time kitchen activity and a significant decrease in night-time kitchen activity (t(147) = -2.90, p < 0.001). We further propose a novel analytical approach to detecting changes in behaviours of PLWD using Markov modelling applied to remote monitoring data as a proxy for behaviours that cannot be directly measured. Together, these results pave the way to introduce improvements into the monitoring of PLWD in naturalistic settings and for shifting from reactive to proactive care.
    摘要 🇨🇳 营养不良和肥虚是老年人智能和功能退化的重要风险因素,也会使人们患有智能和功能退化的人群(PLWD)的入院率增加。宽泛的食品和饮料消耗方式的变化可能会导致营养不良和肥虚,加速智能和功能退化的进程,从而导致生活质量下降。可惜,目前没有可靠的方法可以 объектив地探测这些变化。我们在73户老年人智能和功能退化者的家庭中进行了广泛的量化分析,使用互联网物联网技术收集数据。2019冠状病毒疫情(COVID-19)已经对智能和功能退化者的行为习惯产生了深远的影响,特别是饮食和饮料的消耗方式。使用2019冠状病毒疫情作为自然实验,我们使用线性混合效应模型对21户智能和功能退化者的24小时内的厨房活动进行分析。我们发现了日间厨房活动的 observable 增加(t(147) = -2.90,p < 0.001),并且发现夜间厨房活动的显著减少。此外,我们还提出了一种基于远程监测数据的Markov模型,用于检测智能和功能退化者的行为变化。这些结果将为监测智能和功能退化者的监测提供新的方法,并且可以帮助转换到主动监测。

Diffusion Models for Probabilistic Deconvolution of Galaxy Images

  • paper_url: http://arxiv.org/abs/2307.11122
  • repo_url: https://github.com/yashpatel5400/galgen
  • paper_authors: Zhiwei Xue, Yuhang Li, Yash Patel, Jeffrey Regier
  • for: 这个论文是为了提出一种基于深度生成模型的PSF逆推算法,用于恢复宇宙图像中的细节。
  • methods: 该论文使用了一种基于普通 diffusion 模型的方法,不需要类别器,可以更好地提取宇宙图像中的细节。
  • results: 论文的实验结果表明,基于 diffusion 模型的PSF逆推算法可以更好地捕捉宇宙图像中的细节,并且提供了更多的可能性空间,比如 conditional VAE 的方法。
    Abstract Telescopes capture images with a particular point spread function (PSF). Inferring what an image would have looked like with a much sharper PSF, a problem known as PSF deconvolution, is ill-posed because PSF convolution is not an invertible transformation. Deep generative models are appealing for PSF deconvolution because they can infer a posterior distribution over candidate images that, if convolved with the PSF, could have generated the observation. However, classical deep generative models such as VAEs and GANs often provide inadequate sample diversity. As an alternative, we propose a classifier-free conditional diffusion model for PSF deconvolution of galaxy images. We demonstrate that this diffusion model captures a greater diversity of possible deconvolutions compared to a conditional VAE.
    摘要 天文望远镜捕捉到图像,但图像具有特定的点扩散函数(PSF)。尝试恢复图像为更加锐利PSF后的形态,称为PSF恢复,是一个不定问题,因为PSF混合不是可逆变换。深度生成模型吸引了PSF恢复的应用,因为它们可以对候选图像进行 posterior 分布预测,如果将其混合到PSF中,可能会生成观测结果。然而,经典的深度生成模型如VAEs和GANs经常提供不够的样本多样性。为了解决这问题,我们提议一种无类别的条件扩散模型 дляPSF恢复星系图像。我们证明该扩散模型可以捕捉更多的可能的恢复形态,比 conditional VAE 更加多样化。

PASTA: Pretrained Action-State Transformer Agents

  • paper_url: http://arxiv.org/abs/2307.10936
  • repo_url: None
  • paper_authors: Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot
    for: This paper aims to investigate the use of pre-trained transformer models for reinforcement learning tasks, specifically addressing the problem of adapting models to new environments with limited data.methods: The authors use a unified methodology that includes tokenization at the action and state component level, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT) to adapt the models to downstream tasks.results: The developed models contain fewer than 10 million parameters and can be fine-tuned with fewer than 10,000 parameters during downstream adaptation, allowing for robust policy learning and encouraging further research into the use of transformers for reinforcement learning.
    Abstract Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.
    摘要 自顾学学习在不同计算领域中引发了革命性的思维方式变革,包括自然语言处理、视觉和生物学。现有的方法通常是使用庞大量未标注数据进行预训练 transformer 模型,作为下游任务的开始点,以提高效率。在返回学习领域,研究人员已经采用了这些方法,并开发了基于专家轨迹的预训练模型,以解决广泛的任务,从 робо特斯到推荐系统。然而,现有的方法通常仅适用于特定下游应用程序的精细预训练目标。本文提出了一种叫做 PASTA 的模型,其中包括使用各种设计选择和涵盖广泛的通用下游任务,包括行为做clone、离线学习、感知故障Robustness和动力学变化适应。我们的研究使用一种统一的方法ologies和涵盖了广泛的下游任务,以系统地比较不同的设计选择,并为实践者提供有价值的洞察。关键特点包括动作和状态组件级别的启用,使用基本的预训练目标如下一个token预测,在多个领域同时训练模型,以及使用参数效率的练习(PEFT)。开发的模型中含 fewer than 10 million parameters,并且通过PEFT进行参数练习,可以在下游适应中使用 fewer than 10,000 parameters,使得广泛的社区可以使用这些模型并重现我们的实验。我们希望这种研究会鼓励更多的人使用 transformer 模型的首要原则来表示RL轨迹,并贡献于Robust policy学习。

Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances

  • paper_url: http://arxiv.org/abs/2307.10935
  • repo_url: None
  • paper_authors: Daniel Schwalbe-Koda, Daniel E. Widdowson, Tuan Anh Pham, Vitaliy A. Kurlin
  • for: 这项研究旨在使用计算机模拟和机器学习(ML)技术,为硅酸盐材料的合成创造无监督的材料合成地图。
  • methods: 该研究使用了一种强大的距离度量方法和机器学习分析方法,从253个已知硅酸盐中提取出不同的材料合成条件。
  • results: 研究发现,在不使用标签的情况下,邻近的硅酸盐结构之间的距离度量可以反映硅酸盐的材料合成条件,并且可以预测硅酸盐的合成结果。
    Abstract Zeolites are inorganic materials known for their diversity of applications, synthesis conditions, and resulting polymorphs. Although their synthesis is controlled both by inorganic and organic synthesis conditions, computational studies of zeolite synthesis have focused mostly on organic template design. In this work, we use a strong distance metric between crystal structures and machine learning (ML) to create inorganic synthesis maps in zeolites. Starting with 253 known zeolites, we show how the continuous distances between frameworks reproduce inorganic synthesis conditions from the literature without using labels such as building units. An unsupervised learning analysis shows that neighboring zeolites according to our metric often share similar inorganic synthesis conditions, even in template-based routes. In combination with ML classifiers, we find synthesis-structure relationships for 14 common inorganic conditions in zeolites, namely Al, B, Be, Ca, Co, F, Ga, Ge, K, Mg, Na, P, Si, and Zn. By explaining the model predictions, we demonstrate how (dis)similarities towards known structures can be used as features for the synthesis space. Finally, we show how these methods can be used to predict inorganic synthesis conditions for unrealized frameworks in hypothetical databases and interpret the outcomes by extracting local structural patterns from zeolites. In combination with template design, this work can accelerate the exploration of the space of synthesis conditions for zeolites.
    摘要

Modeling 3D cardiac contraction and relaxation with point cloud deformation networks

  • paper_url: http://arxiv.org/abs/2307.10927
  • repo_url: None
  • paper_authors: Marcel Beetz, Abhirup Banerjee, Vicente Grau
  • for: 该研究旨在开发一种基于点云深度学习的准确评估三维心脏功能的方法,以提高我们对心脏健康和疾病机理的理解。
  • methods: 该方法使用点云深度学习的最新进展,建立了一个点云编码器-解码器结构,以便高效地学习多尺度特征。
  • results: 研究人员对大量的UK Biobank数据集进行了测试,并发现了average Chamfer距离小于图像获取的像素分辨率,同时也发现了与真实数据集的相似性。此外,研究人员还发现了在各个子 популяции中的差异,并且表明了3D凝聚模式可以超越多个临床标准。
    Abstract Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances between the predicted and ground truth anatomies below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.
    摘要 全球单值生物标志物typically used in clinical practice, such as ejection fraction, only provide limited insight into the true 3D cardiac deformation process and therefore limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.

Confidence intervals for performance estimates in 3D medical image segmentation

  • paper_url: http://arxiv.org/abs/2307.10926
  • repo_url: https://github.com/rosanajurdi/SegVal_TMI
  • paper_authors: R. El Jurdi, G. Varoquaux, O. Colliot
  • for: 这 paper 是用来评估医疗图像 segmentation 模型的。
  • methods: 这 paper 使用了 nnU-net 框架和 Medical Decathlon 挑战赛中的两个数据集,以及两种表现指标: dice 准确率和 Hausdorff 距离。
  • results: 这 paper 发现,在不同的测试集大小和表现指标的扩散情况下,参数型的信度范围是Bootstrap估计的可靠近似。此外,它还发现,为了达到某个精度水平,通常需要训练样本数量远少于 classification 任务。 typically,需要约 100-200 个测试样本,而且更Difficult的 segmentation 任务可能需要更多的测试样本。
    Abstract Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across of the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in medical image segmentation. We carry experiments on 3D image segmentation using the standard nnU-net framework, two datasets from the Medical Decathlon challenge and two performance measures: the Dice accuracy and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples.
    摘要 医学分割模型通常会被实际测试。这种测试基于有限的示例图像,因此无法避免噪音。除了平均性能指标之外,报告信息interval也是非常重要。然而,在医学像分割中,这并不是常见的做法。信息interval的宽度取决于测试集大小和性能指标的扩散(测试集中的标准差)。对于分类任务,需要许多测试图像来避免宽的信息interval。但是,在分割任务中,不同的测试图像会带来不同的信息量。在这篇论文中,我们研究了医学像分割中常见的信息interval。我们在使用标准nnU-net框架、医疗十大挑战赛提供的两个数据集和两个性能指标( dice准确率和 Hausdorff 距离)进行了实验。我们发现,参数信息interval是参数Bootstrap估计的可靠近似,并且显示测试集大小和性能指标的扩散对信息interval的影响。进一步地,我们发现,为了 достичь给定的精度,测试样本的数量通常比分类任务低得多。例如,当标准差较低(约3%)时,1% 宽的信息interval只需要100-200个测试样本。更复杂的分割任务可能会导致更高的扩散,需要更多的测试样本。

Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

  • paper_url: http://arxiv.org/abs/2307.10923
  • repo_url: https://github.com/exploita123/charmedforfree
  • paper_authors: Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, Collin M. Stultz
  • for: 这个论文是为了解决现有的自动学习(Self-supervised learning)方法不能处理多Modal时间序列数据的问题。
  • methods: 该论文提出了一种新的自动学习方法——Sequential Multi-Dimensional SSL,它在序列级和个体高维数据级别应用SSL损失来更好地捕捉信息。
  • results: 对两个实际的医疗时间序列数据集进行了实验,结果表明,在先行培育后,使用该方法并then fine-tuning在下游任务上提高了性能,并在一些设置下可以通过不同的自动学习损失函数来提高性能。
    Abstract Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.
    摘要 自适应学习(SSL) для医疗时间序列数据在当前文献中受到了广泛关注,因为这些数据具有高度的资源和重要的生物physiological状态信息。然而,现有的大多数SSL方法仅适用于单模时间序列,例如序列中的结构化特征(如医学实验室值和生物指标)或个人高维度生理学信号(如电cardiogram)。这些现有方法无法轻松地扩展到模型时间序列,其中每个时间步骤都包含结构化特征和高维度数据。在这项工作中,我们解决这个差距,并提议一种新的SSL方法——Sequential Multi-Dimensional SSL。在这种方法中,我们在序列级别和个体高维度数据点级别都应用SSL损失,以更好地捕捉信息在不同级别。我们的策略是对特定的损失函数类型不拘泥,可以是对比性的,如SimCLR,或非对比性的,如VICReg。我们在两个真实的医疗时间序列数据集上进行了实验,其中时间序列包含高频电cardiograms和实验室值和生物指标的序列。我们的实验结果表明,在这些数据集上预训练后,通过精度调整下游任务,可以超过基准值,并在不同的自我超vised损失函数下达到更好的性能。

Language-based Action Concept Spaces Improve Video Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2307.10922
  • repo_url: None
  • paper_authors: Kanchana Ranasinghe, Michael Ryoo
  • for: 学习高效转移和鲁棒的视频表示
  • methods: 使用语言捆绑自我超vised学习将图像CLIP模型适应视频频谱
  • results: 提高零shot和线性探测性能在三个动作认识benchmark上
    Abstract Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domains with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.
    摘要 Translated into Simplified Chinese:最近的对语言图像预训练技术已经导致学习了高度可转移和稳定的图像表示。然而,将这些模型应用到视频频道上仍然是一个开放的问题。我们考虑了一种简单的方法,使用语言绑定的自我超vision学习来适应图像 CLIP 模型到视频频道。我们修改了 temporal 模型,在自我数据采样设置下使用语言Encoder 提取的不同动作概念的特征向量构建动作概念空间。我们引入了两个训练目标,概念练习和概念对接,以保留原始表示的通用性,同时强制行动和其属性之间的关系。我们的方法提高了零shot和直线探测性能在三个动作认识标准 bencmarks 上。

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2307.10907
  • repo_url: https://github.com/apple/ml-entropy-reconstruction
  • paper_authors: Borja Rodríguez-Gálvez, Arno Blaas, Pau Rodríguez, Adam Goliński, Xavier Suau, Jason Ramapuram, Dan Busbridge, Luca Zappella
  • for: 本文研究了多视图自动学习(MVSSL)的成功机制,并通过一种新的下界函数来分析不同的MVSSL家族。
  • methods: 本文使用一种基于信息准则(MI)的下界函数,包括一个 entropy 和一个重建项(ER),来分析不同的MVSSL方法。
  • results: 研究结果表明,使用这种 ER 下界函数可以达到与常见MVSSL方法相当的性能,同时使得训练时使用小批量或小EMA系数时更加稳定。
    Abstract The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.
    摘要 文中所描述的多视图自学习(MVSSL)的机制仍未完全理解。对于对比MVSSL方法的研究,我们通过InfoNCE,一种低下界的共识信息(MI)来研究。但是其他MVSSL方法和MI之间的关系仍然不清楚。我们考虑一种基于 entropy和重建项(ER)的低下界,并通过这个窗口来分析主要的MVSSL家族。我们显示了使用 clustering-based 方法such as DeepCluster和SwAV时,实际上是最大化MI的。我们还重新解释了 distillation-based 方法such as BYOL和DINO的机制,并证明它们通过直接最大化重建项并间接激发稳定的 entropy来实现。我们通过实验证明这一点。 finally,我们表明将常见MVSSL方法的目标替换为ER下界可以实现竞争性的性能,同时使其在训练时使用小批量或小EMA系数时更加稳定。Here's the breakdown of the translation:* 文中所描述的多视图自学习 (MVSSL):The text is discussing the mechanisms behind the success of multi-view self-supervised learning (MVSSL).* 机制仍未完全理解:The mechanisms behind MVSSL are not yet fully understood.* 对于对比MVSSL方法的研究:Research on comparing MVSSL methods.* 通过InfoNCE来研究:Researching through the lens of InfoNCE, a lower bound of mutual information (MI).* 其他MVSSL方法和MI之间的关系仍然不清楚:The relationship between other MVSSL methods and MI is still unclear.* 我们考虑一种基于 entropy和重建项(ER)的低下界:We consider a different lower bound on MI consisting of an entropy and a reconstruction term (ER).* 并通过这个窗口来分析主要的MVSSL家族:And analyze the main MVSSL families through this ER bound.* 我们显示了使用 clustering-based 方法such as DeepCluster和SwAV时,实际上是最大化MI的:We show that using clustering-based methods such as DeepCluster and SwAV, the MI is maximized.* 我们还重新解释了 distillation-based 方法such as BYOL和DINO的机制:We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO.* 并证明它们通过直接最大化重建项并间接激发稳定的 entropy来实现:And prove that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy.* 我们通过实验证明这一点:We confirm this empirically.* finally,我们表明将常见MVSSL方法的目标替换为ER下界可以实现竞争性的性能:Finally, we show that replacing the objectives of common MVSSL methods with the ER bound achieves competitive performance.* 同时使其在训练时使用小批量或小EMA系数时更加稳定:And make them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients.

Variational Point Encoding Deformation for Dental Modeling

  • paper_url: http://arxiv.org/abs/2307.10895
  • repo_url: None
  • paper_authors: Johan Ziruo Ye, Thomas Ørkild, Peter Lempel Søndergaard, Søren Hauberg
  • for: 本研究旨在鼓励更多研究,透过发布新的大量牙齿矩阵数据集。
  • methods: 我们提出了一种扩展FoldingNet的方法,称为Variational FoldingNet(VF-Net),它允许点云表示的 probabilistic 学习。
  • results: 我们的实验结果表明,VF-Net 比现有模型在牙齿扫描和推理方面具有更高的表现力,同时具有更好的鲁棒性。
    Abstract Digital dentistry has made significant advancements in recent years, yet numerous challenges remain to be addressed. In this study, we release a new extensive dataset of tooth meshes to encourage further research. Additionally, we propose Variational FoldingNet (VF-Net), which extends FoldingNet to enable probabilistic learning of point cloud representations. A key challenge in existing latent variable models for point clouds is the lack of a 1-to-1 mapping between input points and output points. Instead, they must rely on optimizing Chamfer distances, a metric that does not have a normalized distributional counterpart, preventing its usage in probabilistic models. We demonstrate that explicit minimization of Chamfer distances can be replaced by a suitable encoder, which allows us to increase computational efficiency while simplifying the probabilistic extension. Our experimental findings present empirical evidence demonstrating the superior performance of VF-Net over existing models in terms of dental scan reconstruction and extrapolation. Additionally, our investigation highlights the robustness of VF-Net's latent representations. These results underscore the promising prospects of VF-Net as an effective and reliable method for point cloud reconstruction and analysis.
    摘要 《数字牙科技术的进步和挑战》Recently, digital dentistry has made significant advancements, but there are still many challenges that need to be addressed. In this study, we release a new and extensive dataset of tooth meshes to encourage further research. Additionally, we propose a new method called Variational FoldingNet (VF-Net), which extends FoldingNet to enable probabilistic learning of point cloud representations.Currently, there is a key challenge in existing latent variable models for point clouds, which is the lack of a 1-to-1 mapping between input points and output points. Instead, they must rely on optimizing Chamfer distances, a metric that does not have a normalized distributional counterpart, preventing its usage in probabilistic models. We demonstrate that explicit minimization of Chamfer distances can be replaced by a suitable encoder, which allows us to increase computational efficiency while simplifying the probabilistic extension.Our experimental findings present empirical evidence demonstrating the superior performance of VF-Net over existing models in terms of dental scan reconstruction and extrapolation. Additionally, our investigation highlights the robustness of VF-Net's latent representations. These results underscore the promising prospects of VF-Net as an effective and reliable method for point cloud reconstruction and analysis.

Learning and Generalizing Polynomials in Simulation Metamodeling

  • paper_url: http://arxiv.org/abs/2307.10892
  • repo_url: https://github.com/jesperhauch/polynomial_deep_learning
  • paper_authors: Jesper Hauch, Christoffer Riis, Francisco C. Pereira
  • for: 本研究旨在提高人工神经网络的 polynomial 拟合能力和通用性,以便在多种工程领域中使用。
  • methods: 本文提出了一种基于多项式神经网络(MNN)的拟合方法,并使用 MNN 作为递归建模Component。
  • results: 实验表明,MNN 比基eline模型更好地泛化,并且其在验证集上的性能与测试集上的性能相似。此外,作者还提出了一种基于 simulations 的模拟中间模型方法,可以更好地拟合 polynomial 时间步骤更新的 simulations。
    Abstract The ability to learn polynomials and generalize out-of-distribution is essential for simulation metamodels in many disciplines of engineering, where the time step updates are described by polynomials. While feed forward neural networks can fit any function, they cannot generalize out-of-distribution for higher-order polynomials. Therefore, this paper collects and proposes multiplicative neural network (MNN) architectures that are used as recursive building blocks for approximating higher-order polynomials. Our experiments show that MNNs are better than baseline models at generalizing, and their performance in validation is true to their performance in out-of-distribution tests. In addition to MNN architectures, a simulation metamodeling approach is proposed for simulations with polynomial time step updates. For these simulations, simulating a time interval can be performed in fewer steps by increasing the step size, which entails approximating higher-order polynomials. While our approach is compatible with any simulation with polynomial time step updates, a demonstration is shown for an epidemiology simulation model, which also shows the inductive bias in MNNs for learning and generalizing higher-order polynomials.
    摘要 “模型学习 polynomials 和泛化到不同分布是Engineering 多个领域的必备技能,因为时间步长更新通常是 polynomials。虽然前向神经网络可以适应任何函数,但它们无法泛化到高阶 polynomials。因此,本文收集并提出了multiplicative neural network(MNN)架构,用于 recursive 构建高阶 polynomials 的近似。我们的实验表明,MNNs 在泛化方面表现更好,并且在验证集中的性能与验证集外的性能相似。此外,我们还提出了一种 simulation metamodeling 方法,用于 simulations with polynomial time step updates。对于这些 simulations,可以通过增加步长来快速 simulate 时间 интерVAL,这意味着需要近似高阶 polynomials。我们的方法与任何具有 polynomial time step updates 的 simulation 相容,并在 epidemiology 模型中进行了示例,这也表明了 MNNs 对于学习和泛化高阶 polynomials 的适应性。”Note that Simplified Chinese is used here, which is the most widely used variety of Chinese in mainland China. If you prefer Traditional Chinese, I can provide that version as well.

Syntactic vs Semantic Linear Abstraction and Refinement of Neural Networks

  • paper_url: http://arxiv.org/abs/2307.10891
  • repo_url: https://github.com/cxlvinchau/linna
  • paper_authors: Calvin Chau, Jan Křetínský, Stefanie Mohr
  • for: 提高神经网络的可扩展性。
  • methods: 使用 linear combination of neurons 来取代单个 neuron,并在 syntaxic 和 semantic 上进行抽象。
  • results: 实现更高的减少,并引入一种改进的减少方法以保持准确性。
    Abstract Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.
    摘要 归纳是一种关键的验证技术,可以提高神经网络的扩展性。然而,归纳神经网络的使用范围还很有限。先前的方法是将一些神经元与其他相似的神经元进行交换,以实现归纳。我们可以将相似性分为逻辑(通过神经元之间的连接量)或semantic(通过神经元对各种输入的活动值)两种。可惜,先前的方法只能实现一定的减少,而且只有部分实现。在这项工作中,我们提供了更 flexible的框架,允许一个神经元被替换为一个线性组合其他神经元,从而提高减少。我们在逻辑和semantic归纳上应用这种方法,并进行实验性评估。此外,我们还引入了一种精细化方法,可以帮助找到更好的减少和精度之间的平衡。

Player-optimal Stable Regret for Bandit Learning in Matching Markets

  • paper_url: http://arxiv.org/abs/2307.10890
  • repo_url: None
  • paper_authors: Fang Kong, Shuai Li
  • For: This paper focuses on the problem of matching markets, specifically on finding a stable matching in an online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms).* Methods: The paper proposes a new algorithm called explore-then-Gale-Shapley (ETGS) and analyzes its performance in terms of the optimal stable regret of each player.* Results: The paper shows that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$, which is a significantly better result than previous works that either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. Additionally, the paper shows that the regret upper bound matches the previously derived lower bound when the preferences of participants satisfy some special conditions.
    Abstract The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.
    摘要 问题的匹配市场已经在文献中进行了长时间的研究,因为它在各种应用场景中具有广泛的应用前景。在这个问题中,找到一个稳定的匹配是一种常见的平衡目标。由于市场参与者通常对他们的偏好不甚清楚,因此一推最近的研究在在线设置下研究了参与者在多轮互动中学习他们未知的偏好。大多数前一些工作只能 deriv theoretically guarantees for player-pessimal stable regret,它是基于参与者最差偏好的稳定匹配中的最低奖励。然而,在最低稳定匹配下,参与者只能获得所有稳定匹配中最低的奖励。为了提高参与者的收益,参与者最佳稳定匹配是最感到满意的。虽然 \citet{basu21beyond} 成功地提出了一个Upper bound for player-optimal stable regret,但其结果可能会是指数增长的,如果参与者偏好的差距很小。whether a polynomial guarantee for this regret exists is a significant but still open problem。在这个工作中,我们提出了一个新的算法名为explore-then-Gale-Shapley(ETGS),并证明了每个参与者的最佳稳定 regret可以 upper bounded by $O(K\log T/\Delta^2)$,where $K$ is the number of arms, $T$ is the horizon, and $\Delta$ is the participants' minimum preference gap among the first $N+1$-ranked arms。这个结果比前一些工作更好,因为它们的目标是player-pessimal stable matching,或者只适用于特殊的市场假设。当参与者的偏好满足某些特殊条件时,我们的 regret upper bound也与之前 derive的下界匹配。

What Twitter Data Tell Us about the Future?

  • paper_url: http://arxiv.org/abs/2308.02035
  • repo_url: None
  • paper_authors: Alina Landowska, Marek Robak, Maciej Skorski
  • For: This paper investigates the futures projected by futurists on Twitter and explores the impact of language cues on anticipatory thinking among social media users.* Methods: The study uses a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using state-of-the-art models. The research employs topic modeling techniques, such as LDA and BERTopic, to identify the topics and language cues used by futurists.* Results: The study finds 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists’ tweets. The research demonstrates that the futurists’ language cues signal futures-in-the-making that enhance social media users to anticipate their own scenarios and respond to them in the present.Here are the three information points in Simplified Chinese text:* For: 这个研究 investigate Twitter 上的未来投影和社交媒体用户的预测思维。* Methods: 这个研究使用了超过100万次公共分享的 tweets,并开发了可扩展的自然语言处理管道,使用当前的模型。研究使用了主题模型,如 LDA 和 BERTopic,来标识未来者的话题和语言指示。* Results: 研究发现了 LDA approach 中的15个话题和 BERTopic approach 中的100个不同的话题。研究表明,未来者的语言指示signal未来的形成,使社交媒体用户能够预测自己的enario并在现在 respond to it。
    Abstract Anticipation is a fundamental human cognitive ability that involves thinking about and living towards the future. While language markers reflect anticipatory thinking, research on anticipation from the perspective of natural language processing is limited. This study aims to investigate the futures projected by futurists on Twitter and explore the impact of language cues on anticipatory thinking among social media users. We address the research questions of what futures Twitter's futurists anticipate and share, and how these anticipated futures can be modeled from social data. To investigate this, we review related works on anticipation, discuss the influence of language markers and prestigious individuals on anticipatory thinking, and present a taxonomy system categorizing futures into "present futures" and "future present". This research presents a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using SOTA models. The study identifies 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists' tweets. These findings contribute to the research on topic modelling and provide insights into the futures anticipated by Twitter's futurists. The research demonstrates the futurists' language cues signals futures-in-the-making that enhance social media users to anticipate their own scenarios and respond to them in present. The fully open-sourced dataset, interactive analysis, and reproducible source code are available for further exploration.
    摘要 人类有一种基本的认知能力,即预期(anticipation),它关注未来的发展和生活。虽然语言标记反映了预期思维,但从自然语言处理的角度来研究预期却有限。这项研究目的是Investigate Twitter上的未来预测和社交媒体用户对未来的预测思维的影响。我们解决的研究问题包括Twitter上预测的未来是什么和这些预测如何被社交数据模型化。为了调查这一点,我们提出了相关的研究和语言标记的影响以及著名人士对预期思维的影响,并提出了一个“现在未来”和“未来现在”的分类系统。本研究使用了一亿多个公共分享的推特信息,并开发了一个可扩展的自然语言处理(NLP)管道,使用当前的最佳实践模型。我们从LDA方法和BERTopic方法中提取了15个主题和100个特定主题,这些发现贡献于主题模型研究,并为Twitter上预测未来提供了新的视角。本研究表明预测者的语言标记可以预示未来的发展,使社交媒体用户能够预测和响应他们的enario。我们提供了全部开源的数据集、交互分析和可重复的代码,以便进一步探索。

Risk-optimized Outlier Removal for Robust Point Cloud Classification

  • paper_url: http://arxiv.org/abs/2307.10875
  • repo_url: None
  • paper_authors: Xinke Li, Junchi Lu
  • for: 这个研究的目的是为了提高点云深度模型在安全敏感场景中的可靠性和安全性,因为这些模型可能会受到意外或自然occurring的点云误差的干扰。
  • methods: 这篇研究提出了一个新的点云异常点除除法,called PointCVaR,可以让标准训练的模型消除额外的异常点和重建数据。这个方法开始是通过做出属性分析,以 determine the influence of each point on the model output,我们称之为点云风险。然后,我们使用Conditional Value at Risk (CVaR) 来优化高风险点的筛选过程。
  • results: 这篇研究在不同的点云误差情况下,通过了多种移除和分类实验,获得了出色的结果。尤其是在受到随机误差、敌意误差和后门触发误差的攻击下,PointCVaR可以成功地防御这些攻击,并且在这些情况下 achieves 87% 的精度。
    Abstract The popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.
    摘要 “随着深度点云模型在安全敏感领域的普及,这些模型对于意外或自然发生的点云噪音的可靠性和安全性受到损害。为了解决这个问题,我们提出了一个新的点云异常点除除法 called PointCVaR,它让标准训练的模型能够更好地消除额外的异常点和重建数据。我们的方法开始 WITH 点云影响分析,决定每个点的影响力,我们称之为点风险。然后,我们使用 Conditional Value at Risk(CVaR)来优化高风险点的范例。我们发现点云噪音通常集中在风险分布的尾部,有较低的频率但高度的风险,导致分类结果受到干扰。尽管不需要额外的训练努力,我们的方法在不同的实验中获得了出色的成绩,包括随机噪音、敌意噪音和后门触发噪音降落。特别是,它在防御后门攻击时取得了87%的准确率。总的来说,我们的PointCVaR可以干扰点云噪音,提高点云分类,使其成为不同情况下的实用插件模组。”

Nonlinear Meta-Learning Can Guarantee Faster Rates

  • paper_url: http://arxiv.org/abs/2307.10870
  • repo_url: None
  • paper_authors: Dimitri Meunier, Zhu Li, Arthur Gretton, Samory Kpotufe
  • for: 本研究的目标是为meta-学习提供理论保证,以便在相关任务之间共享表示结构,从而简化目标任务。
  • methods: 本研究使用了非线性表示,并采用了严格的常见函数 régularization来约束任务特有的偏误。
  • results: 研究人员通过 teoretic 分析和实验 validate了meta-学习的非线性表示下的保证,并证明了随着任务数($N$)的增加,学习共享表示的速率可以scale。
    Abstract Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- \emph{may scale with the number $N$ of tasks} (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,
    摘要 很多最近的理论工作在meta-学中目标是利用相似的表示结构来简化目标任务。重要的是,理论工作的主要目标是理解学习共享表示结构时速度如何随着任务数量 $N$ 和样本数量的增加而增长。在首先步骤中,当共享表示结构和任务特定的回归函数都是线性的时,这种性质 readily reveals the benefits of task aggregation,例如,通过平均Arguments。然而,在实践中,表示结构通常是非线性的,引入了每个任务中的非轻松偏见,这些偏见无法如linear case中那样平均化。在 presente 工作中,我们 derive theoretical guarantees for meta-学with nonlinear representations。具体来说,我们假设共享非线性映射到了无穷dimensional RKHS中,我们显示了适当的 regularization可以减轻任务特定的偏见,同时利用任务特定的回归函数的平滑性。

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.10869
  • repo_url: https://github.com/ase-submission/rtanomaly
  • paper_authors: Wenwei Gu, Jinyang Liu, Zhuangbin Chen, Jianping Zhang, Yuxin Su, Jiazhen Gu, Cong Feng, Zengyin Yang, Michael Lyu
  • for: 本研究旨在提高大规模云服务系统的可靠性和性能,通过准确地识别和定位问题。
  • methods: 本研究提出了一种基于关系和时间特征的多变量异常检测模型(RTAnomaly),通过图注意层学习 metrics 之间的依赖关系,更好地发现异常 metrics。
  • results: 对于公共数据集和两个工业数据集,RTAnomaly 与基eline模型进行比较,实现了平均 F1 分数为 0.929 和 Hit@3 为 0.920,表明RTAnomaly 的优越性。
    Abstract Performance issues permeate large-scale cloud service systems, which can lead to huge revenue losses. To ensure reliable performance, it's essential to accurately identify and localize these issues using service monitoring metrics. Given the complexity and scale of modern cloud systems, this task can be challenging and may require extensive expertise and resources beyond the capacity of individual humans. Some existing methods tackle this problem by analyzing each metric independently to detect anomalies. However, this could incur overwhelming alert storms that are difficult for engineers to diagnose manually. To pursue better performance, not only the temporal patterns of metrics but also the correlation between metrics (i.e., relational patterns) should be considered, which can be formulated as a multivariate metrics anomaly detection problem. However, most of the studies fall short of extracting these two types of features explicitly. Moreover, there exist some unlabeled anomalies mixed in the training data, which may hinder the detection performance. To address these limitations, we propose the Relational- Temporal Anomaly Detection Model (RTAnomaly) that combines the relational and temporal information of metrics. RTAnomaly employs a graph attention layer to learn the dependencies among metrics, which will further help pinpoint the anomalous metrics that may cause the anomaly effectively. In addition, we exploit the concept of positive unlabeled learning to address the issue of potential anomalies in the training data. To evaluate our method, we conduct experiments on a public dataset and two industrial datasets. RTAnomaly outperforms all the baseline models by achieving an average F1 score of 0.929 and Hit@3 of 0.920, demonstrating its superiority.
    摘要 大规模云服务系统中的性能问题会导致重大的收益损失。为确保可靠性,需要准确地识别和定位这些问题使用服务监控指标。由于现代云系统的复杂性和规模,这可能是一项具有挑战性和需要专业知识和资源的任务。现有的方法可能会分析每个指标独立地检测异常。然而,这可能会导致过载的警示,使得工程师难以手动诊断。为了提高性能,不仅需要考虑时间序列中的指标异常,还需要考虑指标之间的相互关系(即关系异常),这可以被视为多变量指标异常检测问题。然而,大多数研究都没有明确提取这两种特征。此外,存在在训练数据中的未标注异常,可能会降低检测性能。为解决这些限制,我们提出了关系时间异常检测模型(RTAnomaly),该模型将指标之间的关系和时间序列信息结合使用。RTAnomaly使用图注意层学习指标之间的依赖关系,以更好地发现可能导致异常的异常指标。此外,我们利用未标注异常学习的概念,以Address the issue of potential anomalies in the training data。为评估我们的方法,我们在公共数据集和两个工业数据集上进行了实验。结果显示,RTAnomaly在所有基线模型之上具有平均F1分数0.929和 Hit@3 0.920,这表明它的优势。

FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback

  • paper_url: http://arxiv.org/abs/2307.10867
  • repo_url: https://github.com/figcapshf/figcapshf
  • paper_authors: Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi
  • for: 这个论文主要是为了解决科学文献中图文合成的问题,提高图文合成的质量和准确性。
  • methods: 该论文使用了一种新的框架,即 FigCaps-HF,来生成图文合成。该框架包括自动评估图文对的质量以及基于人工反馈的学习方法,以优化图文合成的质量和准确性。
  • results: 该论文通过对不同类型的模型进行比较,证明了 FigCaps-HF 框架可以提高图文合成的性能。特别是,当使用 BLIP 作为基础模型时,RLHF 方法可以获得一个平均提升率达 35.7%、16.9% 和 9% 在 ROUGE、BLEU 和 Meteor 等指标中。此外,该论文还释放了一个大规模的 benchmark 数据集,以便进一步评估和发展 RLHF 技术。
    Abstract Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.
    摘要 <>Translate the given text into Simplified Chinese.<>科学视觉和文档中的标题是非常重要的,现有的科学标题生成方法都是基于文档中提取的figure-caption对,但是这些方法 frequently fall short (15),导致生成的标题与读者首选不符。为了生成高质量的标题,我们介绍了FigCaps-HF,一个新的标题生成框架,可以在读者首选的基础上生成标题。我们的框架包括以下两个部分:1. 一种自动评估figure-caption对的质量方法。2. 一种基于人工反馈的强化学习(RLHF)方法,用于优化一个生成figure-to-caption模型,以满足读者首选。我们的简单学习框架在不同的模型上进行了标准化finetuning后,都能够提高性能。特别是当使用BLIP作为基础模型时,我们的RLHF框架实现了ROUGE、BLEU和Meteor等指标中的平均提升为35.7%、16.9%和9%。最后,我们发布了一个大规模的人工反馈 benchmark dataset,以便进一步评估和发展RLHF技术。

Addressing caveats of neural persistence with deep graph persistence

  • paper_url: http://arxiv.org/abs/2307.10865
  • repo_url: https://github.com/ExplainableML/Deep-Graph-Persistence
  • paper_authors: Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke
  • for: 这个论文的目的是为了提出一种新的深度学习中的数据分析方法,以及一种基于这种方法的深度网络复杂度量度。
  • methods: 这个论文使用了topological数据分析的方法,以及一种新的层间拟合方法来处理深度网络。
  • results: 研究发现,深度网络的层次结构和大量 weights 的分布是决定 neural persistence 的两大因素。此外,通过对深度网络进行扩展,可以解决 variance 相关的问题,并且可以准确地量度深度网络的复杂度。
    Abstract Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights. Additionally, the proposed averaging procedure across layers for deep neural networks does not consider interaction between layers. Based on our analysis, we propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers, which is equivalent to calculating neural persistence on one particular matrix. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues through standardisation. Code is available at https://github.com/ExplainableML/Deep-Graph-Persistence .
    摘要 neural persistency 是一种深度学习中的核心度量,在 topological data analysis 领域中提出。在这种工作中,我们发现了论理和实验两个方面的结论:网络权重的方差和大权重的空间吸引力是影响 neural persistency 的主要因素。这些信息对于线性分类器是有用的,但我们发现了深层神经网络中的后Layer没有相关的空间结构,因此 neural persistency 大致相当于网络权重的方差。此外,对于深度神经网络,层融合策略不考虑层之间的交互。基于我们的分析,我们提出了层拓扑下的 filtration 扩展,该扩展等于在一个特定矩阵上计算 neural persistency。这个方法会隐式地包含神经网络中的持续路径和减少方差相关的问题。代码可以在 上找到。

Divide & Bind Your Attention for Improved Generative Semantic Nursing

  • paper_url: http://arxiv.org/abs/2307.10864
  • repo_url: None
  • paper_authors: Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva
  • for: 这篇论文的目的是提出一种基于Generative Semantic Nursing(GSN)的方法,用于解决复杂的提示问题和多个实体之间的属性绑定问题。
  • methods: 该方法使用了两个新的损失函数:一个新的注意力损失函数和一个绑定损失函数,以提高GSN的表现。
  • results: 该方法在多个评估标准上表现出色,能够准确地synthesize所需的对象,并且Attribute binding更加紧密。更多视频和更新可以在项目页面上找到:https://sites.google.com/view/divide-and-bind
    Abstract Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.
    摘要 新型大规模文本至图生成模型,如稳定扩散(SD),已经显示出惊人的成果,具有高准确性。然而,当前领先的模型仍然难以生成完全遵循输入提示的图像。先前的工作,听取与激发(Attend & Excite),引入了生成 semantic nursing(GSN)的概念,通过在推理时间进行交叉注意力优化,以更好地包含 semantics。它在生成简单提示(例如,“一只猫和一只狗”)中显示了扎实的成果。然而,其效果在处理更复杂的提示时下降,并不直接地解决不正确的属性绑定问题。为了解决复杂提示或场景中多个实体的挑战,以及提高属性绑定的问题,我们提出了分区与绑定(Divide & Bind)方法。我们引入了两种新的损失目标:一种新的注意力损失和一种绑定损失。我们的方法在处理复杂提示下能够准确地生成愿景中的目标对象,并且具有改进的属性Alignment。更多视频和更新可以在项目页面()中找到。

Self-paced Weight Consolidation for Continual Learning

  • paper_url: http://arxiv.org/abs/2307.10845
  • repo_url: https://github.com/congwei45/spWC
  • paper_authors: Wei Cong, Yang Cong, Gan Sun, Yuyang Liu, Jiahua Dong
  • for: 这个研究旨在提高sequential task learning中的continual learning效能,并避免catastrophic forgetting这个问题。
  • methods: 我们提出了一个自适应的Weight Consolidation(spWC)框架,通过评估先前任务的推导性贡献来实现 Robust continual learning。我们还开发了一个自适应的调整方法,可以根据关键性表现指标(例如精度)来评估过去任务的困难程度。
  • results: 我们的方法可以对多个任务进行sequential learning,并且可以实现better performance和less computational cost。实验结果显示,我们的方法可以与其他流行的continual learning算法相比,在多个公共Benchmark dataset上实现更好的效果。
    Abstract Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms.
    摘要 CONTINUAL LEARNING算法,它们保持新任务参数与前一个任务相似,在Sequential task learning setting中很受欢迎。但是,1)新的 continual learner的性能将受到前一个任务的贡献的影响,而无法分别评估这些贡献;2)随着任务的增加,现有的算法的计算成本将增加很多,因为它们需要对所有任务进行Regularization。为解决这些挑战,我们提出了一个自适应Weight Consolidation(spWC)框架,以实现Robust continual learning。具体来说,我们开发了一种自适应Regularization,通过测量难度来评估过去任务的优先级。当遇到新任务时,我们将所有过去任务排序为“difficult”到“easy”的顺序,根据优先级。然后,我们将新的 continual learner的参数学习 via 选择保留过去任务中更难的知识。这可以很好地解决catastrophic forgetting问题,同时降低计算成本。我们采用了一种alternative convex search来逐步更新模型参数和优先级权重。我们的spWC框架是可插入的,可以应用于大多数 continual learning算法(例如EWC、MAS和RCIL)以及不同的方向(例如分类和分割)。实验结果表明,我们的提议可以在多个公共 benchmark dataset上提高性能,相比其他流行的 continual learning算法。

Global Precipitation Nowcasting of Integrated Multi-satellitE Retrievals for GPM: A U-Net Convolutional LSTM Architecture

  • paper_url: http://arxiv.org/abs/2307.10843
  • repo_url: https://github.com/reyhaneh-92/genesis_nowcast
  • paper_authors: Reyhaneh Rahimi, Ardeshir Ebtehaj, Ali Behrangi, Jackson Tan
  • for: 这个研究旨在开发一种深度学习架构,用于全球范围内降水预测,每30分钟预测4小时前的降水情况。
  • methods: 这个架构结合了U-Net和卷积长Short-Term Memory(LSTM)神经网络,并使用IMERG和一些关键的降水驱动因素从全球预测系统(GFS)来训练。
  • results: 研究发现,使用不同的训练损失函数,包括平均方差(回归)和焦点损失(分类),对降水预测质量有着不同的影响。结果表明,回归网络在降水轻度(下1.6毫米/小时)方面表现良好,而分类网络在降水极端(大于8毫米/小时)方面可以超过回归网络,以 Critical Success Index(CSI)为标准。同时,包含物理变量可以提高降水预测,特别是在较长的预测时间内。
    Abstract This paper presents a deep learning architecture for nowcasting of precipitation almost globally every 30 min with a 4-hour lead time. The architecture fuses a U-Net and a convolutional long short-term memory (LSTM) neural network and is trained using data from the Integrated MultisatellitE Retrievals for GPM (IMERG) and a few key precipitation drivers from the Global Forecast System (GFS). The impacts of different training loss functions, including the mean-squared error (regression) and the focal-loss (classification), on the quality of precipitation nowcasts are studied. The results indicate that the regression network performs well in capturing light precipitation (below 1.6 mm/hr), but the classification network can outperform the regression network for nowcasting of precipitation extremes (>8 mm/hr), in terms of the critical success index (CSI).. Using the Wasserstein distance, it is shown that the predicted precipitation by the classification network has a closer class probability distribution to the IMERG than the regression network. It is uncovered that the inclusion of the physical variables can improve precipitation nowcasting, especially at longer lead times in both networks. Taking IMERG as a relative reference, a multi-scale analysis in terms of fractions skill score (FSS), shows that the nowcasting machine remains skillful (FSS > 0.5) at the resolution of 10 km compared to 50 km for GFS. For precipitation rates greater than 4~mm/hr, only the classification network remains FSS-skillful on scales greater than 50 km within a 2-hour lead time.
    摘要

Label Calibration for Semantic Segmentation Under Domain Shift

  • paper_url: http://arxiv.org/abs/2307.10842
  • repo_url: None
  • paper_authors: Ondrej Bohdal, Da Li, Timothy Hospedales
  • for: 这篇论文是用于测试一个预训 semantic segmentation 模型在新Domain上的性能是否会严重下降。
  • methods: 这篇论文使用了一种基于域Shift的预训模型进行适应,通过计算软 labels prototype 并根据最相似的分类概率 vector 进行预测。
  • results: 论文显示了这种适应方法可以很快速、几乎不需要更多的计算资源,并且能够提高性能。它还证明了这种适应方法在实际上是非常有用的。
    Abstract Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.
    摘要 “一个先进的语义分割模型在新领域数据上的性能可能会减退很多。我们显示了一个先进模型可以通过计算域转移下的软标签 прототипы,并根据最相似的 вектор预测类别概率来进行预测。我们提出的适应过程快速、计算资源几乎没有成本,并导致了显著的性能提升。我们在实际上非常有用的 sintetic-to-real语义分割问题中展示了这种标签准确化的好处。”Note: "域转移" (domain shift) is translated as "域转移" in Simplified Chinese, and "soft-label" is translated as "软标签" in Simplified Chinese.

Adversarial Conversational Shaping for Intelligent Agents

  • paper_url: http://arxiv.org/abs/2307.11785
  • repo_url: None
  • paper_authors: Piotr Tarasiewicz, Sultan Kenjeyev, Ilana Sebag, Shehab Alshehabi
  • for: 提高对话代理人的智能会话系统稳定性和准确性
  • methods: 使用生成对抗网络(GANPG)和奖励每一个生成步骤(REGS)模型,并在seq2seq和 transformers 框架下进行强化学习
  • results: 通过不同的训练细节,模型可以提高对话代理人的性能和可靠性
    Abstract The recent emergence of deep learning methods has enabled the research community to achieve state-of-the art results in several domains including natural language processing. However, the current robocall system remains unstable and inaccurate: text generator and chat-bots can be tedious and misunderstand human-like dialogue. In this work, we study the performance of two models able to enhance an intelligent conversational agent through adversarial conversational shaping: a generative adversarial network with policy gradient (GANPG) and a generative adversarial network with reward for every generation step (REGS) based on the REGS model presented in Li et al. [18] . This model is able to assign rewards to both partially and fully generated text sequences. We discuss performance with different training details : seq2seq [ 36] and transformers [37 ] in a reinforcement learning framework.
    摘要

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

  • paper_url: http://arxiv.org/abs/2307.11784
  • repo_url: None
  • paper_authors: Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao
  • for: 本文旨在提出一种可靠地在安全关键领域使用学习能力的方法,以确保系统的安全性。
  • methods: 本文提出了一种两步验证方法,以实现可证明的统计保证。
  • results: 本文认为,现有的方法无法实际实现可证明的保证,therefore promoting the two-step verification method for achieving provable statistical guarantees.
    Abstract Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.
    摘要 机器学习技术已经取得了很大的进步,但在安全关键领域使用学习能力的组件仍然存在挑战。其中一个最大的挑战是实现可靠的安全保证。在这篇论文中,我们首先讨论了设计和验证这些系统的工程和研究挑战。然后,根据现有的工作无法实现可证的保证,我们提出了两步验证方法以实现可证的统计保证。

On Combining Expert Demonstrations in Imitation Learning via Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.10810
  • repo_url: https://github.com/ilanasebag/Sliced-MMOT-Imitation-Learning
  • paper_authors: Ilana Sebag, Samuel Cohen, Marc Peter Deisenroth
  • for: 教学Agent特定任务 через专家示范
  • methods: 使用优化运输方法测量Agent和专家轨迹之间的距离,并将多个专家示范合并在OT上
  • results: 在OpenAI Gym控制环境中,提出了一种使用多个专家示范的方法,并分析了其效率,发现标准方法不总是最优
    Abstract Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.
    摘要 copied from clipboard模仿学习(IL)目的是教导代理人特定任务通过专家示范。一种关键的IL方法是定义代理人和专家之间的距离,并找到一个代理人策略,以最小化这个距离。优质运输方法在模仿学习中广泛应用,它们提供了测量代理人和专家轨迹之间有意义的距离的方法。然而,多个专家示范的组合尚未得到广泛的研究。标准方法是简单地 concatenate 状态(-动作)轨迹,这会导致轨迹是多模的。我们提出了一种 альтернатив 方法,使用多个多重最优运输距离,使得多个和多样的状态轨迹在OT意义上能够合理地组合,提供一个更加有意义的 geometric average 的示范。我们的方法允许代理人从多个专家中学习,并在 OpenAI Gym 控制环境中进行了效率分析,结果显示,标准方法并不总是优化的。

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

  • paper_url: http://arxiv.org/abs/2307.10805
  • repo_url: None
  • paper_authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
  • for: 提高分布式学习中通信开销的减少
  • methods: 利用矩阵列中具有不同分散度的特征进行压缩,并采用适应式dropout和适应式量化策略
  • results: 与现有分布式学习框架相比,提供5.6%以上的分类精度提升,同时减少了320倍的通信开销
    Abstract This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.
    摘要
  1. Adaptive feature-wise dropout: The intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped.2. Adaptive feature-wise quantization: The non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression.Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.Translation in Simplified Chinese:这篇论文提出了一种新的通信减少的split学习(SL)框架,名为SplitFC,它降低了在SL训练过程中传输中间特征和梯度 вектор的通信开销。SplitFC的关键思想是利用不同的散度度在矩阵列中。SplitFC包括两种压缩策略:1. 适应特征 wise dropout:中间特征 вектор通过适应dropout概率确定了dropout probabilities,然后根据链规则,相关的中间梯度 вектор也会被Drop。2. 适应特征 wise quantization:未Drop的中间特征和梯度 вектор通过适应压缩水平确定了压缩级别,以避免压缩误差。压缩级别的优化准确表达得到了关闭式表达。实验结果表明,SplitFC在MNIST、CIFAR-10和CelebA datasets上提供了 más de 5.6%的分类精度提升,同时与无压缩SL框架相比,它需要320倍少的通信开销。

Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

  • paper_url: http://arxiv.org/abs/2307.10803
  • repo_url: None
  • paper_authors: Hanchen Yang, Wengen Li, Shuyu Wang, Hui Li, Jihong Guan, Shuigeng Zhou, Jiannong Cao
  • For: This paper provides a comprehensive survey of existing spatial-temporal data mining (STDM) studies for ocean science, including a review of widely-used ST ocean datasets and their unique characteristics, as well as techniques for data quality enhancement and various STDM tasks.* Methods: The paper reviews and discusses various techniques for STDM in ocean science, including data preprocessing, feature extraction, and machine learning algorithms for tasks such as prediction, event detection, pattern mining, and anomaly detection.* Results: The paper highlights the unique challenges and opportunities of STDM in ocean science, and discusses promising research opportunities in this field, including the application of advanced STDM techniques to climate forecasting and disaster warning.
    Abstract With the rapid amassing of spatial-temporal (ST) ocean data, many spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, including climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated but with unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models on ST ocean data. To the best of our knowledge, a comprehensive survey of existing studies remains missing in the literature, which hinders not only computer scientists from identifying the research issues in ocean data mining but also ocean scientists to apply advanced STDM techniques. In this paper, we provide a comprehensive survey of existing STDM studies for ocean science. Concretely, we first review the widely-used ST ocean datasets and highlight their unique characteristics. Then, typical ST ocean data quality enhancement techniques are explored. Next, we classify existing STDM studies in ocean science into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate on the techniques for these tasks. Finally, promising research opportunities are discussed. This survey can help scientists from both computer science and ocean science better understand the fundamental concepts, key techniques, and open challenges of STDM for ocean science.
    摘要 随着空间时间(ST)海洋数据的快速汇集,许多空间时间数据挖掘(STDM)研究已经进行以解决海洋问题,如气候预测和灾害警告。相比一般ST数据(例如交通数据),ST海洋数据更加复杂,但具有独特特征,如多样性和高稀畴性。这些特征使得设计和训练STDM模型对ST海洋数据变得更加困难。据我们所知,现有的相关研究检索在文献中缺失,这不仅阻碍了计算机科学家从海洋数据挖掘中了解研究问题,也阻碍了海洋科学家应用先进的STDM技术。在这篇论文中,我们提供了海洋科学领域的全面的STDM研究检索。具体来说,我们首先评论了广泛使用的ST海洋数据集和其独特特征。然后,我们探讨了一般ST海洋数据质量提升技术。接着,我们分类了现有的STDM研究,并详细介绍了这些任务的技术。最后,我们讨论了有前途的研究机遇。这种检索可以帮助计算机科学家和海洋科学家更好地理解STDM的基本概念、关键技术和开放的挑战,以及在海洋科学领域应用STDM技术的可能性。

Meta-Transformer: A Unified Framework for Multimodal Learning

  • paper_url: http://arxiv.org/abs/2307.10802
  • repo_url: https://github.com/invictus717/MetaTransformer
  • paper_authors: Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue
  • for: 这个研究旨在建立一个能够处理多种模式的模型,并将其与不同的模式进行关联。
  • methods: 这个方法使用一个冻结的encoder来进行多模式认识,并将不同的输入数据maps到共同的token空间,以EXTRACT高级 semantic feature。
  • results: 这个方法可以在12种模式上进行通用的学习,包括基本的认识(文本、图像、点 cloud、音频、影片)、实际应用(X-ray、infrared、颜色、IMU)和数据采矿(图形、表格、时间序列)。
    Abstract Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers. Code will be available at https://github.com/invictus717/MetaTransformer
    摘要 多Modal学习旨在建立处理多种模式的模型。尽管多年的发展,仍然困难设计处理多种模式的统一网络(例如自然语言、2D图像、3D点云、音频、视频、时间序列、表格数据)的模型,因为这些模式之间存在隐藏的差异。在这项工作中,我们提出了一个框架,名为Meta-Transformer,它利用一个冻结的Encoder来实现多Modal感知,无需任何对准的多Modal训练数据。Meta-Transformer框架由三个主要组件组成:一个统一的数据Tokenizer、一个共享Encoder和下游任务的任务特定头。Meta-Transformer是首个在12种模式上进行统一学习的框架,无需对数据进行匹配。在不同的Benchmark上进行的实验表明,Meta-Transformer可以处理各种任务,包括基本的感知(文本、图像、点云、音频、视频)、实用应用(X射线、红外、偏振、IMU)和数据挖掘(图形、表格、时间序列)。Meta-Transformer表明了未来在使用Transformer进行多Modal智能的发展具有扎实的前景。代码将在https://github.com/invictus717/MetaTransformer上提供。

Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case

  • paper_url: http://arxiv.org/abs/2307.11782
  • repo_url: None
  • paper_authors: Meixuan He, Yuqing Liang, Jinlan Liu, Dongpo Xu
  • for: investigate the convergence properties of Adam algorithm in non-convex settings and develop a better understanding of its performance.
  • methods: introduce precise definitions of ergodic and non-ergodic convergence, and establish a weaker sufficient condition for the ergodic convergence guarantee of Adam.
  • results: prove that the last iterate of Adam converges to a stationary point for non-convex objectives, and obtain the non-ergodic convergence rate of $O(1/K)$ for function values under the Polyak-Lojasiewicz (PL) condition.
    Abstract Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical application. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which is arbitrarily close to $o(1/\sqrt{K})$. More importantly, we prove, for the first time, that the last iterate of Adam converges to a stationary point for non-convex objectives. Finally, we obtain the non-ergodic convergence rate of $O(1/K)$ for function values under the Polyak-Lojasiewicz (PL) condition. These findings build a solid theoretical foundation for Adam to solve non-convex stochastic optimization problems.
    摘要 亚当是一种常用的机会估计算法在机器学习中。然而,它的整合仍然未全面理解,特别是在非对称设定下。本文专注于探索亚当的参数设定,以提高其在实际应用中的性能。主要贡献如下:首先,我们引入了精确的ergodic和non-ergodic整合定义,这些定义包括大多数 Stochastic optimization算法的整合形式。同时,我们强调non-ergodic整合的superiority。第二,我们提出了一个较弱的充分必要条件,以确保亚当的ergodic整合保证。这允许更加relaxed的参数选择。基于这个基础,我们获得了几乎确定的almost sure ergodic整合速率,它是$o(1/\sqrt{K})$。更重要的是,我们证明了亚当的最后迭代向非对称目标函数 converge。最后,我们取得了非ergodic整合速率为$O(1/K)$,它是PL conditon下的函数值。这些发现建立了亚当在非对称随机估计问题上的坚固理论基础。

Optimizing PatchCore for Few/many-shot Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.10792
  • repo_url: https://github.com/scortexio/patchcore-few-shot
  • paper_authors: João Santos, Triet Tran, Oliver Rippel
  • for: 这个论文主要关注于ew-shot anomaly detection(AD)领域的研究,特别是使用只有几个选择的样本来分辨正常和异常数据。
  • methods: 这篇论文使用了PatchCore,当前的全shot AD/AS算法,进行研究,包括优化其多种超参数和将supervised learning中知悉的技术转移到AD领域。
  • results: 实验表明,可以通过优化超参数和使用图像水平的扩展来实现显著性能提升,并在VisA dataset上实现了新的state of the art。此外,该论文还提出了未来研究的可能性,即研究具有强 inductive bias的特征提取器。
    Abstract Few-shot anomaly detection (AD) is an emerging sub-field of general AD, and tries to distinguish between normal and anomalous data using only few selected samples. While newly proposed few-shot AD methods do compare against pre-existing algorithms developed for the full-shot domain as baselines, they do not dedicatedly optimize them for the few-shot setting. It thus remains unclear if the performance of such pre-existing algorithms can be further improved. We address said question in this work. Specifically, we present a study on the AD/anomaly segmentation (AS) performance of PatchCore, the current state-of-the-art full-shot AD/AS algorithm, in both the few-shot and the many-shot settings. We hypothesize that further performance improvements can be realized by (I) optimizing its various hyperparameters, and by (II) transferring techniques known to improve few-shot supervised learning to the AD domain. Exhaustive experiments on the public VisA and MVTec AD datasets reveal that (I) significant performance improvements can be realized by optimizing hyperparameters such as the underlying feature extractor, and that (II) image-level augmentations can, but are not guaranteed, to improve performance. Based on these findings, we achieve a new state of the art in few-shot AD on VisA, further demonstrating the merit of adapting pre-existing AD/AS methods to the few-shot setting. Last, we identify the investigation of feature extractors with a strong inductive bias as a potential future research direction for (few-shot) AD/AS.
    摘要 新型异常检测(AD)是一个新趋势的分支,它目标是使用只有几个选择的样本分类正常和异常数据。虽然新提出的几个AD方法会比较旧的全shot预测器,但它们没有专门优化它们为几个shot设定。因此,其性能仍然存在uncertainty。我们在这里解决这个问题。我们对PatchCore,当前的全shotAD/AS算法,在几个shot和多个shot设定下进行AD/AS性能的研究。我们假设可以通过(I)优化其各种超参数,以及(II)将几个shot学习中的技术转移到AD领域来实现性能提高。我们在公共的VisA和MVTec AD datasets上进行了广泛的实验,发现(I)可以通过优化特征提取器来实现显著性能提高,并且(II)图像水平的扩展可以,但并不一定,提高性能。基于这些发现,我们在VisA上实现了新的状态态的AD,进一步证明了适应前 exist AD/AS方法到几个shot设定的价值。最后,我们认为在(几个shot)AD/AS领域 investigating feature extractors with strong inductive bias 是一个可能的未来研究方向。

Adversarial attacks for mixtures of classifiers

  • paper_url: http://arxiv.org/abs/2307.10788
  • repo_url: None
  • paper_authors: Lucas Gnecco Heredia, Benjamin Negrevergne, Yann Chevaleyre
  • for: 提高鲁棒性 against adversarial attacks
  • methods: 使用mixtures of classifiers (a.k.a. randomized ensembles)
  • results: 引入两种攻击性质(有效性和最大化),并证明现有攻击不符合这两种性质。还提出了一种新的攻击方法called lattice climber attack,并在binary linear setting下提供了理论保证,并在 synthetic和实际数据上进行了实验验证。
    Abstract Mixtures of classifiers (a.k.a. randomized ensembles) have been proposed as a way to improve robustness against adversarial attacks. However, it has been shown that existing attacks are not well suited for this kind of classifiers. In this paper, we discuss the problem of attacking a mixture in a principled way and introduce two desirable properties of attacks based on a geometrical analysis of the problem (effectiveness and maximality). We then show that existing attacks do not meet both of these properties. Finally, we introduce a new attack called lattice climber attack with theoretical guarantees on the binary linear setting, and we demonstrate its performance by conducting experiments on synthetic and real datasets.
    摘要 合并分类器(即随机 ensemble)已经被提议用于提高对抗骚扰攻击的Robustness。然而,已经证明现有的攻击方法并不适用于这种类型的分类器。在这篇论文中,我们讨论了攻击混合的问题,并提出了两个愿望的攻击特性(有效性和最大化)。我们then表明现有的攻击方法并不满足这两个特性。最后,我们介绍了一种新的攻击方法called lattice climber attack,并提供了对二分线性设定下的理论保证。我们通过对 sintetic和实际数据进行实验,证明了这种攻击方法的性能。

Feed-Forward Source-Free Domain Adaptation via Class Prototypes

  • paper_url: http://arxiv.org/abs/2307.10787
  • repo_url: None
  • paper_authors: Ondrej Bohdal, Da Li, Timothy Hospedales
  • for: 本研究旨在探讨源自由领域适应的快速化方法,以替代基于反射传播的适应方法。
  • methods: 本方法基于预训练模型计算类 prototype,实现了快速化适应并且只需要小量时间。
  • results: 本研究实现了准确率的显著提升,并且比普通适应方法快速得多。
    Abstract Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.
    摘要 源自由领域适应已经成为很受欢迎的方法,因为它的实用性和不需要访问源数据。然而,适应过程仍然需要一定的时间,并且主要基于优化,使用反向传播。在这种工作中,我们提出了一种简单的前向方法,挑战需要反向传播基于适应。我们的方法基于使用预训练模型计算类下的prototype,实现了与预训练模型的准确率强劲提高,并且只需一小部分时间。

Efficient Beam Tree Recursion

  • paper_url: http://arxiv.org/abs/2307.10779
  • repo_url: None
  • paper_authors: Jishnu Ray Chowdhury, Cornelia Caragea
  • for: 这个论文的目的是提出一种简单的扩展,以提高 Gumbel Tree RvNN 的长度整合性表现,并维持与其他任务的相似性。
  • methods: 这个论文使用的方法是识别 BT-RvNN 的主要瓶颈,并提出一些简化其内存使用的策略。
  • results: 这个论文的结果显示,使用这些策略可以将 BT-RvNN 的内存使用量降低 $10$-$16$ 倍,并创造一个新的州分-of-the-art 在 ListOps 中,同时保持与其他任务的相似性。
    Abstract Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by $10$-$16$ times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d}$ into a sequence contextualizer of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models.
    摘要 “Beam Tree Recursive Neural Network(BT-RvNN)最近被提出,它是Gumbel Tree RvNN的简单扩展,并在ListOps中实现了状态前瞻性的长度泛化性,同时保持与其他任务的相似性。然而,虽然不是最差的一种,BT-RvNN仍然具有昂贵的内存使用。在这篇论文中,我们确定了BT-RvNN的主要瓶颈是排序函数和循环细胞函数的杂谱。我们提出了一些缓解瓶颈的策略,以降低BT-RvNN的内存使用。总的来说,我们的策略不仅降低了BT-RvNN的内存使用量$10$-$16$倍,还创造了一个新的状态前瞻性在ListOps中,同时保持与其他任务的相似性。此外,我们还提出了利用BT-RvNN生成的潜在树节点表示来将BT-RvNN转换为一个序列Contextualizer的形式$f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$。因此,我们的提议不仅开启了RvNN的扩展可能性,还标准化了使用BT-RvNN作为深度学习工具箱中的另一个构建件,可以轻松堆叠或者与其他流行的模型相互作用,如Transformers和结构化状态空间模型。”

Assessing the Use of AutoML for Data-Driven Software Engineering

  • paper_url: http://arxiv.org/abs/2307.10774
  • repo_url: None
  • paper_authors: Fabio Calefato, Luigi Quaranta, Filippo Lanubile, Marcos Kalinowski
  • for: 填补AI/ML技术专业人员短缺的问题,促进自动机器学习(AutoML)的应用。
  • methods: 使用混合方法研究,包括12种终端AutoML工具在两个SE数据集上的比较,以及对实践者和研究者的调查和访谈。
  • results: 发现AutoML解决方案可以在SE领域中的分类任务中表现更好 than manually trained and optimized models,但目前可用的AutoML解决方案仍未能完全支持所有队员和开发工作流程的自动化。
    Abstract Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.
    摘要 背景:由于人工智能(AI)和机器学习(ML)在软件开发中的普及,公司困难找到具备深层理解AI/ML技术的员工。在这种情况下,AutoML在解决AI/ML技能差距方面表现出了扎根的优势,因为它承诺自动化建立AI/ML管道,通常需要专业的团队成员进行工程。目标:尽管有增加的兴趣和高期望,但是有关AutoML在开发AI/ML相关系统的团队中的采用和专家和研究人员对其看法的信息不够。方法:为了填补这些空白,本文提出了一项混合方法研究,包括12个终端AutoML工具在两个SE数据集上的benchmark,以及与相关专家和研究人员进行详细交流的用户调查。结果:我们发现AutoML解决方案可以在SE领域中对分类任务进行更好的模型生成,而且我们的发现还表明现有的AutoML解决方案并不能够完全支持自动化ML开发工作流程中的所有阶段和所有团队成员。结论:我们从研究中得到了关于如何使用AutoML促进SE研究人员的活动,以及如何设计下一代AutoML技术的技术建议。

Music Genre Classification with ResNet and Bi-GRU Using Visual Spectrograms

  • paper_url: http://arxiv.org/abs/2307.10773
  • repo_url: None
  • paper_authors: Junfei Zhang
  • for: 提高音乐播放服务的用户体验和满意度,即Music Recommendation Systems。
  • methods: 使用视觉spectrogram作为输入,并提出了一种 hybrid 模型,结合 Residual neural Network (ResNet) 和 Gated Recurrent Unit (GRU),以更好地捕捉音乐数据的复杂性。
  • results: 提出了一种新的 Automatic Music Genre Classification (AMGC) 系统,可以更好地捕捉音乐数据的复杂性,并且可能提高音乐推荐系统的准确率。
    Abstract Music recommendation systems have emerged as a vital component to enhance user experience and satisfaction for the music streaming services, which dominates music consumption. The key challenge in improving these recommender systems lies in comprehending the complexity of music data, specifically for the underpinning music genre classification. The limitations of manual genre classification have highlighted the need for a more advanced system, namely the Automatic Music Genre Classification (AMGC) system. While traditional machine learning techniques have shown potential in genre classification, they heavily rely on manually engineered features and feature selection, failing to capture the full complexity of music data. On the other hand, deep learning classification architectures like the traditional Convolutional Neural Networks (CNN) are effective in capturing the spatial hierarchies but struggle to capture the temporal dynamics inherent in music data. To address these challenges, this study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU). This model is designed to provide a more comprehensive analysis of music data, offering the potential to improve the music recommender systems through achieving a more comprehensive analysis of music data and hence potentially more accurate genre classification.
    摘要 音乐推荐系统已成为现代音乐流媒体服务的重要组成部分,以提高用户体验和满意度。然而,现有的音乐分类方法受到了复杂的音乐数据的限制,尤其是在音乐种类归类方面。传统的机器学习技术可能有潜力,但是它们依赖于人工设计的特征和特征选择,无法捕捉音乐数据的全面复杂性。相反,深度学习分类架构如传统的卷积神经网络(CNN)可以捕捉音乐数据的空间层次结构,但是它们很难捕捉音乐数据中的时间动态特征。为解决这些挑战,本研究提出了一种新的方法,使用视觉спектрограм作为输入,并提出了一种混合模型,结合了Residual神经网络(ResNet)和Gated Recurrent Unit(GRU)。这种模型设计用于为音乐数据提供更全面的分析,并且可能提高音乐推荐系统的准确率。

Unveiling Emotions from EEG: A GRU-Based Approach

  • paper_url: http://arxiv.org/abs/2308.02778
  • repo_url: None
  • paper_authors: Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi
    for: 这项研究旨在使用EEG数据进行情感识别,以提高人机交互和情感计算领域的发展。methods: 这项研究使用了Gated Recurrent Unit(GRU)算法,它是一种Recurrent Neural Networks(RNNs)的变种,通过利用EEG信号来预测情感状态。研究者们使用了公共可访问的数据集,包括在无动作状态下的中性数据以及受到刺激后的人类EEG记录,以及激发 happiness、neutral和negative情感的刺激。为了获得最佳的特征提取,研究者们对EEG数据进行了磁态除、频率筛选和normalization处理。results: 研究者们使用了GRU模型,并在验证集上达到了100%的准确率。与其他机器学习方法相比,GRU模型的Extreme Gradient Boosting Classifier具有最高的准确率。研究者们还对模型的混淆矩阵进行了分析,从而获得了精准的情感分类结果。这项研究表明了深度学习模型如GRU在情感识别方面的潜力,并且开创了新的情感计算领域的可能性。
    Abstract One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exposed to stimuli evoking happy, neutral, and negative emotions. For the best feature extraction, we pre-process the EEG data using artifact removal, bandpass filters, and normalization methods. With 100% accuracy on the validation set, our model produced outstanding results by utilizing the GRU's capacity to capture temporal dependencies. When compared to other machine learning techniques, our GRU model's Extreme Gradient Boosting Classifier had the highest accuracy. Our investigation of the confusion matrix revealed insightful information about the performance of the model, enabling precise emotion classification. This study emphasizes the potential of deep learning models like GRUs for emotion recognition and advances in affective computing. Our findings open up new possibilities for interacting with computers and comprehending how emotions are expressed through brainwave activity.
    摘要 (Simplified Chinese)一个非常重要的研究领域在情感计算是使用EEG数据进行情感识别。在这项研究中,我们使用了Gated Recurrent Unit(GRU)算法,这是一种Recurrent Neural Networks(RNNs)的变种,以EEG信号来预测情感状态。我们的公共可访问数据集包括普通的中性数据以及在刺激人们表现出喜、中性和负情的EEG记录。为了获得最佳的特征提取,我们对EEG数据进行了噪声除除、频率筛选和 нормализа化处理。在验证集上得到100%的准确率,我们的模型表现出色,利用GRU捕捉时间相关性的能力。与其他机器学习技术相比,我们的GRU模型的Extreme Gradient Boosting Classifier(EGGB)准确率最高。我们对模型的混淆矩阵进行了调查,获得了模型表现的有用信息,启发精准的情感分类。这项研究强调了深度学习模型如GRU的情感识别潜力,并提出了对情感计算的新可能性。我们的发现打开了与计算机交互和理解脑波活动中情感表达的新可能性。

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

  • paper_url: http://arxiv.org/abs/2307.10768
  • repo_url: https://github.com/zhanglab-deepneurocoglab/worm
  • paper_authors: Ankur Sikarwar, Mengmi Zhang
  • for: 本研究的目的是开发一个完整的工作记忆(WM)benchmark数据集,以便用于AI WM模型的开发和评估。
  • methods: 本研究使用了10个任务和100万个实验,评估了4种功能、3种领域和11种行为和神经特征。同时,还包括了人类行为的参照值作为比较标准。
  • results: 研究发现,AI模型在一些情况下能够模拟大脑中的工作记忆特征,如 primacy 和 recency 效应,以及各个领域和功能的神经团块和相关性。然而,也发现现有模型存在一些限制,无法完全approximate人类行为。这个数据集将成为跨 дисциплиinary的资源,用于比较和改进WM模型,研究WM的神经基础,并开发人类样式的WM模型。
    Abstract Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.
    摘要 工作记忆(WM),一种基本的认知过程,协助短暂存储、结合、操作和抽取信息,在推理和决策任务中扮演至关重要的角色。为了有效开发和评估人工智能WM模型,需要一些可靠的benchmark数据集。在这里,我们介绍了一个全面的Working Memory(WorM)benchmark数据集,用于这个目的。WorM包括10个任务和总共100万个尝试,评估了4种功能、3种领域和11种行为和神经特征。我们将现有的循环神经网络和转换器模型在所有这些任务上同时训练和测试。我们还包括了人类行为标准 als an upper bound for comparison。我们的结果表明,人工智能模型在脑中的WM特征中复制了一些特征,如劣antage和最新效应,以及特定领域和功能的神经团和相关特征。在实验中,我们还发现了现有模型的一些局限性,无法模拟人类行为。这个数据集作为一个 ценный资源,可以为认知心理学、神经科学和人工智能社区提供一个标准化的框架,用于比较和改进WM模型,调查WM的神经基础,并开发人类类似的WM模型。我们的源代码和数据可以在https://github.com/ZhangLab-DeepNeuroCogLab/WorM上获取。

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

  • paper_url: http://arxiv.org/abs/2307.10763
  • repo_url: https://github.com/mondalanindya/msqnet
  • paper_authors: Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta
  • for: 提出了一种actor-agnostic multi-modal multi-label action recognition方法,以解决actor-specific pose estimation和多个行为同时发生的问题。
  • methods: 提出了一种基于 transformer 检测框架的 Multi-modal Semantic Query Network (MSQNet) 模型,利用视觉和文本模式更好地表示行为类别。
  • results: 在五个公开的数据集上进行了广泛的实验,并 consistently outperformed actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%.
    Abstract Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.
    摘要 现有的动作识别方法通常是actor-specific的,因为actor之间存在内在的 topological 和 apparent 差异。这会导致actor-specific 姿势估计(例如人 VS 动物),从而增加模型设计复杂度和维护成本。此外,它们通常只学习视觉modal alone 和单个标签分类,而忽略其他可用的信息源(例如类名文本)以及同时发生的多个动作。为了超越这些限制,我们提出了一种新的approach called 'actor-agnostic multi-modal multi-label action recognition', which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.

Mitigating Voter Attribute Bias for Fair Opinion Aggregation

  • paper_url: http://arxiv.org/abs/2307.10749
  • repo_url: None
  • paper_authors: Ryosuke Ueda, Koh Takeuchi, Hisashi Kashima
  • For: This paper focuses on developing fair opinion aggregation methods to address biases in decision-making, particularly in tasks without objectively true labels.* Methods: The authors propose a combination of opinion aggregation models, such as majority voting and the Dawid and Skene model, with fairness options like sample weighting and data splitting. They also introduce a new Soft D&S model to improve soft label estimation.* Results: The experimental results show that the combination of Soft D&S and data splitting is effective for dense data, while weighted majority voting is effective for sparse data. These findings support fair opinion aggregation in decision-making, both for human and machine-learning models.Here’s the simplified Chinese text for the three key points:* For: 这篇论文关注了在决策中减少意见偏见,特别是在没有唯一正确标签的任务中。* Methods: 作者们提议结合意见集成模型,如多数投票和达韦德和锈模型,以及公平选项,如样本权重和数据分割。他们还提出了一种新的软D&S模型,以提高软标签估计的准确性。* Results: 实验结果表明,将软D&S模型与数据分割结合使用,对于稠密数据是有效的,而Weighted多数投票对于稀疏数据是有效的。这些发现将为人类和机器学习模型的均衡意见集成提供支持。
    Abstract The aggregation of multiple opinions plays a crucial role in decision-making, such as in hiring and loan review, and in labeling data for supervised learning. Although majority voting and existing opinion aggregation models are effective for simple tasks, they are inappropriate for tasks without objectively true labels in which disagreements may occur. In particular, when voter attributes such as gender or race introduce bias into opinions, the aggregation results may vary depending on the composition of voter attributes. A balanced group of voters is desirable for fair aggregation results but may be difficult to prepare. In this study, we consider methods to achieve fair opinion aggregation based on voter attributes and evaluate the fairness of the aggregated results. To this end, we consider an approach that combines opinion aggregation models such as majority voting and the Dawid and Skene model (D&S model) with fairness options such as sample weighting. To evaluate the fairness of opinion aggregation, probabilistic soft labels are preferred over discrete class labels. First, we address the problem of soft label estimation without considering voter attributes and identify some issues with the D&S model. To address these limitations, we propose a new Soft D&S model with improved accuracy in estimating soft labels. Moreover, we evaluated the fairness of an opinion aggregation model, including Soft D&S, in combination with different fairness options using synthetic and semi-synthetic data. The experimental results suggest that the combination of Soft D&S and data splitting as a fairness option is effective for dense data, whereas weighted majority voting is effective for sparse data. These findings should prove particularly valuable in supporting decision-making by human and machine-learning models with balanced opinion aggregation.
    摘要 “多元意见的统计发挥了重要的决策作用,例如在招聘和贷款审核中,以及在指导学习中标签数据。 Although majority voting和现有的意见统计模型在简单任务上效果良好,但在无明确真实标签的任务中,它们无法应对不同意见的分歧。具体来说,当投票者属性如性别或种族引入偏见到意见时,统计结果将因投票者属性的分布而异。一个均衡的投票者群体是有利于公平统计结果的,但可能具有困难。在本研究中,我们考虑了基于投票者属性的公平意见统计方法,并评估这些方法的公平性。为此,我们考虑了结合意见统计模型(如多数投票和道维德和斯凯纳模型)和公平选项(如抽样重量)。对于评估公平性,随机软标签被视为更好的选择,而不是硬coded标签。我们首先解决了不考虑投票者属性的soft label估计问题,并发现了一些限制。为了解决这些限制,我们提出了一个新的Soft D&S模型,具有改善了 soft label 估计的精度。此外,我们还评估了不同公平选项与Soft D&S模型的整体公平性,使用人工和半自然数据。实验结果显示,结合Soft D&S模型和抽样重量的公平选项是在厚度数据中有效的,而Weighted多数投票则是在叠节数据中有效的。这些发现将在支持人类和机器学习模型的投票结果均衡中提供价值。”

Fairness-Aware Client Selection for Federated Learning

  • paper_url: http://arxiv.org/abs/2307.10738
  • repo_url: None
  • paper_authors: Yuxin Shi, Zelei Liu, Zhuan Shi, Han Yu
  • for: 提高 Federated Learning(FL)客户端选择的公平性和模型性能。
  • methods: 基于 Lyapunov 优化的 Fairness-aware Federated Client Selection(FairFedCS)方法,通过考虑客户端的声誉、参与 FL 任务的时间和模型性能的贡献,动态调整客户端选择概率。
  • results: 在实际的 multimedia 数据集上进行了广泛的实验,并显示了 FairFedCS 可以提高平均 fairness 19.6% 和测试精度 0.73% 比最佳状态的方法。
    Abstract Federated learning (FL) has enabled multiple data owners (a.k.a. FL clients) to train machine learning models collaboratively without revealing private data. Since the FL server can only engage a limited number of clients in each training round, FL client selection has become an important research problem. Existing approaches generally focus on either enhancing FL model performance or enhancing the fair treatment of FL clients. The problem of balancing performance and fairness considerations when selecting FL clients remains open. To address this problem, we propose the Fairness-aware Federated Client Selection (FairFedCS) approach. Based on Lyapunov optimization, it dynamically adjusts FL clients' selection probabilities by jointly considering their reputations, times of participation in FL tasks and contributions to the resulting model performance. By not using threshold-based reputation filtering, it provides FL clients with opportunities to redeem their reputations after a perceived poor performance, thereby further enhancing fair client treatment. Extensive experiments based on real-world multimedia datasets show that FairFedCS achieves 19.6% higher fairness and 0.73% higher test accuracy on average than the best-performing state-of-the-art approach.
    摘要 federated learning(FL)已经允许多个数据拥有者(即FL客户)共同训练机器学习模型,无需披露私人数据。由于FL服务器只能在每次训练中选择一定数量的客户,因此选择FL客户已成为一个重要的研究问题。现有的方法通常是增强FL模型性能或增强FL客户的公平待遇。尚未解决FL客户性能和公平待遇考虑的权衡问题。为解决这个问题,我们提出了公平性感知 Federated Client Selection(FairFedCS)方法。基于 Lyapunov 优化,它在FL客户选择概率中进行了动态调整,并且同时考虑了FL客户的声誉、参与FL任务的时间和对模型性能的贡献。不使用阈值基于声誉筛选,因此允许FL客户在感知性能不佳时重新恢复声誉,从而进一步提高FL客户的公平待遇。经过基于实际 multimedia 数据集的广泛实验,我们发现 FairFedCS 平均提高了19.6%的公平性和0.73%的测试准确率。

Long-Tail Theory under Gaussian Mixtures

  • paper_url: http://arxiv.org/abs/2307.10736
  • repo_url: https://github.com/armanbolatov/long_tail
  • paper_authors: Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina, Zhenisbek Assylbekov
  • for: 这篇论文关注 Feldman 的长尾理论 (2020),提出了一个简单的 Gaussian 混合模型,以测试不同类型的标签模型在长尾分布下的性能。
  • methods: 论文使用了一个线性分类器和一个非线性分类器,以评估它们在长尾分布下的适用程度。
  • results: 论文发现,在长尾分布下,非线性分类器能够对新数据进行更好的适应,而线性分类器则无法降低一定水平的普遍化错误。此外,论文还发现,当尾部的长度变短时,两种模型之间的性能差距可以降低。
    Abstract We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.
    摘要 我们建议一个简单的 Gaussian 混合模型来生成数据,遵循 Feldman 的长尾理论(2020)。我们显示了一个线性分类器无法在提案的模型中降低泛化误差下限,而一个具有记忆容量的非线性分类器则可以。这证实了对长尾分布的数据,罕见的训练示例必须被考虑来取得最佳的泛化至新数据。最后,我们显示了线性和非线性模型之间的性能差距可以随着尾部的短化而减少,通过实验证明在 sintetic 和实际数据上。

Comparison between transformers and convolutional models for fine-grained classification of insects

  • paper_url: http://arxiv.org/abs/2307.11112
  • repo_url: None
  • paper_authors: Rita Pucci, Vincent J. Kalkman, Dan Stowell
  • for: 本研究旨在提高昆虫种类的自动分类精度,尤其是在同一个分类类型下的种类 diferenciación。
  • methods: 本研究使用了深度学习算法,特别是 transformer 和 convolutional 层结构,进行比较研究。
  • results: 研究发现,hybrid 模型在准确性和执行速度两个方面均有优异表现,而 transformer 模型在样本缺乏时具有更高的执行速度。
    Abstract Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.
    摘要 细致分类问题困难,主要因为找到分化特征困难。当应用到同一种植物类中的物种识别时,这问题更加困难。这是因为物种经常共享 morphological 特征,使其困难分 differentiate。我们考虑 insecta 纲。识别昆虫对生物多样性监测至关重要,因为它们是生态系统的基础居民。公民科学在野外采集昆虫图像,提供了专家创建改进的分布图的机会。我们有数十亿个图像需要自动分类,深度学习算法是细致任务中的主要技术之一。在 SOTA 中,深度学习领域非常肥沃,因此如何选择算法?我们将关注 Odonata 和 Coleoptera 两个目。我们提出了一个初步比较研究,检验了两种最为知名的计算机视觉层结构:变换层和卷积层。我们比较了 T2TViT、EfficientNet 和 ViTAE 三种模型的性能,分析了每种物种、每种形态、性别、推理时间和总性能。虽然我们所观察到的性能很高,但我们的分析表明,混合模型在准确性和推理速度两个方面都有优势。此外,完全转换基础模型在准确性方面表现出色,而完全卷积基础模型在推理速度方面表现出色。这证明了转换层在样本缺乏时具有稳定性和快速推理能力。

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

  • paper_url: http://arxiv.org/abs/2307.10719
  • repo_url: None
  • paper_authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan
  • For: The paper discusses the risks of malicious use of large language models (LLMs) and the limitations of existing defense mechanisms, such as model fine-tuning or output censorship.* Methods: The paper presents theoretical limitations of semantic censorship approaches, highlighting the inherent challenges in censoring LLM outputs due to their programmatic and instruction-following capabilities.* Results: The paper argues that the challenges of censorship extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones, and proposes that the problem of censorship should be reevaluated and treated as a security problem to mitigate potential risks.Here is the same information in Simplified Chinese text:* For: 论文探讨大语言模型(LLM)的可能的恶用行为风险,以及现有的防御机制的不足。* Methods: 论文描述了semantic censorship的理论限制,强调 LLM 的程序化和指令执行能力使得 censored 输出仍然存在问题。* Results: 论文认为, censored 输出的问题不仅限于semantic censorship,攻击者可以通过收集 permissible 输出来重建不当输出,therefore, the problem of censorship should be reevaluated and treated as a security problem to mitigate potential risks.
    Abstract Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed censorship approaches treat the issue as a machine learning problem and rely on another LM to detect undesirable content in LLM outputs. In this paper, we present the theoretical limitations of such semantic censorship approaches. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities. Furthermore, we argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones. As a result, we propose that the problem of censorship needs to be reevaluated; it should be treated as a security problem which warrants the adaptation of security-based approaches to mitigate potential risks.
    摘要

Differences Between Hard and Noisy-labeled Samples: An Empirical Study

  • paper_url: http://arxiv.org/abs/2307.10718
  • repo_url: https://github.com/mahf93/hard-vs-noisy
  • paper_authors: Mahsa Forouzesh, Patrick Thiran
  • for: 本研究旨在解决难度强、标签错误的样本集合中的噪声样本问题。
  • methods: 我们提出了一种系统性的实验方法,用于分析困难样本和噪声标签样本之间的相似性和 diferencias。我们还提出了一种简单 yet effective的度量,可以从噪声标签样本中筛选出噪声样本,保留困难样本。
  • results: 我们的研究表明,使用我们提出的度量筛选出噪声标签样本后,模型在训练过后的测试精度得到了最高的提升。此外,我们的数据分配方法在实际世界中存在标签噪声时也表现出色。
    Abstract Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a simple yet effective metric that filters out noisy-labeled samples while keeping the hard samples. We study various data partitioning methods in the presence of label noise and observe that filtering out noisy samples from hard samples with this proposed metric results in the best datasets as evidenced by the high test accuracy achieved after models are trained on the filtered datasets. We demonstrate this for both our created synthetic datasets and for datasets with real-world label noise. Furthermore, our proposed data partitioning method significantly outperforms other methods when employed within a semi-supervised learning framework.
    摘要 <>按照以下转换规则将文本翻译成简化中文:1. 将"noisy"和"incorrectly"替换为"噪音"和"错误"。2. 将"samples"替换为"样本"。3. 将"hard"和"difficult"替换为"困难"。4. 将"labels"替换为"标签"。5. 将"existing"替换为"现有"。6. 将"systematic"替换为"系统的"。7. 将"empirical"替换为"实际的"。8. 将"controlled"替换为"控制的"。9. 将"filter"替换为"筛选"。10. 将"partitions"替换为"分区"。转换后的文本如下:EXTRACTING NOISY OR INCORRECTLY LABELED SAMPLES FROM A LABELED DATASET WITH HARD/DIFFICULT SAMPLES IS AN IMPORTANT YET UNDERE-EXPLORED TOPIC. TWO GENERAL AND OFTEN INDEPENDENT LINES OF WORK EXIST, ONE FOCUSES ON ADDRESSING NOISY LABELS, AND ANOTHER DEALS WITH HARD SAMPLES. HOWEVER, WHEN BOTH TYPES OF DATA ARE PRESENT, MOST EXISTING METHODS TREAT THEM EQUALLY, WHICH RESULTS IN A DECLINE IN THE OVERALL PERFORMANCE OF THE MODEL. IN THIS PAPER, WE FIRST DESIGN VARIOUS SYNTHETIC DATASETS WITH CUSTOM HARDNESS AND NOISINESS LEVELS FOR DIFFERENT SAMPLES. OUR PROPOSED SYSTEMATIC EMPIRICAL STUDY ENABLES US TO BETTER UNDERSTAND THE SIMILARITIES AND MORE IMPORTANTLY THE DIFFERENCES BETWEEN HARD-TO-LEARN SAMPLES AND INCORRECTLY LABELED SAMPLES. THESE CONTROLLED EXPERIMENTS PAVE THE WAY FOR THE DEVELOPMENT OF METHODS THAT DISTINGUISH BETWEEN HARD AND NOISY SAMPLES. THROUGH OUR STUDY, WE INTRODUCE A SIMPLE YET EFFECTIVE METRIC THAT FILTERS OUT NOISY-LABELED SAMPLES WHILE KEEPING THE HARD SAMPLES. WE STUDY VARIOUS DATA PARTITIONING METHODS IN THE PRESENCE OF LABEL NOISE AND OBSERVE THAT FILTERING OUT NOISY SAMPLES FROM HARD SAMPLES WITH THIS PROPOSED METRIC RESULTS IN THE BEST DATASETS AS EVIDENCED BY THE HIGH TEST ACCURACY ACHIEVED AFTER MODELS ARE TRAINED ON THE FILTERED DATASETS. WE DEMONSTRATE THIS FOR BOTH OUR CREATED SYNTHETIC DATASETS AND FOR DATASETS WITH REAL-WORLD LABEL NOISE. FURTHERMORE, OUR PROPOSED DATA PARTITIONING METHOD SIGNIFICANTLY OUTPERFORMS OTHER METHODS WHEN EMPLOYED WITHIN A SEMI-SUPERVISED LEARNING FRAMEWORK.

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2307.10711
  • repo_url: None
  • paper_authors: Jiachun Pan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng, Hanshu Yan
  • for: 这篇论文旨在Addressing the challenge of diffusion probabilistic model (DPM) customization when only available supervision is a differentiable metric defined on the generated contents.
  • methods: 该论文提出了一种新的方法 called AdjointDPM,它首先使用扩展ODE来生成新的样本,然后使用逆变数法来归导损失的梯度到模型参数(包括conditioning信号、网络参数和初始噪声)。
  • results: 该论文通过三个有趣的任务来证明AdjointDPM的效果:将视觉特效转换为标识码嵌入,finetune DPMs для特定类型的风格化,以及优化初始噪声生成对抗样本 для安全审核。
    Abstract Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.
    摘要 现有的自定义方法需要访问多个参考示例,以将预训练的扩散概率模型(DPM)与用户提供的概念进行对接。这篇论文目标是解决DPM自定义时,只有用户提供的可微分度量表示的挑战。因为扩散过程中的DPM采样过程 involve recursive calls to the denoising UNet,直观梯度反propagation需要存储所有迭代过程的间接状态,导致内存消耗极高。为解决这个问题,我们提出了一种新的方法AdjointDPM。首先,AdjointDPM使用扩散模型来生成新的样本,然后使用逆散射敏感度法来反propagate损失的梯度到模型参数(包括conditioning信号、网络参数和初始噪音)。为了减少在前向生成和反propagation过程中的数值错误,我们进一步将概率流ODE和扩充ODE转换为简单的非硬式ODE,并使用加速的exponential integration。最后,我们示例了AdjointDPM在三个有趣的任务上的效果:将视觉特效转换为标识码嵌入,finetune DPMs for specific types of stylization,和优化初始噪音以生成安全审核中的敌意样本。

Reparameterized Policy Learning for Multimodal Trajectory Optimization

  • paper_url: http://arxiv.org/abs/2307.10710
  • repo_url: https://github.com/haosulab/RPG
  • paper_authors: Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
  • for: 本研究的目标是解决高维连续动作空间中RL政策参数化的挑战。
  • methods: 我们提出了一种原则正的框架,将连续RL政策视为环境优质轨迹的生成模型。通过将政策 conditional on一个隐藏变量,我们得到了一种新的可变 bounds,它鼓励环境探索。
  • results: 我们提出了一种实用的模型基于RL方法,called Reparameterized Policy Gradient (RPG),它利用多模态政策参数化和学习的世界模型来实现强大的探索能力和高数据效率。实验结果表明,我们的方法可以帮助代理人避免环境中的局部优点,解决杂质奖励环境,并在各种任务中达到优秀的性能。
    Abstract We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. Our method consistently outperforms previous approaches across a range of tasks. Code and supplementary materials are available on the project page https://haosulab.github.io/RPG/
    摘要 我们研究了重 parametrization policies 的挑战,在高维连续行为空间中进行学习反馈学习(RL)。我们的目标是开发一个多模态策略,超越常用的 Gaussian 参数化的限制。为 достичь这一目标,我们提出了一种原则的框架,将连续RL策略模型为优质轨迹生成模型。通过对策略进行受限变量条件,我们得到了一种新的variational bound,作为优化目标,这会促进环境的探索。然后,我们提出了一种实用的基于模型的RL方法,called Reparameterized Policy Gradient(RPG),利用多模态策略参数化和学习的世界模型,以实现强大的探索能力和高数据效率。我们的方法在多个任务中表现出色,可以帮助代理人避免环境中的地方最优化,并通过嵌入物体内部的内在奖励来解决稀悉奖励环境。我们的方法在多个任务中比前一些方法表现出色,代码和补充材料可以在项目页面https://haosulab.github.io/RPG/ 中找到。

TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars

  • paper_url: http://arxiv.org/abs/2307.10705
  • repo_url: https://github.com/chequanghuy/TwinLiteNet
  • paper_authors: Quang Huy Che, Dinh Phuc Nguyen, Minh Quan Pham, Duc Khai Lam
  • for: 本研究旨在提出一种轻量级的模型,用于自动驾驶车辆环境理解的驱动区域和车道线分割。
  • methods: 该模型基于TwinLiteNet方法,它是一种廉价的模型,但具有高准确率和高效性。
  • results: 实验结果表明,TwinLiteNet在BDD100K数据集上 achievement mIoU分数为91.3%,对比现有方法相对较高,且具有较少的计算资源需求。 Code可以在url{https://github.com/chequanghuy/TwinLiteNet}中获取。
    Abstract Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}.
    摘要 Semantic segmentation 是自动驾驶中常见的任务,用于理解周围环境。驱动区域分割和车道检测是安全和高效导航的关键,但原始semantic segmentation模型 computationally expensive和需要高级硬件,不适合自动驾驶车辆中的嵌入式系统。这篇论文提出了一种轻量级的模型,用于驱动区域和车道分割。TwinLiteNet是一种便宜的设计,但它可以实现高度准确和高速的分割结果。我们在BDD100K数据集上评估TwinLiteNet,并与现代模型进行比较。实验结果表明,我们的TwinLiteNet与现有方法相似,需要 significatively fewer computational resources。具体来说,TwinLiteNet在驱动区域任务中 achieves mIoU 分数为 91.3%,并在车道检测任务中 achieves IoU 分数为 31.08%,仅使用 0.4 万个参数。此外,TwinLiteNet可以在实时中在有限的计算能力的嵌入式设备上运行,特别是在 Jetson Xavier NX 上 achieve 60 FPS,使其成为自动驾驶车辆的理想解决方案。代码可以在以下链接中找到:

Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits

  • paper_url: http://arxiv.org/abs/2307.10704
  • repo_url: None
  • paper_authors: Sharyal Zafar, Raphaël Feraud, Anne Blavette, Guy Camilleri, Hamid Ben
  • for: 这篇论文的目的是提出一种完全分散式的充电系统,以解决电动车充电过载和电压限制问题。
  • methods: 该系统采用了自适应多代理系统哲学,并使用多臂投掷学来处理系统不确定性。
  • results: caso study表明,该系统具有分散式、扩展性、实时性、无模型基础和公平性等特点。
    Abstract The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation.
    摘要 electric vehicles 和 photovoltaics 的快速增长可能会引入新的挑战,如电流拥堵和电压限制由峰值负荷带来。这些问题可以通过智能充电控制来缓解。文章中已经提出了中央化智能充电解决方案。但这些解决方案可能缺乏扩展性和中央化的缺点,如唯一点失败和数据隐私问题。分散化可以解决这些挑战。本文提出了一个完全分散式智能充电系统,使用适应多代理系统的哲学。该系统使用多臂投掷学来处理系统不确定性。提出的系统是分散式、可扩展、实时、模型自由、具有公平性的。还提供了一个详细的案例研究以评估性能。

Graphs in State-Space Models for Granger Causality in Climate Science

  • paper_url: http://arxiv.org/abs/2307.10703
  • repo_url: None
  • paper_authors: Víctor Elvira, Émilie Chouzenoux, Jordi Cerdà, Gustau Camps-Valls
  • for: 评估时间序列之间的预测可能性
  • methods: 使用图ematrix模型和lasso正则化
  • results: 提高了对标准Granger causality方法的比较Here’s a more detailed explanation of each point:
  • for: The paper is written to assess the predictability of time series from another time series using Granger causality, which is a widely used method in many applied disciplines.
  • methods: The paper uses a graphical perspective of state-space models and a recently presented expectation-maximization algorithm called GraphEM to estimate the linear matrix operator in the state equation of a linear-Gaussian state-space model. Lasso regularization is included in the M-step, which is solved using a proximal splitting Douglas-Rachford algorithm.
  • results: The proposed model and inference technique are demonstrated to have benefits over standard Granger causality methods through experiments in toy examples and challenging climate problems.
    Abstract Granger causality (GC) is often considered not an actual form of causality. Still, it is arguably the most widely used method to assess the predictability of a time series from another one. Granger causality has been widely used in many applied disciplines, from neuroscience and econometrics to Earth sciences. We revisit GC under a graphical perspective of state-space models. For that, we use GraphEM, a recently presented expectation-maximisation algorithm for estimating the linear matrix operator in the state equation of a linear-Gaussian state-space model. Lasso regularisation is included in the M-step, which is solved using a proximal splitting Douglas-Rachford algorithm. Experiments in toy examples and challenging climate problems illustrate the benefits of the proposed model and inference technique over standard Granger causality methods.
    摘要 格兰治 causality(GC)frequently 被视为不是实际的 causality。然而,它仍然是最广泛使用的方法来评估时间序列之间的预测性。格兰治 causality 在各种应用领域中广泛使用,从神经科学和经济统计到地球科学。我们在图表视角下重新审视GC。为此,我们使用 GraphEM,一种最近提出的期望最大化算法来估计状态方程中的线性矩阵运算符。lasso 规范化包括在M-step中,使用距离 Douglas-Rachford 算法解决。实验在小例子和挑战性气候问题中ILLUSTRATE 我们提议的模型和推理方法的优势于标准Granger causality方法。

Self2Self+: Single-Image Denoising with Self-Supervised Learning and Image Quality Assessment Loss

  • paper_url: http://arxiv.org/abs/2307.10695
  • repo_url: https://github.com/JK-the-Ko/Self2SelfPlus
  • paper_authors: Jaekyun Ko, Sanghwan Lee
  • for: 本研究旨在提出一种基于单个噪声图像的自监督学习方法,以便提高噪声除去效果的可行性和实用性。
  • methods: 该方法使用阻塞卷积来提取特征,并使用无参图质量评估来导航训练过程。另外,方法还使用bernoulli采样和dropout来随机 sampling实例从输入图像集,以提高训练效果。
  • results: 实验结果表明,提出的方法在 sintetic 和实际 dataset 上达到了当前最佳的噪声除去性能。这表明了方法的有效性和实用性,可能用于各种噪声除去任务。
    Abstract Recently, denoising methods based on supervised learning have exhibited promising performance. However, their reliance on external datasets containing noisy-clean image pairs restricts their applicability. To address this limitation, researchers have focused on training denoising networks using solely a set of noisy inputs. To improve the feasibility of denoising procedures, in this study, we proposed a single-image self-supervised learning method in which only the noisy input image is used for network training. Gated convolution was used for feature extraction and no-reference image quality assessment was used for guiding the training process. Moreover, the proposed method sampled instances from the input image dataset using Bernoulli sampling with a certain dropout rate for training. The corresponding result was produced by averaging the generated predictions from various instances of the trained network with dropouts. The experimental results indicated that the proposed method achieved state-of-the-art denoising performance on both synthetic and real-world datasets. This highlights the effectiveness and practicality of our method as a potential solution for various noise removal tasks.
    摘要 最近,基于超级vised学习的减噪方法表现出色。然而,它们依赖于外部 dataset 中的噪音-清洁图像对,这限制了它们的应用场景。为解决这些 limitation,研究人员对减噪网络的训练进行了更多的关注。在这种情况下,我们提出了使用单一静止输入图像进行网络训练的自我监督学习方法。使用了扩展 convolution 来抽取特征,并使用无参图像质量评估来引导训练过程。此外,我们使用 Bernoulli 抽样来从输入图像集中随机选择实例进行训练。通过 averaging 生成的多个实例的 predicates,我们得到了最终的结果。实验结果表明,我们的方法在 synthetic 和实际世界 dataset 上实现了 state-of-the-art 的减噪性能。这表明了我们的方法在减噪任务中的实用性和可行性。

Fractional Denoising for 3D Molecular Pre-training

  • paper_url: http://arxiv.org/abs/2307.10683
  • repo_url: https://github.com/fengshikun/frad
  • paper_authors: Shikun Feng, Yuyan Ni, Yanyan Lan, Zhi-Ming Ma, Wei-Ying Ma
  • for: 提高3D分子预训练方法的性能,特别是在药物搜寻任务中。
  • methods: 提出了一种新的混合噪声策略,包括对 dip 角和坐标进行噪声处理。同时,提出了一种新的分数噪声处理方法(Frad),可以更好地适应分子的 ani sowropic 特征。
  • results: 实验表明,Frad 可以更好地提高分子表示性,并在 9 个 QM9 任务和 7 个 MD17 任务中达到新的顶峰性。
    Abstract Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.
    摘要 “坐标干扰是一种有前途的3D分子预训方法,它在不同的下游药物探索任务中表现出色。理论上,这个目标等于学习力场,这是对下游任务有帮助的。然而,坐标干扰学习一个有效的力场受到两个挑战,即低覆盖样本和各向同性力场。这些挑战的根本原因是现有的坐标干扰方法对分子的分布假设不能够捕捉分子的非对称特征。为了解决这些挑战,我们提出了一种新的混合噪音策略,包括坐标和方向夹角的噪音。但是,将这种混合噪音进行传统的噪音除法不再等于学习力场。经过理论的推导,我们发现这个问题是因为输入构造的假设所导致的。为了解决这问题,我们提出了一种新的分解方法(Frad),它只对后者的坐标部分进行噪音除法。这样,Frad可以同时具有较低的能量结构和力场等于性。实验结果显示Frad在分子表现方面有新的顶峰性,在QM9和MD17上分别取得9/12和7/8的新纪录。”

Deep learning for classification of noisy QR codes

  • paper_url: http://arxiv.org/abs/2307.10677
  • repo_url: None
  • paper_authors: Rebecca Leygonie, Sylvain Lobry, ), Laurent Wendling (LIPADE)
  • for: 本研究旨在定义基于深度学习的古典分类模型在抽象图像上的限制,当应用于不可见化对象的图像。
  • methods: 我们使用了基于深度学习的图像分类模型,并对QR码(快速响应码)进行了训练。QR码不是为人类手动读取而设计的,因此我们可以通过对QR码生成的信息进行分析,了解深度学习模型在抽象图像分类中的限制。
  • results: 我们的研究结果表明,基于深度学习的模型可以在抽象图像分类中表现出优异的性能,并且在噪声存在的情况下也能够保持一定的稳定性。这项研究表明,基于深度学习的模型可以在理解抽象图像方面发挥作用。
    Abstract We wish to define the limits of a classical classification model based on deep learning when applied to abstract images, which do not represent visually identifiable objects.QR codes (Quick Response codes) fall into this category of abstract images: one bit corresponding to one encoded character, QR codes were not designed to be decoded manually. To understand the limitations of a deep learning-based model for abstract image classification, we train an image classification model on QR codes generated from information obtained when reading a health pass. We compare a classification model with a classical (deterministic) decoding method in the presence of noise. This study allows us to conclude that a model based on deep learning can be relevant for the understanding of abstract images.
    摘要 我们想定义深度学习模型在抽象图像分类中的限制,当应用于不可识别的视觉对象。二维码(快速响应码)是这类抽象图像的一个例子:每一比特对应一个编码字符,二维码不是为人类手动解码。通过在健康通行证中读取的信息生成的二维码,我们训练了一个图像分类模型,并与经典(束缚)解码方法进行比较,以评估深度学习模型在抽象图像分类中的局限性。这项研究允许我们 conclued that deep learning模型对抽象图像有 relevance。

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

  • paper_url: http://arxiv.org/abs/2307.10655
  • repo_url: None
  • paper_authors: Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Jun Zhang
  • for: 这个论文主要是为了探讨 Federated Learning(FL)中可以共同培训多个参与者的方法,以保护数据隐私。
  • methods: 这篇论文分析了不同类型的共享方法,包括模型共享、synthetic数据共享和知识共享。
  • results: 论文通过对不同共享方法的性能和通信开销进行比较,以及对模型泄露和会员推测攻击的评估,提供了一些结论。
    Abstract Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.
    摘要 受领学习(Federated Learning,FL)已经出现为保持隐私的高效模式,多个方面共同训练不同的数据集。与传统中央学习不同,FL 允许客户端共享隐私保护的信息而不曝光私有数据集。这种方法不仅保证了更好的隐私保护,还促进了多个参与者之间更加有效和安全的协作。因此,FL 已经吸引了广泛的研究人员,促使了许多相关的评估文章。然而,大多数这些评估文章都集中在共享模型参数 durante el proceso de entrenamiento,而忽略了其他形式的本地信息的共享的潜在价值。在这篇文章中,我们提供了一个系统的评估,即在 FL 中何时共享什么,并强调模型的实用性、隐私泄露和通信效率。这种评估与以往不同,主要由以下四个特点:1. 我们提出了一个新的分类方法,将 FL 方法分为三类共享信息:模型共享、合成数据共享和知识共享。2. 我们分析了不同共享方法的隐私攻击的敏感性,并评估了提供一定隐私保证的防御机制。3. 我们进行了广泛的实验,比较不同共享方法在 FL 中的性能和通信开销。此外,我们评估了模型反向泄露和成员身份攻击的风险,并比较了不同防御策略的效果。4. 我们讨论了当前方法的缺陷和未来方向的改进。总之,本文提供了一个系统的评估,帮助读者更好地理解 FL 中共享哪些信息,以及这些信息在模型实用性、隐私泄露和通信效率方面的影响。

Conditional expectation network for SHAP

  • paper_url: http://arxiv.org/abs/2307.10654
  • repo_url: None
  • paper_authors: Ronald Richman, Mario V. Wüthrich
  • for: 这个研究旨在提出一种能够有效地计算Conditional SHAP值的神经网络方法,以便在神经网络和其他回归模型中使用。
  • methods: 这种方法使用了SHAP技术,并且特别考虑了特征组件之间的依赖关系。
  • results: 这种方法可以准确地计算Conditional SHAP值,并且可以提供drop1和anova分析,以及一种考虑特征组件之间依赖关系的PDP图像。
    Abstract A very popular model-agnostic technique for explaining predictive models is the SHapley Additive exPlanation (SHAP). The two most popular versions of SHAP are a conditional expectation version and an unconditional expectation version (the latter is also known as interventional SHAP). Except for tree-based methods, usually the unconditional version is used (for computational reasons). We provide a (surrogate) neural network approach which allows us to efficiently calculate the conditional version for both neural networks and other regression models, and which properly considers the dependence structure in the feature components. This proposal is also useful to provide drop1 and anova analyses in complex regression models which are similar to their generalized linear model (GLM) counterparts, and we provide a partial dependence plot (PDP) counterpart that considers the right dependence structure in the feature components.
    摘要 非常流行的模型无关技术之一是SHapley Additive exPlanation(SHAP)。这两个版本是条件预期版本和无条件预期版本(后者也称为交互SHAP)。通常情况下,使用无条件版本(由于计算原因)。我们提供一种(代理)神经网络方法,可以高效计算条件版本,并且正确考虑特征组件之间的依赖关系。这种提议还有助于提供drop1和anova分析在复杂回归模型中,与其普通线性模型(GLM)对应的分析。此外,我们还提供了一种考虑特征组件依赖关系的PDP对应。

Refining the Optimization Target for Automatic Univariate Time Series Anomaly Detection in Monitoring Services

  • paper_url: http://arxiv.org/abs/2307.10653
  • repo_url: None
  • paper_authors: Manqing Dong, Zhanxiang Zhao, Yitong Geng, Wentao Li, Wei Wang, Huai Jiang
  • for: 这篇论文是为了提高时间序列异常探测的自动化化,以提高工业监控服务的可靠性和系统性能。
  • methods: 本文提出了一个全面的自动化parameter优化框架,包括三个优化目标:预测得分、形状得分和敏感度得分,这些目标可以轻松地适应不同的模型后段,无需专业知识或手动标注努力。
  • results: 本文的提案框架已经在线上进行了超过六个月的实际应用,处理了每分钟50,000多个时间序列,并简化了用户的体验,仅需要提供预期的敏感值,并且实现了适当的侦测结果。
    Abstract Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series anomaly detection models. The framework introduces three optimization targets: prediction score, shape score, and sensitivity score, which can be easily adapted to different model backbones without prior knowledge or manual labeling efforts. The proposed framework has been successfully applied online for over six months, serving more than 50,000 time series every minute. It simplifies the user's experience by requiring only an expected sensitive value, offering a user-friendly interface, and achieving desired detection results. Extensive evaluations conducted on public datasets and comparison with other methods further confirm the effectiveness of the proposed framework.
    摘要 时序列异常检测是对于工业监测服务而言是非常重要的,旨在确保可靠性并优化系统性能。现有的方法经常需要大量的标注资源和手动参数选择,这高亮了自动化的需求。这篇论文提出了一个完整的自动参数优化框架 для时序列异常检测模型。该框架引入了三个优化目标:预测得分、形态得分和敏感度得分,可以轻松地适应不同的模型背景而无需互知或手动标注努力。该提议的框架已经在线上运行了超过六个月,处理了每分钟50,000个时序列,并提供了一个易用的用户界面,以及达到了检测结果的所求的目标。对于公共数据集和其他方法进行了广泛的评估,并证实了提议的效果。

Data-Driven Latency Probability Prediction for Wireless Networks: Focusing on Tail Probabilities

  • paper_url: http://arxiv.org/abs/2307.10648
  • repo_url: https://github.com/samiemostafavi/wireless-pr3d
  • paper_authors: Samie Mostafavi, Gourav Prateek Sharma, James Gross
  • for: Ensuring end-to-end network latency with extremely high reliability (99.999%) in wireless networks, particularly for cyber-physical systems and human-in-the-loop applications.
  • methods: Using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to predict the tail of the latency distribution and estimate the likelihood of rare latencies conditioned on network parameters.
  • results: Benchmarking the proposed approaches using actual latency measurements of IEEE 802.11g (WiFi), commercial private, and a software-defined 5G network to evaluate their sensitivities concerning the tail probabilities.
    Abstract With the emergence of new application areas, such as cyber-physical systems and human-in-the-loop applications, there is a need to guarantee a certain level of end-to-end network latency with extremely high reliability, e.g., 99.999%. While mechanisms specified under IEEE 802.1as time-sensitive networking (TSN) can be used to achieve these requirements for switched Ethernet networks, implementing TSN mechanisms in wireless networks is challenging due to their stochastic nature. To conform the wireless link to a reliability level of 99.999%, the behavior of extremely rare outliers in the latency probability distribution, or the tail of the distribution, must be analyzed and controlled. This work proposes predicting the tail of the latency distribution using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to estimate the likelihood of rare latencies conditioned on the network parameters, which can be used to make more informed decisions in wireless transmission. Actual latency measurements of IEEE 802.11g (WiFi), commercial private and a software-defined 5G network are used to benchmark the proposed approaches and evaluate their sensitivities concerning the tail probabilities.
    摘要 随着新的应用领域的出现,如半导体物理系统和人工智能应用,需要保证端到端网络延迟在极高可靠性水平(99.999%)下达成。而根据IEEE 802.1as时间敏感网络(TSN)规范,可以用switched Ethernet网络达到这些要求。然而,在无线网络中实施TSN机制是困难的,因为无线链路的随机性。为了使 wireless link 达到99.999%的可靠性水平,需要分析和控制无线链路的很少极端延迟的行为,即延迟分布的尾部。本工作提议使用现有的数据驱动方法,如混合密度网络(MDN)和极值混合模型,来估计延迟分布的尾部,并用此来估计conditioned on 网络参数的罕见延迟的可能性。实际测量IEEE 802.11g(WiFi)、商业专用和软件定义5G网络的延迟值,用于评估和评测提议的敏感程度。

Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions

  • paper_url: http://arxiv.org/abs/2307.10644
  • repo_url: None
  • paper_authors: Frank Nielsen
  • for: 该论文旨在处理多变量正态分布集合,如扩散tensor成像、结构tensor计算机视觉、雷达信号处理、机器学习等领域的数据集。
  • methods: 该论文提出了一种快速和稳定的方法来 aproximate multivariate normal distributions的Fisher-Rao距离,以及一种基于几何映射的方法来定义正态分布之间的距离。
  • results: 该论文的结果表明,基于Fisher信息度量的Fisher-Rao距离可以很好地approximate multivariate normal distributions,而且 Computationally, the pullback Hilbert cone distance is much lighter than the Fisher-Rao distance approximation, since it only requires the extreme minimal and maximal eigenvalues of matrices. In addition, the paper shows how to use these distances in clustering tasks.
    Abstract Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
    摘要 数据集中的多变量正态分布非常普遍,例如扩散tensor成像、结构tensor计算机视觉、雷达信号处理、机器学习等。为了处理这些正态数据集,需要定义适当的不同量 zwischen normals和joinning它们的路径。 Fisher-Rao距离是一种原理的距离度量,但它没有固定形式,除了一些特殊情况之外。在这项工作中,我们首先报道了一种快速和稳定的方法来估算正态分布之间的Fisher-Rao距离。其次,我们引入一类基于正态映射的距离,该距离在高维正态区域上定义了一个正态分布的子集。我们展示了该距离在映射后的正态分布上是一个度量,并将其pullback到正态映射上,从而得到了一个距离和正态分布之间的缓解路径。与Fisher-Rao距离估算相比,pullback Hilbert cone距离是计算更轻量级的,因为只需计算正态分布的极小和最大特征值。最后,我们介绍了如何使用这些距离在分 clustering 任务中。

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10635
  • repo_url: https://github.com/mandyyyyii/scibench
  • paper_authors: Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang
  • for: This paper aims to evaluate the reasoning capabilities of large language models (LLMs) on complex scientific problem solving.
  • methods: The paper introduces an expansive benchmark suite called SciBench, which features two datasets: an open set of collegiate-level scientific problems and a closed set of undergraduate-level exams in computer science and mathematics. The authors evaluate the performance of two representative LLMs with various prompting strategies.
  • results: The results show that current LLMs have an overall score of merely 35.80% and make ten different types of errors. The authors find that no single prompting strategy significantly outperforms others, and some strategies that improve in certain problem-solving skills result in declines in other skills.
    Abstract Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.
    摘要 SciBench contains two datasets: an open set featuring collegiate-level scientific problems from mathematics, chemistry, and physics textbooks, and a closed set consisting of undergraduate-level exam problems in computer science and mathematics. We conduct an in-depth benchmark study of two representative LLMs using various prompting strategies, and find that current LLMs achieve only a 35.80% overall score.Through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis shows that no single prompting strategy consistently outperforms others, and some strategies that improve performance in one area lead to declines in other areas.We envision that SciBench will drive further advancements in the reasoning abilities of LLMs, ultimately contributing to scientific research and discovery.

Generative Language Models on Nucleotide Sequences of Human Genes

  • paper_url: http://arxiv.org/abs/2307.10634
  • repo_url: https://github.com/boun-tabi/generativelm-genes
  • paper_authors: Musa Nuri Ihtiyar, Arzucan Ozgur
    for:This paper focuses on developing an autoregressive generative language model for DNA sequences, specifically on the nucleotide sequences of human genes.methods:The authors use a systematic approach to examine the performance of different models, including RNNs and N-grams, and explore the use of real-life tasks beyond classical metrics such as perplexity.results:The study finds that RNNs perform the best, and that selecting a language with a minimal vocabulary size does not significantly reduce the amount of data needed.
    Abstract Language models, primarily transformer-based ones, obtained colossal success in NLP. To be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. Yet, the generative side of the coin is mainly unexplored to the best of our knowledge. Consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. Because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. This decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. First of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. How essential using real-life tasks beyond the classical metrics such as perplexity is observed. Furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. The reason for reviewing this was that choosing such a language might make the problem easier. However, what we observed in this study was it did not provide that much of a change in the amount of data needed.
    摘要 language models, primarily transformer-based ones, have achieved great success in NLP. to be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. however, the generative side of the coin is mainly unexplored to the best of our knowledge. consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. this decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. first of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. how essential using real-life tasks beyond classical metrics such as perplexity is observed. furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. the reason for reviewing this was that choosing such a language might make the problem easier. however, what we observed in this study was it did not provide that much of a change in the amount of data needed.

Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa

  • paper_url: http://arxiv.org/abs/2307.10633
  • repo_url: None
  • paper_authors: Shriyash K. Upadhyay, Etan J. Ginsberg
  • for: 该论文的目的是提出多方法自动训练(MMST),以增强语言模型的可用性和性能。
  • methods: 该论文使用了一种176B参数的语言和代码模型,并对其进行多方法自动训练,以便augment各种方法的优势和改善各种方法的缺陷。
  • results: 该论文显示,通过多方法自动训练,可以1)提高较弱的方法性能(最多30%),2)提高较强的方法性能(最多32.2%),3)提高相关 yet distinct tasks的性能(最多10.3%)。此外,论文还进行了ablation analyses,并发现MMST生成的数据量更大,但是性能提高的原因是多种方法的使用。
    Abstract Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths and ameliorate the weaknesses of each method. Using a 176B parameter model trained on both language and code, we show that MMST can 1) improve the less performant method (up to 30%) making the model easier to use, 2) improve the more performant method (up to 32.2%) making the model more performant, and 3) improve the performance of related but distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. We then conduct ablation analyses to explore why MMST works. We show that MMST generates more data than traditional self-training, but the improvement in performance is driven by the use of multiple methods. We also analyze prompt-engineering and anti-correlated performance between methods as means of making MMST more effective. We hope the evidence from our paper motivates machine learning researchers to explore ways in which advances in language models allow for new forms of training.
    摘要 大语言模型有许多方法解决同一问题,这引入了新的优势(不同的方法可能适用于不同的问题)和劣势(用户可能Difficult to determine哪种方法使用)。在这篇论文中,我们介绍了多种方法自我培训(MMST),其中一种方法在另一种方法的过滤输出上进行训练,从而可以增强每种方法的优势和缓解劣势。使用176亿参数模型,我们显示了以下三点:1. 使用MMST可以提高较弱的方法(最多30%),使模型更易用。2. 使用MMST可以提高较强的方法(最多32.2%),使模型更高效。3. 使用MMST可以提高相关而不同的任务(最多10.3%)的性能,通过提高模型生成合理性的能力。然后,我们进行了剥离分析,发现MMST生成了更多的数据,但是性能提高的原因是使用多种方法。我们还分析了引擎和反相性的作用,以便使MMST更有效。我们希望这篇论文的证据能够鼓励机器学习研究人员探索语言模型的新训练方法。

Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

  • paper_url: http://arxiv.org/abs/2307.10617
  • repo_url: None
  • paper_authors: Anusuya Baby Hari Krishnan
  • for: 本研究旨在提出一种机器学习模型,用于 indentifying 评论中的假评价(deceptive reviews),尤其是针对餐厅评论。
  • methods: 本研究采用了n-gram模型和max features技术来有效地识别假评价内容,并对五种不同的机器学习分类算法进行了比较。
  • results: 实验结果表明,使用了负面攻击分类器的方法可以达到最高的精度和假评价识别率。此外,研究还应用了深度学习技术来进一步提高假评价检测的效果。
    Abstract In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.
    摘要 现代数字景观中,在线评论已成为不同业务的不可或缺的工具。marketers、广告人和在线业务在推广自己或抹黑竞争对手的产品和服务的过程中,都发现了奖励创造假阳性评论的做法。因此,创造假评论已成为促进自己或抹黑竞争对手的不可避免的做法。检测这些假评论已成为研究的焦点之一。这篇研究论文提出了一种机器学习模型,用于识别假评论,特别是针对餐厅的评论。这项研究通过对知名的餐厅评论数据集(Deceptive Opinion Spam Corpus)进行多个实验,开发了ngram模型和最佳特征来有效地识别假内容,特别是假评论。进一步,这篇研究进行了两种不同的特征提取技术的比较,然后与五种不同的机器学习分类算法结合。实验结果表明,通过适应性分类器得到了最高的准确率,不仅在文本分类方面,还在识别假评论方面。此外,研究还探讨了数据扩充和深度学习技术,以进一步提高检测假评论的过程。研究结果突出了提议的机器学习方法的效果,并为在线业务中处理假评论提供了有价值的思路。

Heterogeneous Federated Learning: State-of-the-art and Research Challenges

  • paper_url: http://arxiv.org/abs/2307.10616
  • repo_url: https://github.com/marswhu/hfl_survey
  • paper_authors: Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao
  • for: 本文是一篇关于 federated learning(FL)在异步izable 环境下的报告,它们提出了在实际应用中遇到的多种挑战,以及现有的解决方案。
  • methods: 本文提出了一种新的分类方法,包括数据水平、模型水平和服务器水平的分类方法。此外,文章还提出了一些关键的未来研究方向。
  • results: 本文通过对多种研究挑战和现有的解决方案进行分析,提出了一些关键的未来研究方向,可以帮助进一步发展 Federated Learning 领域。I hope this helps! Let me know if you have any further questions.
    Abstract Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.
    摘要 随着联合学习(Federated Learning,FL)的应用范围扩大,它在大规模企业应用场景中受到了越来越多的关注。现有的联合学习研究主要集中在模型同质 Settings中。然而,实际的联合学习往往面临参与客户端的数据分布、模型架构、网络环境和硬件设备之间的差异。这种差异的联合学习(Heterogeneous Federated Learning,HFL)是更加复杂和多样化的,需要相应的研究挑战和解决方案。因此,一篇系统性的调查研究在这个领域是非常重要的。在本调查中,我们首先总结了HFL中不同方面的研究挑战,包括统计差异、模型差异、通信差异、设备差异以及其他挑战。此外,我们还进行了现有HFL方法的回顾,并提出了一种新的分类方法,根据HFL过程的三级层次:数据层、模型层和服务器层。最后,我们还讨论了未来研究的一些重要和优先的方向,以便进一步发展这一领域。关于HFL的相关研究可以通过https://github.com/marswhu/HFL_Survey查看更新的集成。

Flatness-Aware Minimization for Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.11108
  • repo_url: None
  • paper_authors: Xingxuan Zhang, Renzhe Xu, Han Yu, Yancheng Dong, Pengfei Tian, Peng Cu
  • for: 这篇研究旨在探讨领域扩展(Domain Generalization,DG)中的优化器选择问题。
  • methods: 本研究提出了一种新的方法——Flatness-Aware Minimization for Domain Generalization(FAD),可以有效地优化零项和首项的平坦性同时,以提高DG模型的适用范围。
  • results: 实验结果显示FAD在多种DG数据集上具有优越性,并且能够发现更平坦的极点,较其他零项和首项平坦性感知优化方法更好。
    Abstract Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.
    摘要 领域总结(DG)目标是学习能够适应未知分布变化的Robust模型。作为DG的关键方面,优化器选择尚未得到深入研究。现在,大多数DG方法采用DomainBed的标准准则,并使用Adam作为所有数据集的默认优化器。然而,我们发现Adam并不一定是大多数当前DG方法和数据集的优选优化器。基于损失函数地形的视角,我们提出了一种新的方法:适应性谱优化 для领域总结(FAD),可以高效地同时优化零次项和首次项的平坦性。我们提供了FAD的OOD泛化误差和收敛性的理论分析。我们的实验结果表明FAD在多个DG数据集上具有优秀的性能。此外,我们证明了FAD可以更好地找到平坦的极点,比其他零次项和首次项平坦性意识优化器更强。

Ensemble Learning based Anomaly Detection for IoT Cybersecurity via Bayesian Hyperparameters Sensitivity Analysis

  • paper_url: http://arxiv.org/abs/2307.10596
  • repo_url: None
  • paper_authors: Tin Lai, Farnaz Farid, Abubakar Bello, Fariza Sabrina
    for: 这 paper 的目的是提高 IoT 网络的安全性 via 异常检测。methods: 这 paper 使用了 ensemble 机器学习方法来提高异常检测的准确性,并使用 Bayesian 超参数优化来适应多种 IoT 传感器读数。results: 实验结果表明,这种方法在比较传统方法时有更高的预测力。
    Abstract The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detection. The heterogeneous nature of IoT is both a challenge and an opportunity for cybersecurity. Traditional approaches in cybersecurity monitoring often require different kinds of data pre-processing and handling for various data types, which might be problematic for datasets that contain heterogeneous features. However, heterogeneous types of network devices can often capture a more diverse set of signals than a single type of device readings, which is particularly useful for anomaly detection. In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection. Rather than using one single machine learning model, ensemble learning combines the predictive power from multiple models, enhancing their predictive accuracy in heterogeneous datasets rather than using one single machine learning model. We propose a unified framework with ensemble learning that utilises Bayesian hyperparameter optimisation to adapt to a network environment that contains multiple IoT sensor readings. Experimentally, we illustrate their high predictive power when compared to traditional methods.
    摘要 互联网智能设备(IoT)已经在全球范围内集成了数以十万计的智能设备,这些设备可以自动与其他相连的设备交换数据,而无需人类干预。IoT允许大规模数据聚合和分析,从而提高生活质量在多个领域。特别是数据收集到IoT中含有巨量数据异常检测信息。IoT网络设备的多样性同时是挑战和机遇,传统的网络安全监控方法通常需要不同类型的数据处理和处理,这可能会对具有多种特征的数据造成问题。然而,多种网络设备可以捕捉更多的信号,这对异常检测是特别有用。在这篇论文中,我们提出了一项总结性的研究,利用多个机器学习模型的 ensemble 方法提高 IoT 网络安全性 via 异常检测。而不是使用单一的机器学习模型,ensemble 学习可以将多种模型的预测力相互结合,在具有多种特征的数据集中提高预测精度。我们提议一种统一框架,利用 Bayesian 参数优化来适应包含多种 IoT 传感器读数的网络环境。实验表明,我们的方法具有高预测力,与传统方法相比。

Forecasting Battery Electric Vehicle Charging Behavior: A Deep Learning Approach Equipped with Micro-Clustering and SMOTE Techniques

  • paper_url: http://arxiv.org/abs/2307.10588
  • repo_url: None
  • paper_authors: Hanif Tayarani, Trisha V. Ramadoss, Vaishnavi Karanam, Gil Tal, Christopher Nitta
  • for: 降低排放和污染物对环境的影响,将交通领域电化。
  • methods: 使用人工神经网络算法,对BEV车辆的行程和充电资料进行预测。
  • results: 比较 benchmark 方法,MCDNN 能更好地预测 BEV 充电事件。
    Abstract Energy systems, climate change, and public health are among the primary reasons for moving toward electrification in transportation. Transportation electrification is being promoted worldwide to reduce emissions. As a result, many automakers will soon start making only battery electric vehicles (BEVs). BEV adoption rates are rising in California, mainly due to climate change and air pollution concerns. While great for climate and pollution goals, improperly managed BEV charging can lead to insufficient charging infrastructure and power outages. This study develops a novel Micro Clustering Deep Neural Network (MCDNN), an artificial neural network algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events, information that is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively. The MCDNN is configured using a robust dataset of trips and charges that occurred in California between 2015 and 2020 from 132 BEVs, spanning 5 BEV models for a total of 1570167 vehicle miles traveled. The numerical findings revealed that the proposed MCDNN is more effective than benchmark approaches in this field, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models in predicting the charging events.
    摘要 《能源系统、气候变化和公共卫生》是电动化交通的主要促进因素之一。由于气候变化和污染的关注,全球范围内的电动化交通吸引着广泛的推广。因此,许多汽车制造商即将停止生产内燃机油车,转而生产电池电动车(BEV)。在加利福尼亚州,BEV的采购率在增加,主要是由于气候变化和空气污染的问题。虽然这对气候和污染目标具有优秀的效果,但是不当管理BEV充电可能会导致充电基础设施不足和停电。这项研究开发了一种微型团集深度神经网络(MCDNN)算法,该算法可以高效地学习BEV的行驶和充电数据,以预测BEV的充电事件。MCDNN配置了加利福尼亚州2015-2020年间132台电动车的行驶记录,涵盖5种电动车型,共计1570167公里行驶。numerical发现,提出的MCDNN比各种参考方法,如支持向量机、最近邻居、决策树和其他神经网络模型在预测充电事件方面更有效。

A Holistic Assessment of the Reliability of Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2307.10586
  • repo_url: None
  • paper_authors: Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J. Kochenderfer
  • for: 本研究旨在评估机器学习系统的可靠性,以便在高度竞争的领域中提高系统的可靠性。
  • methods: 本研究提出了一种整体评估机器学习系统可靠性的方法,包括五个关键属性的评估:内部分布准确率、环境变化快速稳定性、针对性攻击快速稳定性、校准性和外部分布检测。
  • results: 研究人员通过使用提出的方法对500多个模型进行评估,发现不同的算法方法可以同时提高多个可靠性指标,而不是只是优先一个指标。这项研究为机器学习可靠性的全面理解和未来研发提供了一份路线图。
    Abstract As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of these systems can significantly diminish due to adversarial attacks or environmental changes, leading to overconfident predictions, failures to detect input faults, and an inability to generalize in unexpected scenarios. This paper proposes a holistic assessment methodology for the reliability of ML systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. A reliability score is also introduced and used to assess the overall system reliability. To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques, then evaluate a selection on real-world tasks using our proposed reliability metrics and reliability score. Our analysis of over 500 models reveals that designing for one metric does not necessarily constrain others but certain algorithmic techniques can improve reliability across multiple metrics simultaneously. This study contributes to a more comprehensive understanding of ML reliability and provides a roadmap for future research and development.
    摘要 machine learning (ML) 系统在高度重要的设置中如医疗、交通、军事和国家安全中越来越普遍,关于它们的可靠性的问题也得到了关注。尽管在进步方面做出了很大的进展,但是这些系统的性能可能会因为抗对抗攻击或环境变化而减退,导致过于自信的预测、输入错误的检测失败和不能适应意外的情况。本文提出了一种整体评估方法 для ML 系统的可靠性。我们的框架评估了五个关键属性:在输入数据集上的准确率、对输入数据集的变化robustness、对抗攻击的Robustness、calibration和对输入数据集之外的检测。我们还引入了一个可靠度分数,用于评估整体系统的可靠性。为了提供不同算法approach的性能分析,我们分类了现有的技术,然后使用我们的提出的可靠性指标和可靠度分数评估一些实际任务中的选择。我们的分析结果表明,不同的算法approach可以同时改善多个可靠性指标。这种研究对 ML 系统的可靠性进行了更全面的理解,并提供了未来研究和开发的道路图。

Intelligent model for offshore China sea fog forecasting

  • paper_url: http://arxiv.org/abs/2307.10580
  • repo_url: None
  • paper_authors: Yanfei Xiang, Qinghong Zhang, Mingqing Wang, Ruixue Xia, Yang Kong, Xiaomeng Huang
  • For: 预测海上雾霾的精准和准确性非常重要,以便有效管理海岸和海上经济活动。* Methods: 本研究使用机器学习方法,并在数值天气预测模型中嵌入,以解决海上雾霾预测的问题。在训练机器学习模型之前,我们使用时间延迟相关分析技术来 indentify关键预测因素,并解决海上雾霾出现的下面机制。* Results: 我们的机器学习基于方法在一年的测试数据上表现出色,超越了WRF-NMM和NOAA FSL的预测性能。具体来说,在预测海上雾霾视力低于或等于1公里的情况下,我们的方法在60小时前的预测中具有更高的检测可能性(POD)和更低的误风率(FAR)。
    Abstract Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using the Yangtze River Estuary (YRE) coastal area as a case study. Prior to training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and decipher the underlying mechanisms driving sea fog occurrence. In addition, we implement ensemble learning and a focal loss function to address the issue of imbalanced data, thereby enhancing the predictive ability of our model. To verify the accuracy of our method, we evaluate its performance using a comprehensive dataset spanning one year, which encompasses both weather station observations and historical forecasts. Remarkably, our machine learning-based approach surpasses the predictive performance of two conventional methods, the weather research and forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, in regard to predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, our methodology achieves superior results by increasing the probability of detection (POD) while simultaneously reducing the false alarm ratio (FAR).
    摘要 <>将文本翻译成简化中文。<> Effective sea fog prediction is crucial for managing maritime and coastal economic activities. However, traditional numerical and statistical forecasting methods often fall short due to the complex and inherently variable nature of sea fog. This study aims to develop an advanced sea fog forecasting method using a numerical weather prediction model, with the Yangtze River Estuary (YRE) coastal area as a case study. Before training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and understand the underlying mechanisms driving sea fog occurrence. Additionally, we use ensemble learning and a focal loss function to address the issue of imbalanced data, which enhances the predictive ability of our model. To evaluate the accuracy of our method, we use a comprehensive dataset spanning one year, which includes both weather station observations and historical forecasts. Our machine learning-based approach outperforms two conventional methods, the Weather Research and Forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, our methodology achieves better results in predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, with higher probability of detection (POD) and lower false alarm ratio (FAR).

SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning

  • paper_url: http://arxiv.org/abs/2307.10579
  • repo_url: None
  • paper_authors: Ziyao Ren, Yan Kang, Lixin Fan, Linghua Yang, Yongxin Tong, Qiang Yang
  • For: 这个研究目的是提出一个名为 Constrained Multi-Objective SecureBoost (CMOSB) 的算法,用于在阶层式联合学习中选择最佳的 SecureBoost 参数,以实现最佳的调解 между 功能损失、训练成本和隐私泄露。* Methods: 这个研究使用了 SecureBoost 算法,并将其与多bjective evolutionary algorithm (MOEA) 结合,以找到 Pareto 最佳解。另外,这个研究还提出了一个新的实例聚类攻击来量化隐私泄露。* Results: 实验结果显示,CMOSB 可以获得不只是基eline的优化参数,还可以找到最佳的参数集,以满足不同的 FL 参与者的需求。
    Abstract SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.
    摘要 secureboost是一种树融合算法,利用同质加密保护数据隐私在垂直联合学习设置下。它在金融和医疗等领域广泛应用,因为它具有可读性、效果和隐私保护能力。然而,secureboost受到高计算复杂性和标签泄露的风险。为了激活secureboost的全部潜力,secureboost的超参数应该仔细选择,以达到最佳的平衡点。现有的方法可以通过实验或规则来设置超参数,但这些方法远不够优化。为了填补这一空白,我们提出了一种受限multi-目标secureboost(CMOSB)算法,以找到Pareto优化解决方案,每个解决方案都是一组超参数,实现了Utility损失、训练成本和隐私泄露的优化平衡。我们设计了三个目标量表示。具体来说,隐私泄露被我们提出的实例划分攻击来度量。实验结果表明,CMOSB可以不仅提供超参数优于基准值,还可以找到优化的超参数集,以满足联合学习参与者的灵活要求。

Boosting Federated Learning Convergence with Prototype Regularization

  • paper_url: http://arxiv.org/abs/2307.10575
  • repo_url: None
  • paper_authors: Yu Qiao, Huy Q. Le, Choong Seon Hong
  • for: 这篇论文旨在提高 Federated Learning (FL) 中的模型性能,解决 Client 间资料不均匀问题。
  • methods: 本文提出了一种基于 Prototype 的调整策略,通过服务器将分布式 Client 的本地 Prototype 聚合成全局 Prototype,将其传回个别 Client 进行本地训练。
  • results: 实验结果显示,该方法在 MNIST 和 Fashion-MNIST 上得到了3.3% 和8.9% 的平均测试精度提升,相比最受欢迎的基于 FedAvg 的基eline。此外,本方法在不均匀环境下具有快速的整合速率。
    Abstract As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings.
    摘要 为了解决客户端数据不均匀性的问题,本文提出了一种基于原型的规范约束策略,用于在分布式机器学习中协同训练共享模型。具体来说,规范过程包括将分布在各客户端上的本地原型由服务器进行汇总,生成一个全局原型,然后将该全局原型发送回到各个客户端,以供本地训练指导。实验结果表明,与最常用的基准方法FedAvg相比,我们的方案在MNIST和Fashion-MNIST两个预测集上平均测试精度提高3.3%和8.9%。此外,我们的方法在不均匀设置下具有快速收敛的特点。

Deceptive Alignment Monitoring

  • paper_url: http://arxiv.org/abs/2307.10569
  • repo_url: None
  • paper_authors: Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 本研究旨在防止大机器学习模型的欺骗性行为,以及检测这些模型是否在不明确的目的下进行 modify 其行为。
  • methods: 本文提出了多个不同的机器学习子领域的研究方向,以检测和防止模型的欺骗性行为。
  • results: 本文认为,这些研究方向将在未来对检测和防止模型的欺骗性行为起到关键作用,并且将为敏捷机器学习社区带来新的研究机遇。
    Abstract As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.
    摘要 随着大机器学习模型的能力不断增长,以及这些模型的自主权力不断扩展,一个新的对手出现了:模型本身。这种威胁被称为“欺骗启动”(deceptive alignment)在AI安全与Alignment社区中。因此,我们将这一方向称为“欺骗启动监测”(Deceptive Alignment Monitoring)。在这种工作中,我们认为未来几年将成为更加重要和关键的方向,并且这些领域的进步将带来长期挑战和新的研究机遇。我们最终呼吁了对抗机器学习社区更加参与这些emerging方向。

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

  • paper_url: http://arxiv.org/abs/2307.10563
  • repo_url: None
  • paper_authors: Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 提高模型 robustness 和可见性,并应对 adversarial 攻击
  • methods: 基于 probablistic 和几何学的方法,探索 activation space 中 pseudo-class 的性质变化,找到 adversarial 攻击的源头
  • results: 提供了一种可靠的 anomaly detection 方法,可以帮助提高模型的安全性和可靠性,并应用于实际场景中
    Abstract We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
    摘要 我们介绍FACADE,一个新的机会概率和几何框架,用于无supervised机器学习领域中的机器学习过程中的非常�ynchronize攻击探测。其主要目的是提高机器学习模型的抗干扰能力,提高可扩展的模型监控,并在实际应用中展示了可靠的应用。FACADE的目标是生成机会概率分布 над circuit,从而获得 Pseudo-classes 或高维度模式在活动空间的变化特征,实现了强大的探测和抗干扰攻击的工具。我们的方法可以提高机器学习模型的类别Robustness,并且可以扩展到可扩展的模型监控。在实际应用中,FACADE 展示了可靠的应用。

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

  • paper_url: http://arxiv.org/abs/2307.10562
  • repo_url: None
  • paper_authors: Shaokui Wei, Mingda Zhang, Hongyuan Zha, Baoyuan Wu
  • for: 本研究探讨了如何使用小量净数据级联机器学习模型中的恶意攻击推议。
  • methods: 本研究使用了链接恶意风险和敌意风险的联系, derivates a novel upper bound for backdoor risk, 并提出了一种新的二级优化问题来 Mitigate backdoor 攻击。
  • results: 实验表明,我们提出的方法可以在不同的 benchmark 数据集和网络架构上达到 state-of-the-art 的性能。
    Abstract Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.
    摘要 <>本文探讨了使用小清洁数据集来纯化被攻击的模型的任务。我们发现了连接背门攻击风险和敌对攻击风险的关系,从而得出了一个新的背门风险Upper bound,它主要捕捉了由两个模型共享的敌对示例(SAEs)所带来的风险。基于这个Upper bound,我们提出了一种新的两级优化问题来 Mitigate 背门攻击。我们称之为共同敌对学习(SAU)。SAU首先生成SAEs,然后通过不正确地分类这些SAEs来减少背门效果在纯化后的模型中。实验结果表明,我们的提出的方法在多个benchmark数据集和网络架构上达到了背门防御的状态之 arts。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, I can provide that as well.

Post-variational quantum neural networks

  • paper_url: http://arxiv.org/abs/2307.10560
  • repo_url: None
  • paper_authors: Po-Wei Huang, Patrick Rebentrost
  • for: 本研究旨在提出一种使用混合量子-классические计算和变量算法来解决量子计算机器硬件不够发展的问题,并且提高量子模型优化的效率。
  • methods: 本研究使用了混合量子-классические计算和变量算法,并提出了“后变量策略”,即将调整参数从量子计算机器传递到类型计算机器上进行优化。 ensemble策略和构建个别量子电路的设计原则也被讨论。
  • results: 本研究表明,使用后变量策略可以提高量子模型的优化效率,并且可以应用于实际应用场景如手写字符识别,实现96%的分类精度。
    Abstract Quantum computing has the potential to provide substantial computational advantages over current state-of-the-art classical supercomputers. However, current hardware is not advanced enough to execute fault-tolerant quantum algorithms. An alternative of using hybrid quantum-classical computing with variational algorithms can exhibit barren plateau issues, causing slow convergence of gradient-based optimization techniques. In this paper, we discuss "post-variational strategies", which shift tunable parameters from the quantum computer to the classical computer, opting for ensemble strategies when optimizing quantum models. We discuss various strategies and design principles for constructing individual quantum circuits, where the resulting ensembles can be optimized with convex programming. Further, we discuss architectural designs of post-variational quantum neural networks and analyze the propagation of estimation errors throughout such neural networks. Lastly, we show that our algorithm can be applied to real-world applications such as image classification on handwritten digits, producing a 96% classification accuracy.
    摘要 量子计算具有提供现代классиical超级计算机substantial计算优势的潜力。然而,当前硬件还不够先进,无法执行 fault-tolerant 量子算法。作为alternative,我们可以使用hybrid量子-классиical计算,使用变量算法,但这会导致恶势垃圾板块问题,使得梯度基于优化技术的敏感度变慢。在这篇论文中,我们讨论了“后变量策略”,即将可调参数从量子计算机shift到类型计算机,选择ensemble策略来优化量子模型。我们讨论了各种策略和设计原则,用于构建个性化的量子Circuit,其结果可以通过凸型Programming优化。此外,我们还讨论了post-variational量子神经网络的建筑设计,并分析了估计错误在such神经网络中的传播。最后,我们示出了我们的算法可以应用于实际应用场景,如手写数字识别,并达到96%的分类精度。

Air Traffic Controller Workload Level Prediction using Conformalized Dynamical Graph Learning

  • paper_url: http://arxiv.org/abs/2307.10559
  • repo_url: https://github.com/ymlasu/para-atm-collection
  • paper_authors: Yutian Pang, Jueming Hu, Christopher S. Lieber, Nancy J. Cooke, Yongming Liu
  • for: 预测空交控制员(ATCo)的工作负担,以提高航空业务操作的安全性和空间利用率。
  • methods: 使用人类在Loop(HITL) simulations with retired ATCo,并对实际航空数据和工作负担标签进行分析。提议使用图表深度学习框架和协Forms预测ATCo工作负担水平。
  • results: 实验结果表明, besides 交通密度特征,交通冲突特征也对工作负担预测做出贡献(即最小水平/垂直分离距离)。 directly learning from空间时间图像的空间特征可以提高预测精度,比手工设计的交通复杂度特征更高。 conformal prediction 是一种有价值的工具,可以进一步提高预测精度,并生成一个范围内的预测工作负担标签。
    Abstract Air traffic control (ATC) is a safety-critical service system that demands constant attention from ground air traffic controllers (ATCos) to maintain daily aviation operations. The workload of the ATCos can have negative effects on operational safety and airspace usage. To avoid overloading and ensure an acceptable workload level for the ATCos, it is important to predict the ATCos' workload accurately for mitigation actions. In this paper, we first perform a review of research on ATCo workload, mostly from the air traffic perspective. Then, we briefly introduce the setup of the human-in-the-loop (HITL) simulations with retired ATCos, where the air traffic data and workload labels are obtained. The simulations are conducted under three Phoenix approach scenarios while the human ATCos are requested to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis is conducted. Next, we propose a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels. The code used is available at \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Prediction/}{$\mathsf{Link}$}.
    摘要 空交控制(ATC)是一个安全关键的服务系统,需要地面空交控制员(ATCo)不断注意以维护每天的航空运输业务。ATCo的工作负担可能会对操作安全和空域使用产生负面影响。为了避免过载和确保ATCo的工作负担水平接受,需要准确预测ATCo的工作负担。在这篇论文中,我们首先进行了研究人员对ATCo工作负担的评估,主要来自空交 perspective。然后,我们 briefly introduce了人在Loop(HITL) simulations with retired ATCos,其中获取了空交数据和工作负担标签。 simulations were conducted under three Phoenix approach scenarios, while the human ATCos were asked to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis was conducted. Next, we proposed a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature also contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compared to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting in a range of predicted workload labels. The code used is available at $\mathsf{Link}$.

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

  • paper_url: http://arxiv.org/abs/2307.10550
  • repo_url: https://github.com/0913ktg/sc_vall-e
  • paper_authors: Daegyeom Kim, Seongho Hong, Yong-Hoon Choi
  • For: The paper is written to propose a style control (SC) VALL-E model for expressive speech synthesis, which can generate diverse voices with controllable attributes such as emotion, speaking rate, pitch, and voice intensity.* Methods: The SC VALL-E model is based on the neural codec language model (VALE) and the generative pretrained transformer 3 (GPT-3), and it uses a newly designed style network to control the attributes of the generated speech. The model takes input from text sentences and prompt audio and is trained to generate controllable speech that can mimic the characteristics of the prompt audio.* Results: The paper conducts comparative experiments with three representative expressive speech synthesis models and measures word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics. The results show that the SC VALL-E model demonstrates competitive performance compared to the existing models and can generate a variety of expressive sounds with controllable attributes.
    Abstract Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub.
    摘要 干支表达 synthesis 模型通常通过添加具有多个说话者、不同情感和不同说话风格的文本 corpus 来训练,以控制不同特性的语音并生成感兴趣的声音。在这篇文章中,我们提出了一种基于 neural codec 语言模型(称为 VALL-E)的风格控制(SC) VALLE 模型。我们在 VALL-E 的结构上添加了一个新的风格网络,并在这个风格网络中标识了不同的特征表达,如情感、说话速度、音高和声音强度。我们设计了一个可控制这些特征的模型。为评估 SC VALL-E 的表现,我们进行了与三种常见的表达性语音合成模型进行比较:global style token(GST) Tacotron2、variational autoencoder(VAE) Tacotron2 和原始 VALL-E。我们使用 word error rate(WER)、F0 voiced error(FVE)和 F0 gross pitch error(F0GPE)作为评估 metric。为比较生成的语音质量,我们使用 comparative mean option score(CMOS)和 similarity mean option score(SMOS)。为评估生成的语音风格控制能力,我们观察了 F0 和 mel-spectrogram 的变化。当使用不在训练数据中的提示音时,SC VALL-E 能够生成多种表达性的声音,并与现有模型相比具有竞争力。我们的实现、预训练模型和声音样本位于 GitHub。

Differentially Flat Learning-based Model Predictive Control Using a Stability, State, and Input Constraining Safety Filter

  • paper_url: http://arxiv.org/abs/2307.10541
  • repo_url: https://github.com/utiasdsl/fmpc_socp
  • paper_authors: Adam W. Hall, Melissa Greeff, Angela P. Schoellig
    for: learning-based optimal control algorithms for unknown systemsmethods: exploits differential flatness, nonlinear transformation learned as a Gaussian process, safety filter, two successive convex optimizationsresults: similar performance to state-of-the-art learning-based controllers, significantly better computational efficiency, respects flat state and input constraints, guarantees stability
    Abstract Learning-based optimal control algorithms control unknown systems using past trajectory data and a learned model of the system dynamics. These controllers use either a linear approximation of the learned dynamics, trading performance for faster computation, or nonlinear optimization methods, which typically perform better but can limit real-time applicability. In this work, we present a novel nonlinear controller that exploits differential flatness to achieve similar performance to state-of-the-art learning-based controllers but with significantly less computational effort. Differential flatness is a property of dynamical systems whereby nonlinear systems can be exactly linearized through a nonlinear input mapping. Here, the nonlinear transformation is learned as a Gaussian process and is used in a safety filter that guarantees, with high probability, stability as well as input and flat state constraint satisfaction. This safety filter is then used to refine inputs from a flat model predictive controller to perform constrained nonlinear learning-based optimal control through two successive convex optimizations. We compare our method to state-of-the-art learning-based control strategies and achieve similar performance, but with significantly better computational efficiency, while also respecting flat state and input constraints, and guaranteeing stability.
    摘要 学习基于的优化控制算法可以控制未知系统使用过去轨迹数据和学习到系统动力学模型。这些控制器使用 линей化学习动力学模型,换取更快的计算速度,或者非线性优化方法,通常表现更好,但可能限制实时应用。在这种工作中,我们提出了一种新的非线性控制器,利用差分平凡性来实现与现有学习基于控制策略相似的性能,但计算效率明显更高。差分平凡性是动力系统的性质,通过非线性输入映射来将非线性系统 Linearize。在这里,非线性变换是通过 Gaussian Process 学习的,并用于安全筛选器,保证高概率稳定性和输入和平凡状态约束的满足。这个安全筛选器然后用于改进由平凡模型预测控制器输出的输入,通过两次 convex 优化来实现受限制的非线性学习基于优化控制。我们与当前学习基于控制策略进行比较,实现相似的性能,但计算效率明显更高,同时也遵守平凡状态和输入约束,并保证稳定性。

The Extractive-Abstractive Axis: Measuring Content “Borrowing” in Generative Language Models

  • paper_url: http://arxiv.org/abs/2307.11779
  • repo_url: None
  • paper_authors: Nedelina Teneva
  • for: 本研究旨在探讨生成模型的抽象性和内容授权问题,并提出了EXTRACTIVE-ABSTRACTIVE轴来评估生成模型。
  • methods: 本研究使用了生成模型对文本数据进行生成和抽象,并对生成结果进行评估。
  • results: 研究发现,生成模型的抽象性和内容授权问题需要更加重视,并提出了对生成模型的评估指标、数据集和注解指南。
    Abstract Generative language models produce highly abstractive outputs by design, in contrast to extractive responses in search engines. Given this characteristic of LLMs and the resulting implications for content Licensing & Attribution, we propose the the so-called Extractive-Abstractive axis for benchmarking generative models and highlight the need for developing corresponding metrics, datasets and annotation guidelines. We limit our discussion to the text modality.
    摘要 <>将文本转换为简化中文。<>生成语言模型会生成高度抽象的输出,与搜索引擎的EXTRACTIVE responses不同,这种特点对内容授权和归功有重要影响。我们提出EXTRACTIVE-ABSTRACTIVE轴来评估生成模型,并需要开发相应的指标、数据集和注释指南。我们只限制于文本 modal。

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

  • paper_url: http://arxiv.org/abs/2307.10529
  • repo_url: None
  • paper_authors: Xueying Ding, Yue Zhao, Leman Akoglu
  • for: 这篇论文主要针对无监督的偏差检测(Outlier Detection,OD)中的问题,即有效地调整(Hyperparameter,HP)的选择和优化。
  • methods: 本文提出了一个名为HYPER的方法,它通过设计和训练一个新的对应网络(Hypernetwork,HN),将 HP 映射到OD模型的优化参数。此外,HYPER还使用了元学习来训练一个代理验证函数,以有效地 validate OD 模型。
  • results: 实验结果显示,HYPER 在 35 个 OD 任务上实现了高性能,并与 8 个基eline 比较得到了显著的效率优势。
    Abstract Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.
    摘要

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

  • paper_url: http://arxiv.org/abs/2307.10524
  • repo_url: None
  • paper_authors: Tongxin Li, Yiheng Lin, Shaolei Ren, Adam Wierman
  • for: 这个论文旨在研究在单轨迹 Markov决策过程(MDP)中的一致性和可靠性的负担关系,并在不可信 advise 的情况下进行研究。
  • methods: 该论文使用 Q-值建议来研究一致性和可靠性的负担关系,并在普通 MDP 模型中包括了连续和离散状态/动作空间。
  • results: 研究结果表明,通过使用 Q-值建议,可以在不可信 advise 的情况下实现近似优化的性能保证,并且比靠solely black-box advise 可以获得更高的性能。
    Abstract We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
    摘要 我们研究了在单轨时变Markov决策过程(MDP)中的一致性和可靠性的贸易。我们的工作与传统途径不同,即对建议视为黑盒来源的做法。相反,我们考虑了一种情况,在该情况下,建议的生成方式具有更多的信息。我们证明了一种首次的一致性和可靠性贸易,基于Q值建议在通用MDP模型中,该模型包括连续和离散状态/动作空间。我们的结果表明,通过利用Q值建议,可以在机器学习建议和一个可靠基础线上动态追求更好的性能,从而获得优化的性能保证,这与黑盒建议 alone 无法达到。

Prediction of Handball Matches with Statistically Enhanced Learning via Estimated Team Strengths

  • paper_url: http://arxiv.org/abs/2307.11777
  • repo_url: None
  • paper_authors: Florian Felice, Christophe Ley
  • for: 预测手球赛事
  • methods: 使用Statistically Enhanced Learning(SEL)模型,并与现有模型进行比较,以评估其性能能力
  • results: 模型的准确率高于80%,并且通过可解释方法提供了有价值的统计和预测性能分析,有助于手球队教练提前准备比赛。
    Abstract We propose a Statistically Enhanced Learning (aka. SEL) model to predict handball games. Our Machine Learning model augmented with SEL features outperforms state-of-the-art models with an accuracy beyond 80%. In this work, we show how we construct the data set to train Machine Learning models on past female club matches. We then compare different models and evaluate them to assess their performance capabilities. Finally, explainability methods allow us to change the scope of our tool from a purely predictive solution to a highly insightful analytical tool. This can become a valuable asset for handball teams' coaches providing valuable statistical and predictive insights to prepare future competitions.
    摘要 我们提出了一个统计增强学习(简称 SEL)模型,用于预测手球比赛。我们的机器学习模型,通过添加 SEL 特征,超过了现状最佳模型的准确率80%。在这项工作中,我们介绍了如何使用过去女子俱乐部比赛数据来训练机器学习模型。然后,我们比较了不同的模型,并评估它们的性能能力。最后,可视化方法使我们的工具从一种仅仅是预测解决方案转化为一种具有高度探索性的分析工具,这将成为手球队教练的宝贵统计和预测信息,以准备未来的比赛。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation

  • paper_url: http://arxiv.org/abs/2307.10507
  • repo_url: None
  • paper_authors: Minghui Chen, Meirui Jiang, Qi Dou, Zehua Wang, Xiaoxiao Li
  • for: 本研究旨在提高分布式学习(Federated Learning,FL)中模型的通用性和全局性,解决当面临分布shift时现有FL算法的负面效果。
  • methods: 我们提出了一种新的联邦模型汤(Federated Model Soup,FMS)方法,通过在联邦训练阶段对本地和全局模型进行选择性 interpolate 来优化本地和全局性之间的负面效果。
  • results: 我们在Retinal和病理图像分类任务上评估了我们的方法,并实现了显著提高对于非典型数据的泛化性。代码可以在https://github.com/ubc-tea/FedSoup中找到。
    Abstract Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers such as hospitals and clinical research laboratories. However, recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. Specifically, personalized FL methods have a tendency to overfit to local data, leading to a sharp valley in the local model and inhibiting its ability to generalize to out-of-distribution data. In this paper, we propose a novel federated model soup method (i.e., selective interpolation of model parameters) to optimize the trade-off between local and global performance. Specifically, during the federated training phase, each client maintains its own global model pool by monitoring the performance of the interpolated model between the local and global models. This allows us to alleviate overfitting and seek flat minima, which can significantly improve the model's generalization performance. We evaluate our method on retinal and pathological image classification tasks, and our proposed method achieves significant improvements for out-of-distribution generalization. Our code is available at https://github.com/ubc-tea/FedSoup.
    摘要 跨存储板 federated learning (FL) 可以开发机器学习模型在数据中心如医院和临床研究实验室等地的数据集上。然而,最近的研究发现,当面临分布变化时,当前的 FL 算法面临一种本地和全球性能之间的负权补偿。特别是,个性化 FL 方法有偏向本地数据过拟合的倾向,导致本地模型呈锐降谷,阻碍其在不同数据集上的泛化性能。在本文中,我们提出一种新的联邦模型汤 soup 方法(即选择性 interpolate 模型参数),以优化本地和全球性能之间的负权补偿。具体来说,在联邦训练阶段,每个客户端都会维护自己的全球模型池,并在本地和全球模型之间进行选择性 interpolate 模型参数。这有助于解决过拟合问题,寻找平降谷,可以显著提高模型的泛化性能。我们在Retinal和病理图像分类任务上评估了我们的方法,并取得了显著的外部数据集泛化性能改进。我们的代码可以在https://github.com/ubc-tea/FedSoup 中找到。

Identifying Interpretable Subspaces in Image Representations

  • paper_url: http://arxiv.org/abs/2307.10504
  • repo_url: None
  • paper_authors: Neha Kalibhat, Shweta Bhardwaj, Bayan Bruss, Hamed Firooz, Maziar Sanjabi, Soheil Feizi
  • for: 本文旨在解释图像表示的特征,提高图像表示的可解释性。
  • methods: 本文使用了对比概念(Contrasting Concepts)来解释图像表示的特征。首先,使用大量captioning数据集(如LAION-400m)和预训练的视觉语言模型(如CLIP)来生成特征的描述。然后,对每个描述语言进行分数和排名,从而得到少量共享的人类可理解的概念,它们准确地描述了目标特征。此外,本文还使用了对比解释,使用低活跃图像(counterfactual)来消除幻觉的概念。
  • results: 研究发现,许多现有的方法只能独立解释特征的一部分,但是使用FALCON可以解释大型表示空间中的特征,并且可以通过高阶分数来解释特征。此外,本文还提出了一种将概念从一个可解释的表示空间传递到另一个未知表示空间的技术。
    Abstract We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.
    摘要 我们提出了自动Feature解释使用对比概念(FALCON)的解释框架,用于解释图像表示中的特征。为目标特征,FALCON使用大量captioningdataset(如LAION-400m)和预训练的视觉语言模型(如CLIP)来caption高度活跃的裁剪图像。每个单词在caption中被分数和排名,导致一小数量的共享、人类可理解的概念,准确地描述目标特征。此外,FALCON还使用对比解释使用低活跃(counterfactual)图像,以消除幻数概念。现有的方法大多解释特征独立,但我们发现在当今的自然语言和指导下的模型中, Less than 20% of the representation space can be explained by individual features。我们表明,在更大的空间中,特征在组合 изуча时变得更加解释,可以通过高级分数概念来解释。我们还讨论了抽取的概念如何用于解释和调试下游任务的失败。最后,我们提出了一种将概念从一个可解释的表示空间传输到另一个未知表示空间的学习简单线性变换的技术。

A Competitive Learning Approach for Specialized Models: A Solution for Complex Physical Systems with Distinct Functional Regimes

  • paper_url: http://arxiv.org/abs/2307.10496
  • repo_url: https://github.com/exploita123/charmedforfree
  • paper_authors: Okezzi F. Ukorigho, Opeoluwa Owoyele
  • for: 该文章是为了提出一种新的竞争学习方法,用于获取基于数据的物理系统模型。
  • methods: 该方法使用动态损失函数,让一组模型同时在数据上进行训练,以便在数据中发现不同的功能 режи度。
  • results: 实验结果表明,该方法可以成功地发现功能 régime,找到真正的管理方程,并减少测试错误。
    Abstract Complex systems in science and engineering sometimes exhibit behavior that changes across different regimes. Traditional global models struggle to capture the full range of this complex behavior, limiting their ability to accurately represent the system. In response to this challenge, we propose a novel competitive learning approach for obtaining data-driven models of physical systems. The primary idea behind the proposed approach is to employ dynamic loss functions for a set of models that are trained concurrently on the data. Each model competes for each observation during training, allowing for the identification of distinct functional regimes within the dataset. To demonstrate the effectiveness of the learning approach, we coupled it with various regression methods that employ gradient-based optimizers for training. The proposed approach was tested on various problems involving model discovery and function approximation, demonstrating its ability to successfully identify functional regimes, discover true governing equations, and reduce test errors.
    摘要 科学和工程中的复杂系统有时会展现不同的行为方式,传统的全球模型很难捕捉这些复杂的行为范围,这限制了它们的准确性。为应对这个挑战,我们提议一种新的竞争学习方法,通过在数据上同时训练多个模型,并使用动态损失函数来让每个模型在训练过程中竞争对每个观察结果。这会使得模型能够成功地识别数据集中的不同功能 режи度。为证明该学习方法的效果,我们将其与不同的回归方法结合使用,这些回归方法使用梯度基于优化器进行训练。我们在各种模型发现和函数近似问题中测试了该方法,并证明了它可以成功地识别功能 режи度,发现真正的管理方程和降低测试错误。

Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets

  • paper_url: http://arxiv.org/abs/2307.10495
  • repo_url: https://github.com/chapman20j/sar_bal
  • paper_authors: James Chapman, Bohan Chen, Zheng Tan, Jeff Calder, Kevin Miller, Andrea L. Bertozzi
  • for: 这个论文主要针对的是sequential active learning方法在Synthetic Aperture Radar(SAR)数据集上的应用和提高。
  • methods: 这篇论文提出了一种新的两部分方法,包括Dijkstra的 Annulus Core-Set(DAC)和LocalMax,以便批处理活动学习。
  • results: 根据实验结果,这种批处理活动学习方法可以与sequential active learning方法几乎达到同样的准确率,但是更高效,与批处理大小成比例。此外,该方法在classify FUSAR-Ship和OpenSARShip datasets时达到了状态平台CNN-based方法的性能。
    Abstract Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods.
    摘要 To address this challenge, we developed a novel two-part approach for batch active learning, consisting of Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size.As an application, we built a pipeline based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms state-of-the-art CNN-based methods.Here is the translation in Simplified Chinese:活动学可以提高机器学习方法的性能,通过选择一个有限数量的未标注数据点,并将其用于标注,以最大化下面的类ifier表现。最近,有关synthetic aperture radar(SAR)数据的进步已经在arXiv:2204.0005中进行。在每个迭代中,sequential active learning选择一个查询集合,而批处活动学选择多个数据点的查询集合。虽然批处活动学方法更高效,但是保持模型准确性与sequential active learning方法相比是一大挑战。我们开发了一种新的、两部分的批处活动学方法,包括Dijkstra的Annulus Core-Set(DAC)和LocalMax。这种批处活动学过程结合DAC和LocalMax可以实现与sequential active learning方法准确性相似,但是更高效,即与批处大小相关。作为应用,我们建立了基于传输学习特征嵌入、图学习、DAC和LocalMax的管道,用于分类FUSAR-Ship和OpenSARShip数据集。我们的管道超过了基于Convolutional Neural Networks(CNN)的状态控制方法。

Blockchain-Based Federated Learning: Incentivizing Data Sharing and Penalizing Dishonest Behavior

  • paper_url: http://arxiv.org/abs/2307.10492
  • repo_url: None
  • paper_authors: Amir Jaberzadeh, Ajay Kumar Shrestha, Faijan Ahamad Khan, Mohammed Afaan Shaikh, Bhargav Dave, Jason Geng
  • for: 本研究旨在提供一个整合数据信任的联邦学习框架,以便在多方合作下进行安全且公正的数据分享,并提供激励、存取控制机制和处罚不正行为。
  • methods: 本研究使用了InterPlanetary File System、区块链和智能合约来实现安全且可靠的数据分享,并将数据信任 integrate into federated learning,以提高联邦学习模型的准确性。
  • results: 实验结果显示,提案的模型能够提高联邦学习模型的准确性,并确保数据分享过程中的安全和公正。此外,研究者还发展了一个基于区块技术的分散式机器学习平台,能够在多方合作下训练 CNN 模型,并维护数据隐私和安全。
    Abstract With the increasing importance of data sharing for collaboration and innovation, it is becoming more important to ensure that data is managed and shared in a secure and trustworthy manner. Data governance is a common approach to managing data, but it faces many challenges such as data silos, data consistency, privacy, security, and access control. To address these challenges, this paper proposes a comprehensive framework that integrates data trust in federated learning with InterPlanetary File System, blockchain, and smart contracts to facilitate secure and mutually beneficial data sharing while providing incentives, access control mechanisms, and penalizing any dishonest behavior. The experimental results demonstrate that the proposed model is effective in improving the accuracy of federated learning models while ensuring the security and fairness of the data-sharing process. The research paper also presents a decentralized federated learning platform that successfully trained a CNN model on the MNIST dataset using blockchain technology. The platform enables multiple workers to train the model simultaneously while maintaining data privacy and security. The decentralized architecture and use of blockchain technology allow for efficient communication and coordination between workers. This platform has the potential to facilitate decentralized machine learning and support privacy-preserving collaboration in various domains.
    摘要 随着数据共享的重要性增加,保证数据的安全和可靠性变得越来越重要。数据治理是一种常见的数据管理方式,但它面临着数据孤岛、数据一致性、隐私、安全和访问控制等挑战。为了解决这些挑战,这篇论文提出了一个涵盖数据信任的 federated learning 框架,并与 InterPlanetary File System、区块链和智能合约结合,实现安全和互惠的数据分享,并提供了奖励、访问控制机制和惩戒任何不诚实行为。实验结果表明,提议的模型能够提高 federated learning 模型的准确率,同时保障数据分享的安全性和公平性。论文还描述了一个基于区块链技术的分布式 federated learning 平台,可以同时训练多个工作者的 CNN 模型,并保持数据隐私和安全性。该平台的分布式架构和使用区块链技术,可以实现高效的通信和协调。这种平台具有推动分布式机器学习和保持隐私协作的潜在潜力。

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

  • paper_url: http://arxiv.org/abs/2307.10490
  • repo_url: https://github.com/ebagdasa/multimodal_injection
  • paper_authors: Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov
  • for: 用图像和声音进行间接提示和指导 injection 攻击。
  • methods: 生成攻击者选择的抗干扰噪音或图像,并将其混合到原始模型中。
  • results: 当用户问题 benign 模型关于干扰后的图像或声音时,攻击者可以通过控制模型的输出文本和对话流来实现攻击。
    Abstract We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
    摘要 我们示示如何使用图像和声音进行间接提示和指令注入在多modal LLMS中。攻击者创造了这些提示的攻击扰动,并与图像或音频录音混合在一起。当用户对(未修改、良好)模型询问这些混合过的图像或音频时,攻击扰动将使模型输出攻击者选择的文本和/或导致后续对话按照攻击者的指令进行。我们透过多个证明例子,证明这种攻击可以对LLLaVa和PandaGPT进行。

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

  • paper_url: http://arxiv.org/abs/2307.10488
  • repo_url: https://github.com/thakur-nandan/sprint
  • paper_authors: Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin
  • for: 这个论文主要是为了提供一个统一的 Python 工具集(SPRINT),用于评估基于含义搜索的神经稀缺检索模型。
  • methods: 这个论文使用了 Pyserini 和 Lucene 等工具来实现一个通用的接口,支持多种基于神经网络的稀缺检索模型。用户可以轻松地添加自己的定制模型,只需要定义权重方法即可。
  • results: 根据 authors 的实验结果,SPRINT 工具集可以在 BEIR 等 benchmark 上建立强大且可重复的零批稀缺检索基准。其中,SPLADEv2 模型在 BEIR 上的平均得分为 0.470 nDCG@10,胜过其他神经稀缺检索模型。 authors 还发现,SPLADEv2 生成的稀缺表示可以帮助其取得表现提升,其中大多数的字符出现在原始查询和文档之外。
    Abstract Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here at https://github.com/thakur-nandan/sprint.
    摘要 传统上,稀疏搜寻系统从字词表现来进行文档搜寻,如BM25,对于搜寻任务产生了重要影响。随着预训 трансформа器模型,如BERT,神经稀疏搜寻带来了一个新的时代。不过,有限的软件支持不同的稀疏模型在一个共同环境中运行,导致实践者很难比较不同的稀疏模型,并获得实际的评估结果。另外,许多先前的工作仅对内部过滤进行评估,即在MS MARCO上进行内部过滤。但是,实际搜寻系统中需要模型能够对未见过的零数据类型进行推导,这是一个重要的需求。在这个研究中,我们提供了SPRINT,一个基于Pyserini和Lucene的Python工具组,支持一个共同的界面,用于评估神经稀疏搜寻。工具组目前包括五个内置模型:uniCOIL、DeepImpact、SPARTA、TILDEv2和SPLADEv2。用户可以轻松地添加自己定义的条件评估方法。使用我们的工具组,我们建立了强大且可重现的零数据类型神经稀疏搜寻基准,并在BEIR上取得了最好的平均分为0.470 nDCG@10。在这个研究中,我们进一步探索了SPLADEv2的表现原因,发现它生成的稀疏表现中,大多数的字词位于原始查询和文档之外,这经常是其表现优化的关键。我们提供了我们在这个研究中使用的SPRINT工具组、模型和数据,可以在https://github.com/thakur-nandan/sprint上取得。

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10485
  • repo_url: https://github.com/ai4finance-foundation/fingpt
  • paper_authors: Xiao-Yang Liu, Guoxuan Wang, Daochen Zha
    for:FinGPT aims to democratize Internet-scale financial data for large language models (LLMs) to revolutionize the finance industry.methods:FinGPT introduces an open-sourced and data-centric framework that automates the collection and curation of real-time financial data from diverse sources on the Internet.results:FinGPT provides researchers and practitioners with accessible and transparent resources to develop their FinLLMs, and demonstrates several applications including robo-advisor, sentiment analysis for algorithmic trading, and low-code development.
    Abstract Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available (quite small size), and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, \textit{Financial Generative Pre-trained Transformer (FinGPT)}, that automates the collection and curation of real-time financial data from >34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from open-source general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes are available at https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation/FinNLP
    摘要 大型自然语言模型(LLM)已经表现出了人类语言理解和生成的很高水平,这可能会革命化金融业。然而,现有的LLM在金融领域经常缺乏表现,这主要归结于通用文本数据和金融文本数据之间的差异。尽管只有有限的金融文本数据集available(数据集较小),而BloombergGPT,首个金融LLM(FinLLM),则是关闭源的(只发布了训练日志)。为了普及互联网级金融数据 для LLM,这是一个开放的挑战,因为数据来源多样化、信号噪声比较低和时间有效性很高。为了解决这些挑战,我们提出了一个开源和数据中心的框架,名为金融生成预训练变换器(Financial Generative Pre-trained Transformer,FinGPT)。FinGPT自动收集和整理互联网上 >34 个不同来源的实时金融数据,为研究人员和实践者提供了可 accessible 和 transparent 的资源,以便开发自己的FinLLM。此外,我们还提出了一种简单 yet effective的RLSP(市场反馈强化学习)策略,可以通过市场的自然反馈来训练FinLLM。此外,我们采用了LoRA(低级适应)方法,允许用户自定义自己的FinLLM,从开源通用自然语言模型(NLM)中获得优秀的性能,而不需要大量的人工调整。FinGPT还提供了多种应用,包括智能投资、情感分析 для算法交易和低代码开发。FinGPT的代码可以在 上下载。FinGPT的目标是普及FinLLM,促进创新,并在开放金融中解锁新的机会。

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

  • paper_url: http://arxiv.org/abs/2307.10472
  • repo_url: None
  • paper_authors: Omkar Dige, Jacob-Junqi Tian, David Emerson, Faiza Khan Khattak
  • for: 评估语言模型中的社会偏见
  • methods: 采用零批示评估语言模型的偏见识别能力
  • results: 结果显示,通过训练 Alpaca 7B 模型,可以达到 56.7% 的准确率,并且规模和数据多样性的扩展可能会带来更好的表现。
    Abstract As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
    摘要 随着语言模型应用的广度和深度快速扩展,现在越来越重要地建立高效的语言模型偏见评估和mitigation的框架。在这篇论文中,我们介绍了我们对适用于各种各样的引入语言模型的偏见识别能力进行评估的方法,包括链条(Chain-of-Thought)提示。在LLaMA和其两个 instrucion fine-tuned 版本中,Alpaca 7B 在偏见识别任务上表现最好,准确率为 56.7%。我们还证明了通过增加 LLVM 大小和数据多样性可以实现更大的性能提升。这是我们偏见 mitigation 框架的首个组成部分,我们会继续更新这个工作,以获得更多的结果。

Classification of Visualization Types and Perspectives in Patents

  • paper_url: http://arxiv.org/abs/2307.10471
  • repo_url: https://github.com/tibhannover/patentimageclassification
  • paper_authors: Junaid Ahmed Ghauri, Eric Müller-Budack, Ralph Ewerth
  • for: 本研究旨在提高套件申请检索和浏览的效率,通过使用不同类型的视觉化和视角来显示创新的细节。
  • methods: 本研究使用了现代深度学习方法,包括变换器,进行图像类型和视角的分类。我们也对CLEF-IP dataset进行扩展,并提供了手动标注的ground truth。
  • results: 实验结果表明了提案的方法的可行性。我们将源代码、模型和数据集公开发布。
    Abstract Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
    摘要 In this paper, we employ state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We expand the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. Furthermore, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. The source code, models, and dataset will be publicly available.

Properties of Discrete Sliced Wasserstein Losses

  • paper_url: http://arxiv.org/abs/2307.10352
  • repo_url: None
  • paper_authors: Eloi Tanguy, Rémi Flamary, Julie Delon
  • for: 这个论文主要研究了 $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z) $ 的属性和优化性,其中 $\gamma_Y $ 和 $\gamma_Z $ 是两个 uniform 抽象概率分布。
  • methods: 这篇论文使用了多种方法,包括研究 $\mathcal{E} $ 的正则性和优化性,以及其 Monte-Carlo 采样 $\mathcal{E}_p $ 的渐近稳定性和 almost-sure uniform 收敛性。
  • results: 研究结果表明,在某些情况下,Stochastic Gradient Descent 方法可以减少 $\mathcal{E} $ 和 $\mathcal{E}_p $ 的优化问题,并且这些方法会收敛到(Clarke)优化点。
    Abstract The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.
    摘要 “划分 Wasserstein(SW)距离已成为比 Wasserstein 距离更受欢迎的选择,用于比较概率分布。它在图像处理、领域适应和生成模型中广泛应用,通常是寻找可以最小化 SW 的参数,以便作为这些参数的损失函数。这些优化问题都有同一个子问题,即寻找可以最小化 SW 能量。在这篇文章中,我们研究了 $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z) $ 的性质,其中 $Y \in \mathbb{R}^{n \times d}$ 是一个某个概率分布的支持。我们调查了这个能量的规律和优化性,以及其 Monte-Carlo 预估 $\mathcal{E}_p$ 的数值,并证明了这些点的均匀收摄和确定性。最后,我们显示了在某些意义上,使用 Stochastic Gradient Descent 方法优化 $\mathcal{E}$ 和 $\mathcal{E}_p$ 可以导向(Clarke)内部点的极值。”

A data science axiology: the nature, value, and risks of data science

  • paper_url: http://arxiv.org/abs/2307.10460
  • repo_url: None
  • paper_authors: Michael L. Brodie
  • for: 这篇论文是为了探讨数据科学的axiology,即其目的、性质、重要性、风险和价值,以帮助理解和定义数据科学,并找到其可能的利益和风险。
  • methods: 这篇论文使用了AXIOLOGY的方法来探讨数据科学的特点,包括其不可预测的性和AI的应用。
  • results: 这篇论文的结果表明,数据科学在知识发现方面具有很大的潜力和可能性,但同时也存在一些风险,例如不可预测的结果和AI的应用可能导致的不良影响。
    Abstract Data science is not a science. It is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery that is not otherwise possible and can be beyond human reasoning. It is changing our world practically and profoundly already widely deployed in tens of thousands of applications in every discipline in an AI Arms Race that, due to its inscrutability, can lead to unfathomed risks. This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving, by exploring and evaluating its remarkable, definitive features. As data science is in its infancy, this initial, speculative axiology is intended to aid in understanding and defining data science to recognize its potential benefits, risks, and open research challenges. AI based data science is inherently about uncertainty that may be more realistic than our preference for the certainty of science. Data science will have impacts far beyond knowledge discovery and will take us into new ways of understanding the world.
    摘要 “数据科学不是一种科学。它是一种研究方法论,具有未曾探索的范围、大小、复杂性和知识发现的力量,超出人类理解的限制。它正在改变我们的世界,已经广泛应用于万千个应用领域,在人工智能竞赛中投入了巨资。这篇论文提出了数据科学的axiology,其目的、本质、重要性、风险和价值,通过探究和评估其非凡的特点。由于数据科学处于其初期,这些初步的论据axiology的目的是帮助我们理解和定义数据科学,认识其潜在的利益、风险和研究挑战。AI基于的数据科学是一种不确定性,可能更加真实地反映我们对世界的理解。数据科学将对我们的世界产生深远的影响,将带我们进入新的理解世界。”

A New Computationally Simple Approach for Implementing Neural Networks with Output Hard Constraints

  • paper_url: http://arxiv.org/abs/2307.10459
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Andrei V. Konstantinov, Lev V. Utkin
  • for: 提出了一种新的计算简单的神经网络输出值约束方法。
  • methods: 使用了额外的神经网络层来实现约束,并将约束转换为神经网络输出值的限制。
  • results: 方法可以简单地扩展到受约束的输入输出问题,并且可以实现不同类型的约束,包括线性和二次约束、等式约束和动态约束。计算复杂度为O(n*m)和O(n^2*m)。数据实验 validate了该方法。
    Abstract A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
    摘要 一种新的计算简单的方法,用于在神经网络输出值上强制实施硬 convex 约束,被提出。该方法的关键思想是将神经网络参数 вектор映射到一个确定在可行集中的点上。该映射通过额外的神经网络层实现,该层受约束的输出约束。提出的方法可以简单地扩展到输出约束不仅仅是单个输出向量,而且也包括输入的共同约束。投影方法可以简单地在提出的方法中实现。该方法可以具体实现不同类型的约束,包括线性和quadratic约束,等式约束,以及边界约束。该方法的计算简单性是其重要特点,其前向传播复杂度为O(n\*m)和O(n^2\*m),其中n是变量数,m是约束数。数值实验证明了该方法的可行性和分类能力。代码实现该方法公开可用。

A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

  • paper_url: http://arxiv.org/abs/2307.10455
  • repo_url: https://github.com/zahrag/BIOSCAN-1M
  • paper_authors: Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T. A. McKeown, Chris C. Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth
  • for: This paper aims to provide a large dataset of hand-labelled insect images to train computer-vision models for taxonomic assessment, and to lay the foundation for a comprehensive survey of global biodiversity.
  • methods: The dataset, called BIOSCAN-Insect, includes raw nucleotide barcode sequences and assigned barcode index numbers for each record, and is primarily used to train computer-vision models for image-based taxonomic assessment.
  • results: The paper presents a million-image dataset with a long-tailed class-imbalance distribution and highly fine-grained classification problem at lower taxonomic levels, which provides a challenging task for image-based taxonomic classification.
    Abstract In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier.
    摘要 寻求catalóg insect多样性,我们提出一个新的大型手标注 insect 图像数据集,称为 BIOSCAN-Insect 数据集。每个记录都由专家taxonomically 分类,同时还有关联的遗传信息,包括 raw нуклеоти德核心序列和分配给每个物种的核心序列编号,这些是生物学基于的种类分类的代理。本文报道一个精心纪录 million 张图像数据集,主要用于训练计算机视觉模型,以提供图像基于的种类评估。然而,数据集还具有一些吸引人的特征,如生物学性的长尾分布和生物学分类系统的层次结构,这些特征都是机器学习社区的研究对象。 basis 的目标是建立一个全面的 global 生物多样性 监测系统,本文引入数据集并通过实现和分析基eline 分类器来探讨分类任务。

The importance of feature preprocessing for differentially private linear optimization

  • paper_url: http://arxiv.org/abs/2307.11106
  • repo_url: None
  • paper_authors: Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon
  • for: 本研究目的是研究 differentially private stochastic gradient descent (DPSGD) 是否具有 sufficient condition to find a good minimizer for every dataset under privacy constraints.
  • methods: 本研究使用了 differentially private stochastic gradient descent (DPSGD) 和其 variants,以及 feature preprocessing.
  • results: 研究发现,without feature preprocessing, DPSGD 会导致 privacy error proportional to the maximum norm of features over all samples. 我们提出了一种名为 DPSGD-F 的算法,combines DPSGD with feature preprocessing, and prove that for classification tasks, it incurs a privacy error proportional to the diameter of the features. 我们还在图像分类 benchmarks 中证明了它的实用性.
    Abstract Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question: is DPSGD alone sufficient to find a good minimizer for every dataset under privacy constraints? As a first step towards answering this question, we show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization. In detail, we first show theoretically that there exists an example where without feature preprocessing, DPSGD incurs a privacy error proportional to the maximum norm of features over all samples. We then propose an algorithm called DPSGD-F, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs a privacy error proportional to the diameter of the features $\max_{x, x' \in D} \|x - x'\|_2$. We then demonstrate the practicality of our algorithm on image classification benchmarks.
    摘要 translate-into:zh-CN训练机器学习模型 WITH differential privacy (DP) 在最近几年内得到了越来越多的关注。DP中最受欢迎的算法之一是差分隐私梯度下降 (DPSGD) 和其变体,在每步都将梯度clip和杂音结合在一起。随着 DPSGD 的使用越来越普遍,我们问:DPSGD 是否能够在隐私限制下找到每个数据集上的好最小值?作为回答的第一步,我们证明了非私有优化不同于隐私优化,private feature preprocessing 是必需的。在详细的证明中,我们证明了在 Linear classification 任务上,如果没有 feature preprocessing,DPSGD 会导致隐私错误与最大特征值的最大值成正比。我们then propose了一个名为 DPSGD-F 的算法,它将 DPSGD 与 feature preprocessing 结合,并证明了在分类任务上,它的隐私错误与特征值的最大值成正比。最后,我们在图像分类标准 benchmark 上证明了我们的算法的实用性。

Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model

  • paper_url: http://arxiv.org/abs/2307.10443
  • repo_url: None
  • paper_authors: Shima Foolad, Kourosh Kiani
  • for: 提高机器阅读理解模型的复杂逻辑处理能力
  • methods: injecting external knowledge into the transformer architecture without relying on external knowledge
  • results: 模型在ReCoRD数据集上的表现比cutting-edge LUKE-Graph和基eline LUKE模型更优。
    Abstract Despite the significant progress made by transformer models in machine reading comprehension tasks, they still fall short in handling complex reasoning tasks due to the absence of explicit knowledge in the input sequence. To address this limitation, many recent works have proposed injecting external knowledge into the model. However, selecting relevant external knowledge, ensuring its availability, and requiring additional processing steps remain challenging. In this paper, we introduce a novel attention pattern that integrates reasoning knowledge derived from a heterogeneous graph into the transformer architecture without relying on external knowledge. The proposed attention pattern comprises three key elements: global-local attention for word tokens, graph attention for entity tokens that exhibit strong attention towards tokens connected in the graph as opposed to those unconnected, and the consideration of the type of relationship between each entity token and word token. This results in optimized attention between the two if a relationship exists. The pattern is coupled with special relative position labels, allowing it to integrate with LUKE's entity-aware self-attention mechanism. The experimental findings corroborate that our model outperforms both the cutting-edge LUKE-Graph and the baseline LUKE model on the ReCoRD dataset that focuses on commonsense reasoning.
    摘要 尽管变换器模型在机器阅读理解任务中做出了重要进步,但它们仍然缺乏明确的知识表达,导致在复杂的推理任务中表现不佳。为解决这一限制,许多最近的研究强调在模型中注入外部知识。然而,选择相关的外部知识,确保其可用性,以及需要额外的处理步骤仍然是挑战。本文提出了一种新的注意模式,即在变换器架构中 интеグ推理知识来自多样化图表示。该注意模式包括三个关键元素:全局-本地注意力 для单词Token,对于实体Token的注意力,以及对每个实体Token和单词Token之间的关系类型进行考虑。这些元素的结合使得注意力得到优化。此外,我们还采用特殊相对位标签,使得该注意模式可以与LUKE模型的实体意识自注意机制集成。实验结果表明,我们的模型在 Commonsense Reasoning 数据集上比悉心LUKE-Graph和基础LUKE模型表现出色。

Confidence Estimation Using Unlabeled Data

  • paper_url: http://arxiv.org/abs/2307.10440
  • repo_url: https://github.com/topoxlab/consistency-ranking-loss
  • paper_authors: Chen Li, Xiaoling Hu, Chao Chen
  • for: 这篇论文的目的是提出一种基于半监督学习的信任估计方法,即使训练标签很少也可以估计模型对未标注样本的信任程度。
  • methods: 该方法使用训练过程中预测的一致性作为代理函数,并提出了一种一致性排名损失函数来估计信任程度。
  • results: 在图像分类和分割任务上,该方法实现了领先的性能在信任估计中,并且通过下游活动学任务的示例表明了该方法的优势。Here’s the English version of the three key points:
  • for: The purpose of this paper is to propose a confidence estimation method for a semi-supervised setting, where most training labels are unavailable.
  • methods: The method uses the consistency of predictions through the training process as a surrogate function, and proposes a consistency ranking loss function for confidence estimation.
  • results: On both image classification and segmentation tasks, the proposed method achieves state-of-the-art performances in confidence estimation, and demonstrates its advantage through an active learning task.
    Abstract Overconfidence is a common issue for deep neural networks, limiting their deployment in real-world applications. To better estimate confidence, existing methods mostly focus on fully-supervised scenarios and rely on training labels. In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. We stipulate that even with limited training labels, we can still reasonably approximate the confidence of model on unlabeled samples by inspecting the prediction consistency through the training process. We use training consistency as a surrogate function and propose a consistency ranking loss for confidence estimation. On both image classification and segmentation tasks, our method achieves state-of-the-art performances in confidence estimation. Furthermore, we show the benefit of the proposed method through a downstream active learning task. The code is available at https://github.com/TopoXLab/consistency-ranking-loss
    摘要 通过训练过程中的预测一致性来估计模型的自信度,我们提出了首个在半监督Setting下的自信度估计方法。即使有限的训练标签,我们仍可以通过训练过程中的预测一致性来理想地估计模型对无标示样本的自信度。我们使用训练一致性作为代理函数,并提出了一种一致排名损失用于自信度估计。在图像分类和 segmentation 任务中,我们的方法实现了状态机器人的表现,并且我们还证明了我们的方法在下游活动学任务中的利好。代码可以在 https://github.com/TopoXLab/consistency-ranking-loss 上获取。

  • paper_url: http://arxiv.org/abs/2307.10438
  • repo_url: None
  • paper_authors: Shengli Jiang, Shiyi Qin, Reid C. Van Lehn, Prasanna Balaprakash, Victor M. Zavala
  • for: 用于 молекуляр的性能预测
  • methods: 使用自动搜索生成高性能 GNN ensemble,并使用 variance decomposition 分解数据和模型不确定性
  • results: 在多个 benchmark 数据集上表现出色,在预测准确性和 UQ 性能方面超过现有方法,并通过 t-SNE 可视化探索分子特征和不确定性的相关性。
    Abstract Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
    摘要 图 neural network (GNN) 已经成为分子性质预测中一种显著的数据驱动方法。然而,典型的 GNN 模型无法量化预测结果的不确定性。这种能力是在下游任务中使模型使用和部署的信任性质的关键。为此,我们介绍 AutoGNNUQ,一种自动 uncertainty quantification(UQ)方法 для分子性质预测。AutoGNNUQ 利用架构搜索生成一个高性能的 GNN ensemble,以便估计预测结果的不确定性。我们的方法使用差分分析将数据( aleatoric)和模型(epistemic)不确定性分解,为了减少它们。在我们的计算实验中,我们证明 AutoGNNUQ 在多个benchmark数据集上表现出色,比现有的 UQ 方法更高精度和 UQ 性能。此外,我们使用 t-SNE 可视化来探索分子特征与不确定性之间的相关性,为了改进数据集。AutoGNNUQ 在药物探索和材料科学等领域有广泛的应用,因为它可以减少分子性质预测中的不确定性。

A Bayesian Programming Approach to Car-following Model Calibration and Validation using Limited Data

  • paper_url: http://arxiv.org/abs/2307.10437
  • repo_url: None
  • paper_authors: Franklin Abodo
  • for: 这个论文是为了提供一种可以准确模拟驾驶行为的模型,以便在交通研究和工程中设计和评估道路改进计划。
  • methods: 这个论文使用了微型驾驶行为模型,从而 derivate macroscopic 措施如流速和拥堵。然而,现有的模型多数只适用于特定的交通情况和道路配置,而无法 direct 应用于工区(WZ)的情况。因此,美国交通部(USDOT)的负责交通研究的Volpe中心被委托,以开发一种可以准确模拟驾驶行为的CF模型,以便在工区中进行安全的交通规划。
  • results: 在模型开发过程中,Volpe研究人员发现了困难在模型kalibrase,因此提出了问题是否存在模型中的问题,数据中的问题,或者 kalibrase 过程中的问题。本论文使用 bayesian 方法进行数据分析和参数估计,以探讨和解决这些问题。首先,使用 bayesian 推理测量数据集的充分性。其次,比较 Volpe 研究人员使用的 Genetic Algorithm 基于 calibration 的过程和 bayesian calibration 的结果。最后,通过使用已有的 CF 模型,Wiedemann 99,对 Volpe 模型进行 probabilistic 模型化。验证是通过信息 критери估计 predictive 准确性来进行。
    Abstract Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadways. These simulators are driven by models of microscopic driver behavior from which macroscopic measures like flow and congestion can be derived. Many models are designed for a subset of possible traffic scenarios and roadway configurations, while others have no explicit constraints on their application. Work zones (WZs) are one scenario for which no model to date has reproduced realistic driving behavior. This makes it difficult to optimize for safety and other metrics when designing a WZ. The Federal Highway Administration commissioned the USDOT Volpe Center to develop a car-following (CF) model for use in microscopic simulators that can capture and reproduce driver behavior accurately within and outside of WZs. Volpe also performed a naturalistic driving study to collect telematics data from vehicles driven on roads with WZs for use in model calibration. During model development, Volpe researchers observed difficulties in calibrating their model, leaving them to question whether there existed flaws in their model, in the data, or in the procedure used to calibrate the model using the data. In this thesis, I use Bayesian methods for data analysis and parameter estimation to explore and, where possible, address these questions. First, I use Bayesian inference to measure the sufficiency of the size of the data set. Second, I compare the procedure and results of the genetic algorithm based calibration performed by the Volpe researchers with those of Bayesian calibration. Third, I explore the benefits of modeling CF hierarchically. Finally, I apply what was learned in the first three phases using an established CF model, Wiedemann 99, to the probabilistic modeling of the Volpe model. Validation is performed using information criteria as an estimate of predictive accuracy.
    摘要 交通模拟软件被运输研究人员和工程师使用来设计和评估路径变化。这些模拟器由微型驾驶行为模型驱动,从而 derive 流和堵塞等宏观指标。许多模型适用于特定的交通enario 和路径配置,而其他些没有明确的约束。工地(WZ)是一种 scenarios для which no model to date has reproduced realistic driving behavior. 这使得在设计工地时 diffficult to optimize for safety and other metrics。美国公路管理局委托美国交通部Volpe Center 开发一个可以在微型模拟器中Capture 和重现驾驶行为的 car-following (CF)模型。Volpe 还执行了一项自然驾驶研究,收集了在路径上驾驶的 vehicless 的 telematics 数据,用于模型均衡。在模型开发过程中,Volpe 研究人员注意到了困难在均衡模型,使得他们开始提问是否存在模型中的毛病、数据中的毛病或者均衡模型使用数据的过程中的毛病。在这个论文中,我使用 bayesian 方法来分析数据和参数估计,以探索和解决这些问题。首先,我使用 bayesian 推理来测试数据集的大小是否充分。其次,我比较了 Volpe 研究人员使用的 genetic algorithm 基于的均衡过程和 bayesian 均衡过程的结果。最后,我探索了模型CF 的层次结构化的好处。最后,我使用一个已知的 CF 模型,Wiedemann 99,来应用在 Volpe 模型上。验证是使用信息 критериion 来估计预测精度。

A Matrix Ensemble Kalman Filter-based Multi-arm Neural Network to Adequately Approximate Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.10436
  • repo_url: https://github.com/ved-piyush/menkf-ann-pul
  • paper_authors: Ved Piyush, Yuchen Yan, Yuzhen Zhou, Yanbin Yin, Souparno Ghosh
  • for: This paper aims to propose a new technique for approximating deep learning models, specifically long short-term memory (LSTM) networks, using a Kalman filter-based approach.
  • methods: The proposed method, called Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN), uses a multi-arm extension of a Kalman filter to approximate LSTM networks, and also performs explicit model stacking to handle unequal-size feature sets.
  • results: The proposed method can adequately approximate LSTM networks trained to classify carbohydrate substrates based on genomic sequences, and can also provide uncertainty estimates for the predictions.
    Abstract Deep Learners (DLs) are the state-of-art predictive mechanism with applications in many fields requiring complex high dimensional data processing. Although conventional DLs get trained via gradient descent with back-propagation, Kalman Filter (KF)-based techniques that do not need gradient computation have been developed to approximate DLs. We propose a multi-arm extension of a KF-based DL approximator that can mimic DL when the sample size is too small to train a multi-arm DL. The proposed Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN) also performs explicit model stacking that becomes relevant when the training sample has an unequal-size feature set. Our proposed technique can approximate Long Short-term Memory (LSTM) Networks and attach uncertainty to the predictions obtained from these LSTMs with desirable coverage. We demonstrate how MEnKF-ANN can "adequately" approximate an LSTM network trained to classify what carbohydrate substrates are digested and utilized by a microbiome sample whose genomic sequences consist of polysaccharide utilization loci (PULs) and their encoded genes.
    摘要 深度学习器(DL)是现代预测机制的州标,应用于需要复杂高维数据处理的多个领域。尽管传统的DL通过梯度下降和反推训练,但是使用Kalman滤波器(KF)技术不需要计算梯度的方法已经开发出来。我们提议一种基于KF的多臂ANN(MEnKF-ANN),可以模拟DL,即使训练样本规模太小。我们的提议技术还实现了显式模型堆叠,当特征集的大小不同时变得有用。我们的提议技术可以模拟长期短 память网络(LSTM),并将对应的预测结果添加不确定性。我们示例了MEnKF-ANN可以“合理”地模拟一个基于PULs和其编码的微生物批处理训练集,用于预测微生物样本中吃掉和利用的碳水化合物substrate。

Learning Formal Specifications from Membership and Preference Queries

  • paper_url: http://arxiv.org/abs/2307.10434
  • repo_url: None
  • paper_authors: Ameesh Shah, Marcell Vazquez-Chanlatte, Sebastian Junges, Sanjit A. Seshia
  • for: 学习形式规定(如自动机)的正式规定
  • methods: 提议一种新的框架, combining membership labels和对比 preference,以便更加灵活地进行活动规定学习
  • results: 在两个不同的领域中实现了框架,并证明了我们的方法可以强健地和方便地通过对比和成员标签来识别规定。
    Abstract Active learning is a well-studied approach to learning formal specifications, such as automata. In this work, we extend active specification learning by proposing a novel framework that strategically requests a combination of membership labels and pair-wise preferences, a popular alternative to membership labels. The combination of pair-wise preferences and membership labels allows for a more flexible approach to active specification learning, which previously relied on membership labels only. We instantiate our framework in two different domains, demonstrating the generality of our approach. Our results suggest that learning from both modalities allows us to robustly and conveniently identify specifications via membership and preferences.
    摘要 active learning是一种已经广泛研究的学习方法,用于学习正式规则,如自动机。在这项工作中,我们将活动规则学习框架扩展到请求组合会员标签和对比性偏好。这种组合方式允许我们更加灵活地进行活动规则学习,之前只能通过会员标签进行学习。我们在两个不同领域中实现了我们的框架,并证明了我们的方法的通用性。我们的结果表明,从两种模式学习可以强大地和方便地识别规则via会员和偏好。

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

  • paper_url: http://arxiv.org/abs/2307.10430
  • repo_url: None
  • paper_authors: Rodrigo Castellon, Achintya Gopal, Brian Bloniarz, David Rosenberg
  • for: 生成具有分布式隐私的 tabular 数据
  • methods: 使用 transformer 模型实现 differentially-private 推论
  • results: 在各种数据集上达到与 marginal-based 方法竞争的性能,在某些情况下甚至超越状态之arte 方法表现Here’s the translation in Simplified Chinese:
  • for: 生成具有分布式隐私的 tabular 数据
  • methods: 使用 transformer 模型实现 differentially-private 推论
  • results: 在各种数据集上达到与 marginal-based 方法竞争的性能,在某些情况下甚至超越状态之arte 方法表现
    Abstract The generation of synthetic tabular data that preserves differential privacy is a problem of growing importance. While traditional marginal-based methods have achieved impressive results, recent work has shown that deep learning-based approaches tend to lag behind. In this work, we present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy and achieves performance competitive with marginal-based methods on a wide variety of datasets, capable of even outperforming state-of-the-art methods in certain settings. We also provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most. These results suggest that deep learning-based techniques should be considered as a viable alternative to marginal-based methods in the generation of differentially private synthetic tabular data.
    摘要 “ differential privacy 的 synthetic 表格数据生成问题是一个日益重要的问题。传统的边缘基于方法已经取得了很好的成绩,但最近的工作表明,深度学习基于方法在这个领域比较落后。在这篇文章中,我们介绍了一种名为 Differentially-Private TaBular AutoRegressive Transformer (DP-TBART),这是一种基于 transformer 的自然语言模型,可以保持 differential privacy 并在各种数据集上达到与边缘基于方法相当的性能。我们还提供了一个理论框架,用于理解边缘基于方法的局限性,以及深度学习基于方法在这个领域中的潜在贡献。这些结果表明,深度学习基于方法应该被视为 differential privacy 生成 synthetic 表格数据的可行的替代方案。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.

PreDiff: Precipitation Nowcasting with Latent Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.10422
  • repo_url: None
  • paper_authors: Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang
  • for: 预测地球系统的未来状况,使用深度学习技术来处理大量的空间时间数据。
  • methods: 提出了一种两stage管道,首先开发了一种可能性扩散模型(PreDiff),其可以进行 probabilistic 预测。其次,通过explicit地控制知识机制,使预测结果与专业知识相一致。
  • results: 通过在Synthetic dataset N-body MNIST和实际 precipitation nowcasting dataset SEVIR进行实验,确认了PreDiff的可行性和Domain-specific prior knowledge的可控性,并且预测结果具有高度的操作实用性。
    Abstract Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
    摘要 地球系统预测 Traditional 依靠复杂的物理模型,computationally expensive 和需要专业知识。 last decade, the unprecedented increase in spatiotemporal Earth observation data 使得 data-driven forecasting models using deep learning techniques 表现出了 promise для diverse Earth system forecasting tasks。 However, these models either struggle with handling uncertainty 或 neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting:1. We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts.2. We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly.We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.

Technical Challenges of Deploying Reinforcement Learning Agents for Game Testing in AAA Games

  • paper_url: http://arxiv.org/abs/2307.11105
  • repo_url: None
  • paper_authors: Jonas Gillberg, Joakim Bergdahl, Alessandro Sestini, Andrew Eakins, Linus Gisslen
  • for: 这个技术论文是为了推广游戏生产中的机器学习技术,特意是通过让游戏自动化测试解决方案中加入了实验性的学习系统来提高测试覆盖率。
  • methods: 这篇技术论文描述了一种将学习系统与现有的脚本化测试解决方案集成,以提高测试覆盖率。具体来说,他们使用了一种基于强化学习的方法,通过让机器学习算法学习自动化测试过程中的优化策略,以提高测试效果。
  • results: 据文章报道,通过将学习系统与脚本化测试解决方案集成,可以有效提高测试覆盖率,并且在一些AAA游戏,如《战场2042》和《黑暗空间2023》中实现了一定的成果。
    Abstract Going from research to production, especially for large and complex software systems, is fundamentally a hard problem. In large-scale game production, one of the main reasons is that the development environment can be very different from the final product. In this technical paper we describe an effort to add an experimental reinforcement learning system to an existing automated game testing solution based on scripted bots in order to increase its capacity. We report on how this reinforcement learning system was integrated with the aim to increase test coverage similar to [1] in a set of AAA games including Battlefield 2042 and Dead Space (2023). The aim of this technical paper is to show a use-case of leveraging reinforcement learning in game production and cover some of the largest time sinks anyone who wants to make the same journey for their game may encounter. Furthermore, to help the game industry to adopt this technology faster, we propose a few research directions that we believe will be valuable and necessary for making machine learning, and especially reinforcement learning, an effective tool in game production.
    摘要 从研究到生产,特别是 для大型和复杂的软件系统,是一个基本问题。在大规模游戏生产中,一个主要原因是开发环境和产品环境之间的差异。在这份技术著作中,我们描述了将实验式学习系统添加到现有的自动游戏测试解决方案基于脚本式 Bot 以增加其容量的尝试。我们报告了在一些 AAA 游戏,包括 Battlefield 2042 和 Dead Space (2023) 中将这个学习系统整合的成果,并希望这份技术著作可以显示游戏生产中如何使用学习机器人,以及一些可能会遇到的主要时间潜在障碍。此外,为了帮助游戏业界更快地采纳这技术,我们建议了一些研究方向,我们认为这些研究方向将是有价值和必要的,以便在游戏生产中使用机器学习和特别是实验学习。

Interpreting and Correcting Medical Image Classification with PIP-Net

  • paper_url: http://arxiv.org/abs/2307.10404
  • repo_url: https://github.com/m-nauta/pipnet
  • paper_authors: Meike Nauta, Johannes H. Hegeman, Jeroen Geerdink, Jörg Schlötterer, Maurice van Keulen, Christin Seifert
  • for: 这篇论文旨在探讨可解释的机器学习模型在医学影像分类 tasks 中的应用性和潜力。
  • methods: 论文使用的是PIP-Net模型,这是一种可解释的图像分类模型,它学习了人类理解的图像部件。
  • results: 研究发现,PIP-Net 的决策过程与医学分类标准相一致,只需要提供图像级别的类别标签。此外,研究还发现了如何通过直接禁用不想要的原型来人工修正PIP-Net的思维。I hope that helps! Let me know if you have any other questions.
    Abstract Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.
    摘要 <>本文探讨了可解释型机器学习模型在医学图像分类 task 中的可行性和潜力。 Specifically, we explore the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision-making process is in line with medical classification standards, while only provided with image-level class labels. Additionally, we show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Selection functions of strong lens finding neural networks

  • paper_url: http://arxiv.org/abs/2307.10355
  • repo_url: None
  • paper_authors: A. Herle, C. M. O’Riordan, S. Vegetti
  • for: 这个论文主要目的是研究深 gravitational lens 系统中 neural network 的偏袋性。
  • methods: 这个论文使用了类似于常见 literatura 中使用的 Convolutional Neural Networks 和三个不同的训练集来研究 lens finding neural network 的选择函数。
  • results: 研究发现,这些 neural network 偏好 larger Einstein radii 和更集中的 source-light distributions。增加检测重要性阈值可以改善选择函数的效果。
    Abstract Convolution Neural Networks trained for the task of lens finding with similar architecture and training data as is commonly found in the literature are biased classifiers. An understanding of the selection function of lens finding neural networks will be key to fully realising the potential of the large samples of strong gravitational lens systems that will be found in upcoming wide-field surveys. We use three training datasets, representative of those used to train galaxy-galaxy and galaxy-quasar lens finding neural networks. The networks preferentially select systems with larger Einstein radii and larger sources with more concentrated source-light distributions. Increasing the detection significance threshold to 12$\sigma$ from 8$\sigma$ results in 50 per cent of the selected strong lens systems having Einstein radii $\theta_\mathrm{E}$ $\ge$ 1.04 arcsec from $\theta_\mathrm{E}$ $\ge$ 0.879 arcsec, source radii $R_S$ $\ge$ 0.194 arcsec from $R_S$ $\ge$ 0.178 arcsec and source S\'ersic indices $n_{\mathrm{Sc}^{\mathrm{S}$ $\ge$ 2.62 from $n_{\mathrm{Sc}^{\mathrm{S}$ $\ge$ 2.55. The model trained to find lensed quasars shows a stronger preference for higher lens ellipticities than those trained to find lensed galaxies. The selection function is independent of the slope of the power-law of the mass profiles, hence measurements of this quantity will be unaffected. The lens finder selection function reinforces that of the lensing cross-section, and thus we expect our findings to be a general result for all galaxy-galaxy and galaxy-quasar lens finding neural networks.
    摘要 convolutional neural networks 特别是用于这个任务的 Architecture 和训练数据,即通常在文献中找到的 Architecture 和训练数据,是偏向分类器。 理解这个镜像系统的选择函数是掌握这个大量强 gravitational lens系统的潜在力量的关键。 我们使用了三个训练数据集,代表了通常用于训练 galaxy-galaxy 和 galaxy-quasar 镜像系统的训练数据。 这些网络偏好 Systems with larger Einstein radii 和更集中的源光辉分布。 将检测关键值从 8σ 提高到 12σ 会导致50%选择的强镜系统有 Einstein radii θE 大于或等于 1.04弧度,source radii RS 大于或等于 0.194弧度,source Sérsic indices nScS 大于或等于 2.62。 对于找寻类别的模型,它具有更强的偏好 towards higher lens ellipticities than those trained to find lensed galaxies。 选择函数不受Source 的梯度影响,因此Measurements of this quantity will be unaffected。 镜像选择函数与镜像截面的选择函数相似,因此我们预期我们的发现将是一个通用的结果,适用于所有 galaxy-galaxy 和 galaxy-quasar 镜像系统。

LightPath: Lightweight and Scalable Path Representation Learning

  • paper_url: http://arxiv.org/abs/2307.10171
  • repo_url: None
  • paper_authors: Sean Bin Yang, Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen
  • for: 本文提出了一种轻量级和可扩展的路径表示学习框架,用于智能交通和智能城市应用。
  • methods: 提议使用笛卡尔环境抽象和全局本地知识传播来减少资源消耗和提高可扩展性,同时保持准确性。
  • results: 经过广泛的实验 validate 了该框架的可扩展性和精度,并且在资源有限的环境中具有优势。
    Abstract Movement paths are used widely in intelligent transportation and smart city applications. To serve such applications, path representation learning aims to provide compact representations of paths that enable efficient and accurate operations when used for different downstream tasks such as path ranking and travel cost estimation. In many cases, it is attractive that the path representation learning is lightweight and scalable; in resource-limited environments and under green computing limitations, it is essential. Yet, existing path representation learning studies focus on accuracy and pay at most secondary attention to resource consumption and scalability. We propose a lightweight and scalable path representation learning framework, termed LightPath, that aims to reduce resource consumption and achieve scalability without affecting accuracy, thus enabling broader applicability. More specifically, we first propose a sparse auto-encoder that ensures that the framework achieves good scalability with respect to path length. Next, we propose a relational reasoning framework to enable faster training of more robust sparse path encoders. We also propose global-local knowledge distillation to further reduce the size and improve the performance of sparse path encoders. Finally, we report extensive experiments on two real-world datasets to offer insight into the efficiency, scalability, and effectiveness of the proposed framework.
    摘要 路径表示法广泛应用于智能交通和智能城市应用程序中。为了满足这些应用程序,路径表示学习目标是提供高效精度的路径表示,以便在不同的下游任务中进行高效的操作,如路径排名和旅行费用估算。在资源有限的环境和绿色计算限制下,现有的路径表示学习研究通常强调精度,并且只在必要的情况下进行次要的考虑。我们提出了一个轻量级和可扩展的路径表示学习框架,称为LightPath,以降低资源消耗和实现可扩展性,而不影响准确性。更 Specifically,我们首先提出了一个稀疏自动编码器,以确保框架在路径长度方面具有良好的扩展性。然后,我们提出了一个关系理解框架,以更快地训练更加稀疏的路径编码器。 finally,我们提出了全球-本地知识传播,以进一步减小路径编码器的大小和提高其性能。我们在两个真实世界数据集上进行了广泛的实验,以提供有关效率、可扩展性和效果的深入了解。

Challenges and Applications of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10169
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy
  • for: 本研究旨在为机器学习研究人员提供一个系统的开放问题和成功应用领域,以便更快地了解大语言模型(LLMs)领域的当前状态,并更快地成为产ктив的研究人员。
  • methods: 本研究使用了系统的Literature Review和问题定义方法,以掌握大语言模型领域的当前状态和未解决问题。
  • results: 本研究提出了一系列的开放问题和成功应用领域,以便 ML 研究人员更快地了解大语言模型领域的当前状态,并更快地成为产ктив的研究人员。
    Abstract Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive.
    摘要 庞大语言模型(LLM)从无存到普遍的机器学习议题中的几年内。由于这个领域的快速进步,因此难以识别还没有解决的挑战和已经有成果的应用领域。在这篇论文中,我们 hoping to establish a systematic set of open problems and application successes,以便ML研究人员更快地了解这个领域的现状,更快地成为生产力。Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and other countries.

VITS : Variational Inference Thomson Sampling for contextual bandits

  • paper_url: http://arxiv.org/abs/2307.10167
  • repo_url: None
  • paper_authors: Pierre Clavier, Tom Huix, Alain Durmus
  • for: 这 paper 是关于 contextual bandits 的一种变体 Thompson sampling(TS)算法的研究。
  • methods: 该算法使用 Gaussian Variational Inference 提供高效的 posterior 近似,并且可以轻松地从近似中采样。
  • results: 该paper 表明 VITS 算法可以实现 sub-linear regret bound,并且在 synthetic 和实际世界数据上进行了实验验证。Here’s the breakdown of each point in English:* For: The paper is about a variant of the Thompson sampling algorithm for contextual bandits.* Methods: The algorithm uses Gaussian Variational Inference to provide efficient posterior approximations, and can easily sample from these approximations.* Results: The paper shows that the VITS algorithm can achieve a sub-linear regret bound, and demonstrates its effectiveness through experiments on both synthetic and real-world datasets.
    Abstract In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. At each round, traditional TS requires samples from the current posterior distribution, which is usually intractable. To circumvent this issue, approximate inference techniques can be used and provide samples with distribution close to the posteriors. However, current approximate techniques yield to either poor estimation (Laplace approximation) or can be computationally expensive (MCMC methods, Ensemble sampling...). In this paper, we propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. This scheme provides powerful posterior approximations which are easy to sample from, and is computationally efficient, making it an ideal choice for TS. In addition, we show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit. Finally, we demonstrate experimentally the effectiveness of VITS on both synthetic and real world datasets.
    摘要 在本文中,我们介绍并分析了一种 Contextual Bandit 中的 Thompson 抽样(TS)算法的变体。在每个轮次中,传统的 TS 需要从当前的 posterior 分布中采样,通常是不可行的。为了缓解这个问题,我们可以使用 Approximate Inference 技术,提供靠近 posterior 的样本。然而,当前的 approximate 技术可能会导致低效的估计(Laplace 应用)或者是 computationally expensive(MCMC 方法、Ensemble 抽样...)。在本文中,我们提出了一个新的算法,基于 Gaussian Variational Inference 的 Varitional Inference Thompson Sampling(VITS)。这种方案提供了简单易于采样的强 posterior aproximation,计算效率高,适用于 TS。此外,我们证明了 VITS 在维度和轮次数方面的下降 regret bound 与传统的 TS 相同。最后,我们通过 synthetic 和实际世界数据的实验证明了 VITS 的实际效果。

Improving Multimodal Datasets with Image Captioning

  • paper_url: http://arxiv.org/abs/2307.10350
  • repo_url: None
  • paper_authors: Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt
  • for: 提高大型视觉语言模型的成功,如CLIP和Flamingo。
  • methods: 研究如何使用生成的标题提高web数据的Utility,并比较不同混合策略的性能。
  • results: 与DataComp benchmark中提出的最佳策略相比,我们的方法在ImageNet和38个任务中提高了2%和4%的性能,并在Flickr和MS-COCO检索中表现了2倍的提升。我们还分析了生成标题的效果,并证明标准图像描述标准不是多Modal训练中标题的可靠指标。最后,我们在大规模的DataComp中进行了实验,探讨生成标题在大量训练数据量下的局限性,以及图像淘汰的重要性。
    Abstract Massive web datasets play a key role in the success of large vision-language models like CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to reduce noise often come at the expense of data diversity. Our work focuses on caption quality as one major source of noise, and studies how generated captions can increase the utility of web-scraped datapoints with nondescript text. Through exploring different mixing strategies for raw and generated captions, we outperform the best filtering method proposed by the DataComp benchmark by 2% on ImageNet and 4% on average across 38 tasks, given a candidate pool of 128M image-text pairs. Our best approach is also 2x better at Flickr and MS-COCO retrieval. We then analyze what makes synthetic captions an effective source of text supervision. In experimenting with different image captioning models, we also demonstrate that the performance of a model on standard image captioning benchmarks (e.g., NoCaps CIDEr) is not a reliable indicator of the utility of the captions it generates for multimodal training. Finally, our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text, as well as the importance of image curation with increasing training data quantity.
    摘要 大量网络数据对于大型视觉语言模型如CLIP和Flamingo的成功起到了关键作用。然而,原始网络数据具有噪音,现有的过滤方法通常会导致数据多样性减少。我们的工作将注意力点在caption质量上,研究如何使用生成的caption提高网络抓取到的文本点的使用价值。通过不同的混合策略来融合原始和生成的caption,我们在ImageNet和38个任务上比DataCompbenchmark中的最佳过滤方法提高2%和4%。我们的最佳方法还在Flickr和MS-COCO检索中表现出2倍的好干净性。我们还分析了生成caption的有效性源泉,并通过不同的图像描述模型的实验,发现标准图像描述benchmark(如NoCaps CIDEr)中模型的性能不是训练多模式时caption的用途的可靠指标。最后,我们在DataComp的大规模数据(1.28B image-text pair)上进行实验,提供了生成caption的局限性以及图像筛选的重要性,随着训练数据量的增加。

Rethinking Backdoor Attacks

  • paper_url: http://arxiv.org/abs/2307.10163
  • repo_url: https://github.com/lancopku/SOS
  • paper_authors: Alaa Khaddaj, Guillaume Leclerc, Aleksandar Makelov, Kristian Georgiev, Hadi Salman, Andrew Ilyas, Aleksander Madry
  • for: 本文旨在探讨Backdoor攻击的问题,即敌对者在训练集中插入恶意构建的例子,以让模型易受欺诈。
  • methods: 本文提出了一种新的方法来抗击Backdoor攻击,即通过无结构信息对训练数据分布进行检测,并使用Robust统计技术来检测和移除恶意构建的例子。
  • results: 本文的结果表明,在缺乏结构信息的情况下,Backdoor攻击是不可识别的,而且与自然出现的特征相同。基于此观察,本文检视了现有的Backdoor攻击防御方法,并描述了它们的假设和依赖关系。最后,本文提出了一种新的假设,即Backdoor攻击对应于训练数据中最强的特征。基于这个假设,本文开发了一种新的检测算法,具有理论保证和实际效果。
    Abstract In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. In this work, we present a different approach to the backdoor attack problem. Specifically, we show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data--and thus impossible to "detect" in a general sense. Then, guided by this observation, we revisit existing defenses against backdoor attacks and characterize the (often latent) assumptions they make and on which they depend. Finally, we explore an alternative perspective on backdoor attacks: one that assumes these attacks correspond to the strongest feature in the training data. Under this assumption (which we make formal) we develop a new primitive for detecting backdoor attacks. Our primitive naturally gives rise to a detection algorithm that comes with theoretical guarantees and is effective in practice.
    摘要 在一个后门攻击中,敌对者会插入一些恶意构建的后门示例,以让模型易于操纵。防御这类攻击通常包括视这些插入的示例为训练集中的异常值,并使用robust统计技术来探测和除掉它们。在这项工作中,我们提出了一种不同的后门攻击问题的解决方案。具体来说,我们表明了在训练数据分布的结构信息不存在的情况下,后门攻击是无法分辨的,因此不能在通用的概念上探测。然后,我们根据这一观察,重新评估了现有的后门攻击防御方法,描述了它们所假设的(常常隐藏的)假设和依赖项。最后,我们探索了一种假设后门攻击对应于训练数据中最强的特征,并将这种假设进行了正式表述。我们的原则 Naturally gives rise to a detection algorithm that comes with theoretical guarantees and is effective in practice.

Robust Driving Policy Learning with Guided Meta Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.10160
  • repo_url: None
  • paper_authors: Kanghoon Lee, Jiachen Li, David Isele, Jinkyoo Park, Kikuo Fujimura, Mykel J. Kochenderfer
  • for: 增强自动驾驶车辆在互动交通场景中的自适应能力
  • methods: 使用随机互动奖励函数生成多种目标,并通过引导政策实现这些目标,以 trains 多种驾驶策略
  • results: 在一个具有挑战性的 T-路口场景中,成功启动了一个适应性强的驾驶策略,并且能够在未经见过的社交车辆行为下generalizationHere’s the same information in a more detailed format:
  • for: The paper aims to improve the adaptability of autonomous vehicles in interactive traffic scenarios by training a single meta-policy to handle diverse social vehicle behaviors.
  • methods: The proposed method uses randomized interaction-based reward functions to generate diverse objectives and train the meta-policy through guiding policies that achieve specific objectives. The ego vehicle’s driving policy is trained to be robust to unseen situations with out-of-distribution (OOD) social agents’ behaviors.
  • results: The proposed method is tested in a challenging uncontrolled T-intersection scenario, where the ego vehicle’s driving policy is able to generalize well to unseen situations with OOD social agents’ behaviors.I hope this helps! Let me know if you have any further questions.
    Abstract Although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors. In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives. We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.
    摘要 (Simplified Chinese)尽管深度强化学习(DRL)在交互式交通场景中展现出了扎实的结果,现有的工作通常采用固定行为策略来控制社交车辆在训练环境中。这可能会使学习的驾驶策略过拟合环境,使其与不同、未见的行为的社交车辆困难交流。在这种工作中,我们提出了一种高效的方法,用于训练社交车辆的多样化驾驶策略。通过随机 modify social vehicle的互动基于奖励函数,我们可以生成多样的目标,并高效地通过指导策略来训练单一的元策略。我们还提出了一种增强ego车驾驶策略的训练策略,使用已学习的元策略控制社交车辆的环境。我们的方法成功地学习了一个 Egode驾驶策略,该策略在未见的社交车辆行为情况下可以普适地应用。

Curvature-based Clustering on Graphs

  • paper_url: http://arxiv.org/abs/2307.10155
  • repo_url: https://github.com/agosztolai/geometric_clustering
  • paper_authors: Yu Tian, Zachary Lubberts, Melanie Weber
  • for: 本文研究了一种基于图形学的无监督节点划分(或社区检测)算法,用于找到图中紧密连接的子结构,即社区或群体。
  • methods: 本文使用了离散 Ricci curvature 和其相关的 геометрический流动,以揭示图中的社区结构。 并考虑了多种离散 curvature 观念,并对其进行分析。
  • results: 本文提供了both theoretical 和 empirical 证明,证明了我们的 curvature-based 划分算法的实用性。 此外,还提供了一些关于图形 curvature 和其对副图 curvature 的关系的结果,可能对 curvature-based 网络分析有独立的价值。
    Abstract Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms, which exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneficial to perform community detection on the line graph, i.e., the graph's dual. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and that of its dual, which enable the efficient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis.
    摘要 不监督节点划分(或社区探测)是一个经典的图学任务。在这篇论文中,我们研究了利用图形的几何特性来标识紧密连接的子结构,它们组成社区或社区。我们的方法利用离散 Ricci 曲率和其相关的几何流动,以便在图形中揭示社区结构。我们考虑了多种离散曲率概念,并分析了它们的使用价值。与先前文献不同,我们不仅研究单会员社区探测,每个节点都属于唯一一个社区,还研究了混合会员社区探测,社区可能 overlap。为实现后一种情况,我们提出在对граф的 dual 进行社区探测,即线图。我们提供了理论和实验证明,以及对权重图的 curvature 的分析,这些结果可能为 curvature-based 网络分析提供帮助。

Code Detection for Hardware Acceleration Using Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10348
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Pablo Antonio Martínez, Gregorio Bernabé, José Manuel García
  • for: 本研究探讨了使用大型自然语言模型(LLM)进行代码检测。
  • methods: 我们提出了一种初步的提示策略和一种新的提示策略来实现代码检测。
  • results: 结果表明,我们的新提示策略可以减少假阳性,实现了优秀的总准确率(91.1%, 97.9%, 和 99.7%)。这些结果对现有的代码检测方法提出了明显的挑战。
    Abstract Large language models (LLMs) have been massively applied to many tasks, often surpassing state-of-the-art approaches. While their effectiveness in code generation has been extensively studied (e.g., AlphaCode), their potential for code detection remains unexplored. This work presents the first analysis of code detection using LLMs. Our study examines essential kernels, including matrix multiplication, convolution, and fast-fourier transform, implemented in C/C++. We propose both a preliminary, naive prompt and a novel prompting strategy for code detection. Results reveal that conventional prompting achieves great precision but poor accuracy (68.8%, 22.3%, and 79.2% for GEMM, convolution, and FFT, respectively) due to a high number of false positives. Our novel prompting strategy substantially reduces false positives, resulting in excellent overall accuracy (91.1%, 97.9%, and 99.7%, respectively). These results pose a considerable challenge to existing state-of-the-art code detection methods.
    摘要 大型语言模型(LLM)已经广泛应用于多种任务,经常超越现有的方法。而它们在代码生成中的应用则未得到充分探讨。 本研究是代码检测使用 LLM 的首次分析。我们的研究探讨了重要的核心操作,包括矩阵乘法、卷积和快速傅立叶变换,它们在 C/C++ 中实现。我们提出了一个初步、简单的提示和一个新的提示策略来进行代码检测。结果显示,传统的提示方法可以很好地精度(68.8%、22.3%和79.2%),但是受到许多假阳性的影响,因此精度低。我们的新提示策略可以干扰假阳性,实现了优秀的总精度(91.1%、97.9%和99.7%)。这些结果对现有的代码检测方法提出了严重的挑战。

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

  • paper_url: http://arxiv.org/abs/2307.10142
  • repo_url: https://github.com/se-hwan/pbrs-humanoid
  • paper_authors: Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim
  • for: 本研究旨在 benchmarking 标准的 reward shaping 方法和 potential based reward shaping (PBRS) 方法,以加速 reinforcement learning (RL) 的学习速度。
  • methods: 本研究使用了 humanoid robot 进行实验,并对两种 reward shaping 方法进行比较。
  • results: 研究发现,在高维系统中,PBRS 的优化效果只有较小的改善,但 PBRS 的评价标准更容易调整。
    Abstract The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.
    摘要 主要挑战在开发有效的增强学习(RL)管道是设计和调整奖金函数。Well-designed 形式的奖金可以导致学习速度明显提高。然而,Naively 定义的奖金可能会与愿望行为冲突,导致过拟合或者even erratic performance if not properly tuned。理论上,广泛的 potential based reward shaping(PBRS)可以帮助导引学习过程,无需affecting the optimal policy。虽然一些研究已经探讨了使用 potential based reward shaping 加速学习的整合,但大多数研究仅限于格子世界和低维系统,RL 在机器人领域主要依靠标准的奖金形式。在这篇论文中,我们对标准的奖金形式和 PBRS 进行了对比,发现在这个高维系统中,PBRS 只有微妙的加速学习速度。然而,PBRS 奖金项是标准奖金形式相比较更加Robust 尺度,因此更容易调整。

ProtiGeno: a prokaryotic short gene finder using protein language models

  • paper_url: http://arxiv.org/abs/2307.10343
  • repo_url: https://github.com/tonytu16/protigeno
  • paper_authors: Tony Tu, Gautham Krishna, Amirali Aghazadeh
  • for: 本研究旨在提高质子生物基因预测的准确性和 recall,尤其是在短读取 Frame (ORFs) 中。
  • methods: 我们开发了一种基于深度学习的方法,称为 ProtiGeno,使用一个训练在数百万个演化后的蛋白质模型来预测质子生物短基因。
  • results: 在系统性的大规模实验中,我们示出了 ProtiGeno 可以更高度准确地预测短质子生物基因,比现有的状态艺术基因预测器更高。我们还讲述了 ProtiGeno 预测短基因的预测特征和可能的限制。
    Abstract Prokaryotic gene prediction plays an important role in understanding the biology of organisms and their function with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.
    摘要 probiotic gene prediction plays an important role in understanding the biology of organisms and their function, with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.Here's the word-for-word translation of the text into Simplified Chinese: probiotic gene prediction plays an important role in understanding the biology of organisms and their function, with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.Please note that the translation is done using Google Translate, and the result may not be perfectly accurate or idiomatic.

Gradient Sparsification For Masked Fine-Tuning of Transformers

  • paper_url: http://arxiv.org/abs/2307.10098
  • repo_url: None
  • paper_authors: James O’ Neill, Sourav Dutta
  • for: 这 paper 是 investigate 如何提高 transfer learning 中的 fine-tuning 性能,并提出了一种 gradient sparsification 方法 GradDrop。
  • methods: 这 paper 使用了 two 种方法来 evaluate GradDrop,一种是使用 multilingual XGLUE bencmark,另一种是使用 XLMR-Large 模型。
  • results: 实验结果表明,GradDrop 可以与使用额外的翻译数据进行 intermediate pretraining 相比,并且超过标准的 fine-tuning 和慢滑层解决方法。另外,一种 post-analysis 还表明,GradDrop 可以提高 under-resourced 语言的性能。
    Abstract Fine-tuning pretrained self-supervised language models is widely adopted for transfer learning to downstream tasks. Fine-tuning can be achieved by freezing gradients of the pretrained network and only updating gradients of a newly added classification layer, or by performing gradient updates on all parameters. Gradual unfreezing makes a trade-off between the two by gradually unfreezing gradients of whole layers during training. This has been an effective strategy to trade-off between storage and training speed with generalization performance. However, it is not clear whether gradually unfreezing layers throughout training is optimal, compared to sparse variants of gradual unfreezing which may improve fine-tuning performance. In this paper, we propose to stochastically mask gradients to regularize pretrained language models for improving overall fine-tuned performance. We introduce GradDrop and variants thereof, a class of gradient sparsification methods that mask gradients during the backward pass, acting as gradient noise. GradDrop is sparse and stochastic unlike gradual freezing. Extensive experiments on the multilingual XGLUE benchmark with XLMR-Large show that GradDrop is competitive against methods that use additional translated data for intermediate pretraining and outperforms standard fine-tuning and gradual unfreezing. A post-analysis shows how GradDrop improves performance with languages it was not trained on, such as under-resourced languages.
    摘要 广泛采用已经预训练自主语言模型的微调是为了进行转移学习到下游任务。微调可以通过冻结预训练网络的梯度来实现,或者是通过在新增的分类层上进行梯度更新。渐进冻结可以在训练过程中逐渐解冻整个层的梯度,从而实现存储和训练速度之间的平衡。然而,不知道渐进冻结layers在训练过程中是最佳的,相比之下, sparse variant of gradual unfreezing可能会提高微调性能。在这篇论文中,我们提出了随机层梯度掩码来规范预训练语言模型,以提高总体微调性能。我们引入GradDrop和其变种,它是一类梯度减少方法,在反向传播中随机掩码梯度。GradDrop是不同于渐进冻结的,它是随机和粗略的。我们在多语言XGLUE标准测试 benchmark上进行了广泛的实验,结果显示GradDrop和其变种与使用额外翻译数据进行中间预训练的方法相当竞争,并且超过了标准微调和渐进冻结。一种后期分析表明,GradDrop在未经训练的语言上提高性能,如受到了资源的语言。

Revisiting invariances and introducing priors in Gromov-Wasserstein distances

  • paper_url: http://arxiv.org/abs/2307.10093
  • repo_url: None
  • paper_authors: Pinar Demetci, Quang Huy Tran, Ievgen Redko, Ritambhara Singh
  • for: 本研究旨在提出一种新的优先Transport-基于距离,以提高对metric spaces中样本之间的比较,并且能够在certain应用中控制对映射变换的灵活性。
  • methods: 本研究使用augmented Gromov-Wasserstein distance,该距离考虑了样本之间的pairwise similarity,同时还能够 incorporate feature alignments,以更好地利用输入数据中的先验知识。
  • results: 本研究通过 theoretically analyzing the proposed metric,并在单细胞多Omic alignment和机器学习中进行了实验 validate the effectiveness of the proposed method。
    Abstract Gromov-Wasserstein distance has found many applications in machine learning due to its ability to compare measures across metric spaces and its invariance to isometric transformations. However, in certain applications, this invariance property can be too flexible, thus undesirable. Moreover, the Gromov-Wasserstein distance solely considers pairwise sample similarities in input datasets, disregarding the raw feature representations. We propose a new optimal transport-based distance, called Augmented Gromov-Wasserstein, that allows for some control over the level of rigidity to transformations. It also incorporates feature alignments, enabling us to better leverage prior knowledge on the input data for improved performance. We present theoretical insights into the proposed metric. We then demonstrate its usefulness for single-cell multi-omic alignment tasks and a transfer learning scenario in machine learning.
    摘要 《Gromov-Wasserstein距离》在机器学习中发现了广泛的应用,主要是因为它可以比较度量空间中的度量,并且对于同态变换是不变的。然而,在某些应用场景中,这种不变性属性可能是不需要的,甚至是不жела的。此外,Gromov-Wasserstein距离仅考虑输入数据集中的对应关系,不考虑原始特征表示。我们提议一种新的优化的Gromov-Wasserstein距离,called Augmented Gromov-Wasserstein,允许控制变换的级别。它还包含特征对齐,使得我们可以更好地利用输入数据中的先验知识,提高性能。我们提供了关于提议度量的理论听见。然后,我们在单细ће多元素Alignment任务和机器学习中的传递学习场景中展示了其用于。

eess.IV - 2023-07-20

Parse and Recall: Towards Accurate Lung Nodule Malignancy Prediction like Radiologists

  • paper_url: http://arxiv.org/abs/2307.10824
  • repo_url: None
  • paper_authors: Jianpeng Zhang, Xianghua Ye, Jianfeng Zhang, Yuxing Tang, Minfeng Xu, Jianfei Guo, Xin Chen, Zaiyi Liu, Jingren Zhou, Le Lu, Ling Zhang
  • for: 这个研究旨在提高肺癌早期检测的精度,以提高生存 outcome。
  • methods: 这个方法使用了 radiologist-inspired 的方法,包括 context parsing 和 prototype recalling 模组。context parsing 模组首先将 nodule 的 context 结构分解,然后将 contextual information 聚合以更好地理解 nodule。prototype recalling 模组使用 prototype-based learning 来将先前学习的案例转换为 prototype,并在训练中进行线上协同更新。
  • results: 这个方法在低剂量和非contrast CT 检测中实现了高水平的检测性能。
    Abstract Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as location, shape, and peripheral vessels, and experienced radiologists can search for clues from previous cases as a reference to enrich the basis of decision-making. In this paper, we propose a radiologist-inspired method to simulate the diagnostic process of radiologists, which is composed of context parsing and prototype recalling modules. The context parsing module first segments the context structure of nodules and then aggregates contextual information for a more comprehensive understanding of the nodule. The prototype recalling module utilizes prototype-based learning to condense previously learned cases as prototypes for comparative analysis, which is updated online in a momentum way during training. Building on the two modules, our method leverages both the intrinsic characteristics of the nodules and the external knowledge accumulated from other nodules to achieve a sound diagnosis. To meet the needs of both low-dose and noncontrast screening, we collect a large-scale dataset of 12,852 and 4,029 nodules from low-dose and noncontrast CTs respectively, each with pathology- or follow-up-confirmed labels. Experiments on several datasets demonstrate that our method achieves advanced screening performance on both low-dose and noncontrast scenarios.
    摘要 肺癌是全球最主要的死亡原因之一, early screening 对于提高存活率的影响是关键。在临床实践中, nodule 的 contextual 结构和 radiologist 的经验是识别 benign 和 malignant nodule 的两个核心元素。 contextual 信息可以提供 nodule 的全面信息,如位置、形状和周围血管,经验丰富的 radiologist 可以通过对前期案例进行参考,增强基于决策的基础。在这篇论文中,我们提出了一种基于 radiologist 的方法,用于模拟诊断过程。该方法包括 context parsing 和 prototype recalling 两个模块。context parsing 模块首先将 nodule 的 contextual 结构分解,然后将 contextual 信息聚合,以更全面地理解 nodule。prototype recalling 模块使用 prototype-based learning 来压缩已经学习的案例,并在线更新,以便在训练过程中保持最新。基于这两个模块,我们的方法可以利用 nodule 的内在特征和已经accumulated的外部知识来实现准确的诊断。为了满足低剂量和非对抗 CT 的需求,我们收集了12,852和4,029 nodule 的数据集,其中每个数据集具有 pathology-或 follow-up-确认的标签。实验表明,我们的方法在低剂量和非对抗场景中都达到了高水平的检测性能。

A novel integrated method of detection-grasping for specific object based on the box coordinate matching

  • paper_url: http://arxiv.org/abs/2307.11783
  • repo_url: None
  • paper_authors: Zongmin Liu, Jirui Wang, Jie Li, Zufeng Li, Kai Ren, Peng Shi
  • for: 本研究旨在提高服务机器人对特定物体的检测和抓取能力,以更好地照顾老年和残疾人群。
  • methods: 该研究提出了一种基于盒坐标匹配的检测-抓取集成方法(DG-BCM),包括对SOLOv2实例分割模型进行通道注意力模块(CAM)和空间注意力模块(SAM)的改进,以及将atrous spatial pyramid pooling(ASPP)和CAM添加到生成征式径向减少神经网络(GR-CNN)模型中来优化抓取估计。
  • results: 实验表明,提高后的模型在对物体检测和抓取任务中均表现出色,而且在对几种具体物体的抓取任务中也得到了可行和有效的结果。
    Abstract To better care for the elderly and disabled, it is essential for service robots to have an effective fusion method of object detection and grasp estimation. However, limited research has been observed on the combination of object detection and grasp estimation. To overcome this technical difficulty, a novel integrated method of detection-grasping for specific object based on the box coordinate matching is proposed in this paper. Firstly, the SOLOv2 instance segmentation model is improved by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation. Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to verify the superiority of improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, demonstrating the feasibility and effectiveness of DG-BCM algorithm proposed in this paper.
    摘要 Firstly, the SOLOv2 instance segmentation model is enhanced by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation.Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to demonstrate the superiority of the improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, showcasing the feasibility and effectiveness of the DG-BCM algorithm proposed in this paper.

Aggressive saliency-aware point cloud compression

  • paper_url: http://arxiv.org/abs/2307.10741
  • repo_url: None
  • paper_authors: Eleftheria Psatha, Dimitrios Laskos, Gerasimos Arvanitis, Konstantinos Moustakas
  • for: This paper aims to present a novel, geometry-based, end-to-end compression scheme for point clouds, which is essential for various applications such as virtual reality and 3D modeling.
  • methods: The proposed method combines information on the geometrical features of the point cloud and the user’s position to achieve remarkable results for aggressive compression schemes demanding very small bit rates. It includes separating visible and non-visible points, calculating four saliency maps, and using delta coordinates and solving a sparse linear system for decoding.
  • results: The proposed method achieves significantly better results than the geometry-based point cloud compression (G-PCC) algorithm by the Moving Picture Experts Group (MPEG) for small bit rates, as demonstrated by evaluation studies and comparisons with various point clouds.
    Abstract The increasing demand for accurate representations of 3D scenes, combined with immersive technologies has led point clouds to extensive popularity. However, quality point clouds require a large amount of data and therefore the need for compression methods is imperative. In this paper, we present a novel, geometry-based, end-to-end compression scheme, that combines information on the geometrical features of the point cloud and the user's position, achieving remarkable results for aggressive compression schemes demanding very small bit rates. After separating visible and non-visible points, four saliency maps are calculated, utilizing the point cloud's geometry and distance from the user, the visibility information, and the user's focus point. A combination of these maps results in a final saliency map, indicating the overall significance of each point and therefore quantizing different regions with a different number of bits during the encoding process. The decoder reconstructs the point cloud making use of delta coordinates and solving a sparse linear system. Evaluation studies and comparisons with the geometry-based point cloud compression (G-PCC) algorithm by the Moving Picture Experts Group (MPEG), carried out for a variety of point clouds, demonstrate that the proposed method achieves significantly better results for small bit rates.
    摘要 随着三维场景的准确表示需求的增加,并与 immerse 技术相结合,点云的受欢迎程度有所提高。然而,高质量点云需要大量数据,因此压缩方法的需求是非常重要。在这篇论文中,我们提出了一种新的、基于几何特征的、端到端压缩方案,其中包括点云的几何特征和用户的位置信息,实现了非常出色的压缩效果,特别是在非常小的比特率下。首先,我们将可见和非可见点分开,然后计算出四个重要性地图,使用点云的几何特征和用户的距离、可见信息和用户的关注点。这些地图的组合得到了最终的重要性地图,用于在编码过程中对每个点 quantize 不同数量的比特。解码器使用 delta 坐标和解决一个稀疏线性系统来重建点云。经过评估和与基于几何特征的点云压缩算法(G-PCC)由移动画像专家组(MPEG)提出的比较研究,我们的方法在不同的点云上实现了明显更好的效果。

Prediction of sunflower leaf area at vegetative stage by image analysis and application to the estimation of water stress response parameters in post-registration varieties

  • paper_url: http://arxiv.org/abs/2307.11110
  • repo_url: None
  • paper_authors: Pierre Casadebaig, Nicolas Blanchet, Nicolas Bernard Langlade
  • for: 这研究的目的是为了提供一种自动测量阳光花的发育和生理响应,以便更好地了解可用于农业实践中的品种,以及 indentifying植物对环境的生物、遗传和分子基础。
  • methods: 在INRAE Toulouse的Heliaphen高通量 fenotyping 平台上,我们设计了两个实验,每个实验有8种(2*96植物),并在每天基础上使用光栅摄取植物图像。同时,我们手动测量了这些植物的叶面每两天,持续约10天。图像分析以EXTRACT植物的形态特征,并评估不同的模型来估计植物总叶面积。
  • results: 使用线性模型并应用 posteriori 缓和后果,我们可以估计植物总叶面积的Relative squared error为11%,效率为93%。使用这些数据计算的叶面积响应和蒸散响应(LER和TR),并与手动测量结果进行比较。结果显示,自动测量的LE和TR参数估计值可以用于模拟。相比之下,手动测量结果显示自动测量的LE值较低,但与绿house grown植物的手动测量结果更相似,这可能表明自动测量方法偏向压力敏感。
    Abstract The automatic measurement of developmental and physiological responses of sunflowers to water stress represents an applied challenge for a better knowledge of the varieties available to growers, but also a fundamental one for identifying the biological, genetic and molecular bases of plant response to their environment.On INRAE Toulouse's Heliaphen high-throughput phenotyping platform, we set up two experiments, each with 8 varieties (2*96 plants), and acquired images of plants subjected or not to water stress, using a light barrier on a daily basis. At the same time, we manually measured the leaf surfaces of these plants every other day for the duration of the stress, which lasted around ten days. The images were analyzed to extract morphological characteristics of the segmented plants and different models were evaluated to estimate total plant leaf areas using these data.A linear model with a posteriori smoothing was used to estimate total leaf area with a relative squared error of 11% and an efficiency of 93%. Leaf areas estimated conventionally or with the developed model were used to calculate the leaf expansion and transpiration responses (LER and TR) used in the SUNFLO crop model for 8 sunflower varieties studied. Correlation coefficients of 0.61 and 0.81 for LER and TR respectively validate the use of image-based leaf area estimation. However, the estimated values for LER are lower than for the manual method on Heliaphen, but closer overall to the manual method on greenhouse-grown plants, potentially suggesting an overestimation of stress sensitivity.It can be concluded that the LE and TR parameter estimates can be used for simulations. The low cost of this method (compared with manual measurements), the possibility of parallelizing and repeating measurements on the Heliaphen platform, and of benefiting from the Heliaphen platform's data management, are major improvements for valorizing the SUNFLO model and characterizing the drought sensitivity of cultivated varieties.
    摘要 自动测量阳光花的发育和生理响应对水压力是一个应用挑战,以便更好地了解可用种植者,也是一个基础的挑战,以确定植物对环境的响应的生物、遗传和分子基础。在INRAE Toulouse的Heliaphen高通量现场测量平台上,我们设计了两个实验,每个实验有8种(2*96植物),并在每天基础上使用光梯图像这些植物,并手动测量这些植物的叶面每两天,测试期间约10天。图像分析后,EXTract了植物的形态特征,并评估了不同的模型以估算植物的总叶面面积。使用线性模型和 posteriori 缓和后处理时,可以Estimate植物的总叶面面积,相对平方差为11%,效率为93%。使用这些数据计算植物的叶面Expand和蒸发响应(LER和TR),并使用这些响应在SUNFLO植物模型中。 corrCoef = 0.61和0.81,这些值 validate the use of image-based leaf area estimation。然而,Estimated LER values are lower than those obtained using the manual method on Heliaphen, but closer to the manual method on greenhouse-grown plants, suggesting an overestimation of stress sensitivity。可以 concluThat the LE and TR parameter estimates can be used for simulations. The low cost of this method (compared with manual measurements), the possibility of parallelizing and repeating measurements on the Heliaphen platform, and the possibility of benefiting from the Heliaphen platform's data management, are major improvements for valorizing the SUNFLO model and characterizing the drought sensitivity of cultivated varieties。

Depth from Defocus Technique: A Simple Calibration-Free Approach for Dispersion Size Measurement

  • paper_url: http://arxiv.org/abs/2307.10678
  • repo_url: None
  • paper_authors: Saini Jatin Rao, Shubham Sharma, Saptarshi Basu, Cameron Tropea
  • for: 这项研究旨在提出一种基于图像技术的粒子大小和位置测量方法,用于在多相流体中跟踪粒子的运动和大小。
  • methods: 该方法基于‘深度从杂化’(DFD)技术,使用单个摄像头设置,并采用简单的光学配置和简单的准备过程,以便在更广泛的应用中使用。
  • results: 研究表明,该方法可以高精度地测量粒子的大小和位置,并且可以在多相流体中跟踪粒子的运动。
    Abstract Dispersed particle size measurement is crucial in a variety of applications, be it in the sizing of spray droplets, tracking of particulate matter in multiphase flows, or the detection of target markers in machine vision systems. Further to sizing, such systems are characterised by extracting quantitative information like spatial position and associated velocity of the dispersed phase particles. In the present study we propose an imaging based volumetric measurement approach for estimating the size and position of spherically dispersed particles. The approach builds on the 'Depth from Defocus' (DFD) technique using a single camera approach. The simple optical configuration, consisting of a shadowgraph setup and a straightforward calibration procedure, makes this method readily deployable and accessible for broader applications.
    摘要 “粒子大小分布测量在各种应用中都非常重要,无论是在喷涂液滴大小的测量,或者追踪多相流体中的固体粒子,或者是机器视觉系统中的目标标记检测。在当前研究中,我们提出了基于图像的体积测量方法,用于估算归一化粒子的大小和位置。这种方法基于‘深度从对焦’(DFD)技术,使用单一相机设置。该简单的光学配置和简单的准备过程,使得这种方法可以广泛应用和易于实施。”Note that Simplified Chinese is the standard writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other countries.

Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors

  • paper_url: http://arxiv.org/abs/2307.10667
  • repo_url: None
  • paper_authors: Haechang Lee, Dongwon Park, Wongi Jeong, Kijeong Kim, Hyunwoo Je, Dongil Ryu, Se Young Chun
  • for: 这个论文旨在解决现代CMOS图像感知器(CIS)的物理尺寸越来越小,现在手机摄像头采用不同的非 Bayer颜色畸数组(CFA)模式(例如Quad、Nona、QxQ),这些非 Bayer感知器比传统Bayer CFA更加高效,但可能会在排除中产生视觉artefacts。
  • methods: 该论文提出了一种高效的统一排除方法,可以应用于不同的CFAs和不同的照明条件下的RAW数据。该方法使用了知识学习模型,并且只需要1%的钥匙缓存来实现高效的排除。
  • results: 该论文的实验结果表明,该方法可以在Synthetic和实际RAW数据中实现了state-of-the-art的排除性能,并且可以在不同的CFAs和照明条件下实现了比较高的稳定性和精度。
    Abstract As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introduce visual artifacts during demosaicing due to their inherent pixel pattern structures and sensor hardware characteristics. Previous demosaicing methods have primarily focused on Bayer CFA, necessitating distinct reconstruction methods for non-Bayer patterned CIS with various CFA modes under different lighting conditions. In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes CFA-adaptive filters for only 1% key filters in the network for each CFA, but still manages to effectively demosaic all the CFAs, yielding comparable performance to the large-scale models. Furthermore, by employing meta-learning during inference (KLAP-M), our model is able to eliminate unknown sensor-generic artifacts in real RAW data, effectively bridging the gap between synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved state-of-the-art demosaicing performance in both synthetic and real RAW data of Bayer and non-Bayer CFAs.
    摘要 “随着CMOS图像感知器(CIS)的物理大小变小,latest的手机摄像头正在采用不同的非 Bayer颜色网络阵列(CFA)模式(例如Quad、Nona、QxQ),这些非 Bayer 感知器比传统Bayer CFA 高效,因为它们可以根据不同的光照情况调整像素单元大小,但可能会在构成过程中带来视觉错误。先前的构成方法主要集中在Bayer CFA,需要特别的重建方法 для非 Bayer 模式的 CIS,不同的照明情况下。在这个工作中,我们提出了一个高效的统一构成方法,可以应用于传统Bayer RAW 和不同的非 Bayer CFAs 的 RAW 数据,不同的操作模式下。我们的知识学习基本的构成模型,即KLAP,只需要1%的键 filters 在网络中,但还是能够有效地构成所有的 CFAs,与大规模模型相比,具有相似的性能。此外,通过在推断中使用meta-learning(KLAP-M),我们的模型可以实际地消除实际感知器的普通遗传特征错误,实现实际数据和 sintetic 数据之间的距离。我们的 KLAP 和 KLAP-M 方法在实验和实际 RAW 数据中均 achiev 了顶尖的构成性能。”

Physics-Driven Turbulence Image Restoration with Stochastic Refinement

  • paper_url: http://arxiv.org/abs/2307.10603
  • repo_url: https://github.com/vita-group/pirn
  • paper_authors: Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang
    for: PiRN 是一种用于恢复这个问题的红外线成像系统中的湮渠问题。methods: PiRN 使用了直接将物理模型integrated 到训练过程中,以帮助网络对实际世界中的湮渠状况进行适应。此外,PiRN 还引入了 Stochastic Refinement (SR) 来增强其 perceived quality。results: PiRN 和 PiRN-SR 能够提高 unknown turbulence conditions 的一致性和恢复评分,并且提供了 pixels 精度和 perceived quality 的州际先进成果。
    Abstract Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.
    摘要 图像扭曲 caused by atmospheric turbulence 是一种随机的延迟问题,对于远程光学感知系统来说是一个 kritical problem。过去几十年,一些研究已经进行了,包括基于模型的和 emerging deep learning 解决方案,使用合成数据。虽然最近已经出现了基于物理的快速模拟工具,但是这些模型的训练只是基于合成数据和真实数据对。本文提出了物理结合的Restoration Network(PiRN),将物理基础的模拟器直接引入训练过程,以帮助网络分离随机性和扭曲以及下面的图像。此外,为了超越由 deterministic 模型引入的“平均效果”和真实世界扭曲与合成数据之间的领域差异,我们进一步引入 PiRN with Stochastic Refinement(PiRN-SR),以提高它的感知质量。总的来说,我们的 PiRN 和 PiRN-SR 可以提高对真实世界未知扭曲条件的泛化能力,并提供了 pixel-wise 精度和感知质量的 state-of-the-art 修复。我们的代码可以在 \url{https://github.com/VITA-Group/PiRN} 上获取。

Is Grad-CAM Explainable in Medical Images?

  • paper_url: http://arxiv.org/abs/2307.10506
  • repo_url: None
  • paper_authors: Subhashis Suara, Aayush Jha, Pratik Sinha, Arif Ahmed Sekh
  • For: This paper is written for researchers and practitioners in the field of medical imaging and artificial intelligence, particularly those interested in Explainable Deep Learning and its applications in medical imaging.* Methods: The paper discusses various explainability techniques, including Grad-CAM, and their limitations in medical imaging applications.* Results: The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging.Here’s the same information in Simplified Chinese:* For: 这篇论文是为了吸引医学图像和人工智能领域的研究人员和实践者,尤其是关注Explainable Deep Learning和其在医学图像应用中的潜在应用。* Methods: 论文讨论了各种解释技术,包括Grad-CAM,以及它们在医学图像应用中的限制。* Results: 发现表明Explainable Deep Learning和Grad-CAM在医学图像应用中可以提高深度学习模型的准确率和可读性。
    Abstract Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (will be available).
    摘要 simplified Chinese:人工智能(AI)领域中,可解释深度学习(Explainable Deep Learning)已经受到了广泛关注,尤其在医疗影像领域,因为需要准确和可解释的机器学习模型以便有效的诊断和治疗规划。Grad-CAM是一个基线,它可以高亮深度学习模型决策过程中使用的图像中的关键区域,从而提高了可解释性和信任性。它在许多计算机视觉(CV)任务中被应用,如分类和解释。本研究探讨了Explainable Deep Learning的原理和医疗影像中的应用,讨论了不同的解释技术和其局限性,以及Grad-CAM在医疗影像中的应用。研究发现,Explainable Deep Learning和Grad-CAM在医疗影像中可以提高深度学习模型的准确性和可解释性。代码将在(将来可用)。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Metaverse: A Young Gamer’s Perspective

  • paper_url: http://arxiv.org/abs/2307.10439
  • repo_url: None
  • paper_authors: Ivan V. Bajić, Teo Saeedi-Bajić, Kai Saeedi-Bajić
  • for: 了解小游戏玩家对Metaverse的需求和期望
  • methods: 基于年龄在10岁以下的小游戏玩家的视角,探讨Metaverse与物理世界以及其他技术的关系,以及这些年轻用户对Metaverse的期望
  • results: 提供了对小游戏玩家对Metaverse的认知和期望的准确描述,可以用于规划更详细的主观实验和MMSP技术的研发
    Abstract When developing technologies for the Metaverse, it is important to understand the needs and requirements of end users. Relatively little is known about the specific perspectives on the use of the Metaverse by the youngest audience: children ten and under. This paper explores the Metaverse from the perspective of a young gamer. It examines their understanding of the Metaverse in relation to the physical world and other technologies they may be familiar with, looks at some of their expectations of the Metaverse, and then relates these to the specific multimedia signal processing (MMSP) research challenges. The perspectives presented in the paper may be useful for planning more detailed subjective experiments involving young gamers, as well as informing the research on MMSP technologies targeted at these users.
    摘要 在开发 metaverse 技术时,需要了解用户的需求和要求。目前对最年轻的用户(10岁以下)对 metaverse 的使用情况知之甚少。这篇论文从年轻游戏者的视角来探讨 metaverse,了解它与物理世界以及他们可能熟悉的其他技术之间的关系,探讨他们对 metaverse 的期望,并将这些期望与 multimedia signal processing(MMSP)技术研究挑战相关联。这篇论文中的视角可以用于规划更详细的主观实验,以及 informing MMSP 技术的研究。

Adversarial Latent Autoencoder with Self-Attention for Structural Image Synthesis

  • paper_url: http://arxiv.org/abs/2307.10166
  • repo_url: None
  • paper_authors: Jiajie Fan, Laure Vuaille, Hao Wang, Thomas Bäck
    for:SA-ALAE is proposed to facilitate industrial engineering processes by generating feasible design images of complex engineering parts.methods:SA-ALAE uses a novel Self-Attention Adversarial Latent Autoencoder architecture, which allows generating feasible design images by leveraging the structural patterns and long-range dependencies in industrial design images.results:SA-ALAE is shown to be effective in generating engineering blueprints in a real automotive design task, allowing users to explore novel variants of an existing design and control the generation process by operating in latent space.
    Abstract Generative Engineering Design approaches driven by Deep Generative Models (DGM) have been proposed to facilitate industrial engineering processes. In such processes, designs often come in the form of images, such as blueprints, engineering drawings, and CAD models depending on the level of detail. DGMs have been successfully employed for synthesis of natural images, e.g., displaying animals, human faces and landscapes. However, industrial design images are fundamentally different from natural scenes in that they contain rich structural patterns and long-range dependencies, which are challenging for convolution-based DGMs to generate. Moreover, DGM-driven generation process is typically triggered based on random noisy inputs, which outputs unpredictable samples and thus cannot perform an efficient industrial design exploration. We tackle these challenges by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE), which allows generating feasible design images of complex engineering parts. With SA-ALAE, users can not only explore novel variants of an existing design, but also control the generation process by operating in latent space. The potential of SA-ALAE is shown by generating engineering blueprints in a real automotive design task.
    摘要 <>转换文本到简化中文。<>生成工程设计方法驱动深度生成模型(DGM)已经提出,以便促进工程设计过程。在这些过程中,设计通常以图像的形式出现,例如蓝图、工程图纸和CAD模型,具体取决于级别。DGM已经成功地应用于自然图像的生成,例如显示动物、人脸和风景。然而,工程设计图像与自然场景不同,它们具有丰富的结构征式和长距离依赖关系,这些关系是 convolution-based DGM 生成的挑战。此外,DGM 驱动的生成过程通常由随机噪声触发,输出不可预测的样本,因此无法进行有效的工程设计探索。我们解决这些挑战 by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE),允许生成复杂工程部件的可靠设计图像。通过 SA-ALAE,用户不仅可以探索现有设计的新变体,还可以在幽默空间控制生成过程。SA-ALAE 的潜力在一个实际的汽车设计任务中展示。

Make-A-Volume: Leveraging Latent Diffusion Models for Cross-Modality 3D Brain MRI Synthesis

  • paper_url: http://arxiv.org/abs/2307.10094
  • repo_url: None
  • paper_authors: Lingting Zhu, Zeyue Xue, Zhenchao Jin, Xian Liu, Jingzhen He, Ziwei Liu, Lequan Yu
  • for: 这个研究是为了解决现有的医疗影像合成问题,特别是针对跨modal的医疗影像合成。
  • methods: 这个研究使用了散射基础的架构,名为Make-A-Volume,并使用了2D backbone和缓和层来实现跨modal的3D医疗影像合成。
  • results: 这个研究的结果显示了Make-A-Volume框架可以实现高效的跨modal医疗影像合成,并且可以避免模式落差和训练不稳定的问题。
    Abstract Cross-modality medical image synthesis is a critical topic and has the potential to facilitate numerous applications in the medical imaging field. Despite recent successes in deep-learning-based generative models, most current medical image synthesis methods rely on generative adversarial networks and suffer from notorious mode collapse and unstable training. Moreover, the 2D backbone-driven approaches would easily result in volumetric inconsistency, while 3D backbones are challenging and impractical due to the tremendous memory cost and training difficulty. In this paper, we introduce a new paradigm for volumetric medical data synthesis by leveraging 2D backbones and present a diffusion-based framework, Make-A-Volume, for cross-modality 3D medical image synthesis. To learn the cross-modality slice-wise mapping, we employ a latent diffusion model and learn a low-dimensional latent space, resulting in high computational efficiency. To enable the 3D image synthesis and mitigate volumetric inconsistency, we further insert a series of volumetric layers in the 2D slice-mapping model and fine-tune them with paired 3D data. This paradigm extends the 2D image diffusion model to a volumetric version with a slightly increasing number of parameters and computation, offering a principled solution for generic cross-modality 3D medical image synthesis. We showcase the effectiveness of our Make-A-Volume framework on an in-house SWI-MRA brain MRI dataset and a public T1-T2 brain MRI dataset. Experimental results demonstrate that our framework achieves superior synthesis results with volumetric consistency.
    摘要 《医学影像合成》是一个关键的领域,它具有推动多种医学影像应用的潜力。虽然最近的深度学习生成模型已经取得了一定的成功,但大多数当前的医学影像合成方法仍然 rely on 生成对抗网络,并且受到模式坍缩和训练不稳定的问题困扰。此外,2D 脊梁驱动的方法会导致 volumes 的不一致,而 3D 脊梁却是由于巨大的存储成本和训练困难而困难实现。在这篇论文中,我们介绍了一种新的概念,即通过 2D 脊梁来实现 cross-modality 3D 医学影像合成。为了学习 cross-modality slice-wise 映射,我们采用了扩散模型,并学习了一个低维的隐藏空间,从而实现了高效的计算。为了启用 3D 图像合成并减少 volumes 的不一致,我们进一步插入了一系列的volumetric层到 2D slice-mapping 模型中,并通过对匹配的 3D 数据进行精心调整。这种方法将 2D 图像扩散模型扩展到volumetric版本,只需要微小的参数和计算量增加,可以实现一种原理性的解决方案 для通用的 cross-modality 3D 医学影像合成。我们在一个自家的 SWI-MRA 脑MRI 数据集和一个公共 T1-T2 脑MRI 数据集上展示了我们的 Make-A-Volume 框架的效果,并证明了我们的框架可以实现superior的合成结果,同时保持volumetric consistency。

cs.SD - 2023-07-19

Alzheimer’s Disease Detection from Spontaneous Speech and Text: A review

  • paper_url: http://arxiv.org/abs/2307.10005
  • repo_url: None
  • paper_authors: Vrindha M. K., Geethu V., Anurenjan P. R., Deepak S., Sreeni K. G.
  • for: 本文旨在检查使用语音分析来诊断阿尔茨heimer病。
  • methods: 本文使用了不同的算法来分类阿尔茨heimer病,包括语音特征工程和自然语言处理。
  • results: 根据本文的结论,可以通过考虑语音和语言特征来建立更准确的阿尔茨heimer病分类模型。此外,语音信号可能是诊断 деменция的有用工具,并可能成为诊断阿尔茨heimer病的可靠生物标志。
    Abstract In the past decade, there has been a surge in research examining the use of voice and speech analysis as a means of detecting neurodegenerative diseases such as Alzheimer's. Many studies have shown that certain acoustic features can be used to differentiate between normal aging and Alzheimer's disease, and speech analysis has been found to be a cost-effective method of detecting Alzheimer's dementia. The aim of this review is to analyze the various algorithms used in speech-based detection and classification of Alzheimer's disease. A literature survey was conducted using databases such as Web of Science, Google Scholar, and Science Direct, and articles published from January 2020 to the present were included based on keywords such as ``Alzheimer's detection'', "speech," and "natural language processing." The ADReSS, Pitt corpus, and CCC datasets are commonly used for the analysis of dementia from speech, and this review focuses on the various acoustic and linguistic feature engineering-based classification models drawn from 15 studies. Based on the findings of this study, it appears that a more accurate model for classifying Alzheimer's disease can be developed by considering both linguistic and acoustic data. The review suggests that speech signals can be a useful tool for detecting dementia and may serve as a reliable biomarker for efficiently identifying Alzheimer's disease.
    摘要 过去一代,有很多研究探讨使用声音和语音分析来诊断神经退化疾病如阿尔茨海默病。许多研究表明,certain acoustic features可以用于 отличать正常老化和阿尔茨海默病,并且语音分析被认为是一种cost-effective的诊断阿尔茨海默病方法。本文的目的是对speech-based detection和分类阿尔茨海默病Algorithms进行分析。通过Web of Science、Google Scholar和Science Direct等数据库,对于2020年1月至当前期间发表的文章进行了文献综述,根据键入关键字“阿尔茨海默病检测”、“语音”和“自然语言处理”进行过滤。ADReSS、Pitt corpus和CCC数据集是用于分析诊断 деменции的常用数据集,本文将Focus on various acoustic and linguistic feature engineering-based classification models from 15 studies。根据本研究的结果,可以通过考虑语音和语言数据来建立更准确的阿尔茨海默病分类模型。该综述表明,speech signals可以用于检测诊断 деменcia,并可能成为阿尔茨海默病的可靠生物标志。

DisCover: Disentangled Music Representation Learning for Cover Song Identification

  • paper_url: http://arxiv.org/abs/2307.09775
  • repo_url: None
  • paper_authors: Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, Ruiqi Li, Lichao Zhang, Fei Wu
  • for: 这个研究的目标是解决音乐信息检索(MIR)领域中的重要问题——歌曲重建(CSI),即从大量歌曲集中寻找查询歌曲的封面版本。
  • methods: 该研究使用了 causal graph 技术来分析 CSI 任务,并通过知识导向分解模块(KDM)和敌对分解模块(GADM)来阻断内版本和间版本的影响,从而实现不同版本之间的不可分化。
  • results: 对比best-performing方法,该研究的 DisCover 框架在 CSI 任务中表现出优于其他方法,并且进行了深入分析,证明了不可分化的重要性。
    Abstract In the field of music information retrieval (MIR), cover song identification (CSI) is a challenging task that aims to identify cover versions of a query song from a massive collection. Existing works still suffer from high intra-song variances and inter-song correlations, due to the entangled nature of version-specific and version-invariant factors in their modeling. In this work, we set the goal of disentangling version-specific and version-invariant factors, which could make it easier for the model to learn invariant music representations for unseen query songs. We analyze the CSI task in a disentanglement view with the causal graph technique, and identify the intra-version and inter-version effects biasing the invariant learning. To block these effects, we propose the disentangled music representation learning framework (DisCover) for CSI. DisCover consists of two critical components: (1) Knowledge-guided Disentanglement Module (KDM) and (2) Gradient-based Adversarial Disentanglement Module (GADM), which block intra-version and inter-version biased effects, respectively. KDM minimizes the mutual information between the learned representations and version-variant factors that are identified with prior domain knowledge. GADM identifies version-variant factors by simulating the representation transitions between intra-song versions, and exploits adversarial distillation for effect blocking. Extensive comparisons with best-performing methods and in-depth analysis demonstrate the effectiveness of DisCover and the and necessity of disentanglement for CSI.
    摘要 在音乐信息检索(MIR)领域,绘制歌曲标识(CSI)是一项具有挑战性的任务,旨在从庞大的歌曲库中标识查询歌曲的重新录制版本。现有的方法仍然受到高度的内版本变化和 между版本相关性的影响,这是因为模型的版本特定和版本不具有效的因素相互纠缠不清。在这种情况下,我们设定了分离版本特定和版本不具有效的因素的目标,这可以使模型更容易学习查询歌曲未知版本的不变音乐表示。我们使用 causal graph 技术进行分析 CSI 任务,并确定了内版本和 между版本的影响,以阻塞这些影响。为此,我们提出了一个名为 DisCover 的杜陵音乐表示学习框架,它包括两个关键组件:(1)帮助 guid 分离模块(KDM)和(2)整形强化对抗分离模块(GADM)。KDM 尝试将学习的表示与版本特定因素进行分离,而 GADM 则通过模拟版本之间的表示转移,并通过对抗填充来阻塞版本相关的影响。我们对最佳实现和深入分析进行了广泛的比较,并证明了 DisCover 的有效性和分离的必要性。

Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer

  • paper_url: http://arxiv.org/abs/2307.09723
  • repo_url: https://github.com/hlmu/frito
  • paper_authors: Honglin Mu, Wentian Xia, Wanxiang Che
  • for: 提高Transformer模型对不同数据的泛化能力
  • methods: 限制每个语言序列位置的自注意力响应范围在频率维度上
  • results: 实现Transformer模型在TAU 2020和Nsynth数据集上的SOTA泛化性能,并且降低了20%的执行时间
    Abstract Sound classification models' performance suffers from generalizing on out-of-distribution (OOD) data. Numerous methods have been proposed to help the model generalize. However, most either introduce inference overheads or focus on long-lasting CNN-variants, while Transformers has been proven to outperform CNNs on numerous natural language processing and computer vision tasks. We propose FRITO, an effective regularization technique on Transformer's self-attention, to improve the model's generalization ability by limiting each sequence position's attention receptive field along the frequency dimension on the spectrogram. Experiments show that our method helps Transformer models achieve SOTA generalization performance on TAU 2020 and Nsynth datasets while saving 20% inference time.
    摘要 音频分类模型的表现受到外部数据(OOD)的影响,许多方法已经被提议来帮助模型泛化。然而,大多数方法都会增加推理负担或者专注于长期存在的CNN变体,而 transformer 则被证明可以在自然语言处理和计算机视觉任务上超越 CNN。我们提出了 FRITO,一种有效的 regularization 技术,用于限制 transformer 自注意力的每个序列位置的频率维度在 spectrogram 上。实验显示,我们的方法可以帮助 transformer 模型在 TAU 2020 和 Nsynth 数据集上实现 SOTA 的泛化性能,同时节省 20% 的推理时间。

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs

  • paper_url: http://arxiv.org/abs/2307.09435
  • repo_url: https://github.com/yl4579/SLMGAN
  • paper_authors: Yinghao Aaron Li, Cong Han, Nima Mesgarani
  • for: 这篇论文旨在提出一种基于大规模预训练语音语言模型(SLM)的抽象语音模型,用于语音变换任务。
  • methods: 该论文使用了 generator adversarial network(GAN)框架,并将SLM表示加以利用,以实现静止语音变换。
  • results: 对比其他当前最佳无监督语音变换模型,SLMGAN在自然性和相似性两个指标上表现出色,表明SLM基于的抽象语音模型可以为相关应用提供潜在的优势。
    Abstract In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement. These applications typically involve mapping text or speech inputs to pre-trained SLM representations, from which target speech is decoded. This paper introduces a new approach, SLMGAN, to leverage SLM representations for discriminative tasks within the generative adversarial network (GAN) framework, specifically for voice conversion. Building upon StarGANv2-VC, we add our novel SLM-based WavLM discriminators on top of the mel-based discriminators along with our newly designed SLM feature matching loss function, resulting in an unsupervised zero-shot voice conversion system that does not require text labels during training. Subjective evaluation results show that SLMGAN outperforms existing state-of-the-art zero-shot voice conversion models in terms of naturalness and achieves comparable similarity, highlighting the potential of SLM-based discriminators for related applications.
    摘要 Translation Notes:* "pre-trained speech language models" (SLMs) ⇒ "预训练的语音语言模型" (SLM)* "text-to-speech synthesis" ⇒ "文本到语音合成"* "voice conversion" ⇒ "语音转换"* "mel-based discriminators" ⇒ "基于mel的探测器"* "SLM feature matching loss function" ⇒ "SLM特征匹配损失函数"* "zero-shot voice conversion" ⇒ "零shot语音转换"* "naturalness" ⇒ "自然性"* "similarity" ⇒ "相似性"