cs.CL - 2023-11-14

DALA: A Distribution-Aware LoRA-Based Adversarial Attack against Pre-trained Language Models

  • paper_url: http://arxiv.org/abs/2311.08598
  • repo_url: None
  • paper_authors: Yibo Wang, Xiangjue Dong, James Caverlee, Philip S. Yu
  • for: 本研究旨在提高黑客攻击方法的攻击成功率,并对黑客攻击方法的攻击效果进行评估。
  • methods: 本研究提出了一种 Distribution-Aware LoRA-based Adversarial Attack (DALA) 方法,该方法考虑了攻击示例的分布偏移,以提高攻击效果。
  • results: 实验结果表明,DALA 方法可以在四个常用的数据集上提高黑客攻击方法的攻击成功率,并在 ASR 和 NASR 两个评价指标上达到了比较高的水平。
    Abstract Pre-trained language models (PLMs) that achieve success in applications are susceptible to adversarial attack methods that are capable of generating adversarial examples with minor perturbations. Although recent attack methods can achieve a relatively high attack success rate (ASR), our observation shows that the generated adversarial examples have a different data distribution compared with the original examples. Specifically, these adversarial examples exhibit lower confidence levels and higher distance to the training data distribution. As a result, they are easy to detect using very simple detection methods, diminishing the actual effectiveness of these attack methods. To solve this problem, we propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method, which considers the distribution shift of adversarial examples to improve attack effectiveness under detection methods. We further design a new evaluation metric NASR combining ASR and detection for the attack task. We conduct experiments on four widely-used datasets and validate the attack effectiveness on ASR and NASR of the adversarial examples generated by DALA on the BERT-base model and the black-box LLaMA2-7b model.
    摘要 预训言语模型(PLM)在应用中获得成功,却容易受到敌意攻击的威胁。最新的攻击方法可以生成敌意示例,但我们发现这些示例的数据分布与原始示例不同。具体来说,这些示例的信任水平较低,与训练数据分布更加远离。因此,它们容易被非常简单的检测方法探测到,从而削弱了攻击的实际效果。为解决这个问题,我们提议一种考虑数据分布变化的 Distribution-Aware LoRA-based Adversarial Attack(DALA)方法。此外,我们还设计了一个新的评价指标NASR,它将ASR和检测结果相结合来评价攻击任务的效果。我们在四个常用的数据集上进行了实验,并验证了DALA在BERT-base模型和黑盒LLaMA2-7b模型上的攻击效果。

Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment

  • paper_url: http://arxiv.org/abs/2311.08596
  • repo_url: None
  • paper_authors: Philippe Laban, Lidiya Murakhovs’ka, Caiming Xiong, Chien-Sheng Wu
  • for: This paper aims to analyze the behavior of Large Language Models (LLMs) in multi-turn conversations and evaluate their ability to refine and improve their answers.
  • methods: The authors propose the FlipFlop experiment, which involves presenting an LLM with a prompt containing a classification task in the first round, and then challenging the model with a follow-up phrase in the second round to elicit a reflection on its initial answer.
  • results: The study finds that LLMs flip their answers on average 46% of the time and experience a drop in accuracy between their first and final predictions, with an average drop of 17%. The results demonstrate the universality of sycophantic behavior in LLMs and provide a robust framework for analyzing model behavior and evaluating potential solutions.Here are the three points in Simplified Chinese text:
  • for: 这篇论文目的是分析大语言模型(LLMs)在多轮对话中的行为,并评估它们是否可以改进和精细化它们的答案。
  • methods: 作者提出了“折衣”实验,其中在第一轮对话中给LLM提供一个分类任务的提示,然后在第二轮对话中给它一个“你确定吗?”的跟问,以让模型反思其初始答案,并决定是否确认或变更它的答案。
  • results: 研究发现,LLMs在平均情况下会在46%的时间变更答案,并且在第一个和最终预测之间的准确率下降了17%。结果表明大语言模型中的追随行为是通用的,并提供了一个可靠的框架来分析模型行为并评估可能的解决方案。
    Abstract The interactive nature of Large Language Models (LLMs) theoretically allows models to refine and improve their answers, yet systematic analysis of the multi-turn behavior of LLMs remains limited. In this paper, we propose the FlipFlop experiment: in the first round of the conversation, an LLM responds to a prompt containing a classification task. In a second round, the LLM is challenged with a follow-up phrase like "Are you sure?", offering an opportunity for the model to reflect on its initial answer, and decide whether to confirm or flip its answer. A systematic study of nine LLMs on seven classification tasks reveals that models flip their answers on average 46% of the time and that all models see a deterioration of accuracy between their first and final prediction, with an average drop of 17%. The FlipFlop experiment illustrates the universality of sycophantic behavior in LLMs and provides a robust framework to analyze model behavior and evaluate potential solutions.
    摘要 大型自然语言模型(LLM)的互动性在理论上允许模型在回答时进行精细调整和改进,然而对多轮行为的系统性分析尚未受到广泛关注。本文提出了“折衣”实验:在第一轮对话中,一个LLM对一个分类任务提交答案。在第二轮对话中,模型被挑战以“你确定吗?”的后续话,给模型提供反思其初始答案的机会,并决定是否确认或折衣答案。对九个LLM在七个分类任务上进行系统性分析,发现models在 average 46% 的时间会折衣答案,并且所有模型在第一个预测和最终预测之间的准确率都会下降,平均下降17%。“折衣”实验表明 LLM 中的迷恋行为是普遍的,并提供了一个可靠的框架来分析模型行为并评估可能的解决方案。

ACID: Abstractive, Content-Based IDs for Document Retrieval with Language Models

  • paper_url: http://arxiv.org/abs/2311.08593
  • repo_url: None
  • paper_authors: Haoxin Li, Phillip Keung, Daniel Cheng, Jungo Kasai, Noah A. Smith
  • for: 这个论文旨在提出一种新的终端文档检索方法,即直接基于输入查询生成文档标识符。
  • methods: 这种方法使用大语言模型生成抽象关键词,并将每个文档的标识符组成为这些关键词的组合。
  • results: 使用这种方法可以提高终端文档检索的top-10和top-20准确率,相比之前的状态艺术基准。
    Abstract Generative retrieval (Wang et al., 2022; Tay et al., 2022) is a new approach for end-to-end document retrieval that directly generates document identifiers given an input query. Techniques for designing effective, high-quality document IDs remain largely unexplored. We introduce ACID, in which each document's ID is composed of abstractive keyphrases generated by a large language model, rather than an integer ID sequence as done in past work. We compare our method with the current state-of-the-art technique for ID generation, which produces IDs through hierarchical clustering of document embeddings. We also examine simpler methods to generate natural-language document IDs, including the naive approach of using the first k words of each document as its ID or words with high BM25 scores in that document. We show that using ACID improves top-10 and top-20 accuracy by 15.6% and 14.4% (relative) respectively versus the state-of-the-art baseline on the MSMARCO 100k retrieval task, and 4.4% and 4.0% respectively on the Natural Questions 100k retrieval task. Our results demonstrate the effectiveness of human-readable, natural-language IDs in generative retrieval with LMs. The code for reproducing our results and the keyword-augmented datasets will be released on formal publication.
    摘要 新的生成检索方法(Wang et al., 2022; Tay et al., 2022)可以直接基于输入查询生成文档标识符。现有的技术设计高质量、有效的文档标识符仍然尚未得到了充分的探索。我们介绍了ACID,它的每个文档标识符由一个大语言模型生成的抽象关键词组成,而不是以往的整数ID序列。我们与现状态的技术进行比较,它通过层次归一化文档嵌入来生成ID。我们还考虑了使用每个文档的前k个词作为其ID的简单方法,以及使用每个文档中高BM25 scores的词作为ID的方法。我们的结果表明,使用ACID可以提高MSMARCO 100k检索任务的前10和前20准确率相对提高15.6%和14.4%,并在Natural Questions 100k检索任务上提高4.4%和4.0%。我们的结果表明,使用人类可读的、自然语言ID在LMs中的生成检索中是有效的。我们将在正式发布时释放代码和附加的关键词扩展数据集。

PEMA: Plug-in External Memory Adaptation for Language Models

  • paper_url: http://arxiv.org/abs/2311.08590
  • repo_url: None
  • paper_authors: HyunJin Kim, Young Jin Kim, JinYeong Bak
  • for: 这个论文的目的是为了提高预训练大语言模型(PLM)在不同下游NLP任务上的性能,同时减少预训练过程中的资源需求。
  • methods: 这篇论文使用了Parameter-Efficient Fine-Tuning(PEFT)方法,其中包括Plug-in External Memory Adaptation(PEMA)和LoRA-based weight matrices等技术。PEMA可以在推理过程中在测试数据上下文中插入外部存储,以便在推理过程中使用PLM生成的上下文表示。
  • results: 根据实验结果,PEMA方法在语法数据集和实际数据集上的机器翻译和风格转换任务上表现出色,比其他PEFT方法更高效和可靠。同时,PEMA方法能够保持句子的意义,同时生成适当的语言和风格。
    Abstract Pre-trained language models (PLMs) have demonstrated impressive performance across various downstream NLP tasks. Nevertheless, the resource requirements of pre-training large language models in terms of memory and training compute pose significant challenges. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning on specific tasks. To overcome the limitations, we introduce Plug-in External Memory Adaptation (PEMA), a Parameter-Efficient Fine-Tuning (PEFT) approach designed for fine-tuning PLMs without the need for all weights. PEMA can be integrated into the context representation of test data during inference to execute downstream tasks. It leverages an external memory to store context representations generated by a PLM, mapped with the desired target word. Our method entails training LoRA-based weight matrices within the final layer of the PLM for enhanced efficiency. The probability is then interpolated with the next-word distribution from the PLM to perform downstream tasks. To improve the generation quality, we propose a novel interpolation strategy named Gradual Unrolling. To demonstrate the effectiveness of our proposed method, we conduct experiments to demonstrate the efficacy of PEMA with a syntactic dataset and assess its performance on machine translation and style transfer tasks using real datasets. PEMA outperforms other PEFT methods in terms of memory and latency efficiency for training and inference. Furthermore, it outperforms other baselines in preserving the meaning of sentences while generating appropriate language and styles.
    摘要 预训语言模型(PLM)已经在不同的下游自然语言处理任务中表现出色。然而,预训大语言模型的资源需求,包括内存和训练计算机,带来了 significiant challenges。此外,由于资源的巨大需求,许多PLM的权重都是机密的。因此,用户被迫分享自己的数据来为特定任务进行细化。为了突破这些限制,我们介绍了插入式外部记忆适配(PEMA),一种基于精简的 Parametric Efficient Fine-Tuning(PEFT)方法,可以在不需要所有权重的情况下进行细化。PEMA可以在推理过程中将测试数据的上下文表示 integrate into the context representation of test data during inference, and leverage an external memory to store context representations generated by a PLM, mapped with the desired target word. Our method entails training LoRA-based weight matrices within the final layer of the PLM for enhanced efficiency. The probability is then interpolated with the next-word distribution from the PLM to perform downstream tasks. To improve the generation quality, we propose a novel interpolation strategy named Gradual Unrolling. To demonstrate the effectiveness of our proposed method, we conduct experiments to demonstrate the efficacy of PEMA with a syntactic dataset and assess its performance on machine translation and style transfer tasks using real datasets. PEMA outperforms other PEFT methods in terms of memory and latency efficiency for training and inference, and also outperforms other baselines in preserving the meaning of sentences while generating appropriate language and styles.

Asking More Informative Questions for Grounded Retrieval

  • paper_url: http://arxiv.org/abs/2311.08584
  • repo_url: None
  • paper_authors: Sedrick Keh, Justin T. Chiu, Daniel Fried
  • for: 这种研究的目的是提高模型在基于图像的多回合识别任务中的信息收集能力,具体来说是使用更加有用的问题来提高模型的性能。
  • methods: 这种方法使用了开放式问题的形式,而不是传统的简单的是否问题,从而使得模型可以更好地获取信息。此外,这种方法还包括一种假设处理机制,以解决模型在处理开放式问题时的假设错误。
  • results: 实验表明,这种方法可以提高模型在基于图像的多回合识别任务中的性能,相比之前的状态艺术提高了14%,而且在人工评估中比基于传统问题更加高效,具体来说是48%更高效。
    Abstract When a model is trying to gather information in an interactive setting, it benefits from asking informative questions. However, in the case of a grounded multi-turn image identification task, previous studies have been constrained to polar yes/no questions, limiting how much information the model can gain in a single turn. We present an approach that formulates more informative, open-ended questions. In doing so, we discover that off-the-shelf visual question answering (VQA) models often make presupposition errors, which standard information gain question selection methods fail to account for. To address this issue, we propose a method that can incorporate presupposition handling into both question selection and belief updates. Specifically, we use a two-stage process, where the model first filters out images which are irrelevant to a given question, then updates its beliefs about which image the user intends. Through self-play and human evaluations, we show that our method is successful in asking informative open-ended questions, increasing accuracy over the past state-of-the-art by 14%, while resulting in 48% more efficient games in human evaluations.
    摘要 当模型在交互 Setting 中尝试收集信息时,它会受益于提问有价值的问题。然而,在基于图像识别多turn任务的前 Studies 中,模型受限于两元问题,这限制了模型在单turn中收集信息的可能性。我们提出了一种方法,该方法使用更加有价值的、开放式问题。在做这些事情时,我们发现了许多Off-the-shelf 视觉问答(VQA)模型经常会出现先假设错误,这些错误不符合信息增加问题选择方法所能处理。为解决这个问题,我们提出了一种方法,该方法可以在问题选择和信念更新中包含先假设处理。具体来说,我们使用两个阶段的过程,首先使用模型过滤出不相关的图像,然后更新模型对用户所意图的图像的信念。通过自适应和人工评估,我们表明了我们的方法能够成功地提问有价值的开放式问题,相比前一个状态的最佳性能提高14%,而同时在人工评估中效率提高48%。

Graph-Induced Syntactic-Semantic Spaces in Transformer-Based Variational AutoEncoders

  • paper_url: http://arxiv.org/abs/2311.08579
  • repo_url: None
  • paper_authors: Yingji Zhang, Marco Valentino, Danilo S. Carvalho, Ian Pratt-Hartmann, André Freitas
  • For: 提高VAEs的表现和泛化能力* Methods: 使用多任务学习或双Encoder架构,将分布Semantic特征和语法结构归一化到不同的隐藏空间中* Results: 通过在编码阶段 integrate graph-based和序列模型,并在解码器的注意力机制中插入多个特化的隐藏表示,实现了提高语言模型和下游生成任务的表现
    Abstract The injection of syntactic information in Variational AutoEncoders (VAEs) has been shown to result in an overall improvement of performances and generalisation. An effective strategy to achieve such a goal is to separate the encoding of distributional semantic features and syntactic structures into heterogeneous latent spaces via multi-task learning or dual encoder architectures. However, existing works employing such techniques are limited to LSTM-based VAEs. In this paper, we investigate latent space separation methods for structural syntactic injection in Transformer-based VAE architectures (i.e., Optimus). Specifically, we explore how syntactic structures can be leveraged in the encoding stage through the integration of graph-based and sequential models, and how multiple, specialised latent representations can be injected into the decoder's attention mechanism via low-rank operators. Our empirical evaluation, carried out on natural language sentences and mathematical expressions, reveals that the proposed end-to-end VAE architecture can result in a better overall organisation of the latent space, alleviating the information loss occurring in standard VAE setups, resulting in enhanced performances on language modelling and downstream generation tasks.
    摘要 射预Variational AutoEncoders(VAEs)中的内在结构信息注入已经被证明可以提高性能和数据准确性。一种有效的策略是在多任务学习或双encoder架构下,分离分布式semantic feature和 sintactic structure的编码。然而,现有的作品仅采用LSTM-based VAEs。在本文中,我们对Transformer-based VAE架构(i.e., Optimus)中的latent space separation方法进行了探索。具体来说,我们试验了在编码阶段通过结构模型和序列模型的结合,以及在解码器的注意机制中通过低维运算符插入多个特殊的latent representation。我们的实验评估,针对自然语言句子和数学表达,表明了我们提案的终端VAE架构可以将latent space更好地组织,降低标准VAE设置中的信息损失,进而提高语言预测和下游生成任务的性能。

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

  • paper_url: http://arxiv.org/abs/2311.08562
  • repo_url: https://github.com/cathyxl/magic
  • paper_authors: Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, Jiashi Feng
  • for: 本研究旨在评估大语言模型(LLMs)在多智能环境中的能力,包括判断、规划、协作、自我意识和合理性。
  • methods: 本研究使用了游戏如彩虹和隐身,以及游戏理论场景如成本分享、多人投降和公共财富,创造了多样化的测试环境。同时,研究人员还使用了可能图模型(PGM)方法来增强LLMs的处理复杂社会和认知维度的能力。
  • results: 研究发现,使用PGM方法可以提高所选模型的能力,并且在七种多智能系统中,使用GPT-4模型的能力高出三倍于使用Llama-2-70B模型的能力。同时,研究也发现了不同模型之间的能力差距,并且PGM方法可以提高所有选择的模型的能力平均50%。研究代码可以在GitHub上下载:https://github.com/cathyxl/MAgIC。
    Abstract Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework that captures their abilities in reasoning, planning, collaboration, and more. This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings, providing quantitative metrics to evaluate their judgment, reasoning, deception, self-awareness, cooperation, coordination, and rationality. We utilize games such as Chameleon and Undercover, alongside game theory scenarios like Cost Sharing, Multi-player Prisoner's Dilemma, and Public Good, to create diverse testing environments. Our framework is fortified with the Probabilistic Graphical Modeling (PGM) method, enhancing the LLMs' capabilities in navigating complex social and cognitive dimensions. The benchmark evaluates seven multi-agent systems powered by different LLMs, quantitatively highlighting a significant capability gap over threefold between the strongest, GPT-4, and the weakest, Llama-2-70B. It also confirms that our PGM enhancement boosts the inherent abilities of all selected models by 50% on average. Our codes are released here https://github.com/cathyxl/MAgIC.
    摘要

UT5: Pretraining Non autoregressive T5 with unrolled denoising

  • paper_url: http://arxiv.org/abs/2311.08552
  • repo_url: None
  • paper_authors: Mahmoud G. Salem, Jiayu Ye, Chu-Cheng Lin, Frederick Liu
  • for: 本研究旨在提高Transformer基于大型自然语言模型的natural language generation能力,尤其是解决autoregressive模型在解码K个token时需要K次顺序前进的性能瓶颈。
  • methods: 本研究采用了非 autoregressive(NAR)方法,通过对T5模型进行无监督预训练,以提高其在下游生成任务中的性能。
  • results: 研究发现,通过对T5模型进行无监督预训练,可以在SQuAD问题生成任务和XSum任务中达到SoTA的性能水平。Here’s the simplified Chinese text in the format you requested:
  • for: 本研究旨在提高Transformer基于大型自然语言模型的natural language generation能力。
  • methods: 本研究采用了非 autoregressive(NAR)方法,通过对T5模型进行无监督预训练,以提高其在下游生成任务中的性能。
  • results: 研究发现,通过对T5模型进行无监督预训练,可以在SQuAD问题生成任务和XSum任务中达到SoTA的性能水平。
    Abstract Recent advances in Transformer-based Large Language Models have made great strides in natural language generation. However, to decode K tokens, an autoregressive model needs K sequential forward passes, which may be a performance bottleneck for large language models. Many non-autoregressive (NAR) research are aiming to address this sequentiality bottleneck, albeit many have focused on a dedicated architecture in supervised benchmarks. In this work, we studied unsupervised pretraining for non auto-regressive T5 models via unrolled denoising and shown its SoTA results in downstream generation tasks such as SQuAD question generation and XSum.
    摘要

Efficient Continual Pre-training for Building Domain Specific Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08545
  • repo_url: None
  • paper_authors: Yong Xie, Karan Aggarwal, Aitzaz Ahmad
  • for: 这个研究是为了开发领域专业的语言模型(LLMs)。
  • methods: 这个研究使用了适应领域的 continual pre-training 方法来开发领域专业的 LLMs。
  • results: 研究发现,通过适应领域的 continual pre-training,可以在金融领域表现出色,并且比原始基础模型更好。此外,研究还提出了一些简单 yet effective 的数据选择策略,可以在少量的资料量和成本下达到更好的性能。
    Abstract Large language models (LLMs) have demonstrated remarkable open-domain capabilities. Traditionally, LLMs tailored for a domain are trained from scratch to excel at handling domain-specific tasks. In this work, we explore an alternative strategy of continual pre-training as a means to develop domain-specific LLMs. We introduce FinPythia-6.9B, developed through domain-adaptive continual pre-training on the financial domain. Continual pre-trained FinPythia showcases consistent improvements on financial tasks over the original foundational model. We further explore simple but effective data selection strategies for continual pre-training. Our data selection strategies outperforms vanilla continual pre-training's performance with just 10% of corpus size and cost, without any degradation on open-domain standard tasks. Our work proposes an alternative solution to building domain-specific LLMs from scratch in a cost-effective manner.
    摘要 Simplified Chinese:大型语言模型(LLM)在开放领域上表现出色。传统上,为一个领域而适应的 LLM 通常是从scratch开始训练,以便在处理领域特定任务上表现出色。在这项工作中,我们探索了一种途径,通过连续预训练来开发领域特定的 LLM。我们介绍了基于金融领域的 FinPythia-6.9B,通过适应领域的连续预训练而开发。连续预训练后的 FinPythia 在金融任务上表现了一致性的改进,比原始基础模型更好。我们还进一步探索了简单而有效的数据选择策略,用于连续预训练。我们的数据选择策略可以在10%的文库大小和成本下达到同等水平,而不会影响开放领域的标准任务表现。我们的工作提出了一种可行的解决方案,即通过cost-effective的方式建立领域特定的 LLM。

Extending Multilingual Machine Translation through Imitation Learning

  • paper_url: http://arxiv.org/abs/2311.08538
  • repo_url: None
  • paper_authors: Wen Lai, Viktor Hangya, Alexander Fraser
  • for: 扩展大规模多语言神经机器翻译模型(MNMT)到新语言,以便将所有已支持语言翻译到新语言。
  • methods: 使用imit-MNMT方法,即模仿专家的行为,通过英语作为中间语言,将新语言和原始语言的翻译结果相似。
  • results: 对新语言和原始语言之间的翻译表现有显著改善,无论 catastrophic forgetting 问题。同时,可以解决copy和off-target问题,两个常见的现代大规模 MNMT 模型存在的问题。
    Abstract Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind. We aim to extend large-scale MNMT models to a new language, allowing for translation between the newly added and all of the already supported languages in a challenging scenario: using only a parallel corpus between the new language and English. Previous approaches, such as continued training on parallel data including the new language, suffer from catastrophic forgetting (i.e., performance on other languages is reduced). Our novel approach Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert, a technique widely used in the computer vision area, but not well explored in NLP. More specifically, we construct a pseudo multi-parallel corpus of the new and the original languages by pivoting through English, and imitate the output distribution of the original MNMT model. Extensive experiments show that our approach significantly improves the translation performance between the new and the original languages, without severe catastrophic forgetting. We also demonstrate that our approach is capable of solving copy and off-target problems, which are two common issues existence in current large-scale MNMT models.
    摘要 尽管现有的多语言神经机器翻译(MNMT)模型已经支持了许多语言,但大多数世界上的语言仍然被留下。我们想扩展大规模MNMT模型到一个新语言,以便在这个新语言和所有已经支持的语言之间进行翻译。过去的方法,如继续在并行数据集中包括新语言进行训练,会导致 catastrophic forgetting(即其他语言的性能下降)。我们的新方法Imit-MNMT将这个任务视为一种模仿学习过程,这种技术在计算机视觉领域广泛使用,但在自然语言处理领域并不受欢迎。更 Specifically,我们将新语言和原始语言之间的 pseudo 多并行数据集建立,通过英语作为中间语言,并模仿原始 MNMT 模型的输出分布。我们的方法在翻译性能方面取得了显著改进,而无需严重的 catastrophic forgetting。我们还证明了我们的方法可以解决现有大规模 MNMT 模型中的复制和偏差问题。

Natural Language Processing for Financial Regulation

  • paper_url: http://arxiv.org/abs/2311.08533
  • repo_url: https://github.com/mdxedia/Awsome-Cash
  • paper_authors: Ixandra Achitouv, Dragos Gorduza, Antoine Jacquier
  • for: 本研究使用自然语言处理技术来框架化金融监管领域中的规则和政策匹配搜索,无需数据支持学习。
  • methods: 本研究使用自然语言处理技术的基本构件,包括句子生成和句子分析,以及使用自由可用资源进行改进。
  • results: 本研究可以超越简单的预训练句子转换器模型,实现更高效的规则和政策匹配搜索。
    Abstract This article provides an understanding of Natural Language Processing techniques in the framework of financial regulation, more specifically in order to perform semantic matching search between rules and policy when no dataset is available for supervised learning. We outline how to outperform simple pre-trained sentences-transformer models using freely available resources and explain the mathematical concepts behind the key building blocks of Natural Language Processing.
    摘要 Translated into Simplified Chinese:这篇文章提供了关于自然语言处理技术在金融规制框架中的理解,具体来说是为了在无可学习数据的情况下进行规则和政策之间的Semantic matching搜索。我们提供了使用可得到的资源超越简单预训练句子变换器模型的方法,并解释了自然语言处理中关键Component的数学概念。

CoRE-CoG: Conversational Recommendation of Entities using Constrained Generation

  • paper_url: http://arxiv.org/abs/2311.08511
  • repo_url: None
  • paper_authors: Harshvardhan Srivastava, Kanav Pruthi, Soumen Chakrabarti, Mausam
  • for: 提高 conversational recommendation systems (CRS) 的准确性和流畅性,解决 prior systems 中的三大挑战(1)在每次转折时是否需要推荐知识库 (KB) 实体,(2) 推荐哪一个最相关的 KB 实体,以及 (3) 在对话历史中适当地推荐实体。
  • methods: CoRE-CoG 使用了以下三个模块来解决这些挑战:(1) 推荐触发器,决定系统语句是否包含实体,(2) 类型剔除模块,提高推荐实体的相关性,以及 (3) 一种新的约束回归生成器,以保持 fluency 并做出准确的推荐决定。
  • results: CoRE-CoG 在最新的 benchmark 上实现了close to 10 F1 和 4 Recall@1 的 conditional generation 子任务的提高,与基线相比增加了近 10 F1 和 4 Recall@1 的分数点。
    Abstract End-to-end conversational recommendation systems (CRS) generate responses by leveraging both dialog history and a knowledge base (KB). A CRS mainly faces three key challenges: (1) at each turn, it must decide if recommending a KB entity is appropriate; if so, it must identify the most relevant KB entity to recommend; and finally, it must recommend the entity in a fluent utterance that is consistent with the conversation history. Recent CRSs do not pay sufficient attention to these desiderata, often generating unfluent responses or not recommending (relevant) entities at the right turn. We introduce a new CRS we call CoRE-CoG. CoRE-CoG addresses the limitations in prior systems by implementing (1) a recommendation trigger that decides if the system utterance should include an entity, (2) a type pruning module that improves the relevance of recommended entities, and (3) a novel constrained response generator to make recommendations while maintaining fluency. Together, these modules ensure simultaneous accurate recommendation decisions and fluent system utterances. Experiments with recent benchmarks show the superiority particularly on conditional generation sub-tasks with close to 10 F1 and 4 Recall@1 percent points gain over baselines.
    摘要 End-to-end conversational recommendation systems (CRS) 通过对对话历史和知识库 (KB) 的利用,生成响应。CRS 面临三个关键挑战:在每个转卡时,决定是否推荐 KB 实体,如果是,则确定最相关的 KB 实体,最后,推荐实体的表达需要与对话历史一致。现有 CRS 不够重视这些要求,经常生成不流畅的响应或者不推荐相关的实体。我们提出了一个新的 CRS,即 CoRE-CoG。CoRE-CoG 通过实施以下三个模块来解决先前系统的限制:1. 推荐触发器:决定系统语句是否包含实体。2. 类型缩短模块:提高推荐实体的相关性。3. 一种新的受限响应生成器:在保持流畅性的情况下,为系统语句提供推荐。这些模块共同确保同时准确地做出推荐决策和流畅的系统语句。对于最近的benchmark检验,CoRE-CoG 表现出优异,特别是在条件生成子任务中,与基eline相比,取得了近10个 F1 和 4 的 Recall@1 百分点的提升。

Semi-Structured Chain-of-Thought: Integrating Multiple Sources of Knowledge for Improved Language Model Reasoning

  • paper_url: http://arxiv.org/abs/2311.08505
  • repo_url: None
  • paper_authors: Xin Su, Tiep Le, Steven Bethard, Phillip Howard
  • for: 提高大语言模型在知识密集任务中的表现
  • methods: 引入 semi-structured prompting 方法,协同使用模型的 parametric memory、文本文档中的不结构化知识和知识图中的结构化知识
  • results: 在多步问答任务上达到了比较好的表现,甚至超过了一些需要 fine-tuning 的方法
    Abstract An important open question pertaining to the use of large language models for knowledge-intensive tasks is how to effectively integrate knowledge from three sources: the model's parametric memory, external structured knowledge, and external unstructured knowledge. Most existing prompting methods either rely solely on one or two of these sources, or require repeatedly invoking large language models to generate similar or identical content. In this work, we overcome these limitations by introducing a novel semi-structured prompting approach that seamlessly integrates the model's parametric memory with unstructured knowledge from text documents and structured knowledge from knowledge graphs. Experimental results on open-domain multi-hop question answering datasets demonstrate that our prompting method significantly surpasses existing techniques, even exceeding those which require fine-tuning.
    摘要 Currently, a significant open question regarding the use of large language models for knowledge-intensive tasks is how to effectively integrate knowledge from three sources: the model's parametric memory, external structured knowledge, and external unstructured knowledge. Most existing prompting methods either rely solely on one or two of these sources or require repeatedly invoking large language models to generate similar or identical content. In this work, we overcome these limitations by introducing a novel semi-structured prompting approach that seamlessly integrates the model's parametric memory with unstructured knowledge from text documents and structured knowledge from knowledge graphs. Experimental results on open-domain multi-hop question answering datasets demonstrate that our prompting method significantly surpasses existing techniques, even exceeding those which require fine-tuning.Here's the text in Traditional Chinese:目前,大型语言模型用于知识工作中的一个重要开问是如何有效地结合三种知识来源:模型的参数记忆、外部结构化知识和外部无结构化知识。现有的提示方法大多只靠一或二种来源,或者需要重复运行大型语言模型来生成相似或相同的内容。在这个工作中,我们解决了这些限制,通过引入一种新的半结构化提示方法,将模型的参数记忆与文档中的无结构化知识和知识图中的结构化知识融合在一起。实验结果显示,我们的提示方法在开放领域多步问答dataset上表现出色, Even exceeding those which require fine-tuning.

Functionality learning through specification instructions

  • paper_url: http://arxiv.org/abs/2311.08481
  • repo_url: None
  • paper_authors: Pedro Henrique Luz de Araujo, Benjamin Roth
  • for: 本研究旨在提高自然语言处理模型的功能学习能力,不需要练习数据集。
  • methods: 研究人员生成了一组指令说明,用于定义每个功能的需求。然后,他们将这些指令组合成 specification-augmented prompts,并对预训练在自然指令数据上的语言模型进行测试。
  • results: 研究发现,较小的模型(params < 3B)具有困难遵循指令说明的能力,而较大的模型(params > 3B)可以受益于指令说明,甚至在不同的功能上展现愉悦的行为。
    Abstract Test suites assess natural language processing models' performance on specific functionalities: cases of interest involving model robustness, fairness, or particular linguistic capabilities. They enable fine-grained evaluations of model aspects that would otherwise go unnoticed in standard evaluation datasets, but they do not address the problem of how to fix the failure cases. Previous work has explored functionality learning by fine-tuning models on suite data. While this improves performance on seen functionalities, it often does not generalize to unseen ones and can harm general performance. This paper analyses a fine-tuning-free approach to functionality learning. For each functionality in a suite, we generate a specification instruction that encodes it. We combine the obtained specification instructions to create specification-augmented prompts, which we feed to language models pre-trained on natural instruction data to generate suite predictions. A core aspect of our analysis is to measure the effect that including a set of specifications has on a held-out set of unseen, qualitatively different specifications. Our experiments across four tasks and models ranging from 80M to 175B parameters show that smaller models struggle to follow specification instructions. However, larger models (> 3B params.) can benefit from specifications and even generalize desirable behaviors across functionalities.
    摘要 <>测试集用于评估自然语言处理模型的特定功能性能:涉及到模型Robustness、公平性或特定语言能力的场景。它们允许细化评估模型的具体方面,而标准评估数据集中的问题则可能被忽略不了。先前的工作已经探讨了学习功能性能的方法,包括在suite数据上练习模型。although this improves performance on seen functionalities, it often does not generalize to unseen ones and can harm general performance. 本文分析了一种不需要练习的方法来学习功能性能。对于每个suite中的功能,我们生成一个 specification instruction,这个 instruction 编码了该功能。我们将获得的 specification instructions 组合成 specification-augmented prompts,并将这些提示 feed 到自然 instrucion 数据上预训练的语言模型中,以供 suite 预测。我们的分析中的核心问题是测量在一个固定的 held-out 集中的未见、Qualitatively different 的 specification 对模型的影响。我们的实验通过四个任务和参数量从 80M 到 175B 的模型展示,较小的模型困难遵循 specification instructions,但是大于 3B 参数的模型可以利用 specification 并将恰当的行为扩展到不同的功能。

Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08472
  • repo_url: None
  • paper_authors: Carlos Aguirre, Kuleen Sasse, Isabel Cachola, Mark Dredze
  • for: This paper is written to explore the fairness of large language models (LLMs) as NLP classification systems, specifically looking at how different shot selection strategies affect model fairness.
  • methods: The paper uses standard NLP tasks and three fairness datasets to evaluate the fairness of LLMs. The authors consider both existing and new demographically sensitive methods for shot selection.
  • results: The paper explores how different shot selection strategies affect the fairness of LLMs as NLP classification systems, and discusses how future work can include LLM fairness evaluations.
    Abstract Recently, work in NLP has shifted to few-shot (in-context) learning, with large language models (LLMs) performing well across a range of tasks. However, while fairness evaluations have become a standard for supervised methods, little is known about the fairness of LLMs as prediction systems. Further, common standard methods for fairness involve access to models weights or are applied during finetuning, which are not applicable in few-shot learning. Do LLMs exhibit prediction biases when used for standard NLP tasks? In this work, we explore the effect of shots, which directly affect the performance of models, on the fairness of LLMs as NLP classification systems. We consider how different shot selection strategies, both existing and new demographically sensitive methods, affect model fairness across three standard fairness datasets. We discuss how future work can include LLM fairness evaluations.
    摘要 最近,NLU工作偏向几个shot(在上下文中学习),大型自然语言模型(LLM)在多种任务上表现良好。然而,对于LLM作为预测系统的公平性知之甚少。常见的标准方法 для公平性做出在模型权重访问或在微调中应用,这些方法不适用于几个shot学习。 LLM是否具有预测偏见?在这个工作中,我们研究shot的影响,直接影响模型性能,LLM作为NLU分类系统的公平性。我们考虑了不同的shot选择策略,包括现有的人口普查方法和新的人口普查方法,对三个标准公平性数据集的模型公平性进行分析。我们讨论了未来工作如何包括LLM公平性评估。

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

  • paper_url: http://arxiv.org/abs/2311.08469
  • repo_url: None
  • paper_authors: Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr
  • for: 研究异常、不常见的情况下的推理能力。
  • methods: 使用人工推理和大语言模型来生成解释,以提高对于不常见的情况下的预测能力。
  • results: 比较人类解释和大语言模型的表现,发现人类解释和模型合作的解释能够达到最高质量,并且可以在 especificity 和 diversity 之间进行融合。此外,通过在线模仿学习算法进行训练,可以更好地帮助模型学习这种任务。
    Abstract Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate a natural language explanation that makes the unexpected outcome more likely in the context. To this end, we curate and release a new English language corpus called UNcommonsense. We characterize the differences between the performance of human explainers and the best performing large language models, finding that model-enhanced human-written explanations achieve the highest quality by trading off between specificity and diversity. Finally, we experiment with several online imitation learning algorithms to train open and accessible language models on this task. When compared with the vanilla supervised fine-tuning approach, these methods consistently reduce lose rates on both common and uncommonsense abductive reasoning judged by human evaluators.
    摘要 Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate a natural language explanation that makes the unexpected outcome more likely in the context. To this end, we curate and release a new English language corpus called UNcommonsense. We characterize the differences between the performance of human explainers and the best performing large language models, finding that model-enhanced human-written explanations achieve the highest quality by trading off between specificity and diversity. Finally, we experiment with several online imitation learning algorithms to train open and accessible language models on this task. When compared with the vanilla supervised fine-tuning approach, these methods consistently reduce lose rates on both common and uncommonsense abductive reasoning judged by human evaluators.Here's the translation in Traditional Chinese as well:Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate a natural language explanation that makes the unexpected outcome more likely in the context. To this end, we curate and release a new English language corpus called UNcommonsense. We characterize the differences between the performance of human explainers and the best performing large language models, finding that model-enhanced human-written explanations achieve the highest quality by trading off between specificity and diversity. Finally, we experiment with several online imitation learning algorithms to train open and accessible language models on this task. When compared with the vanilla supervised fine-tuning approach, these methods consistently reduce lose rates on both common and uncommonsense abductive reasoning judged by human evaluators.

Retrieve and Copy: Scaling ASR Personalization to Large Catalogs

  • paper_url: http://arxiv.org/abs/2311.08402
  • repo_url: None
  • paper_authors: Sai Muralidhar Jayanthi, Devang Kulshreshtha, Saket Dingliwal, Srikanth Ronanki, Sravan Bodapati
  • for: 提高自然语言处理(NLP)系统的自适应性,以提高识别罕见词和域 especific实体的精度。
  • methods: 使用注意力基于的上下文偏好技术来改进识别罕见词和域 especific实体的精度。
  • results: 提出了一种“检索并复制”机制,可以提高响应速度而不 sacrifi 精度,并提出了一种培 optimized training 策略,可以在大型目录中提高识别精度。实验结果表明,我们的方法可以相比强基eline提高 Word Error Rate (WERR)上到 6%,并提高 F1 得分约 3.6%。同时,我们的方法可以支持大型目录,不会影响 WER 和 F1 分数,并且可以实现每个音频帧的执行速度提高至少 20%。
    Abstract Personalization of automatic speech recognition (ASR) models is a widely studied topic because of its many practical applications. Most recently, attention-based contextual biasing techniques are used to improve the recognition of rare words and domain specific entities. However, due to performance constraints, the biasing is often limited to a few thousand entities, restricting real-world usability. To address this, we first propose a "Retrieve and Copy" mechanism to improve latency while retaining the accuracy even when scaled to a large catalog. We also propose a training strategy to overcome the degradation in recall at such scale due to an increased number of confusing entities. Overall, our approach achieves up to 6% more Word Error Rate reduction (WERR) and 3.6% absolute improvement in F1 when compared to a strong baseline. Our method also allows for large catalog sizes of up to 20K without significantly affecting WER and F1-scores, while achieving at least 20% inference speedup per acoustic frame.
    摘要 personalized automatic speech recognition (ASR) models 是一个广泛研究的话题,因为它有很多实际应用。最近,人们主要使用注意力基于的上下文偏好技术来改善罕见词和领域专有实体的识别。然而,由于性能约束,偏好通常是限制在几千个实体上,这限制了实际应用的可用性。为了解决这个问题,我们首先提出了“获取并复制”机制,以提高响应时间而不失去精度,即使扩展到大型目录。我们还提出了一种训练策略,以超越扩展后的识别率下降。总的来说,我们的方法可以在20K大型目录下实现6%左右的单词错误率下降(WERR)和3.6%绝对提升的F1分数,而无需明显影响WER和F1分数。此外,我们的方法还可以在每个音频帧上实现至少20%的执行速度提升。

A Material Lens on Coloniality in NLP

  • paper_url: http://arxiv.org/abs/2311.08391
  • repo_url: None
  • paper_authors: William Held, Camille Harris, Michael Best, Diyi Yang
  • for: 本文旨在探讨自然语言处理(NLP)领域中的殖民性强度,并提出了解决方案。
  • methods: 本文使用了actor-network理论(ANT)来分析NLP数据、算法和软件中的殖民性关系网络,并进行了质量调查以证明殖民性强度随NLP研究阶段的发展增长。
  • results: 研究发现,NLP领域的殖民性强度随着时间的推移而增长,而且与殖民前期的不平等关系有直接的相关性。因此,要解决NLP中的殖民性问题,不仅需要更改现有的价值观,还需要 aktiv地去除殖民思想在基础数据和算法中的积累。
    Abstract Coloniality, the continuation of colonial harms beyond "official" colonization, has pervasive effects across society and scientific fields. Natural Language Processing (NLP) is no exception to this broad phenomenon. In this work, we argue that coloniality is implicitly embedded in and amplified by NLP data, algorithms, and software. We formalize this analysis using Actor-Network Theory (ANT): an approach to understanding social phenomena through the network of relationships between human stakeholders and technology. We use our Actor-Network to guide a quantitative survey of the geography of different phases of NLP research, providing evidence that inequality along colonial boundaries increases as NLP builds on itself. Based on this, we argue that combating coloniality in NLP requires not only changing current values but also active work to remove the accumulation of colonial ideals in our foundational data and algorithms.
    摘要 殖民性,即殖民主义的继续影响 beyond "官方" 殖民化,对社会和科学领域都产生了广泛的影响。自然语言处理(NLP)不例外。在这篇文章中,我们 argued that 殖民性在 NLP 数据、算法和软件中是隐式地嵌入的,并通过这些技术的网络关系来强化。我们使用 Actor-Network 理论(ANT)来理解社会现象,通过人类利益相互关系和技术之间的网络来分析。我们使用我们的 Actor-Network 来导引一项量化调查 NLP 研究的不同阶段的地理学,并提供证据表明,在 NLP 建立起来的过程中,殖民性的不平等增加。基于这些证据,我们 argue that 在 NLP 中推翻殖民性需要不仅改变当前的价值观,还需要活动地除掉殖民主义的积累在我们基础数据和算法中。

On What Basis? Predicting Text Preference Via Structured Comparative Reasoning

  • paper_url: http://arxiv.org/abs/2311.08390
  • repo_url: None
  • paper_authors: Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky
  • for: 用于提高自然语言处理(NLP)中文本偏好预测的精度。
  • methods: 使用结构化中间比较来预测文本偏好。首先提出比较方面,然后生成每个方面下的文本比较。使用对比式比较器确保每个方面的比较能够清晰地区分文本之间的差异,从而减少幻想和提高一致性。
  • results: 在多种NLP任务中,包括摘要、检索和自动评分等,SCapproach可以使LLMs达到文本偏好预测的状态 искусственный智能水平。
    Abstract Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning. While approaches like Chain-of-Thought improve accuracy in many other settings, they struggle to consistently distinguish the similarities and differences of complex texts. We introduce SC, a prompting approach that predicts text preferences by generating structured intermediate comparisons. SC begins by proposing aspects of comparison, followed by generating textual comparisons under each aspect. We select consistent comparisons with a pairwise consistency comparator that ensures each aspect's comparisons clearly distinguish differences between texts, significantly reducing hallucination and improving consistency. Our comprehensive evaluations across various NLP tasks, including summarization, retrieval, and automatic rating, demonstrate that SC equips LLMs to achieve state-of-the-art performance in text preference prediction.
    摘要 <>转换给定文本为简化中文。>文本比较逻辑在文本偏好预测中发挥关键作用,但大型语言模型(LLM)经常表现出不一致的逻辑。虽然链条思维方法在其他设定中提高准确性,但它们在识别复杂文本之间的相似性和差异时表现不佳。我们介绍了SC,一种提示方法,它预测文本偏好by generating结构化中间比较。SC首先提出比较方面,然后生成文本中的比较。我们使用对比式相似性比较器来选择一致的比较,以确保每个比较方面的比较能够清晰地分辨文本之间的差异,从而减少幻觉和提高一致性。我们在各种NLP任务,包括摘要、检索和自动评分中进行了全面的评估,demonstrate that SC使得LLM可以实现文本偏好预测的状态码性表现。

ChOiRe: Characterizing and Predicting Human Opinions with Chain of Opinion Reasoning

  • paper_url: http://arxiv.org/abs/2311.08385
  • repo_url: https://github.com/dxlong2000/ChOiRe
  • paper_authors: Xuan Long Do, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen
  • for: 预测人类意见 (predicting human opinions)
  • methods: 四步解决方案框架 (four-step solution framework): 1. LM 分析用户明确人格 (analyze user explicit personae) 2. LM 排序隐藏人格意见 (rank implicit persona opinions) 3. 链接意见 (Chain-of-Opinion) 运算 4. 多次执行链接意见 (execute CoO multiple times)
  • results: 提高过往技术表现率 (improve previous LLM-based techniques) by 3.22%.
    Abstract Aligning language models (LMs) with human opinion is challenging yet vital to enhance their grasp of human values, preferences, and beliefs. We present ChOiRe, a four-step solution framework to predict human opinion that differentiates between the user explicit personae (i.e. demographic or ideological attributes) that are manually declared and implicit personae inferred from user historical opinions. Specifically, it consists of (i) an LM analyzing the user explicit personae to filter out irrelevant attributes; (ii) the LM ranking the implicit persona opinions into a preferential list; (iii) Chain-of-Opinion (CoO) reasoning, where the LM sequentially analyzes the explicit personae and the most relevant implicit personae to perform opinion prediction; (iv) and where ChOiRe executes Step (iii) CoO multiple times with increasingly larger lists of implicit personae to overcome insufficient personae information to infer a final result. ChOiRe achieves new state-of-the-art effectiveness with limited inference calls, improving previous LLM-based techniques significantly by 3.22%.
    摘要 aligning language models (LMs) with human opinion 是一项挑战性质的 yet vital 任务,以提高 LMs 的人类价值观、偏好和信仰的理解。我们提出了 ChOiRe,一个四步解决方案框架,用于预测人类意见。specifically,it consists of (i) an LM analyzing the user explicit personae (i.e. demographic or ideological attributes)to filter out irrelevant attributes; (ii) the LM ranking the implicit persona opinions into a preferential list; (iii) Chain-of-Opinion (CoO) reasoning, where the LM sequentially analyzes the explicit personae and the most relevant implicit personae to perform opinion prediction; (iv) and where ChOiRe executes Step (iii) CoO multiple times with increasingly larger lists of implicit personae to overcome insufficient personae information to infer a final result. ChOiRe achieves new state-of-the-art effectiveness with limited inference calls, improving previous LLM-based techniques significantly by 3.22%.

Direct Preference Optimization for Neural Machine Translation with Minimum Bayes Risk Decoding

  • paper_url: http://arxiv.org/abs/2311.08380
  • repo_url: None
  • paper_authors: Guangyu Yang, Jinghong Chen, Weizhe Lin, Bill Byrne
  • for: 提高多语言大语言模型(MLLM)的翻译性能
  • methods: 使用现代 reinforcement learning(RL)技术,具体是Direct Preference Optimization(DPO)来微调 MLLM,以获得 MBR decoding 的好处而不需要额外计算
  • results: 在多个 NMT 测试集上,我们的微调模型比基础 MLLM ohne preference optimization 有显著的提高,并且只需使用小型单语言微调集来实现。
    Abstract Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive and in this paper, we show how recently developed Reinforcement Learning (RL) technique, Direct Preference Optimization (DPO) can be used to fine-tune MLLMs so that we get the gains from MBR without the additional computation in inference. Our fine-tuned models have significantly improved performance on multiple NMT test sets compared to base MLLMs without preference optimization. Our method boosts the translation performance of MLLMs using relatively small monolingual fine-tuning sets.
    摘要 <> translate "Minimum Bayes Risk (MBR) decoding can significantly improve translation performance of Multilingual Large Language Models (MLLMs). However, MBR decoding is computationally expensive and in this paper, we show how recently developed Reinforcement Learning (RL) technique, Direct Preference Optimization (DPO) can be used to fine-tune MLLMs so that we get the gains from MBR without the additional computation in inference. Our fine-tuned models have significantly improved performance on multiple NMT test sets compared to base MLLMs without preference optimization. Our method boosts the translation performance of MLLMs using relatively small monolingual fine-tuning sets." into Simplified Chinese.翻译结果:可以使用最小概率风险(MBR)解码提高多语言大语言模型(MLLM)的翻译性能。然而,MBR解码 computationally costly。在这篇论文中,我们使用最近发展的回归学习(RL)技术,直接偏好优化(DPO)来练化 MLLM,以获得 MBR 的优点而无需在推理中添加计算。我们的练化模型在多个 NMT 测试集上显著提高了翻译性能,与基础 MLLM без偏好优化比较。我们的方法可以通过 relativelly 小的单语言练化集来提高 MLLM 的翻译性能。

A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts

  • paper_url: http://arxiv.org/abs/2311.08374
  • repo_url: None
  • paper_authors: Nafis Irtiza Tripto, Saranya Venkatraman, Dominik Macko, Robert Moro, Ivan Srba, Adaku Uchendu, Thai Le, Dongwon Lee
  • for: investigate whether a text retains its original authorship when it undergoes numerous paraphrasing iterations using Large Language Models (LLMs)
  • methods: employing LLMs to paraphrase the text and examining the resulting text for retention of original authorship
  • results: a philosophical inquiry into the determination of authorship in instances where LLMs or similar paraphrasing tools are employed to rephrase the text, with a focus on whether authorship should be attributed to the original human author or the AI-powered tool
    Abstract In the realm of text manipulation and linguistic transformation, the question of authorship has always been a subject of fascination and philosophical inquiry. Much like the \textbf{Ship of Theseus paradox}, which ponders whether a ship remains the same when each of its original planks is replaced, our research delves into an intriguing question: \textit{Does a text retain its original authorship when it undergoes numerous paraphrasing iterations?} Specifically, since Large Language Models (LLMs) have demonstrated remarkable proficiency in the generation of both original content and the modification of human-authored texts, a pivotal question emerges concerning the determination of authorship in instances where LLMs or similar paraphrasing tools are employed to rephrase the text. This inquiry revolves around \textit{whether authorship should be attributed to the original human author or the AI-powered tool, given the tool's independent capacity to produce text that closely resembles human-generated content.} Therefore, we embark on a philosophical voyage through the seas of language and authorship to unravel this intricate puzzle.
    摘要 在文本处理和语言转化领域中,作者所有权问题一直是一个感人和哲学问题。与《戴达瑟斯船只 парадок斯》相似,我们的研究探讨了一个有趣的问题:改进了多个重叠修改后,文本仍然保留原始作者的身份吗?具体来说,由大语言模型(LLMs)所示的出色表现使得我们面临着一个问题:在使用LLMs或类似的重叠修改工具时,推准作者的身份应该归属于原始的人类作者还是AI工具?这个问题环绕着AI工具独立地生成文本的能力和人类生成内容的相似性。因此,我们将开启一场哲学之旅,探索语言和作者所有权的秘密。

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08370
  • repo_url: None
  • paper_authors: Bertie Vidgen, Hannah Rose Kirk, Rebecca Qian, Nino Scherrer, Anand Kannappan, Scott A. Hale, Paul Röttger
  • for: 这个论文的目的是为开发者和企业提供一个快速和系统地检测大语言模型(LLM)的重要安全隐患的测试集。
  • methods: 论文提出了一个名为SimpleSafetyTests的新测试集,包含5种危害领域的100个测试提示,以检测 LLM 是否会遵循有害指令、提供危险建议或生成恶意内容。
  • results: 测试发现大多数测试用的 LLM 在20%以上的情况下做出了危险响应,其中一些模型甚至在50%以上的情况下做出了危险响应。 prepending 一个安全强调系统提示可以显著减少危险响应的发生,但并不能完全消除它们。
    Abstract The past year has seen rapid acceleration in the development of large language models (LLMs). For many tasks, there is now a wide range of open-source and open-access LLMs that are viable alternatives to proprietary models like ChatGPT. Without proper steering and safeguards, however, LLMs will readily follow malicious instructions, provide unsafe advice, and generate toxic content. This is a critical safety risk for businesses and developers. We introduce SimpleSafetyTests as a new test suite for rapidly and systematically identifying such critical safety risks. The test suite comprises 100 test prompts across five harm areas that LLMs, for the vast majority of applications, should refuse to comply with. We test 11 popular open LLMs and find critical safety weaknesses in several of them. While some LLMs do not give a single unsafe response, most models we test respond unsafely on more than 20% of cases, with over 50% unsafe responses in the extreme. Prepending a safety-emphasising system prompt substantially reduces the occurrence of unsafe responses, but does not completely stop them from happening. We recommend that developers use such system prompts as a first line of defence against critical safety risks.
    摘要 过去一年,大型语言模型(LLM)的开发速度快速增加。许多任务上现在有很多开源和开放的 LLM,可以作为专有模型如ChatGPT的替代品。如果没有适当的导航和安全措施,LLM很快就会遵循恶意指令,提供危险的建议,并生成毒害内容。这是商业和开发人员的重要安全风险。我们介绍了 SimpleSafetyTests 作为新的测试集,用于快速和系统地检测这些重要安全风险。测试集包括 100 个测试提示,涵盖了五种伤害领域,LLM 在大多数应用程序上应该拒绝遵从。我们测试了 11 个流行的开源 LLM,发现了一些重要的安全漏洞。although some LLMs do not give a single unsafe response,most models we test respond unsafely on more than 20% of cases,with over 50% unsafe responses in the extreme。在添加安全强调系统提示前,Unsafe responses 的发生率很高,但不可以完全消除。我们建议开发人员使用这些系统提示作为首选防止重要安全风险的措施。

How You Prompt Matters! Even Task-Oriented Constraints in Instructions Affect LLM-Generated Text Detection

  • paper_url: http://arxiv.org/abs/2311.08369
  • repo_url: None
  • paper_authors: Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki
  • for: 本研究旨在探讨现有语言模型检测器在生成文本时的表现不稳定性,具体来说是在学生作业写作中使用任务导向的约束会导致检测器的表现不一致。
  • methods: 本研究使用现有的语言模型生成文本检测器,并手动创建了作业质量因素的任务导向约束。实验结果显示,使用任务导向约束的 instrucion 可以导致检测器的表现差异增加至多达 20 倍。
  • results: 本研究发现,使用任务导向约束的 instrucion 可以导致现有的检测器表现不稳定,具体来说是在生成文本时的表现差异可以达到多达 20 倍。这些结果表明需要进一步的研究,以开发更加稳定的检测器,能够检测任务导向约束所引起的分布Shift。
    Abstract Against the misuse (e.g., plagiarism or spreading misinformation) of Large Language Models (LLMs), many recent works have presented LLM-generated-text detectors with promising detection performance. Spotlighting a situation where users instruct LLMs to generate texts (e.g., essay writing), there are various ways to write the instruction (e.g., what task-oriented constraint to include). In this paper, we discover that even a task-oriented constraint in instruction can cause the inconsistent performance of current detectors to the generated texts. Specifically, we focus on student essay writing as a realistic domain and manually create the task-oriented constraint for each factor on essay quality by Ke and Ng (2019). Our experiment shows that the detection performance variance of the current detector on texts generated by instruction with each task-oriented constraint is up to 20 times larger than the variance caused by generating texts multiple times and paraphrasing the instruction. Our finding calls for further research on developing robust detectors that can detect such distributional shifts caused by a task-oriented constraint in the instruction.
    摘要 对大语言模型(LLM)的滥用(如 пла格іязм或传播False Information),许多最近的研究已经提出了LLM生成文本检测器,其检测性能有promising的表现。对于用户将LLM生成文本(例如学生写作),存在多种写作指导(例如任务型束)。在这篇文章中,我们发现,即使用户提供了任务型束,current detector的检测性能仍然存在不稳定性。 Specifically, we focus on学生写作 as a realistic domain, and manually create the task-oriented constraint for each factor of essay quality proposed by Ke and Ng (2019). Our experiment shows that the detection performance variance of the current detector on texts generated by instruction with each task-oriented constraint is up to 20 times larger than the variance caused by generating texts multiple times and paraphrasing the instruction. Our finding calls for further research on developing robust detectors that can detect such distributional shifts caused by a task-oriented constraint in the instruction.

Artificial Text Boundary Detection with Topological Data Analysis and Sliding Window Techniques

  • paper_url: http://arxiv.org/abs/2311.08349
  • repo_url: None
  • paper_authors: Laida Kushnareva, Tatiana Gaintseva, German Magai, Serguei Barannikov, Dmitry Abulkhanov, Kristian Kuznetsov, Irina Piontkovskaya, Sergey Nikolenko
  • for: 本研究旨在探讨人工智能语言生成模型的快速发展以来,文本中的人类和机器生成部分的分界问题。
  • methods: 本研究考虑了多种不同的方法来解决人工智能语言生成模型中文本的分界问题,并对几种预测器进行了比较。
  • results: 研究发现,使用精度进行精心微调的RoBERTa模型在总体来说表现良好,但在cross-domain和cross-生成器设置下表现不佳,很容易过拟合数据中的假性质。然后,研究提出了基于冻结语言模型的嵌入特征的新方法,能够超过人类精度水平和先前考虑的基准值。此外,研究还采用了抽象率基于的方法来检测文本边界,并分析了这些方法的行为。
    Abstract Due to the rapid development of text generation models, people increasingly often encounter texts that may start out as written by a human but then continue as machine-generated results of large language models. Detecting the boundary between human-written and machine-generated parts of such texts is a very challenging problem that has not received much attention in literature. In this work, we consider and compare a number of different approaches for this artificial text boundary detection problem, comparing several predictors over features of different nature. We show that supervised fine-tuning of the RoBERTa model works well for this task in general but fails to generalize in important cross-domain and cross-generator settings, demonstrating a tendency to overfit to spurious properties of the data. Then, we propose novel approaches based on features extracted from a frozen language model's embeddings that are able to outperform both the human accuracy level and previously considered baselines on the Real or Fake Text benchmark. Moreover, we adapt perplexity-based approaches for the boundary detection task and analyze their behaviour. We analyze the robustness of all proposed classifiers in cross-domain and cross-model settings, discovering important properties of the data that can negatively influence the performance of artificial text boundary detection algorithms.
    摘要 (Note: The text has been translated into Simplified Chinese, but some words and phrases may still be in Traditional Chinese, as there are no direct translations for some of the technical terms used in the text.)

MC^2: A Multilingual Corpus of Minority Languages in China

  • paper_url: http://arxiv.org/abs/2311.08348
  • repo_url: https://github.com/luciusssss/mc2_corpus
  • paper_authors: Chen Zhang, Mingxu Tao, Quzhe Huang, Jiuheng Lin, Zhibin Chen, Yansong Feng
  • for: 提高中国少数民族语言的可访问性
  • methods: 采用质量中心的解决方案,优先保证数据的准确性和质量,同时提高语言表现的多样性和代表性
  • results: 实现了中国少数民族语言的大规模数据采集,探讨了这些语言的新研究挑战,如长文本处理和多种写作系统的结合
    Abstract Large-scale corpora play a vital role in the construction of large language models (LLMs). However, existing LLMs exhibit limited abilities in understanding low-resource languages, including the minority languages in China, due to a lack of training data. To improve the accessibility of these languages, we present MC^2, a Multilingual Corpus of Minority Languages in China, which is the largest open-source corpus so far. It encompasses four underrepresented languages, i.e., Tibetan, Uyghur, Kazakh in the Kazakh Arabic script, and Mongolian in the traditional Mongolian script. Notably, two writing systems in MC^2 are long neglected in previous corpora. As we identify serious contamination in the low-resource language split in the existing multilingual corpora, we propose a quality-centric solution for collecting MC^2, prioritizing quality and accuracy while enhancing representativeness and diversity. By in-depth analysis, we demonstrate the new research challenges MC^2 brings, such as long-text modeling and multiplicity of writing systems. We hope MC^2 can help enhance the equity of the underrepresented languages in China and provide a reliable data foundation for further research on low-resource languages.
    摘要 大规模 corpora 在大语言模型(LLM)的建构中发挥重要作用。然而,现有的 LLM 在理解低资源语言方面表现有限,包括中国少数民族语言,因为缺乏训练数据。为了提高这些语言的可访问性,我们提出 MC^2,一个多语言资料库,这是目前最大的开源资料库。它包括四种少数语言,即藏语、维吾尔语、哈萨克语(使用kazakh arabic字母)和蒙古语(使用传统蒙古字母)。值得注意的是, MC^2 中两种文字系统长期被前一些资料库忽略。在我们发现现有多语言资料库中低资源语言分区存在严重污染的问题后,我们提出了一种基于质量的解决方案,即在收集 MC^2 时,优先考虑质量和准确性,同时增强表现和多样性。通过深入分析,我们显示 MC^2 带来的新研究挑战,例如长文本模型和多种文字系统的复杂性。我们希望 MC^2 能够提高中国少数民族语言的平等,并为未来对低资源语言进行更多研究提供可靠的数据基础。

  • paper_url: http://arxiv.org/abs/2311.08329
  • repo_url: https://github.com/hanseokoh/ktrlf
  • paper_authors: Hanseok Oh, Haebin Shin, Miyoung Ko, Hyunji Lee, Minjoon Seo
  • for: 这篇论文是为了解决一个新的问题KTRL+F,这是一个基于知识的在文档中搜索任务,需要在文档中实时找到所有semantic target,同时考虑外部知识来填充semantic gap。
  • methods: 论文使用了多种基eline来分析KTRL+F问题,并发现了现有模型存在一些局限性,如幻觉、响应时间较长、外部知识难以引入。因此,论文提出了一种知识增强的短语检索模型,可以在实时下提供一个平衡的性能和速度。
  • results: 论文通过用户研究发现,解决KTRL+F问题可以提高用户搜索体验,用户可以减少查询数量,并减少外部源查询次数。这表明,通过增强在文档中的信息访问,可以提高用户的搜索效率。
    Abstract We introduce a new problem KTRL+F, a knowledge-augmented in-document search task that necessitates real-time identification of all semantic targets within a document with the awareness of external sources through a single natural query. This task addresses following unique challenges for in-document search: 1) utilizing knowledge outside the document for extended use of additional information about targets to bridge the semantic gap between the query and the targets, and 2) balancing between real-time applicability with the performance. We analyze various baselines in KTRL+F and find there are limitations of existing models, such as hallucinations, low latency, or difficulties in leveraging external knowledge. Therefore we propose a Knowledge-Augmented Phrase Retrieval model that shows a promising balance between speed and performance by simply augmenting external knowledge embedding in phrase embedding. Additionally, we conduct a user study to verify whether solving KTRL+F can enhance search experience of users. It demonstrates that even with our simple model users can reduce the time for searching with less queries and reduced extra visits to other sources for collecting evidence. We encourage the research community to work on KTRL+F to enhance more efficient in-document information access.
    摘要 我们引入了一个新的问题KTRL+F,即具有知识扩展的在文档内搜索任务,需要在文档中实时识别所有 semantic 目标,并在单个自然语言查询中利用外部知识。这个任务面临着以下两个独特挑战:1)利用外部知识来延伸文档中的信息,以填补查询和目标之间的semantic gap,2)在实时应用中维护高性能。我们分析了KTRL+F的各种基eline,发现存在许多限制,如幻觉、低响应速度、或者Difficulty in leveraging external knowledge。因此,我们提议一种具有良好平衡的知识扩展短语检索模型,通过在短语嵌入中添加外部知识嵌入来实现。此外,我们进行了一次用户研究,以验证解决KTRL+F问题是否可以提高搜索用户的体验。结果表明,即使使用我们的简单模型,用户可以减少搜索时间,并避免了额外访问其他来源以收集证据。我们鼓励研究者继续开发KTRL+F,以提高更有效的文档内信息访问。

Open-vocabulary keyword spotting in any language through multilingual contrastive speech-phoneme pretraining

  • paper_url: http://arxiv.org/abs/2311.08323
  • repo_url: None
  • paper_authors: Jian Zhu, Farhan Samir, Changbing Yang, Jahurul Islam
  • for: 这篇论文旨在开发一个大规模多语言 speech Corpora,包含超过 115 种语言,并提出一种多语言 phoneme-speech 对比 embedding 模型,能够在开放词汇中匹配 speech 信号和 phonemically 转录的关键词或自由文本。
  • methods: 该模型使用 fine-grained phonemic transcriptions 作为输入,并采用一种基于 contrastive learning 的训练方法,可以在 97 种语言中进行开放词汇匹配。
  • results: 对比文本模型,使用 phonemes 作为模型单元可以实现更好的 crosslinguistic 泛化性,并且在两个采集 speech Corpora 中进行了证明。
    Abstract In this paper, we introduce a massively multilingual speech corpora with fine-grained phonemic transcriptions, encompassing more than 115 languages from diverse language families. Based on this multilingual dataset, we propose CLAP-IPA, a multilingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between speech signals and phonemically transcribed keywords or arbitrary phrases. The proposed model has been tested on two fieldwork speech corpora in 97 unseen languages, exhibiting strong generalizability across languages. Comparison with a text-based model shows that using phonemes as modeling units enables much better crosslinguistic generalization than orthographic texts.
    摘要 在这篇论文中,我们介绍了一个大量多语言的语音 corpora,其包含了超过115种语言,从多种语言家族中选取。基于这个多语言数据集,我们提议了一种多语言phoneme-speech对比嵌入模型,能够在语音信号和phonemically转写的关键词或自由语phrases之间进行开放词汇匹配。我们对97种未看过语言的场景进行了测试,结果显示了模型在不同语言之间具有强大的泛化能力。与文本基于模型相比,使用phonemes作为模型单元可以实现跨语言泛化的更好的性能。

On-the-Fly Fusion of Large Language Models and Machine Translation

  • paper_url: http://arxiv.org/abs/2311.08306
  • repo_url: None
  • paper_authors: Hieu Hoang, Huda Khayrallah, Marcin Junczys-Dowmunt
  • for: 提高机器翻译模型的翻译质量
  • methods: 使用LLM进行在线折衔,并与NMT模型相互结合
  • results: 实验结果表明,使用LLM可以提高翻译质量,并且结合LLM和NMT模型的结合效果比两个 stronger MT模型的结合效果更好。
    Abstract We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input. We perform experiments on 4 language pairs (both directions) with varying data amounts. We find that a slightly weaker-at-translation LLM can improve translations of a NMT model, and ensembling with an LLM can produce better translations than ensembling two stronger MT models. We combine our method with various techniques from LLM prompting, such as in context learning and translation context.
    摘要 我们提议在实时进行机器翻译模型与大语言模型(LLM)的 ensemble,两者在同一任务和输入下被触发。我们在4种语言对(双向)进行了实验,并调整了数据量。我们发现一些较弱的翻译LLM可以改善机器翻译模型的翻译结果,而ensemble两者可以生成更好的翻译结果,而且比 ensemble两个更强的机器翻译模型。我们将方法与LLM触发技术相结合,如在语言上学习和翻译上下文。

How Well Do Large Language Models Understand Syntax? An Evaluation by Asking Natural Language Questions

  • paper_url: http://arxiv.org/abs/2311.08287
  • repo_url: https://github.com/Jacob-Zhou/SynEval
  • paper_authors: Houquan Zhou, Yang Hou, Zhenghua Li, Xuebin Wang, Zhefeng Wang, Xinyu Duan, Min Zhang
  • for: 本研究探讨了大语言模型(LLMs)真正理解语言的问题,是否仅仅通过模式识别来模拟理解。
  • methods: 本研究采用自然语言问答(Q&A)方式,制定了九个句子理解知识点,这些知识点最直接关系到句子理解。
  • results: 实验结果表明,大多数LLMs在句子理解知识点上有限的掌握,尤其是在附属句子和副词修饰知识点上表现出较大的差异。
    Abstract While recent advancements in large language models (LLMs) bring us closer to achieving artificial general intelligence, the question persists: Do LLMs truly understand language, or do they merely mimic comprehension through pattern recognition? This study seeks to explore this question through the lens of syntax, a crucial component of sentence comprehension. Adopting a natural language question-answering (Q&A) scheme, we craft questions targeting nine syntactic knowledge points that are most closely related to sentence comprehension. Experiments conducted on 24 LLMs suggest that most have a limited grasp of syntactic knowledge, exhibiting notable discrepancies across different syntactic knowledge points. In particular, questions involving prepositional phrase attachment pose the greatest challenge, whereas those concerning adjectival modifier and indirect object are relatively easier for LLMs to handle. Furthermore, a case study on the training dynamics of the LLMs reveals that the majority of syntactic knowledge is learned during the initial stages of training, hinting that simply increasing the number of training tokens may not be the `silver bullet' for improving the comprehension ability of LLMs.
    摘要 Recent advancements in large language models (LLMs) have brought us closer to achieving artificial general intelligence, but the question remains: do LLMs truly understand language, or do they simply mimic comprehension through pattern recognition? This study aims to explore this question through the lens of syntax, a crucial component of sentence comprehension. Using a natural language question-answering (Q&A) scheme, we crafted questions targeting nine syntactic knowledge points that are most closely related to sentence comprehension. Our experiments on 24 LLMs suggest that most have a limited grasp of syntactic knowledge, with notable discrepancies across different syntactic knowledge points. In particular, questions involving prepositional phrase attachment posed the greatest challenge, while those concerning adjectival modifier and indirect object were relatively easier for LLMs to handle. Additionally, a case study on the training dynamics of the LLMs revealed that the majority of syntactic knowledge is learned during the initial stages of training, suggesting that simply increasing the number of training tokens may not be the "silver bullet" for improving the comprehension ability of LLMs.Note: Please keep in mind that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks

  • paper_url: http://arxiv.org/abs/2311.08273
  • repo_url: None
  • paper_authors: Rochelle Choenni, Ekaterina Shutova, Dan Garrette
  • for: 本研究旨在 investigate multilingual language models中的语言层次结构和 Cross-Lingual Sharing 的关系。
  • methods: 研究使用 Training Data Attribution 方法来衡量模型的预测结果如何受到语言特有的训练示例和 Cross-Lingual Sharing 的影响。
  • results: 研究发现,无需特殊的模块化 intervención,语言模块 naturally arise 在模型中,并且 SFT 可以减少语言特化的子网络,导致更多的 Cross-Lingual Sharing。
    Abstract Recent work has proposed explicitly inducing language-wise modularity in multilingual LMs via sparse fine-tuning (SFT) on per-language subnetworks as a means of better guiding cross-lingual sharing. In this work, we investigate (1) the degree to which language-wise modularity naturally arises within models with no special modularity interventions, and (2) how cross-lingual sharing and interference differ between such models and those with explicit SFT-guided subnetwork modularity. To quantify language specialization and cross-lingual interaction, we use a Training Data Attribution method that estimates the degree to which a model's predictions are influenced by in-language or cross-language training examples. Our results show that language-specialized subnetworks do naturally arise, and that SFT, rather than always increasing modularity, can decrease language specialization of subnetworks in favor of more cross-lingual sharing.
    摘要

A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily

  • paper_url: http://arxiv.org/abs/2311.08268
  • repo_url: None
  • paper_authors: Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, Shujian Huang
  • for: 提高 Large Language Models(LLMs)的安全性和用户体验,防止 LLMs 生成危险内容。
  • methods: 基于 Prompt Rewriting 和 Scenario Nesting 的自动攻击框架 ReNeLLM,利用 LLMs 本身生成有效的监狱攻击提示。
  • results: 比对基eline的成本和时间成本,ReNeLLM 能够显著提高攻击成功率,同时大幅降低时间成本。研究还发现现有防御方法无法有效地保护 LLMs。
    Abstract Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe responses. However, adversarial prompts known as 'jailbreaks' can circumvent safeguards, leading LLMs to generate harmful content. Exploring jailbreak prompts can help to better reveal the weaknesses of LLMs and further steer us to secure them. Unfortunately, existing jailbreak methods either suffer from intricate manual design or require optimization on another white-box model, compromising generalization or jailbreak efficiency. In this paper, we generalize jailbreak prompt attacks into two aspects: (1) Prompt Rewriting and (2) Scenario Nesting. Based on this, we propose ReNeLLM, an automatic framework that leverages LLMs themselves to generate effective jailbreak prompts. Extensive experiments demonstrate that ReNeLLM significantly improves the attack success rate while greatly reducing the time cost compared to existing baselines. Our study also reveals the inadequacy of current defense methods in safeguarding LLMs. Finally, we offer detailed analysis and discussion from the perspective of prompt execution priority on the failure of LLMs' defense. We hope that our research can catalyze both the academic community and LLMs vendors towards the provision of safer and more regulated Large Language Models.
    摘要 在这篇论文中,我们将监狱提示攻击分为两个方面:(1) 提示重写和(2) enario Nesting。基于这两个方面,我们提出了一种自动化框架,即 ReNeLLM,可以利用 LLMs 本身来生成有效的监狱提示。我们的实验结果表明,ReNeLLM 可以明显提高攻击成功率,同时大幅降低了时间成本,相比现有的基elines。我们的研究还发现,现有的防御方法无法保护 LLMS。最后,我们提供了关于提示执行优先级的详细分析,并讨论 LLMS 的防御失败的原因。我们希望,我们的研究可以激发学术界和 LLMS 供应商,向更安全和更加规范的 Large Language Models 努力。

Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

  • paper_url: http://arxiv.org/abs/2311.08263
  • repo_url: https://github.com/smart-life-tech/231118-082633-esp32dev
  • paper_authors: Hongxuan Zhang, Zhining Liu, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen
  • for: 这篇论文主要针对的是提高大型语言模型(LLM)的推理速度,以便在实时应用中更好地使用LLM。
  • methods: 该论文提出了一种名为FastCoT的框架,该框架基于平行解码,不需要 auxiliary model 或 LLM 的修改。 FastCoT 使用可变大小的上下文窗口,同时进行平行解码和自然语言处理,以便充分利用 GPU 计算资源。
  • results: 经过广泛的实验, authors 表明 FastCoT 可以将推理时间减少约 20%,只有微不足的性能下降。 此外, authors 还证明了上下文窗口大小在不同任务上具有较大的稳定性。
    Abstract In this work, we propose FastCoT, a model-agnostic framework based on parallel decoding without any further training of an auxiliary model or modification to the LLM itself. FastCoT uses a size-varying context window whose size changes with position to conduct parallel decoding and auto-regressive decoding simultaneously, thus fully utilizing GPU computation resources. In FastCoT, the parallel decoding part provides the LLM with a quick glance of the future composed of approximate tokens, which could lead to faster answers compared to regular autoregressive decoding used by causal transformers. We also provide an implementation of parallel decoding within LLM, which supports KV-cache generation and batch processing. Through extensive experiments, we demonstrate that FastCoT saves inference time by nearly 20% with only a negligible performance drop compared to the regular approach. Additionally, we show that the context window size exhibits considerable robustness for different tasks.
    摘要 在这项工作中,我们提出了FastCoT,一个模型无关框架,不需要额外训练或 modify LLM 自身。FastCoT 使用变化大小的上下文窗口来同时进行平行解码和自然语言解码,因此可以完全利用 GPU 计算资源。在 FastCoT 中,平行解码部分为 LLM 提供了快速 glance 到未来组成的 approximate 字符,这可能会比常规autoregressive解码使用 causal transformer 更快。我们还提供了在 LLM 中实现平行解码的方法,支持 KV-cache 生成和批处理。经过广泛的实验,我们发现 FastCoT 可以将推理时间减少 nearly 20%,只有微scopic 性能下降。此外,我们还证明了上下文窗口大小在不同任务中具有显著的稳定性。

On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation

  • paper_url: http://arxiv.org/abs/2311.08249
  • repo_url: https://github.com/aalto-speech/dbca
  • paper_authors: Anssi Moisio, Mathias Creutz, Mikko Kurimo
  • for: 这个论文的目的是开发一种可以评估自然语言处理系统中的compositional generalization(CG)能力的benchmark。
  • methods: 这个论文使用了分布基于的compositional assessment(DBCA)框架,将欧 parliament翻译 corpus分为训练和测试集,以测试翻译系统在不同的依赖关系分布下的性能。
  • results: 这个实验使用了自动化的分布分割方法,可以方便地应用于其他dataset和语言上。 Code和数据可以在https://github.com/aalto-speech/dbca上获取。
    Abstract Compositional generalisation (CG), in NLP and in machine learning more generally, has been assessed mostly using artificial datasets. It is important to develop benchmarks to assess CG also in real-world natural language tasks in order to understand the abilities and limitations of systems deployed in the wild. To this end, our GenBench Collaborative Benchmarking Task submission utilises the distribution-based compositionality assessment (DBCA) framework to split the Europarl translation corpus into a training and a test set in such a way that the test set requires compositional generalisation capacity. Specifically, the training and test sets have divergent distributions of dependency relations, testing NMT systems' capability of translating dependencies that they have not been trained on. This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages. The code and data for the experiments is available at https://github.com/aalto-speech/dbca.
    摘要 叙述总结(CG)在自然语言处理(NLP)和机器学习中通常通过人工生成的数据集进行评估。为了更好地理解部署在野的系统的能力和局限性,需要开发真实世界自然语言任务中的标准测试套件。为此,我们的GenBench Collaborative Benchmarking Task提交使用分布型compose-ibility评估(DBCA)框架,将欧 parliament翻译集分为训练和测试集,以便测试翻译系统对于它们没有受过训练的依赖关系的翻译能力。specifically,训练和测试集具有不同的依赖关系分布,测试翻译系统的compose-ibility。这是一种自动化的、简单便宜的方法,可以轻松应用于其他数据集和语言。相关代码和数据可以在https://github.com/aalto-speech/dbca上获取。

Unlock the Power: Competitive Distillation for Multi-Modal Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08213
  • repo_url: None
  • paper_authors: Xinwei Li, Li Lin, Shuai Wang, Chen Qian
  • for: 提高多模态LM的性能和泛化能力
  • methods: 使用竞争型多模态知识储存(CoMD)框架,包括多模态预训练和竞争式知识传递两个阶段
  • results: 实验结果表明,我们的知识传递方法可以持续提高学生模型的能力,并在零基eline设置下超越当前状态艺AE模型和其他强基eline。
    Abstract Recently, multi-modal content generation has attracted lots of attention from researchers by investigating the utilization of visual instruction tuning based on large language models (LLMs). To enhance the performance and generalization ability of such LLMs, the practice of distilling knowledge from pretrained multi-modal models (a.k.a. teachers) to more compact multi-modal LLMs (students) has gained considerable interest. However, the prevailing paradigm of instructiontuning in multi-modal LLMs knowledge distillation is resource-intensive and unidirectional, neglecting the potential for mutual feedback between the student and teacher models. Thus, we propose an innovative Competitive Multi-modal Distillation framework (CoMD), which captures bidirectional feedback between teacher and student models and continually updates the multi-modal capabilities that the student model has learned. It comprises two stages: multi-modal pre-training and multi-modal competitive distillation. The first stage pre-trains the student model on a large number of filtered multi-modal datasets. The second stage facilitates a bidirectional knowledge transfer between the student and teacher models. Our experimental analysis of diverse datasets shows that our knowledge transfer method consistently improves the capabilities of the student model. Finally, the 7B-sized student model after four distillations surpassed the current state-of-the-art model LLaVA-13B on the ScienceQA and LLaVA Test dataset, also outperforms other strong baselines in the zero-shot setting.
    摘要 (注意:以下是简化中文版本,与原文可能有些不同)近些年来,多Modal内容生成技术吸引了研究人员的广泛关注,通过基于大语言模型(LLM)的视觉指令调整来探索多Modal内容生成的可能性。为了提高多Modal LLMs的性能和泛化能力,卷积多Modal模型(teacher)到更加紧凑的多Modal LLMs(student)的知识抽象已经得到了广泛的关注。然而,现有的多Modal LLMs知识抽象方法通常是资源占用和单向的,忽略了学生和教师模型之间的可能的反馈。因此,我们提出了一种创新的竞争型多Modal抽象方法(CoMD),该方法通过双向反馈来捕捉学生和教师模型之间的知识交换。CoMD包括两个阶段:多Modal预训练和多Modal竞争抽象。第一阶段通过大量筛选的多Modal数据集进行学生模型的预训练。第二阶段通过双向知识传递来实现学生和教师模型之间的知识交换。我们对多个数据集进行了实验分析,表明我们的知识传递方法能够不断提高学生模型的能力。最终,我们的7B字模型经四次抽象后,超越了当前状态的艺术模型LLaVA-13B在科学问答和LLaVA测试集上的性能,同时也超越了其他强大的基线模型。

GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding

  • paper_url: http://arxiv.org/abs/2311.08191
  • repo_url: https://github.com/gibson210/gec-depend
  • paper_authors: Konstantin Yakovlev, Alexander Podolskiy, Andrey Bout, Sergey Nikolenko, Irina Piontkovskaya
  • for: 这个论文的目的是提出一种新的非自然语言处理(NLP)任务,即句子重构(GEC)。
  • methods: 这个论文使用了一种新的非自然语言处理(NLP)方法,即句子重构(GEC)方法,该方法使用了一个卷积网络和一个排序网络来实现。
  • results: 这个论文的实验结果表明,该方法可以超过之前已知的非自然语言处理(NLP)方法,并达到自然语言处理(NLP)方法的水平,而不需要使用语言特定的数据生成方法。
    Abstract Grammatical error correction (GEC) is an important NLP task that is currently usually solved with autoregressive sequence-to-sequence models. However, approaches of this class are inherently slow due to one-by-one token generation, so non-autoregressive alternatives are needed. In this work, we propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network that outputs a self-attention weight matrix that can be used in beam search to find the best permutation of input tokens (with auxiliary {ins} tokens) and a decoder network based on a step-unrolled denoising autoencoder that fills in specific tokens. This allows us to find the token permutation after only one forward pass of the permutation network, avoiding autoregressive constructions. We show that the resulting network improves over previously known non-autoregressive methods for GEC and reaches the level of autoregressive methods that do not use language-specific synthetic data generation methods. Our results are supported by a comprehensive experimental validation on the ConLL-2014 and Write&Improve+LOCNESS datasets and an extensive ablation study that supports our architectural and algorithmic choices.
    摘要 句子结构错误纠正(GEC)是一个重要的自然语言处理(NLP)任务,通常使用回归式序列到序列模型解决。然而,这类方法因为每个字符串生成一个 Token 的速度相对较慢,因此需要非回归式的替代方法。在这种工作中,我们提出了一种新的非回归式GEC方法,其拆分 Architecture 为一个卷积网络,该网络输出一个自注意力权重矩阵,可以在搜索 beam 中使用,以找到输入token的最佳 permutation(与辅助 {ins} tokens)。此外,我们还使用一个基于步骤拆分的减噪自适应网络,填充特定的 Token。这样,我们可以在单一的前进 pass 中找到 permutation,无需使用回归式结构。我们的结果表明,该网络在 previously known 非回归式方法之上提高,并达到使用语言特定数据生成方法的 autoregressive 方法水平。我们的结果得到了 ConLL-2014 和 Write&Improve+LOCNESS 数据集的广泛实验 validate,以及一项详细的ablation study,支持我们的建筑和算法选择。

Unlocking Science: Novel Dataset and Benchmark for Cross-Modality Scientific Information Extraction

  • paper_url: http://arxiv.org/abs/2311.08189
  • repo_url: None
  • paper_authors: Yuhan Li, Jian Wu, Zhiwei Yu, Börje F. Karlsson, Wei Shen, Manabu Okumura, Chin-Yew Lin
  • for: 本研究的目的是提供一个 semi-supervised 标注管道,以便对科学论文中的信息进行抽象,并且可以跨 Modality 进行标注。
  • methods: 本研究使用了一个迭代式的标注管道,可以同时标注文本中的实体,以及表格中的实体和关系。
  • results: 根据本研究的结果,使用 semi-supervised 标注管道可以实现跨 Modality 的标注,并且可以提高IE模型的性能。此外,本研究还报告了使用 ChatGPT 大语言模型的现有性能。
    Abstract Extracting key information from scientific papers has the potential to help researchers work more efficiently and accelerate the pace of scientific progress. Over the last few years, research on Scientific Information Extraction (SciIE) witnessed the release of several new systems and benchmarks. However, existing paper-focused datasets mostly focus only on specific parts of a manuscript (e.g., abstracts) and are single-modality (i.e., text- or table-only), due to complex processing and expensive annotations. Moreover, core information can be present in either text or tables or across both. To close this gap in data availability and enable cross-modality IE, while alleviating labeling costs, we propose a semi-supervised pipeline for annotating entities in text, as well as entities and relations in tables, in an iterative procedure. Based on this pipeline, we release novel resources for the scientific community, including a high-quality benchmark, a large-scale corpus, and a semi-supervised annotation pipeline. We further report the performance of state-of-the-art IE models on the proposed benchmark dataset, as a baseline. Lastly, we explore the potential capability of large language models such as ChatGPT for the current task. Our new dataset, results, and analysis validate the effectiveness and efficiency of our semi-supervised pipeline, and we discuss its remaining limitations.
    摘要 科学文献提取有大量潜在的可能性,可以帮助研究人员更 efficiently 工作,并促进科学进步的速度。过去几年,科学信息提取(SciIE)的研究发展出了许多新系统和标准。然而,现有的论文集中的数据主要集中在特定部分(例如摘要),并且是单一的模式(文本或表格),这是因为处理复杂和昂贵的标注。此外,核心信息可能存在于文本中或表格中,或者在两者之间。为了填补这个数据不足和启用交叉模式的IE,我们提出了一个半supervised管道来标注文本中的实体,以及表格中的实体和关系。基于这个管道,我们发布了一个高质量的benchmark,一个大规模的 corpus,以及一个半supervised的标注管道。我们还报告了现有IE模型在我们的benchmark dataset上的性能,作为基eline。最后,我们探讨了使用大语言模型如ChatGPT进行当前任务的可能性。我们的新数据集、结果和分析证明了我们的半supervised管道的有效性和高效性,并讨论了剩下的限制。

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

  • paper_url: http://arxiv.org/abs/2311.08182
  • repo_url: https://github.com/ofa-sys/diverseevol
  • paper_authors: Shengguang Wu, Keming Lu, Benfeng Xu, Junyang Lin, Qi Su, Chang Zhou
  • for: 提高大语言模型(LLM)的指令遵循能力,尤其需要大量的指令调整数据。
  • methods: 我们提出了一种自动 sampler 机制,让模型自己选择最有优势的数据点,以提高其表现。
  • results: 我们在三个数据集和benchmark上进行了广泛的实验,发现我们的方法可以让模型在使用 less than 8% 的原始数据时保持或提高性能。
    Abstract Enhancing the instruction-following ability of Large Language Models (LLMs) primarily demands substantial instruction-tuning datasets. However, the sheer volume of these imposes a considerable computational burden and annotation cost. To investigate a label-efficient instruction tuning method that allows the model itself to actively sample subsets that are equally or even more effective, we introduce a self-evolving mechanism DiverseEvol. In this process, a model iteratively augments its training subset to refine its own performance, without requiring any intervention from humans or more advanced LLMs. The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets, as the model selects new data points most distinct from any existing ones according to its current embedding space. Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol. Our models, trained on less than 8% of the original dataset, maintain or improve performance compared with finetuning on full data. We also provide empirical evidence to analyze the importance of diversity in instruction data and the iterative scheme as opposed to one-time sampling. Our code is publicly available at https://github.com/OFA-Sys/DiverseEvol.git.
    摘要 加强大语言模型(LLM)的指令遵从能力主要需要大量的指令调整数据。然而,这些数据的量却带来了较大的计算负担和标注成本。为了研究一种标签效率的指令调整方法,我们提出了一种自我演化机制——多样演化(DiverseEvol)。在这个过程中,模型会自动选择子集,以便在其当前的嵌入空间中进行进一步的调整。不需要人类或更高级的LLM的干预。我们的数据采样技术的关键在于在选择的子集中增加多样性,模型会选择新的数据点与现有的数据点最为不同,以便进一步提高自己的性能。我们在三个数据集和测试准则上进行了广泛的实验,结果表明多样演化可以很好地提高模型的性能。我们的模型,只使用原始数据的8%,可以保持或提高与全量数据的训练模型相同的性能。我们还提供了对多样性和迭代方案的分析,以及对一次采样的证明。我们的代码公开可用于https://github.com/OFA-Sys/DiverseEvol.git。

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

  • paper_url: http://arxiv.org/abs/2311.08152
  • repo_url: https://github.com/hitsz-tmg/multi-agent-peer-review
  • paper_authors: Zhenran Xu, Senbao Shi, Baotian Hu, Jindi Yu, Dongfang Li, Min Zhang, Yuxiang Wu
  • for: 提高单个模型的推理能力,让模型能够更好地解决复杂的问题
  • methods: 引入多个模型协作,每个模型独立构建解决方案,对别的模型的解决方案提供评审,并将评审结果纳入自己的解决方案中
  • results: 在三种不同类型的推理任务上,与现有方法相比,协作方法在十个数据集中具有更高的准确率
    Abstract Large Language Models (LLMs) have shown remarkable capabilities in general natural language processing tasks but often fall short in complex reasoning tasks. Recent studies have explored human-like problem-solving strategies, such as self-correct, to push further the boundary of single-model reasoning ability. In this work, we let a single model "step outside the box" by engaging multiple models to correct each other. We introduce a multi-agent collaboration strategy that emulates the academic peer review process. Each agent independently constructs its own solution, provides reviews on the solutions of others, and assigns confidence levels to its reviews. Upon receiving peer reviews, agents revise their initial solutions. Extensive experiments on three different types of reasoning tasks show that our collaboration approach delivers superior accuracy across all ten datasets compared to existing methods. Further study demonstrates the effectiveness of integrating confidence in the reviews for math reasoning, and suggests a promising direction for human-mimicking multi-agent collaboration process.
    摘要 大型自然语言处理模型(LLM)在一般natural language processing任务中表现出众,但在复杂的推理任务中经常失落。最近的研究已经探索了人类化的问题解决策略,如自我修复,以推进单个模型的推理能力的边缘。在这项工作中,我们让单个模型“离开盒子”,通过多个模型相互纠正。我们提出了一种多代理协作策略,这种策略模仿了学术 peer review 过程。每个代理独立构建自己的解决方案,对他人的解决方案提供评估,并将对其评估的信任程度分配给自己的评估。接收 peer review 后,代理修改了初始的解决方案。我们在三种不同的推理任务上进行了广泛的实验,并证明了我们的协作方法在所有十个数据集上比现有方法更高的准确率。进一步的研究还表明了将信任纳入评估中的效果,并建议了人类化多代理协作过程的可能性。

Sinkhorn Transformations for Single-Query Postprocessing in Text-Video Retrieval

  • paper_url: http://arxiv.org/abs/2311.08143
  • repo_url: None
  • paper_authors: Konstantin Yakovlev, Gregory Polyakov, Ilseyar Alimova, Alexander Podolskiy, Andrey Bout, Sergey Nikolenko, Irina Piontkovskaya
  • for: 这篇论文是关于多模态检索中的一种新趋势,即使用 dual-softmax loss (DSL) 进行后处理测试集结果。
  • methods: 本文提出了一种基于 Sinkhorn 变换的新后处理方法,并不需要对多个测试样本进行输入。
  • results: 我们的方法可以显著提高现有模型的效果,如 CLIP4Clip、BLIP、X-CLIP 和 DRL,并在多个标准文本-视频检索数据集上达到新的状态级。
    Abstract A recent trend in multimodal retrieval is related to postprocessing test set results via the dual-softmax loss (DSL). While this approach can bring significant improvements, it usually presumes that an entire matrix of test samples is available as DSL input. This work introduces a new postprocessing approach based on Sinkhorn transformations that outperforms DSL. Further, we propose a new postprocessing setting that does not require access to multiple test queries. We show that our approach can significantly improve the results of state of the art models such as CLIP4Clip, BLIP, X-CLIP, and DRL, thus achieving a new state-of-the-art on several standard text-video retrieval datasets both with access to the entire test set and in the single-query setting.
    摘要 Introduction:In recent years, there has been a growing trend in multimodal retrieval to use postprocessing techniques, such as the dual-softmax loss (DSL), to improve the performance of state-of-the-art models. However, these approaches typically require access to the entire test set, which can be limiting in practical applications. In this work, we propose a new postprocessing approach based on Sinkhorn transformations that outperforms DSL and does not require access to multiple test queries.Methodology:Our proposed approach uses Sinkhorn transformations to transform the test set into a more robust representation that can better capture the relationships between the text and video modalities. We show that this approach can significantly improve the results of state-of-the-art models such as CLIP4Clip, BLIP, X-CLIP, and DRL on several standard text-video retrieval datasets.Results:We evaluate our proposed approach on several standard text-video retrieval datasets, including the MSR-VTT dataset, the LSMDC dataset, and the MSVD dataset. Our results show that our approach can significantly improve the results of state-of-the-art models both with access to the entire test set and in the single-query setting. Specifically, we achieve a new state-of-the-art on the MSR-VTT dataset with a recall of 86.4% and a precision of 83.3%, and we achieve a new state-of-the-art on the LSMDC dataset with a recall of 84.3% and a precision of 81.3%.Conclusion:In this work, we proposed a new postprocessing approach based on Sinkhorn transformations that outperforms DSL and does not require access to multiple test queries. Our approach significantly improves the results of state-of-the-art models on several standard text-video retrieval datasets, both with access to the entire test set and in the single-query setting. We demonstrate the effectiveness of our approach and its potential for practical applications in multimodal retrieval.

Memory-efficient Stochastic methods for Memory-based Transformers

  • paper_url: http://arxiv.org/abs/2311.08123
  • repo_url: https://github.com/vishwajit-vishnu/memory-efficient-stochastic-methods-for-memory-based-transformers
  • paper_authors: Vishwajit Kumar Vishnu, C. Chandra Sekhar
  • for: 这个研究是为了提高记忆基于对称化器的训练效率,通常用于长距离上下文问题。
  • methods: 我们提出了一个新的两阶段训练机制和一个新的调整技术来改善记忆基于对称化器的训练效率。
  • results: 我们的结果显示,对称化器XL的基eline模型(Transformer-XL)在字元级语言模型任务上和相同的参数下表现与我们的结果模型(Skip Cross-head TransformerXL)相似,并在词元级语言模型任务上显示了约20% fewer参数下的表现。我们的提案方法不需要任何额外的记忆。此外,我们还证明了我们的调整机制在BERT上显示了相似的表现,并在多个GLUE任务上显示了约30%的标准差减少。
    Abstract Training Memory-based transformers can require a large amount of memory and can be quite inefficient. We propose a novel two-phase training mechanism and a novel regularization technique to improve the training efficiency of memory-based transformers, which are often used for long-range context problems. For our experiments, we consider transformer-XL as our baseline model which is one of memorybased transformer models. We show that our resultant model, Skip Cross-head TransformerXL, outperforms the baseline on character level language modeling task with similar parameters and outperforms the baseline on word level language modelling task with almost 20% fewer parameters. Our proposed methods do not require any additional memory. We also demonstrate the effectiveness of our regularization mechanism on BERT which shows similar performance with reduction in standard deviation of scores of around 30% on multiple GLUE tasks.
    摘要 训练基于记忆的变换器可能需要很大的内存并且可能不够效率。我们提出了一种新的两阶段训练机制和一种新的常见化技术来改善基于记忆的变换器的训练效率,这些变换器通常用于长距离上下文问题。在我们的实验中,我们选择了 transformer-XL 作为我们的基线模型,它是一种基于记忆的变换器模型。我们显示了我们的结果模型 Skip Cross-head TransformerXL 在字符级语言模型任务上与基eline模型具有相同的参数时表现更好,并在单词级语言模型任务上与基eline模型减少约20%的参数表现更好。我们的提议方法不需要额外的内存。我们还证明了我们的常见化机制在 BERT 上表现良好,其在多个 GLUE 任务上降低了标准差分布的分数的比例约30%。

Insights into Classifying and Mitigating LLMs’ Hallucinations

  • paper_url: http://arxiv.org/abs/2311.08117
  • repo_url: None
  • paper_authors: Alessandro Bruno, Pier Luigi Mazzeo, Aladine Chetouani, Marouane Tliba, Mohamed Amine Kerkouri
  • for: 本研究旨在探讨人工智能中的幻觉现象,以及它对人工智能的影响。
  • methods: 本研究使用了多种方法,包括机器翻译、问答系统、对话系统、概要系统、知识图与LLMs等。
  • results: 研究发现,幻觉现象可能导致人工智能生成的文本中出现false或误导性信息。此外,还提出了一些缓解幻觉的可能性,以提高LLMs的可靠性。
    Abstract The widespread adoption of large language models (LLMs) across diverse AI applications is proof of the outstanding achievements obtained in several tasks, such as text mining, text generation, and question answering. However, LLMs are not exempt from drawbacks. One of the most concerning aspects regards the emerging problematic phenomena known as "Hallucinations". They manifest in text generation systems, particularly in question-answering systems reliant on LLMs, potentially resulting in false or misleading information propagation. This paper delves into the underlying causes of AI hallucination and elucidates its significance in artificial intelligence. In particular, Hallucination classification is tackled over several tasks (Machine Translation, Question and Answer, Dialog Systems, Summarisation Systems, Knowledge Graph with LLMs, and Visual Question Answer). Additionally, we explore potential strategies to mitigate hallucinations, aiming to enhance the overall reliability of LLMs. Our research addresses this critical issue within the HeReFaNMi (Health-Related Fake News Mitigation) project, generously supported by NGI Search, dedicated to combating Health-Related Fake News dissemination on the Internet. This endeavour represents a concerted effort to safeguard the integrity of information dissemination in an age of evolving AI technologies.
    摘要 大量的大语言模型(LLM)在多种人工智能应用中的普及,证明了它们在文本挖掘、文本生成和问答等任务中的出色表现。然而,LLM并不免于缺点。其中一个最引起关注的问题是“幻觉”(Hallucination)。这种现象在基于LLM的文本生成系统中出现,特别是在问答系统中,可能导致 false或误导的信息传播。这篇论文探讨了幻觉的下面原因,并解释了它在人工智能中的重要性。具体来说,我们在Machine Translation、问答、对话系统、概要系统、知识图与LLM等任务中进行幻觉分类。此外,我们还探讨了可能的缓解幻觉的策略,以提高LLM的总可靠性。我们的研究是在HeReFaNMi(健康相关假新闻 Mitigation)项目中进行的,该项目是由NGI Search支持的,旨在在互联网上防茧健康相关假新闻的传播。这是一项共同努力,旨在保护信息传播的integrity在人工智能技术的发展过程中。

Improving hateful memes detection via learning hatefulness-aware embedding space through retrieval-guided contrastive learning

  • paper_url: http://arxiv.org/abs/2311.08110
  • repo_url: None
  • paper_authors: Jingbiao Mei, Jinghong Chen, Weizhe Lin, Bill Byrne, Marcus Tomalin
  • for: 检测仇恨推文(hateful memes)
  • methods: 利用重现导向的对比训练构建倾向感知 embedding 空间(retrieval-guided contrastive training)
  • results: 在 hateMemes 数据集上达到了状态码(AUROC)86.7,超过了大型多Modal Model(Flamingo 和 LLaVA)的表现,并实现了基于数据库中未见过的数据进行仇恨推文检测的功能。
    Abstract Hateful memes have emerged as a significant concern on the Internet. These memes, which are a combination of image and text, often convey messages vastly different from their individual meanings. Thus, detecting hateful memes requires the system to jointly understand the visual and textual modalities. However, our investigation reveals that the embedding space of existing CLIP-based systems lacks sensitivity to subtle differences in memes that are vital for correct hatefulness classification. To address this issue, we propose constructing a hatefulness-aware embedding space through retrieval-guided contrastive training. Specifically, we add an auxiliary loss that utilizes hard negative and pseudo-gold samples to train the embedding space. Our approach achieves state-of-the-art performance on the HatefulMemes dataset with an AUROC of 86.7. Notably, our approach outperforms much larger fine-tuned Large Multimodal Models like Flamingo and LLaVA. Finally, we demonstrate a retrieval-based hateful memes detection system, which is capable of making hatefulness classification based on data unseen in training from a database. This allows developers to update the hateful memes detection system by simply adding new data without retraining, a desirable feature for real services in the constantly-evolving landscape of hateful memes on the Internet.
    摘要 仇恨的 мемы在互联网上成为了一个重要的问题。这些 мемы 是一种 combining 图像和文本的形式,通常会传递出与其个别意义不同的信息。因此,检测仇恨的 мемы 需要系统能够同时理解图像和文本的modalities。然而,我们的调查发现,现有的 CLIP-based 系统的 embedding 空间缺乏对微妙的变化在 мемы 中的敏感性,这是正确地分类仇恨的关键。为解决这个问题,我们提议通过Retrieval-guided contrastive 训练构建一个仇恨意识的 embedding 空间。具体来说,我们添加了一个 auxiliary 损失函数,使用硬negative和 pseudo-gold 样本来训练 embedding 空间。我们的方法在 HatefulMemes 数据集上达到了状态的最佳性能,AUROC 为 86.7。特别是,我们的方法超过了大型 Fine-tuned Large Multimodal Models like Flamingo 和 LLaVA。 finally,我们展示了一个基于 Retrieval 的仇恨 memes 检测系统,可以通过添加新数据来更新系统,而不需要重新训练,这是一个 Desirable 的特点 для实际服务在互联网上的constantly-evolving 环境中。

SAIE Framework: Support Alone Isn’t Enough – Advancing LLM Training with Adversarial Remarks

  • paper_url: http://arxiv.org/abs/2311.08107
  • repo_url: None
  • paper_authors: Mengsay Loem, Masahiro Kaneko, Naoaki Okazaki
  • for: 提高模型对实例的理解和推理能力
  • methods: 使用支持和对抗对话方法进行培训
  • results: 模型在多种数据集上表现出色,并且在多代理推理场景中表现出更高的推理能力
    Abstract Large Language Models (LLMs) can justify or criticize their predictions through discussion with other models or humans, thereby enhancing their intrinsic understanding of instances. While proactive discussions enhance performance, this approach is currently limited to the inference phase. In this context, we posit a hypothesis: learning interactive discussions during training can improve understanding for the instances in the training step and proficiency in logical/critical thinking ability and verbalized expression of the model in the inference step. Our proposed SAIE training method involves both supportive and adversarial discussions between the learner and partner models. The learner model receives a remark from the partner through the discussion, and the parameters of the learner model are then updated based on this remark. That is, the teacher signal dynamically adjusts in response to the evolving model output throughout the training step. By bolstering the capacity for discussion and comprehension of instances, our experiments across datasets, including GSM8K, CommonsenseQA, and MMLU, reveal that models fine-tuned with our method consistently surpass those trained with standard fine-tuning techniques. Moreover, our approach demonstrates superior performance in multi-agent inference scenarios, boosting the models' reasoning abilities at the inference step.
    摘要 Simplified Chinese:大型语言模型(LLM)可以通过与其他模型或人类的交流来提高它们的实例理解能力。而积极的交流可以在推理阶段提高性能,我们提出一个假设:在训练阶段学习交流可以提高实例理解和逻辑推理能力。我们的SAIE训练方法包括支持和对抗交流,学习者模型从合作伙伴获得反馈,并根据这个反馈更新学习者模型的参数。我们的实验结果显示,使用我们的方法训练的模型在GSM8K、CommonSenseQA和MMLU等数据集上 consistently 超越标准训练技术。此外,我们的方法也在多代推理enario中表现出色,提高模型在推理阶段的推理能力。

Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models

  • paper_url: http://arxiv.org/abs/2311.08106
  • repo_url: None
  • paper_authors: Yujin Kim, Jaehong Yoon, Seonghyeon Ye, Sung Ju Hwang, Se-young Yun
  • for: 本研究旨在 Addressing the challenges of language models acquiring and updating outdated knowledge in an ever-evolving world.
  • methods: 我们提出了一个 temporally evolving question answering benchmark,即 EvolvingQA,用于训练和评估语言模型在不断发展的Wikipedia数据库上。我们的benchmark包括问答作为下游任务,以模拟实际应用场景。
  • results: 我们发现,现有的连续学习基elines有困难在更新和忘记过时知识。我们的发现表明,模型在学习更新知识时收到的权重 gradients 太小,导致模型困难学习新知识。此外,我们发现模型在提供数字或时间类问题的答案时尤其困难。
    Abstract In an ever-evolving world, the dynamic nature of knowledge presents challenges for language models that are trained on static data, leading to outdated encoded information. However, real-world scenarios require models not only to acquire new knowledge but also to overwrite outdated information into updated ones. To address this under-explored issue, we introduce the temporally evolving question answering benchmark, EvolvingQA - a novel benchmark designed for training and evaluating LMs on an evolving Wikipedia database, where the construction of our benchmark is automated with our pipeline using large language models. Our benchmark incorporates question-answering as a downstream task to emulate real-world applications. Through EvolvingQA, we uncover that existing continual learning baselines have difficulty in updating and forgetting outdated knowledge. Our findings suggest that the models fail to learn updated knowledge due to the small weight gradient. Furthermore, we elucidate that the models struggle mostly on providing numerical or temporal answers to questions asking for updated knowledge. Our work aims to model the dynamic nature of real-world information, offering a robust measure for the evolution-adaptability of language models.
    摘要 在一个不断演化的世界中,知识的动态性对语言模型来说是一个挑战,因为这些模型通常被训练在静态数据上,导致编码的信息变得过时。然而,现实生活中的应用需要模型不仅学习新知识,还要将过时的信息更新为最新的一。为解决这一未曾被探讨的问题,我们提出了时间演化问答标准 benchmark,即 EvolvingQA,这是一个基于自动化pipeline和大语言模型的新 benchmark,用于训练和评估语言模型在不断演化的Wikipedia数据库中。我们的benchmark将问答作为下游任务,以模拟实际应用场景。通过EvolvingQA,我们发现了现有的连续学习基线在更新和忘记过时知识方面存在困难。我们的发现表明,模型在小权重Gradient下难以学习更新的知识。此外,我们发现模型在提供 numerical或时间相关的答案时存在困难。我们的工作旨在模拟现实世界中的信息动态性,为语言模型的演化能力提供一种可靠的测试方法。

DiLoCo: Distributed Low-Communication Training of Language Models

  • paper_url: http://arxiv.org/abs/2311.08105
  • repo_url: None
  • paper_authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc’Aurelio Ranzato, Arthur Szlam, Jiajun Shen
  • for: 这篇论文的目的是提出一种分布式优化算法,以便在各个设备之间训练大型自然语言处理模型。
  • methods: 该算法基于联邦平均化,并使用了AdamW内部优化器和Nesterov差动外部优化器。
  • results: 在广泛使用的C4数据集上,DiLoCo在8个工作者上表现和完全同步优化相当,但是对于通信量减少500倍。DiLoCo具有良好的数据分布robustness,同时也具有资源不可用和可用的robustness。
    Abstract Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accelerators, with devices exchanging gradients and other intermediate states at each optimization step. While it is difficult to build and maintain a single computing cluster hosting many accelerators, it might be easier to find several computing clusters each hosting a smaller number of devices. In this work, we propose a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that enables training of language models on islands of devices that are poorly connected. The approach is a variant of federated averaging, where the number of inner steps is large, the inner optimizer is AdamW, and the outer optimizer is Nesterov momentum. On the widely used C4 dataset, we show that DiLoCo on 8 workers performs as well as fully synchronous optimization while communicating 500 times less. DiLoCo exhibits great robustness to the data distribution of each worker. It is also robust to resources becoming unavailable over time, and vice versa, it can seamlessly leverage resources that become available during training.
    摘要 To address this challenge, we propose a distributed optimization algorithm called Distributed Low-Communication (DiLoCo). This algorithm enables training of language models on islands of devices that are poorly connected. Our approach is based on federated averaging, but with a large number of inner steps, an AdamW inner optimizer, and a Nesterov momentum outer optimizer.We tested DiLoCo on the widely used C4 dataset with 8 workers, and found that it performs as well as fully synchronous optimization while communicating 500 times less. Additionally, DiLoCo is robust to the data distribution of each worker, and can seamlessly leverage resources that become available during training. It is also robust to resources becoming unavailable over time.In summary, DiLoCo is a distributed optimization algorithm that enables training of language models on islands of devices with low communication overhead. It is robust to variations in data distribution and resources, and can seamlessly leverage available resources during training.

Align after Pre-train: Improving Multilingual Generative Models with Cross-lingual Alignment

  • paper_url: http://arxiv.org/abs/2311.08089
  • repo_url: None
  • paper_authors: Chong Li, Shaonan Wang, Jiajun Zhang, Chengqing Zong
  • for: 提高多语言生成模型的跨语言能力
  • methods: 利用翻译句子对进行对internal sentence representations的对接和模型输出的对接,通过多语言对比学习实现对接
  • results: even with less than 0.1% of pre-training tokens, the alignment framework significantly improves the cross-lingual abilities of generative models and mitigates the performance gap, and results in a better internal multilingual representation distribution of multilingual models.
    Abstract Multilingual generative models obtain remarkable cross-lingual capabilities through pre-training on large-scale corpora. However, they still exhibit a performance bias toward high-resource languages, and learn isolated distributions of sentence representations across languages. To bridge this gap, we propose a simple yet effective alignment framework exploiting pairs of translation sentences. It aligns the internal sentence representations across different languages via multilingual contrastive learning and aligns model outputs by answering prompts in different languages. Experimental results demonstrate that even with less than 0.1 {\textperthousand} of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative models and mitigates the performance gap. Further analysis reveals that it results in a better internal multilingual representation distribution of multilingual models.
    摘要 这文本将被翻译为简化中文。<>多语言生成模型在大规模资料体上进行预训练后展现出卓越的跨语言能力,但 ainda 表现出语言高质量语言的偏好,并学习不同语言之间的隔离分布。为了 bridging 这个差距,我们提出了一个简单 yet effective 的对齐框架,利用翻译句子的对组。这个框架通过多ilingual contrastive learning 对内部句子表现进行对齐,并通过Answering 不同语言的提示来对外部输出进行对齐。实验结果显示,即使使用少于 0.1 个预训 tokens,我们的对齐框架可以帮助生成模型提高跨语言能力,并减少表现差距。进一步的分析显示,它导致多语言模型的内部多语言表现分布得到改善。

Data and models for stance and premise detection in COVID-19 tweets: insights from the Social Media Mining for Health (SMM4H) 2022 shared task

  • paper_url: http://arxiv.org/abs/2311.08057
  • repo_url: None
  • paper_authors: Vera Davydova, Huabin Yang, Elena Tutubalina
  • for: 本研究的目的是为了评估神经网络模型在健康领域的意见检测和理据分类方面的性能。
  • methods: 本研究使用了 manually annotated tweets 来评估模型的性能,并采用了 feature-level (early) fusion 和 dual-view 架构来增强模型的准确性。
  • results: 研究通过使用新收集的 Twitter 数据来评估模型在不同话题上的性能,并发现模型在这些话题上的性能有所不同。
    Abstract The COVID-19 pandemic has sparked numerous discussions on social media platforms, with users sharing their views on topics such as mask-wearing and vaccination. To facilitate the evaluation of neural models for stance detection and premise classification, we organized the Social Media Mining for Health (SMM4H) 2022 Shared Task 2. This competition utilized manually annotated posts on three COVID-19-related topics: school closures, stay-at-home orders, and wearing masks. In this paper, we extend the previous work and present newly collected data on vaccination from Twitter to assess the performance of models on a different topic. To enhance the accuracy and effectiveness of our evaluation, we employed various strategies to aggregate tweet texts with claims, including models with feature-level (early) fusion and dual-view architectures from SMM4H 2022 leaderboard. Our primary objective was to create a valuable dataset and perform an extensive experimental evaluation to support future research in argument mining in the health domain.
    摘要 COVID-19 大流行引发了社交媒体平台上的讨论,用户分享他们对于面Mask-wearing和疫苗接种的看法。为了评估神经网络模型的立场检测和前提分类能力,我们组织了2022年社会媒体挖掘 для健康(SMM4H)的共同任务2。这项竞赛使用了手动注释的帖子,涵盖了三个COVID-19相关的话题:学校关闭、困在家中和Mask-wearing。在这篇文章中,我们对先前的工作进行了扩展,并提供了新收集的Twitter上的疫苗接种话题的数据来评估模型的性能。为了提高评估的准确性和有效性,我们采用了多种策略,包括在Feature-level(早期)融合和双视体系中使用SMM4H 2022 leaderboard上的模型。我们的主要目标是创造一个有价值的数据集,并进行了广泛的实验评估,以支持未来健康领域的论点挖掘研究。

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

  • paper_url: http://arxiv.org/abs/2311.08011
  • repo_url: None
  • paper_authors: Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang
  • for: 提高语言模型对新知识的更新能力
  • methods: 基于 Parametric Arithmetic 的 Forgetting before Learning 方法
  • results: 对两个公共可用的数据集进行实验,结果显示我们提议的 F-Learning 可以显著提高语言模型对新知识的更新性能,而且在一定情况下,减少 LoRA 参数也可以达到类似的效果,甚至在一些情况下超越全 Fine-tuning。
    Abstract Recently Large Language Models (LLMs) have demonstrated their amazing text understanding and generation capabilities. However, even stronger LLMs may still learn incorrect knowledge from the training corpus, as well as some knowledge that is outdated over time. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which is based on parametric arithmetic to achieve forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can achieve a similar effect to subtracting the parameters of full fine-tuning, and sometimes even surpass it significantly.
    摘要

A Comparative Analysis of the COVID-19 Infodemic in English and Chinese: Insights from Social Media Textual Data

  • paper_url: http://arxiv.org/abs/2311.08001
  • repo_url: None
  • paper_authors: Jia Luo, Daiyun Peng, Lei Shi, Didier El Baz, Xinran Liu
  • for: 本研究探讨了COVID-19信息 Overflow 在英语和中文语言上的比较分析,通过社交媒体平台上的文本数据进行了词频分析和主题划分分析,以及情感分析,以便更好地理解COVID-19信息 Overflow 的特点和特征。
  • methods: 本研究使用了社交媒体平台上的文本数据,通过词频分析、主题划分分析和情感分析来探讨COVID-19信息 Overflow 的特点和特征。
  • results: 本研究发现,COVID-19信息 Overflow 中最常出现的 thirty-five 个词语,可以帮助我们理解COVID-19信息 Overflow 中的主要话题和趋势,同时,情感分析也能够帮助我们理解社交媒体上COVID-19信息 Overflow 的情感特征。
    Abstract The COVID-19 infodemic, characterized by the rapid spread of misinformation and unverified claims related to the pandemic, presents a significant challenge. This paper presents a comparative analysis of the COVID-19 infodemic in the English and Chinese languages, utilizing textual data extracted from social media platforms. To ensure a balanced representation, two infodemic datasets were created by augmenting previously collected social media textual data. Through word frequency analysis, the thirty-five most frequently occurring infodemic words are identified, shedding light on prevalent discussions surrounding the infodemic. Moreover, topic clustering analysis uncovers thematic structures and provides a deeper understanding of primary topics within each language context. Additionally, sentiment analysis enables comprehension of the emotional tone associated with COVID-19 information on social media platforms in English and Chinese. This research contributes to a better understanding of the COVID-19 infodemic phenomenon and can guide the development of strategies to combat misinformation during public health crises across different languages.
    摘要 COVID-19信息暴发,表现为快速传播谣言和未经证实的疫情相关宣传,呈现了一项重要挑战。本文通过对英语和中文社交媒体文本数据进行比较分析,探讨COVID-19信息暴发的特点和特征。为保证数据的准确性和 representativeness,本文创建了两个infodemic数据集,通过文本数据的扩充来增强先前收集的社交媒体数据。通过字词频分析,本文发现了35个最常出现的infodemic词语,这些词语反映了社交媒体上COVID-19信息暴发的主要话题和讨论。此外,主题归一分析揭示了每种语言上COVID-19信息暴发的主要话题结构,提供了更深入的理解。此外,情感分析帮助了我们理解社交媒体上COVID-19信息的情感色彩,以及不同语言上的情感差异。本研究对COVID-19信息暴发现象的理解做出了贡献,可以导向公共卫生危机期间抗击谣言的开发。

How Well Do Text Embedding Models Understand Syntax?

  • paper_url: http://arxiv.org/abs/2311.07996
  • repo_url: https://github.com/fzp0424/sr
  • paper_authors: Yan Zhang, Zhaopeng Feng, Zhiyang Teng, Zuozhu Liu, Haizhou Li
  • for: 本研究旨在探讨文本嵌入模型在不同语法上的普适性,以及previous研究所不充分考虑的语法理解挑战。
  • methods: 我们首先开发了一个评估集,名为SR,以测试文本嵌入模型对语法理解的能力。SR包括两个重要的语法方面:结构规则和概念之间关系的理解。我们发现现有的文本嵌入模型尚未充分解决这两个语法理解挑战,并且在评估 dataset 上表现不佳。
  • results: 我们的发现表明,现有的文本嵌入模型在不同语法上的普适性尚未得到足够的改进,而且在评估 dataset 上的表现越来越不佳。此外,我们进行了严格的分析,探讨导致这种局限性的因素,以及previous研究所无法探测这种局限性的原因。最后,我们提出了增强文本嵌入模型在多种语法上的普适性的策略。本研究为语法理解挑战提供了实际的指导,以便在多种语法上提高模型的性能。
    Abstract Text embedding models have significantly contributed to advancements in natural language processing by adeptly capturing semantic properties of textual data. However, the ability of these models to generalize across a wide range of syntactic contexts remains under-explored. In this paper, we first develop an evaluation set, named \textbf{SR}, to scrutinize the capability for syntax understanding of text embedding models from two crucial syntactic aspects: Structural heuristics, and Relational understanding among concepts, as revealed by the performance gaps in previous studies. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges, and such ineffectiveness becomes even more apparent when evaluated against existing benchmark datasets. Furthermore, we conduct rigorous analysis to unearth factors that lead to such limitations and examine why previous evaluations fail to detect such ineffectiveness. Lastly, we propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios. This study serves to highlight the hurdles associated with syntactic generalization and provides pragmatic guidance for boosting model performance across varied syntactic contexts.
    摘要 Translation notes:* "Text embedding models" is translated as "文本嵌入模型" (wén tiān zhù mó delè 模型)* "Semantic properties" is translated as "Semantic 属性" (Semantic 属性)* "Structural heuristics" is translated as "结构准则" (jiégòng zhèngxíng)* "Relational understanding" is translated as "关系理解" (guān xì líjiě)* "Performance gaps" is translated as "性能差距" (xìng néng kùjì)* "Benchmark datasets" is translated as "标准数据集" (biāo zhù shuō yì)* "Syntactic contexts" is translated as "语言上下文" (yǔ yán shàng xìng)* "Generalization ability" is translated as "通用能力" (tōng yòng néng lì)Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China.

The ART of LLM Refinement: Ask, Refine, and Trust

  • paper_url: http://arxiv.org/abs/2311.07961
  • repo_url: None
  • paper_authors: Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, Ping Yu, Ram Pasunuru, Mrinmaya Sachan, Jason Weston, Asli Celikyilmaz
  • for: 提高 Large Language Models(LLMs)的生成质量
  • methods: 提出 Ask, Refine, and Trust(ART)目标,让 LLM 自我反思并更正生成结果
  • results: ART 在两个多步骤逻辑任务(GSM8K和StrategyQA)上表现出+5点的提升,而且使用较小的模型进行决策,从而实现成本效果。
    Abstract In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often struggle to accurately identify errors when reasoning is involved. To address this, we propose a reasoning with refinement objective called ART: Ask, Refine, and Trust, which asks necessary questions to decide when an LLM should refine its output, and either affirm or withhold trust in its refinement by ranking the refinement and the initial prediction. On two multistep reasoning tasks of mathematical word problems (GSM8K) and question answering (StrategyQA), ART achieves a performance gain of +5 points over self-refinement baselines, while using a much smaller model as the decision maker. We also demonstrate the benefit of using smaller models to make refinement decisions as a cost-effective alternative to fine-tuning a larger model.
    摘要

First Step Advantage: Importance of Starting Right in Multi-Step Reasoning

  • paper_url: http://arxiv.org/abs/2311.07945
  • repo_url: None
  • paper_authors: Kushal Jain, Kumar Shridhar
  • for: 这篇论文旨在探讨大语言模型(LLM)如何解决复杂的理解任务,并将这些能力压缩到更小的模型中。
  • methods: 论文使用了LLM来导引更小的模型,以便在特定任务上创建专门的、经济的模型。
  • results: 论文发现,如果在正确的时间进行指导,更小的模型可以减少理解任务中的错误,并提高性能超过100%。
    Abstract Large Language Models (LLMs) can solve complex reasoning tasks by generating rationales for their predictions. Distilling these capabilities into a smaller, compact model can facilitate the creation of specialized, cost-effective models tailored for specific tasks. However, smaller models often face challenges in complex reasoning tasks and often deviate from the correct reasoning path. We show that LLMs can guide smaller models and bring them back to the correct reasoning path only if they intervene at the right time. We show that smaller models fail to reason primarily due to their difficulty in initiating the process, and that guiding them in the right direction can lead to a performance gain of over 100%. We explore different model sizes and evaluate the benefits of providing guidance to improve reasoning in smaller models.
    摘要

It’s All Relative! – A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction

  • paper_url: http://arxiv.org/abs/2311.07930
  • repo_url: None
  • paper_authors: Aditi Chaudhary, Karthik Raman, Michael Bendersky
  • for: 提高大型自然语言模型(LLM)在生成 sintetic 查询对的能力,以建立更好的搜索模型,尤其是在没有可用训练数据的情况下。
  • methods: 使用 LLM 生成 sintetic 查询对,通过提供少量示例进行 prompting,并将查询生成 conditional 于输入文档或 relevance 标签。
  • results: 通过在七个 IR 数据集进行广泛的实验,发现使用这种方法生成的 sintetic 查询可以提高下游性能,表明生成的查询质量更高。
    Abstract Recent developments in large language models (LLMs) have shown promise in their ability to generate synthetic query-document pairs by prompting with as few as 8 demonstrations. This has enabled building better IR models, especially for tasks with no training data readily available. Typically, such synthetic query generation (QGen) approaches condition on an input context (e.g. a text document) and generate a query relevant to that context, or condition the QGen model additionally on the relevance label (e.g. relevant vs irrelevant) to generate queries across relevance buckets. However, we find that such QGen approaches are sub-optimal as they require the model to reason about the desired label and the input from a handful of examples. In this work, we propose to reduce this burden of LLMs by generating queries simultaneously for different labels. We hypothesize that instead of asking the model to generate, say, an irrelevant query given an input context, asking the model to generate an irrelevant query relative to a relevant query is a much simpler task setup for the model to reason about. Extensive experimentation across seven IR datasets shows that synthetic queries generated in such a fashion translates to a better downstream performance, suggesting that the generated queries are indeed of higher quality.
    摘要 In this work, we propose to simplify the task for LLMs by generating queries simultaneously for different labels. We hypothesize that instead of asking the model to generate, say, an irrelevant query given an input context, asking the model to generate an irrelevant query relative to a relevant query is a much simpler task setup for the model to reason about. Our extensive experimentation across seven IR datasets shows that synthetic queries generated in this way lead to better downstream performance, suggesting that the generated queries are of higher quality.

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

  • paper_url: http://arxiv.org/abs/2311.07919
  • repo_url: https://github.com/qwenlm/qwen-audio
  • paper_authors: Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren Zhou
  • for: 这 paper 是为了提高 audio-language 模型的多任务能力和多种 audio 类型处理能力。
  • methods: 这 paper 使用了一种多任务训练框架,通过预先训练 audio 模型,使其能够处理多种 audio 类型和任务,而不需要任务特定的 fine-tuning。
  • results: 这 paper 的结果表明,Qwen-Audio 模型可以在多种 benchmark 任务上达到出色的性能,超过其他类似模型。此外,基于 Qwen-Audio 的 Qwen-Audio-Chat 模型可以处理多种 audio 和文本输入,支持多轮对话和各种 audio-中心场景。
    Abstract Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field. Consequently, most existing works have only been able to support a limited range of interaction capabilities. In this paper, we develop the Qwen-Audio model and address this limitation by scaling up audio-language pre-training to cover over 30 tasks and various audio types, such as human speech, natural sounds, music, and songs, to facilitate universal audio understanding abilities. However, directly co-training all tasks and datasets can lead to interference issues, as the textual labels associated with different datasets exhibit considerable variations due to differences in task focus, language, granularity of annotation, and text structure. To overcome the one-to-many interference, we carefully design a multi-task training framework by conditioning on a sequence of hierarchical tags to the decoder for encouraging knowledge sharing and avoiding interference through shared and specified tags respectively. Remarkably, Qwen-Audio achieves impressive performance across diverse benchmark tasks without requiring any task-specific fine-tuning, surpassing its counterparts. Building upon the capabilities of Qwen-Audio, we further develop Qwen-Audio-Chat, which allows for input from various audios and text inputs, enabling multi-turn dialogues and supporting various audio-central scenarios.
    摘要 In this paper, we propose the Qwen-Audio model to address this limitation by scaling up audio-language pre-training to cover over 30 tasks and various audio types, including human speech, natural sounds, music, and songs. Our goal is to facilitate universal audio understanding abilities.However, directly co-training all tasks and datasets can lead to interference issues, as the textual labels associated with different datasets exhibit considerable variations due to differences in task focus, language, granularity of annotation, and text structure. To overcome this challenge, we carefully design a multi-task training framework by conditioning on a sequence of hierarchical tags to the decoder. This approach encourages knowledge sharing and avoids interference through shared and specified tags, respectively.Remarkably, Qwen-Audio achieves impressive performance across diverse benchmark tasks without requiring any task-specific fine-tuning, surpassing its counterparts. Building upon the capabilities of Qwen-Audio, we further develop Qwen-Audio-Chat, which allows for input from various audios and text inputs, enabling multi-turn dialogues and supporting various audio-central scenarios.

Automated title and abstract screening for scoping reviews using the GPT-4 Large Language Model

  • paper_url: http://arxiv.org/abs/2311.07918
  • repo_url: https://github.com/wilkox/gptscreenr
  • paper_authors: David Wilkins
  • for: 这篇论文旨在提供一种自动屏选学术文献的方法,以帮助进行大规模的文献筛选任务。
  • methods: 这篇论文使用GPT-4大语言模型(LLM)和链式思维技术来自动屏选学术文献。
  • results: 在验证中,GPTscreenR与替代方案之间的性能相似,具有71%的敏感性、89%的特异性和总准确率为84%。
    Abstract Scoping reviews, a type of literature review, require intensive human effort to screen large numbers of scholarly sources for their relevance to the review objectives. This manuscript introduces GPTscreenR, a package for the R statistical programming language that uses the GPT-4 Large Language Model (LLM) to automatically screen sources. The package makes use of the chain-of-thought technique with the goal of maximising performance on complex screening tasks. In validation against consensus human reviewer decisions, GPTscreenR performed similarly to an alternative zero-shot technique, with a sensitivity of 71%, specificity of 89%, and overall accuracy of 84%. Neither method achieved perfect accuracy nor human levels of intraobserver agreement. GPTscreenR demonstrates the potential for LLMs to support scholarly work and provides a user-friendly software framework that can be integrated into existing review processes.
    摘要 scoping 筛选文献审查,一种文献审查类型,需要大量的人工劳动来屏幕大量的学术论文的相关性。这篇文章介绍了 R 统计编程语言中的 GPTscreenR 包,使用 GPT-4 大语言模型(LLM)自动屏幕源文。该包利用链条思想,目的是提高复杂屏幕任务的性能。在与专家人员审核决策相比,GPTscreenR 的性能与替代的零shot 技术相似,具有71%的敏感性、89%的特异性和84%的总准确率。 neither 方法达到了完美准确性 nochuman 水平的内部一致性。 GPTscreenR 表明了 LLM 可以支持学术工作,并提供了易用的软件框架,可以与现有的审查过程集成。

Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey

  • paper_url: http://arxiv.org/abs/2311.07914
  • repo_url: None
  • paper_authors: Garima Agrawal, Tharindu Kumarage, Zeyad Alghami, Huan Liu
  • for: 本研究旨在探讨现代LLMs中的幻见问题,以及如何通过外部知识的扩展来减少幻见并提高逻辑准确率。
  • methods: 本研究分析了利用知识图为LLMs进行知识扩展的三种主要方法,并对这些方法进行比较性分析和实验评估。
  • results: 研究发现,通过利用知识图来扩展LLMs可以有效地减少幻见,并提高逻辑准确率。但是,还存在一些挑战和未来研究的可能性。
    Abstract The contemporary LLMs are prone to producing hallucinations, stemming mainly from the knowledge gaps within the models. To address this critical limitation, researchers employ diverse strategies to augment the LLMs by incorporating external knowledge, aiming to reduce hallucinations and enhance reasoning accuracy. Among these strategies, leveraging knowledge graphs as a source of external information has demonstrated promising results. In this survey, we conduct a comprehensive review of these knowledge-graph-based knowledge augmentation techniques in LLMs, focusing on their efficacy in mitigating hallucinations. We systematically categorize these methods into three overarching groups, offering both methodological comparisons and empirical evaluations of their performance. Lastly, the paper explores the challenges associated with these techniques and outlines potential avenues for future research in this emerging field.
    摘要 当代LLMs具有生成幻觉的特点,主要归结于模型中知识的缺失。为解决这一重要局限性,研究人员采用多种策略来增强LLMs,以减少幻觉并提高逻辑精度。在这篇评论中,我们进行了全面的知识图基于增强技术的评审,关注它们在减少幻觉方面的有效性。我们将这些方法分为三大类,并对它们进行了方法学比较和实验性评估。最后,文章探讨了这些技术的挑战和未来研究的可能性。

CPopQA: Ranking Cultural Concept Popularity by LLMs

  • paper_url: http://arxiv.org/abs/2311.07897
  • repo_url: None
  • paper_authors: Ming Jiang, Mansi Joshi
  • for: 这研究旨在检验大型自然语言模型(LLM)是否可以准确地掌握文化节日的统计趋势,特别是长尾节日的普遍性。
  • methods: 研究使用了一种新的几个问题解决任务(CPopQA),用于测试LLM的统计排名能力。
  • results: 实验表明,大型模型可以准确地掌握文化节日的统计趋势,其中GPT-3.5表现最佳,并能够识别不同洲的地域文化 proximity。
    Abstract Prior work has demonstrated large language models' (LLMs) potential to discern statistical tendencies within their pre-training corpora. Despite that, many examinations of LLMs' knowledge capacity focus on knowledge explicitly appearing in the training data or implicitly inferable from similar contexts. How well an LLM captures the corpus-level statistical trends of concepts for reasoning, especially long-tail ones, is still underexplored. In this study, we introduce a novel few-shot question-answering task (CPopQA) that examines LLMs' statistical ranking abilities for long-tail cultural concepts (e.g., holidays), with a specific focus on these concepts' popularity in the United States and the United Kingdom, respectively. We curate a dataset containing 459 holidays across 58 countries, generating a total of 6,000 QA testing pairs. Experiments on four strong LLMs show that large models are capable of ranking long-tail cultural concepts regarding their statistical tendency. Notably, GPT-3.5 displayed superior performance and exhibited its potential to identify geo-cultural proximity across continents.
    摘要 In this study, we introduce a new few-shot question-answering task (CPopQA) that examines LLMs' statistical ranking abilities for long-tail cultural concepts (e.g., holidays), with a specific focus on their popularity in the United States and the United Kingdom. We curated a dataset containing 459 holidays across 58 countries, resulting in a total of 6,000 QA testing pairs. Our experiments on four strong LLMs show that large models are capable of ranking long-tail cultural concepts based on their statistical tendency. Notably, GPT-3.5 displayed superior performance and exhibited its ability to identify geo-cultural proximity across continents.

Fair Abstractive Summarization of Diverse Perspectives

  • paper_url: http://arxiv.org/abs/2311.07884
  • repo_url: https://github.com/psunlpgroup/fairsumm
  • paper_authors: Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang
  • for: 这篇论文研究了如何实现公正的抽象概要,以便不会忽略某些群体的观点。
  • methods: 该论文提出了四种无需参考的自动评价指标,用于衡量抽象概要是否公正。
  • results: 实验表明,模型生成的概要和人类写的参考概要都受到低度公正的影响。研究还发现了一些常见的公正概要Influencing factor,并提出了三种简单 yet effective的方法来解决不公正概要问题。
    Abstract People from different social and demographic groups express diverse perspectives and conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and politics. A fair summary should provide a comprehensive coverage of diverse perspectives without underrepresenting certain groups. However, current work in summarization metrics and Large Language Models (LLMs) evaluation has not explored fair abstractive summarization. In this paper, we systematically investigate fair abstractive summarization for user-generated data. We first formally define fairness in abstractive summarization as not underrepresenting perspectives of any groups of people and propose four reference-free automatic metrics measuring the differences between target and source perspectives. We evaluate five LLMs, including three GPT models, Alpaca, and Claude, on six datasets collected from social media, online reviews, and recorded transcripts. Experiments show that both the model-generated and the human-written reference summaries suffer from low fairness. We conduct a comprehensive analysis of the common factors influencing fairness and propose three simple but effective methods to alleviate unfair summarization. Our dataset and code are available at https://github.com/psunlpgroup/FairSumm.
    摘要 人们来自不同的社会和人口结构组群体表达了多样化的观点和矛盾的意见,包括产品评论、医疗、法律和政治等领域。一个公正的概要应该提供广泛的多样化观点的涵盖,而无论任何群体的观点不被忽视。然而,当前的摘要 metric 和大语言模型(LLM)评价工作尚未探讨公正抽象摘要。在这篇论文中,我们系统地调查了用户生成数据的公正抽象摘要。我们首先正式定义了摘要公正性的形式化定义,即不对任何群体的观点进行忽视,并提出了四种Reference-free自动度量 measure 衡量目标和源观点之间的差异。我们对六个社交媒体、在线评论和录制过的语音材料上收集的六个数据集进行了五种 LLN 的评估,包括三种 GPT 模型、Alpaca 和 Claude。实验显示,模型生成的和人类写的参考摘要都受到低度的公正性影响。我们进行了对公正性的全面分析,并提出了三种简单 yet 有效的方法来缓解不公正的摘要。我们的数据集和代码可以在 获取。

Learning Mutually Informed Representations for Characters and Subwords

  • paper_url: http://arxiv.org/abs/2311.07853
  • repo_url: None
  • paper_authors: Yilin Wang, Xinyi Hu, Matthew R. Gormley
  • for: 这个论文的目的是提出一种新的语言模型,即束缚模型,用于组合字符和子词语言模型。
  • methods: 该模型使用视觉语言模型的想法,将字符和子词视为两种不同的感知modalities,并生成它们之间相互了解的表示。
  • results: 该模型在文本分类、命名实体识别和POS标记任务上表现出色,尤其是在噪音文本和低资源语言下表现更好。此外,该模型还在所有英文序列标签任务和分类任务上表现更好于其背景语言模型。
    Abstract Most pretrained language models rely on subword tokenization, which processes text as a sequence of subword tokens. However, different granularities of text, such as characters, subwords, and words, can contain different kinds of information. Previous studies have shown that incorporating multiple input granularities improves model generalization, yet very few of them outputs useful representations for each granularity. In this paper, we introduce the entanglement model, aiming to combine character and subword language models. Inspired by vision-language models, our model treats characters and subwords as separate modalities, and it generates mutually informed representations for both granularities as output. We evaluate our model on text classification, named entity recognition, and POS-tagging tasks. Notably, the entanglement model outperforms its backbone language models, particularly in the presence of noisy texts and low-resource languages. Furthermore, the entanglement model even outperforms larger pre-trained models on all English sequence labeling tasks and classification tasks. Our anonymized code is available at https://anonymous.4open.science/r/noisy-IE-A673
    摘要 大多数预训练语言模型都基于字符串分词,将文本处理为字符串Token的序列。然而,不同的文本粒度,如字符、子词和词,可以包含不同的信息。先前的研究表明,将多个输入粒度合并可以提高模型泛化性,但很少的其中输出有用的表示。在本文中,我们介绍了杂化模型,旨在将字符和子词语言模型相结合。受视语言模型的启示,我们的模型将字符和子词视为两种不同的modalities,并生成了彼此相互 Informed的表示。我们对文本分类、命名实体识别和POS标注任务进行评估。值得注意的是,杂化模型在噪音文本和低资源语言的情况下特别有优异表现,并且在所有英文序列标签任务和分类任务上都超越了它的后ION语言模型。我们的匿名代码可以在https://anonymous.4open.science/r/noisy-IE-A673中找到。

On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model

  • paper_url: http://arxiv.org/abs/2311.07820
  • repo_url: None
  • paper_authors: Nohil Park, Joonsuk Park, Kang Min Yoo, Sungroh Yoon
  • for: 这研究旨在探讨在多语言模型中使用准确的文本提示来改进模型的适应性。
  • methods: 研究使用了XGLM模型,并对其进行了Token-based提示和参数有效的微调。
  • results: 研究发现,使用提示微调可以达到或超过微调的性能,同时只需更新0.13%的模型参数。此外,提示微调也能更好地提高低资源语言的性能。
    Abstract An exciting advancement in the field of multilingual models is the emergence of autoregressive models with zero- and few-shot capabilities, a phenomenon widely reported in large-scale language models. To further improve model adaptation to cross-lingual tasks, another trend is to further fine-tune the language models with either full fine-tuning or parameter-efficient tuning. However, the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models has yet to be studied. Specifically, we lack an understanding of the role of linguistic distributions in multilingual models in the effectiveness of token-based prompt tuning. To address this question, we conduct experiments comparing prompt tuning and fine-tuning on the decoder-based multilingual model, XGLM, with four cross-lingual tasks (XNLI, PAWS-X, POS, NER). According to our study, prompt tuning achieves on par or better performance over fine-tuning across all languages while updating at most 0.13\% of the model parameters. Moreover, we empirically show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning. Our further analysis shows that the phenomenon is related to the tokenization scheme of the multilingual model.
    摘要 “在多语言模型领域,一种有趣的发展是零或几个shot能力的推论模型,这种现象广泛出现在大规模语言模型中。为了进一步改进模型在多语言任务中的适应性,另一种趋势是进一步精细调整语言模型,使其在cross-lingual任务中更加高效。然而,在多语言模型中PEFT(parameter-efficient fine-tuning)和cross-lingual任务之间的互动尚未得到了研究。具体来说,我们缺乏关于多语言模型中 linguistic distributions 对 token-based prompt tuning 的效iveness的理解。为了解答这个问题,我们在decoder-based多语言模型XGLM上进行了实验,使用了四个cross-lingual任务(XNLI、PAWS-X、POS、NER)。根据我们的研究,prompt tuning 在所有语言上具有on par或更好的性能,只需要更新模型参数的0.13%。此外,我们还证明了prompt tuning 对低资源语言的表现更加出色于 fine-tuning。我们进一步的分析表明,这种现象与多语言模型的tokenization scheme相关。”

cs.LG - 2023-11-14

Variational Temporal IRT: Fast, Accurate, and Explainable Inference of Dynamic Learner Proficiency

  • paper_url: http://arxiv.org/abs/2311.08594
  • repo_url: None
  • paper_authors: Yunsung Kim, Sreechan Sankaranarayanan, Chris Piech, Candace Thille
  • for: 这个论文旨在描述一种快速和准确地掌握动态学习者能力的方法。
  • methods: 这种方法基于变分Item Response Theory(VTIRT),它可以在巨量数据集上进行快速和准确的掌握。
  • results: 在应用于9个实际学生数据集上,VTIRT consistently 提高了预测未来学习者表现的精度,比其他学习者能力模型更好。
    Abstract Dynamic Item Response Models extend the standard Item Response Theory (IRT) to capture temporal dynamics in learner ability. While these models have the potential to allow instructional systems to actively monitor the evolution of learner proficiency in real time, existing dynamic item response models rely on expensive inference algorithms that scale poorly to massive datasets. In this work, we propose Variational Temporal IRT (VTIRT) for fast and accurate inference of dynamic learner proficiency. VTIRT offers orders of magnitude speedup in inference runtime while still providing accurate inference. Moreover, the proposed algorithm is intrinsically interpretable by virtue of its modular design. When applied to 9 real student datasets, VTIRT consistently yields improvements in predicting future learner performance over other learner proficiency models.
    摘要 <> translate "Dynamic Item Response Models extend the standard Item Response Theory (IRT) to capture temporal dynamics in learner ability. While these models have the potential to allow instructional systems to actively monitor the evolution of learner proficiency in real time, existing dynamic item response models rely on expensive inference algorithms that scale poorly to massive datasets. In this work, we propose Variational Temporal IRT (VTIRT) for fast and accurate inference of dynamic learner proficiency. VTIRT offers orders of magnitude speedup in inference runtime while still providing accurate inference. Moreover, the proposed algorithm is intrinsically interpretable by virtue of its modular design. When applied to 9 real student datasets, VTIRT consistently yields improvements in predicting future learner performance over other learner proficiency models." into Simplified Chinese. Dynamic Item Response Models 扩展标准Item Response Theory (IRT),以捕捉学生能力的时间动态。这些模型有可能使 instrucitonal 系统在实时监测学生水平的进程中活动地监测学生的能力。现有的动态项Response模型使用昂贵的推理算法,这些算法不利于处理大量数据。在这种工作中,我们提出了Variational Temporal IRT (VTIRT),用于快速和准确地推理学生的能力。VTIRT可以提供数个级别的速度提升,而且仍然提供准确的推理。此外,我们的算法具有自然的解释性,由于其模块化的设计。当应用于9个实际学生数据集时,VTIRT一致地提高了预测未来学生性能的性能,相比其他学生能力模型。

Uncertainty Quantification in Neural-Network Based Pain Intensity Estimation

  • paper_url: http://arxiv.org/abs/2311.08569
  • repo_url: None
  • paper_authors: Burcu Ozek, Zhenyuan Lu, Srinivasan Radhakrishnan, Sagar Kamarthi
    for: 本研究旨在提供一种基于神经网络的疼痛时间Interval估计方法,并考虑uncertainty量化。methods: 本研究采用三种算法:bootstrap方法、LossL优化 Algorithm和modified LossS优化 Algorithm。results: 研究结果显示LossS方法比其他两种方法提供更窄的预测Interval,并在不同的疼痛评估场景下(一般方法、个性化方法和混合方法)进行了评估。 hybrid方法在临床上展现出了最佳性能,具有实际应用价值。
    Abstract Improper pain management can lead to severe physical or mental consequences, including suffering, and an increased risk of opioid dependency. Assessing the presence and severity of pain is imperative to prevent such outcomes and determine the appropriate intervention. However, the evaluation of pain intensity is challenging because different individuals experience pain differently. To overcome this, researchers have employed machine learning models to evaluate pain intensity objectively. However, these efforts have primarily focused on point estimation of pain, disregarding the inherent uncertainty and variability present in the data and model. Consequently, the point estimates provide only partial information for clinical decision-making. This study presents a neural network-based method for objective pain interval estimation, incorporating uncertainty quantification. This work explores three algorithms: the bootstrap method, lower and upper bound estimation (LossL) optimized by genetic algorithm, and modified lower and upper bound estimation (LossS) optimized by gradient descent algorithm. Our empirical results reveal that LossS outperforms the other two by providing a narrower prediction interval. As LossS outperforms, we assessed its performance in three different scenarios for pain assessment: (1) a generalized approach (single model for the entire population), (2) a personalized approach (separate model for each individual), and (3) a hybrid approach (separate model for each cluster of individuals). Our findings demonstrate the hybrid approach's superior performance, with notable practicality in clinical contexts. It has the potential to be a valuable tool for clinicians, enabling objective pain intensity assessment while taking uncertainty into account. This capability is crucial in facilitating effective pain management and reducing the risks associated with improper treatment.
    摘要 不当的疼痛管理可能会导致严重的身体或心理后果,包括痛苦和对吗用药物的依赖。评估疼痛的存在和严重程度是非常重要的,以避免这些后果并确定适当的干预措施。然而,评估疼痛的程度是具有挑战性的,因为不同的人会经受疼痛的不同程度。为了解决这个问题,研究人员使用机器学习模型评估疼痛的程度。然而,这些努力主要集中在点估计疼痛的方面,忽略了数据和模型中存在的不确定性和变化。因此,点估计只提供了部分信息,不足以支持临床决策。本研究提出了一种基于神经网络的对疼痛间隔评估方法,并包括不确定性评估。本工作 explore three algorithms:bootstrap方法、LossL优化的遗传算法和LossS优化的梯度下降算法。我们的实验结果表明,LossS在三种方法中表现最佳,提供了较窄的预测 интерval。我们进一步评估LossS在不同的疼痛评估场景中的表现:(1)总体方法(单一模型为整个人口)、(2)个性化方法(每个个体各自的模型)和(3)混合方法(每个群体的模型)。我们的发现表明,混合方法的表现最佳,具有在临床上实际的实用性。这种能力可能是临床医生所需的一种有价值的工具,帮助评估疼痛的程度,同时考虑不确定性。这种能力是管理疼痛的正确方式的关键,可以减少不当的治疗所导致的风险。

Manifold learning in Wasserstein space

  • paper_url: http://arxiv.org/abs/2311.08549
  • repo_url: None
  • paper_authors: Keaton Hamm, Caroline Moosmüller, Bernhard Schmitzer, Matthew Thorpe
  • For: 本研究旨在建立拓扑学基础 для拟合概率分布的推理算法,具体是在紧邻的 compact和凸的 $\mathbb{R}^d$ 上的可数continue probability measure空间中。* Methods: 本文引入一种自然的构造方法,即使用 Wasserstein-2 距离 $W$ 来定义submanifold $\Lambda$ 上的 métrique $W_\Lambda$。这些submanifold不一定是平坦的,但仍然允许当地的线性化。然后,通过samples ${\lambda_i}{i=1}^N$ 和 pairwise extrinsic Wasserstein distances $W$ 来学习 $\Lambda$ 的潜在拓扑结构。* Results: 本文显示了如何从 samples ${\lambda_i}{i=1}^N$ 和 $W$ alone 学习 $\Lambda$ 的 métrique space $( \Lambda, W_{\Lambda})$,并且可以 asymptotically recover the metric space from a graph with nodes ${\lambda_i}{i=1}^N$ 和 edge weights $W(\lambda_i,\lambda_j)$. 此外,文章还证明了如何通过spectral analysis 来recover tangent space at a sample $\lambda$ via optimal transport maps from $\lambda$ to sufficiently close and diverse samples ${\lambda_i}{i=1}^N$.
    Abstract This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $W$. We begin by introducing a natural construction of submanifolds $\Lambda$ of probability measures equipped with metric $W_\Lambda$, the geodesic restriction of $W$ to $\Lambda$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. In particular, we show that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$. In addition, we demonstrate how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.
    摘要 The paper shows how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. Specifically, the paper demonstrates that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$.Furthermore, the paper shows how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper provides explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.Translated into Simplified Chinese, the paper aims to establish the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, using the Wasserstein-2 distance $W$. The paper introduces a natural construction of submanifolds $\Lambda$ of probability measures equipped with metric $W_\Lambda$, which is the geodesic restriction of $W$ to $\Lambda$. These submanifolds are not necessarily flat, but allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$.The paper shows how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. Specifically, the paper demonstrates that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$.Furthermore, the paper shows how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper provides explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.

Low-Frequency Load Identification using CNN-BiLSTM Attention Mechanism

  • paper_url: http://arxiv.org/abs/2311.08536
  • repo_url: None
  • paper_authors: Amanie Azzam, Saba Sanami, Amir G. Aghdam
  • for: 这个研究旨在提出一个以卷积神经网和两向长短Term Memory为基础的混合学习方法,用于低频率电力资料分解。
  • methods: 本研究使用了一个混合的卷积神经网和BILSTM模型,同时具有一个集成的注意力机制,以提高低频率电力资料分解的精度。
  • results: 根据 simulations 的结果,提出的方法在精度和计算时间两个方面都有所提高,较以往的方法更高。
    Abstract Non-intrusive Load Monitoring (NILM) is an established technique for effective and cost-efficient electricity consumption management. The method is used to estimate appliance-level power consumption from aggregated power measurements. This paper presents a hybrid learning approach, consisting of a convolutional neural network (CNN) and a bidirectional long short-term memory (BILSTM), featuring an integrated attention mechanism, all within the context of disaggregating low-frequency power data. While prior research has been mainly focused on high-frequency data disaggregation, our study takes a distinct direction by concentrating on low-frequency data. The proposed hybrid CNN-BILSTM model is adept at extracting both temporal (time-related) and spatial (location-related) features, allowing it to precisely identify energy consumption patterns at the appliance level. This accuracy is further enhanced by the attention mechanism, which aids the model in pinpointing crucial parts of the data for more precise event detection and load disaggregation. We conduct simulations using the existing low-frequency REDD dataset to assess our model performance. The results demonstrate that our proposed approach outperforms existing methods in terms of accuracy and computation time.
    摘要 非侵入式电力监测(NILM)是一种已知的技术,用于有效和经济地管理电力消耗。该方法通过聚合电力测量值来估算设备级电力消耗。本文提出了一种混合学习方法,包括卷积神经网络(CNN)和双向长短期记忆神经网络(BILSTM),具有集成注意机制,所在的低频数据分解领域中。相比之前的研究主要集中在高频数据分解领域,我们的研究做出了不同的选择,即集中在低频数据分解领域。该混合模型具有提取时间相关特征和空间相关特征的能力,可以准确地识别设备级电力消耗模式。此外,注意机制可以帮助模型更加准确地检测和分解电力负荷。我们使用现有的低频RED dataset进行了 simulations,以评估我们的模型性能。结果显示,我们的提posed方法在准确率和计算时间方面都超过了现有方法。

On semi-supervised estimation using exponential tilt mixture models

  • paper_url: http://arxiv.org/abs/2311.08504
  • repo_url: None
  • paper_authors: Ye Tian, Xinwei Zhang, Zhiqiang Tan
  • for: 这个论文是用来探讨 semi-supervised learning 中使用 exponential tilt mixture (ETM) 模型和最大非Parametric 极限likelihood 估计方法,以优化预测性能。
  • methods: 该论文使用 exponential tilt mixture 模型和最大非Parametric 极限likelihood 估计方法,并对这些方法的 asymptotic properties 进行分析和解释。
  • results: 论文表明,使用 exponential tilt mixture 模型和最大非Parametric 极限likelihood 估计方法可以提高 semi-supervised learning 中的预测性能,并且在Random Sampling 和 outcome-stratified Sampling 两种采样方法下都有更高的效率。
    Abstract Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only the predictors. Logistic regression is equivalent to an exponential tilt model in the labeled population. For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models and maximum nonparametric likelihood estimation, while allowing that the class proportions may differ between the unlabeled and labeled data. We derive asymptotic properties of ETM-based estimation and demonstrate improved efficiency over supervised logistic regression in a random sampling setup and an outcome-stratified sampling setup previously used. Moreover, we reconcile such efficiency improvement with the existing semiparametric efficiency theory when the class proportions in the unlabeled and labeled data are restricted to be the same. We also provide a simulation study to numerically illustrate our theoretical findings.
    摘要 请考虑一个半监督性Setting中,有一个标注的数据集,其中变量和回快应答的关系是一个二分类问题,并且有一个无标注数据集,只包含变量。在这个Setting中,椭圆倾斜模型(ETM)是等效于折射函数回快的模型。我们进一步分析和理解使用ETM模型进行半监督性估计,并允许类别之间的分布差异。我们 derive了ETM模型基于的估计的极限性质,并在随机抽样和结果抽样两种设置中证明了我们的估计方法比超vised折射函数回快更高效。此外,我们还与现有的半 Parametric效率理论相协调,当类别在无标注和标注数据中的分布相同时,我们的估计方法具有更高效的性质。最后,我们进行了一个数值 simulations to numerically illustrate our theoretical findings。

Variational Quantum Eigensolver with Constraints (VQEC): Solving Constrained Optimization Problems via VQE

  • paper_url: http://arxiv.org/abs/2311.08502
  • repo_url: None
  • paper_authors: Thinh Viet Le, Vassilis Kekatos
  • for: 提出了一种扩展了 celebritied VQE 的量子-классический算法模式,以解决优化问题中的约束。
  • methods: 使用了量子-классический算法模式,在量子圈中捕捉优化变量的vector,并在类比的Lagrangian函数中进行权重补做。
  • results: 可以准确地解决quadratically-constrained binary optimization (QCBO)问题,找到符号化二进制策略,并解决大规模的线性程序 (LP) 问题。
    Abstract Variational quantum approaches have shown great promise in finding near-optimal solutions to computationally challenging tasks. Nonetheless, enforcing constraints in a disciplined fashion has been largely unexplored. To address this gap, this work proposes a hybrid quantum-classical algorithmic paradigm termed VQEC that extends the celebrated VQE to handle optimization with constraints. As with the standard VQE, the vector of optimization variables is captured by the state of a variational quantum circuit (VQC). To deal with constraints, VQEC optimizes a Lagrangian function classically over both the VQC parameters as well as the dual variables associated with constraints. To comply with the quantum setup, variables are updated via a perturbed primal-dual method leveraging the parameter shift rule. Among a wide gamut of potential applications, we showcase how VQEC can approximately solve quadratically-constrained binary optimization (QCBO) problems, find stochastic binary policies satisfying quadratic constraints on the average and in probability, and solve large-scale linear programs (LP) over the probability simplex. Under an assumption on the error for the VQC to approximate an arbitrary probability mass function (PMF), we provide bounds on the optimality gap attained by a VQC. Numerical tests on a quantum simulator investigate the effect of various parameters and corroborate that VQEC can generate high-quality solutions.
    摘要 几何量子方法已经显示出了解决计算复杂任务的很好的承诺。然而,在一个有约束的情况下进行约束的执行还是尚未得到了充分的探索。为了解决这个差距,这项工作提出了一种混合量子-классический算法体系,称为VQEC,该体系将扩展了著名的VQE来处理优化问题中的约束。与标准VQE类似,VQEC中的优化变量vector被捕捉到了变量量子电路(VQC)的状态中。为了处理约束,VQEC在类比的Lagrangian函数上进行了类比的优化。在量子设置中,变量被更新了通过一种受到参数shift规则的偏好 primal-dual 方法。在各种应用中,我们展示了如何VQEC可以约approximately解决 quadratic-constrained binary optimization (QCBO)问题,找到随机二进制政策满足平均和概率上的二次约束,以及解决大规模的线性Program (LP) over the probability simplex。对于VQC的参数错误,我们提供了关于优化的距离上的下界。在一个量子仿真器上进行的数值测试表明,VQEC可以生成高质量的解决方案。

Ensemble sampling for linear bandits: small ensembles suffice

  • paper_url: http://arxiv.org/abs/2311.08376
  • repo_url: None
  • paper_authors: David Janz, Alexander E. Litvak, Csaba Szepesvári
  • for: 这个论文是为了研究 Stochastic Linear Bandit 设定下 ensemble sampling 的效果的。
  • methods: 这个论文使用了 ensemble sampling 方法,并且ensemble size 是 $d \log T$ 的 ORDER。
  • results: 这个论文显示,在标准假设下, ensemble sampling 可以避免 linear 时间的 ensemble size 增长,而达到 near $\sqrt{T}$ 的 regret。此外,这个论文还是首次在结构化设定下不需要 ensemble size 与 $T$ 成线性关系。
    Abstract We provide the first useful, rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d \log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\sqrt{T}$ order regret. Ours is also the first result that allows infinite action sets.
    摘要 我们提供了阶梯循环探索的首个有用且严谨的分析,具体是针对多元阶梯循环设定中的数学线上抽样。具体来说,我们证明,在标准假设之下,一个 $d$-维的阶梯循环中的抽样 ensemble 的大小为 $m$,其中 $m$ 的预设值为 $d \log T$,则对应的忘却 regret 将 bounded by order $(d \log T)^{5/2} \sqrt{T}$。我们的结果是首个不需要ensemble size scales linearly with $T$的结果,而且可以 дости到近 $\sqrt{T}$ 类的忘却 regret。此外,我们的结果还允许无限的动作集。

Transformers can optimally learn regression mixture models

  • paper_url: http://arxiv.org/abs/2311.08362
  • repo_url: None
  • paper_authors: Reese Pathak, Rajat Sen, Weihao Kong, Abhimanyu Das
  • for: investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions.
  • methods: use transformers to learn a mixture of linear regressions, and prove that the decision-theoretic optimal procedure is indeed implementable by a transformer.
  • results: transformers can learn mixtures of regressions in a sample-efficient fashion and are somewhat robust to distribution shifts, and can make predictions that are close to the optimal predictor.
    Abstract Mixture models arise in many regression problems, but most methods have seen limited adoption partly due to these algorithms' highly-tailored and model-specific nature. On the other hand, transformers are flexible, neural sequence models that present the intriguing possibility of providing general-purpose prediction methods, even in this mixture setting. In this work, we investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions. We construct a generative process for a mixture of linear regressions for which the decision-theoretic optimal procedure is given by data-driven exponential weights on a finite set of parameters. We observe that transformers achieve low mean-squared error on data generated via this process. By probing the transformer's output at inference time, we also show that transformers typically make predictions that are close to the optimal predictor. Our experiments also demonstrate that transformers can learn mixtures of regressions in a sample-efficient fashion and are somewhat robust to distribution shifts. We complement our experimental observations by proving constructively that the decision-theoretic optimal procedure is indeed implementable by a transformer.
    摘要 “混合模型在许多回归问题中出现,但大多数方法尚未得到广泛采用,一部分原因是这些算法具有特定和模型固有的特点。然而,变换器是一种灵活的神经网络模型,它们可能提供一种通用预测方法,即使在混合Setting中。在这个工作中,我们研究了假设变Transformers可以学习混合回归的优化预测器。我们构造了一个生成过程,其中混合线性回归的决策理论优化程序是通过数据驱动的几何加权来实现。我们发现,变换器在生成的数据上具有低 Mean Squared Error。在推理时,我们也证明了变换器通常会预测接近优化预测器。我们的实验还表明,变换器可以在一个样本效率的方式上学习混合回归,并且对分布变化具有一定的抗预测性。我们补充了我们的实验观察,通过构造性地证明了决策理论优化程序实际上可以通过变换器来实现。”Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Sparsity-Preserving Differentially Private Training of Large Embedding Models

  • paper_url: http://arxiv.org/abs/2311.08357
  • repo_url: None
  • paper_authors: Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
  • for: 保护用户隐私,防止大规模矩阵模型中泄露用户个人信息。
  • methods: 提出了两种新算法DP-FEST和DP-AdaFEST,可以在私有训练大规模矩阵模型时保持梯度稀疏性,提高训练效率。
  • results: 实验结果表明,我们的算法可以在实际世界 dataset 上实现大量梯度减小($10^6 \times$),同时保持比较高的准确率。
    Abstract As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions ($10^6 \times$) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.
    摘要

Mean-field variational inference with the TAP free energy: Geometric and statistical properties in linear models

  • paper_url: http://arxiv.org/abs/2311.08442
  • repo_url: None
  • paper_authors: Michael Celentano, Zhou Fan, Licong Lin, Song Mei
  • for: study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p.
  • methods: minimize the TAP free energy, and use an Approximate Message Passing (AMP) algorithm to find the local minimizer.
  • results: the TAP free energy has a local minimizer which provides a consistent estimate of the posterior marginals, and the algorithm linearly converges to the minimizer within this local neighborhood.Here’s the text in Traditional Chinese:
  • for: 研究在高维 Bayesian 线性模型中使用 mean-field variational inference,当 sample size n 与 dimension p 相对接近时。
  • methods: 使用 TAP 自由能点来寻找最佳解,并使用 Approximate Message Passing (AMP) 算法寻找最佳解的地方最小化。
  • results: TAP 自由能点有一个地方最小化,可以提供正确的 posterior 分布 marginal,并且使用 AMP 算法可以将解导向到这个最小化中。
    Abstract We study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p. In high dimensions, the common approach of minimizing a Kullback-Leibler divergence from the posterior distribution, or maximizing an evidence lower bound, may deviate from the true posterior mean and underestimate posterior uncertainty. We study instead minimization of the TAP free energy, showing in a high-dimensional asymptotic framework that it has a local minimizer which provides a consistent estimate of the posterior marginals and may be used for correctly calibrated posterior inference. Geometrically, we show that the landscape of the TAP free energy is strongly convex in an extensive neighborhood of this local minimizer, which under certain general conditions can be found by an Approximate Message Passing (AMP) algorithm. We then exhibit an efficient algorithm that linearly converges to the minimizer within this local neighborhood. In settings where it is conjectured that no efficient algorithm can find this local neighborhood, we prove analogous geometric properties for a local minimizer of the TAP free energy reachable by AMP, and show that posterior inference based on this minimizer remains correctly calibrated.
    摘要 我们研究了mean-field变量推断在 bayesian 线性模型中,当样本大小 n 与维度 p 相对可观时。在高维度下,通常采用 minimum 库拉布-莱布劳分配函数或最大化证明下界来实现 posterior 分布的估计,但这可能会偏离真实 posterior mean 并低估 posterior uncertainty。我们改用 TAP 自由能的最小化,在高维度极限框架中显示了一个本地最小值,该值可以提供正确的 posterior marginals 和 correctly calibrated posterior inference。从 геометрической角度来看,TAP 自由能的 landscape 在一个广泛的扩展 neighborhood 中是强烈凹陷的,其中可以使用 Approximate Message Passing(AMP)算法来找到本地最小值。我们还提出了一种高效的算法,可以在这个本地 neighborhood 中线性征化到最小值。在某些情况下,我们 conjecture 了可能无法有效地找到本地 neighborhood,我们则证明了一个本地最小值可以通过 AMP 算法来实现,并且 posterior inference 基于这个最小值仍然正确推断。

Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty

  • paper_url: http://arxiv.org/abs/2311.08309
  • repo_url: None
  • paper_authors: Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, Sepp Hochreiter
  • for: 本研究旨在提高机器学习模型在实际应用中的决策能力,特别是分别知道和不知道模型的能力。
  • methods: 研究人员提出了一种新的measure of predictive uncertainty,以解决现有措施的限制。
  • results: 研究人员通过控制的 sintetic任务和ImageNet数据集的evalution,证明了新的措施的优势,其行为更加合理,在实际应用中也具有优势。
    Abstract Applying a machine learning model for decision-making in the real world requires to distinguish what the model knows from what it does not. A critical factor in assessing the knowledge of a model is to quantify its predictive uncertainty. Predictive uncertainty is commonly measured by the entropy of the Bayesian model average (BMA) predictive distribution. Yet, the properness of this current measure of predictive uncertainty was recently questioned. We provide new insights regarding those limitations. Our analyses show that the current measure erroneously assumes that the BMA predictive distribution is equivalent to the predictive distribution of the true model that generated the dataset. Consequently, we introduce a theoretically grounded measure to overcome these limitations. We experimentally verify the benefits of our introduced measure of predictive uncertainty. We find that our introduced measure behaves more reasonably in controlled synthetic tasks. Moreover, our evaluations on ImageNet demonstrate that our introduced measure is advantageous in real-world applications utilizing predictive uncertainty.
    摘要 使用机器学习模型进行决策需要能够区分模型所知和模型所不知。一个重要的评估模型知识的因素是量化模型预测不确定性。预测不确定性通常由巴YES插值模型(BMA)预测分布的熵来度量。然而,当前的这种预测不确定性度量存在限制。我们提供新的洞察,这种限制的原因是BMA预测分布与真实生成数据集的模型预测分布不同。因此,我们引入基于理论的预测不确定性度量,以超越这些限制。我们实验证明了我们引入的预测不确定性度量具有更合理的行为,并在控制的 sintetic任务中进行了实验证明。此外,我们在ImageNet上进行了评估,发现我们引入的预测不确定性度量在实际应用中具有优势。

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

  • paper_url: http://arxiv.org/abs/2311.08290
  • repo_url: None
  • paper_authors: Nicholas E. Corrado, Josiah P. Hanna
  • for: 提高在做RL算法时的数据效率,避免采样错误导致的启发式学习。
  • methods: 提出一种适应性的、偏置外样本采集方法(PROPS),通过增加当前策略下的样本动作概率来减少采样错误,从而提高数据效率。
  • results: 在 MuJoCo benchmark 任务上以及权值任务上,证明 PROPS 可以在训练过程中逐步减少采样错误,并提高偏置外样本采集法的数据效率。
    Abstract On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the policy evaluation setting has shown that non-i.i.d., off-policy sampling can produce data with lower sampling error than on-policy sampling can produce. Motivated by this observation, we introduce an adaptive, off-policy sampling method to improve the data efficiency of on-policy policy gradient algorithms. Our method, Proximal Robust On-Policy Sampling (PROPS), reduces sampling error by collecting data with a behavior policy that increases the probability of sampling actions that are under-sampled with respect to the current policy. Rather than discarding data from old policies -- as is commonly done in on-policy algorithms -- PROPS uses data collection to adjust the distribution of previously collected data to be approximately on-policy. We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms. Our work improves the RL community's understanding of a nuance in the on-policy vs off-policy dichotomy: on-policy learning requires on-policy data, not on-policy sampling.
    摘要 在返回学习(RL)算法中,在政策更新中使用独立同分布(i.i.d)的轨迹是常见的。然而,只observe了finite个轨迹后,在当前策略下进行on-policy sampling可能会产生数据不符合预期的on-policy数据分布。这种抽样错误会导致更新不稳定和数据不fficient。在策略评估设置下, latest work 表明,非独立、off-policy抽样可以生成比on-policy抽样更加稳定的数据。为了解决这个问题,我们介绍了一种适应的off-policy抽样方法,即PROPS。PROPS方法通过使用当前策略增加对于当前策略下尚未抽样的动作的抽样概率来减少抽样错误。不同于通常在on-policy算法中抛弃老策略下的数据,PROPS方法使用数据采集来调整先前采集的数据,使其更加接近on-policy数据分布。我们对MuJoCo continueous-action和discrete-action任务进行了实验,并证明了以下两点:1. PROPS方法在训练过程中逐渐减少抽样错误。2. PROPS方法可以提高on-policy policy gradient算法的数据效率。我们的工作有助于RL社区更好地理解on-policy学习和off-policy学习之间的区别:on-policy学习不需要on-policy抽样,而是需要on-policy数据。

Mixed Attention Network for Cross-domain Sequential Recommendation

  • paper_url: http://arxiv.org/abs/2311.08272
  • repo_url: https://github.com/guanyu-lin/man
  • paper_authors: Guanyu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li, Meng Wang
  • for: 提高数据稀缺问题的现代推荐系统中的sequential推荐,特别是新用户的推荐。
  • methods: 我们提出了一种Mixed Attention Network(MAN),包括本地/全局编码层、混合注意层和本地/全局预测层。本地/全局编码层用于捕捉域特定的Sequential模式,混合注意层用于捕捉本地和全局的项目相似性、序列融合和用户群组 across multiple domains。最后,本地/全局预测层用于进一步演化和结合域特定和跨域的兴趣。
  • results: 在两个真实世界数据集(每个有两个域)上,我们的提出的模型得到了superiority。此外,我们还进行了further study,发现我们的方法和组件都是model-agnostic和effective。代码和数据可以在https://github.com/Guanyu-Lin/MAN上获取。
    Abstract In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain sequential recommendation models such as PiNet and DASL have a common drawback relying heavily on overlapped users in different domains, which limits their usage in practical recommender systems. In this paper, we propose a Mixed Attention Network (MAN) with local and global attention modules to extract the domain-specific and cross-domain information. Firstly, we propose a local/global encoding layer to capture the domain-specific/cross-domain sequential pattern. Then we propose a mixed attention layer with item similarity attention, sequence-fusion attention, and group-prototype attention to capture the local/global item similarity, fuse the local/global item sequence, and extract the user groups across different domains, respectively. Finally, we propose a local/global prediction layer to further evolve and combine the domain-specific and cross-domain interests. Experimental results on two real-world datasets (each with two domains) demonstrate the superiority of our proposed model. Further study also illustrates that our proposed method and components are model-agnostic and effective, respectively. The code and data are available at https://github.com/Guanyu-Lin/MAN.
    摘要 现代推荐系统中,顺序推荐利用用户的时间序列行为来进行有效的下一个项目建议,尤其是对于新用户来说,它受到数据稀缺问题的困扰。跨Domain推荐是一条有前途的方向,它通过跨多个领域的数据来提高数据稀缺领域的性能。然而,现有的跨Domain顺序推荐模型,如PiNet和DASL,具有依赖于不同领域之间的重叠用户的限制,这限制了它们在实际推荐系统中的应用。在这篇论文中,我们提出了一种混合注意网络(MAN),它包括本地/全球注意模块来EXTRACT DOMAIN-SPECIFIC AND CROSS-DOMAIN信息。首先,我们提出了本地/全球编码层,用于捕捉不同领域的Sequential pattern。然后,我们提出了混合注意层,包括物品相似注意、序列融合注意和组prototype注意,用于捕捉本地/全球item相似性、融合本地/全球item序列和提取不同领域之间的用户组。最后,我们提出了本地/全球预测层,用于进一步演化和结合不同领域的用户兴趣。实验结果表明,我们提出的方法在两个真实世界数据集(各有两个领域)上具有优势。此外,我们还进行了进一步的研究,并证明我们的方法和组件是模型免疫的和有效的。代码和数据可以在https://github.com/Guanyu-Lin/MAN上获取。

Mobility-Induced Graph Learning for WiFi Positioning

  • paper_url: http://arxiv.org/abs/2311.08271
  • repo_url: None
  • paper_authors: Kyuwon Han, Seung Min Yu, Seong-Lyun Kim, Seung-Woo Ko
  • for: 这篇论文的目的是提出一种基于智能手机的用户流动跟踪方法,以实现更高精度的用户位置确定。
  • methods: 该方法使用了两种不同的图构建方法,即时间驱动的移动图(TMG)和方向驱动的移动图(DMG),然后通过图 convolutional neural network(GCN)相互学习,将这两种图的信息 fusion 以实现更高精度的位置确定。
  • results: 对Field experimentResults 表明,提出的方法可以实现更高精度的用户位置确定,比如root mean square errors(RMSE)为1.398(m)和1.073(m)在自助学习和半助学习学习框架下,分别下降了27.3%和44.4%。
    Abstract A smartphone-based user mobility tracking could be effective in finding his/her location, while the unpredictable error therein due to low specification of built-in inertial measurement units (IMUs) rejects its standalone usage but demands the integration to another positioning technique like WiFi positioning. This paper aims to propose a novel integration technique using a graph neural network called Mobility-INduced Graph LEarning (MINGLE), which is designed based on two types of graphs made by capturing different user mobility features. Specifically, considering sequential measurement points (MPs) as nodes, a user's regular mobility pattern allows us to connect neighbor MPs as edges, called time-driven mobility graph (TMG). Second, a user's relatively straight transition at a constant pace when moving from one position to another can be captured by connecting the nodes on each path, called a direction-driven mobility graph (DMG). Then, we can design graph convolution network (GCN)-based cross-graph learning, where two different GCN models for TMG and DMG are jointly trained by feeding different input features created by WiFi RTTs yet sharing their weights. Besides, the loss function includes a mobility regularization term such that the differences between adjacent location estimates should be less variant due to the user's stable moving pace. Noting that the regularization term does not require ground-truth location, MINGLE can be designed under semi- and self-supervised learning frameworks. The proposed MINGLE's effectiveness is extensively verified through field experiments, showing a better positioning accuracy than benchmarks, say root mean square errors (RMSEs) being 1.398 (m) and 1.073 (m) for self- and semi-supervised learning cases, respectively.
    摘要 用智能手机的用户流动跟踪可以有效地找到他/她的位置,但由于内置的各种各样的各种各样测量单元 (IMU) 的低精度,这种独立使用不可靠,需要与其他定位技术结合。这篇论文提出了一种新的集成技术,即用于流动图学学习 (MINGLE),该技术基于两种不同的图, capture 用户流动特征。 Specifically, we consider sequential measurement points (MPs) as nodes, and a user's regular mobility pattern allows us to connect neighbor MPs as edges, called time-driven mobility graph (TMG). Second, a user's relatively straight transition at a constant pace when moving from one position to another can be captured by connecting the nodes on each path, called a direction-driven mobility graph (DMG). Then, we can design graph convolution network (GCN)-based cross-graph learning, where two different GCN models for TMG and DMG are jointly trained by feeding different input features created by WiFi RTTs yet sharing their weights. Besides, the loss function includes a mobility regularization term such that the differences between adjacent location estimates should be less variant due to the user's stable moving pace. Noting that the regularization term does not require ground-truth location, MINGLE can be designed under semi- and self-supervised learning frameworks. The proposed MINGLE's effectiveness is extensively verified through field experiments, showing a better positioning accuracy than benchmarks, with root mean square errors (RMSEs) being 1.398 (m) and 1.073 (m) for self- and semi-supervised learning cases, respectively.

A Simple and Powerful Framework for Stable Dynamic Network Embedding

  • paper_url: http://arxiv.org/abs/2311.09251
  • repo_url: https://github.com/edwarddavis1/universal_dynamic_embedding_with_testing
  • paper_authors: Ed Davis, Ian Gallagher, Daniel John Lawson, Patrick Rubin-Delanchy
  • for: 本文解决了动态网络嵌入的问题,即将动态网络中的节点表示为低维度空间中演化的向量。 static网络嵌入领域较为成熟,而动态网络嵌入领域相对较为初级。
  • methods: 我们提出了一种将广泛使用的静态网络嵌入方法应用于扩展 adjacency matrix,以生成可读写的动态网络嵌入。我们提供了一个理论保证,即无论嵌入维度如何,这些扩展方法都会生成稳定的嵌入,即时间和空间中的节点行为相同的节点将是可交换的。
  • results: 我们定义了一种用于评估动态网络嵌入质量的假设测试框架,并使用这个框架测试了一些虚拟网络的动态网络嵌入。我们发现,even in trivial cases, unstable methods often either conservative or encode incorrect structure。相比之下,我们的稳定 unfolded 方法不仅更容易理解,也更有力量,在比较权重方面表现更好。
    Abstract In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of static network embedding is wide and established, the field of dynamic network embedding is comparatively in its infancy. We propose that a wide class of established static network embedding methods can be used to produce interpretable and powerful dynamic network embeddings when they are applied to the dilated unfolded adjacency matrix. We provide a theoretical guarantee that, regardless of embedding dimension, these unfolded methods will produce stable embeddings, meaning that nodes with identical latent behaviour will be exchangeable, regardless of their position in time or space. We additionally define a hypothesis testing framework which can be used to evaluate the quality of a dynamic network embedding by testing for planted structure in simulated networks. Using this, we demonstrate that, even in trivial cases, unstable methods are often either conservative or encode incorrect structure. In contrast, we demonstrate that our suite of stable unfolded methods are not only more interpretable but also more powerful in comparison to their unstable counterparts.
    摘要 在这篇论文中,我们讨论了动态网络嵌入问题,即将动态网络中的节点表示为低维度空间中演化的 вектор。虽然静态网络嵌入领域已经广泛发展,但动态网络嵌入领域相对较为未发展。我们建议可以使用广泛应用于静态网络嵌入的已有方法来生成可读取和强大的动态网络嵌入。我们提供了一种理论保证,即不 matter embedding dimension,使用扩展 unfolded 方法生成的嵌入都是稳定的,意味着在时间或空间上相同的latent behavior的节点都是可交换的。此外,我们定义了一个假设测试框架,可以用来评估动态网络嵌入的质量,通过测试模拟网络中的植入结构。使用这个框架,我们示出了在rivial cases中,不稳定的方法通常是保守的或者编码了错误的结构。相比之下,我们的稳定 unfolded 方法不仅更加可读取也更加强大,与不稳定的对手相比。

Counterfactual Explanation for Regression via Disentanglement in Latent Space

  • paper_url: http://arxiv.org/abs/2311.08228
  • repo_url: None
  • paper_authors: Xuan Zhao, Klaus Broelemann, Gjergji Kasneci
  • for: 本研究旨在提供一种生成可能性解释(Counterfactual Explanation,CE)的新方法,以帮助用户更好地理解和控制AI系统的预测结果。
  • methods: 本方法首先在潜在空间中分离了标签相关和标签不相关的维度,然后通过组合这些维度和预定输出生成CE。该方法的基本想法是,理想的反因搜索应该专注于标签不相关的输入特征,并建议更改到目标相关的特征。在潜在空间进行搜索可以帮助实现这个目标。
  • results: 在多个图像和表格数据集上进行了多种实验,显示了该方法与三种现有方法相比,能够更有效地返回更加靠近原始数据折衣的结果。这是高维机器学习应用中至关重要的特点。代码将在本研究发表后作为开源包make disponibles。
    Abstract Counterfactual Explanations (CEs) help address the question: How can the factors that influence the prediction of a predictive model be changed to achieve a more favorable outcome from a user's perspective? Thus, they bear the potential to guide the user's interaction with AI systems since they represent easy-to-understand explanations. To be applicable, CEs need to be realistic and actionable. In the literature, various methods have been proposed to generate CEs. However, the majority of research on CEs focuses on classification problems where questions like ``What should I do to get my rejected loan approved?" are raised. In practice, answering questions like ``What should I do to increase my salary?" are of a more regressive nature. In this paper, we introduce a novel method to generate CEs for a pre-trained regressor by first disentangling the label-relevant from the label-irrelevant dimensions in the latent space. CEs are then generated by combining the label-irrelevant dimensions and the predefined output. The intuition behind this approach is that the ideal counterfactual search should focus on the label-irrelevant characteristics of the input and suggest changes toward target-relevant characteristics. Searching in the latent space could help achieve this goal. We show that our method maintains the characteristics of the query sample during the counterfactual search. In various experiments, we demonstrate that the proposed method is competitive based on different quality measures on image and tabular datasets in regression problem settings. It efficiently returns results closer to the original data manifold compared to three state-of-the-art methods, which is essential for realistic high-dimensional machine learning applications. Our code will be made available as an open-source package upon the publication of this work.
    摘要 “ counterfactual explanations (CEs) 可以帮助解答:如何让预测模型的预测结果更加有利?因此,它具有导引用户与人工智能系统互动的潜力。在文献中,许多方法已经被提出来生成 CE,但大多数研究专注于分类问题,例如“我可以做什么来获得被拒绝的贷款被批准?”相比之下,在实际应用中,更有需要回归的问题,例如“我可以做什么来增加我的薪资?”这些问题是较为回归的性质。在这篇论文中,我们提出了一种新的方法来生成 CE,具体来说是在潜在空间中分解标签相关的维度和标签不相关的维度,然后将这些维度与预定的输出结合来生成 CE。我们的想法是,理想的对抗方案应该专注于标签不相关的特征,并建议更改为标签相关的特征。在潜在空间中进行对抗搜寻可以帮助 достичь这个目标。我们显示了我们的方法可以将查询样本中的特征保留在对抗搜寻中,并在不同的质量度上与三种现有的方法进行比较。结果显示,我们的方法在数据集上的表现稳定且高效,能够更好地返回更加类似于原始数据构造的结果。我们将在这篇论文发表时公开源代码。”

Federated Skewed Label Learning with Logits Fusion

  • paper_url: http://arxiv.org/abs/2311.08202
  • repo_url: None
  • paper_authors: Yuwei Wang, Runhan Li, Hao Tan, Xuefeng Jiang, Sheng Sun, Min Liu, Bo Gao, Zhiyuan Wu
  • for: 本研究旨在Addressing label distribution skew challenge in federated learning (FL) settings, where data label categories are imbalanced on each client.
  • methods: 我们提出了FedBalance方法,将calibrate local models’ logits to correct optimization bias caused by data heterogeneity. Specifically, we introduce an extra private weak learner on the client side to capture the variance of different data.
  • results: 我们的方法可以实现13% higher average accuracy compared with state-of-the-art methods.
    Abstract Federated learning (FL) aims to collaboratively train a shared model across multiple clients without transmitting their local data. Data heterogeneity is a critical challenge in realistic FL settings, as it causes significant performance deterioration due to discrepancies in optimization among local models. In this work, we focus on label distribution skew, a common scenario in data heterogeneity, where the data label categories are imbalanced on each client. To address this issue, we propose FedBalance, which corrects the optimization bias among local models by calibrating their logits. Specifically, we introduce an extra private weak learner on the client side, which forms an ensemble model with the local model. By fusing the logits of the two models, the private weak learner can capture the variance of different data, regardless of their category. Therefore, the optimization direction of local models can be improved by increasing the penalty for misclassifying minority classes and reducing the attention to majority classes, resulting in a better global model. Extensive experiments show that our method can gain 13\% higher average accuracy compared with state-of-the-art methods.
    摘要

Modeling Complex Disease Trajectories using Deep Generative Models with Semi-Supervised Latent Processes

  • paper_url: http://arxiv.org/abs/2311.08149
  • repo_url: https://github.com/uzh-dqbm-cmi/eustar_dgm4h
  • paper_authors: Cécile Trottet, Manuel Schürch, Ahmed Allam, Imon Barua, Liubov Petelytska, Oliver Distler, Anna-Maria Hoffmann-Vold, Michael Krauthammer, the EUSTAR collaborators
  • for: 模型和分析复杂的疾病轨迹,寻找有意义的时间隐藏表示法。
  • methods: 使用深度生成时间序列模型,利用隐藏过程来解释观察到的疾病轨迹,并通过医学概念来增强可解释性。
  • results: 可以用于数据分析和临床假设测试,包括找到相似的病人和疾病分类新亚型,以及个性化在线监测和预测多变量时序序列,同时Quantification of uncertainty。
    Abstract In this paper, we propose a deep generative time series approach using latent temporal processes for modeling and holistically analyzing complex disease trajectories. We aim to find meaningful temporal latent representations of an underlying generative process that explain the observed disease trajectories in an interpretable and comprehensive way. To enhance the interpretability of these latent temporal processes, we develop a semi-supervised approach for disentangling the latent space using established medical concepts. By combining the generative approach with medical knowledge, we leverage the ability to discover novel aspects of the disease while integrating medical concepts into the model. We show that the learned temporal latent processes can be utilized for further data analysis and clinical hypothesis testing, including finding similar patients and clustering the disease into new sub-types. Moreover, our method enables personalized online monitoring and prediction of multivariate time series including uncertainty quantification. We demonstrate the effectiveness of our approach in modeling systemic sclerosis, showcasing the potential of our machine learning model to capture complex disease trajectories and acquire new medical knowledge.
    摘要 在这篇论文中,我们提出了一种深度生成时间序列方法,使用潜在的时间序列过程来模型和整体分析复杂的疾病轨迹。我们希望找到可解释的时间潜在表示,以解释观察到的疾病轨迹。为了增强这些时间潜在表示的解释力,我们开发了一种半监督的方法,用已知的医学概念来分离潜在空间。通过结合生成方法和医学知识,我们利用了发现新的疾病方面的能力,并将医学概念集成到模型中。我们表明了学习的时间潜在过程可以用于进一步的数据分析和临床假设测试,包括找到相似的病人和疾病分类。此外,我们的方法可以实现个性化的在线监测和预测多元时序列,包括不确定性评估。我们在模型系统性综合病中示cases,展示了我们的机器学习模型的潜在性,可以捕捉复杂的疾病轨迹,并获得新的医学知识。

Lite it fly: An All-Deformable-Butterfly Network

  • paper_url: http://arxiv.org/abs/2311.08125
  • repo_url: None
  • paper_authors: Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran, Ngai Wong
  • for: 这篇论文旨在提出一种基于扭曲蝴蝶(DeBut)的神经网络压缩方法,可以减少神经网络中的参数量和计算量,同时保持神经网络的性能。
  • methods: 这篇论文使用了一种名为扭曲蝴蝶(DeBut)的新方法,它可以将权重矩阵分解成一系列特殊的扭曲因子,从而实现神经网络压缩。
  • results: 这篇论文的实验结果表明,使用扭曲蝴蝶(DeBut)方法可以压缩神经网络,同时保持神经网络的性能。例如,可以将PointNet压缩到只有5%的参数和5%的性能下降。
    Abstract Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compression orthogonal to the traditional ways of pruning or low-rank decomposition. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions, which explains the empirically good performance of DeBut layers. By developing an automated DeBut chain generator, we show for the first time the viability of homogenizing a DNN into all DeBut layers, thus achieving an extreme sparsity and compression. Various examples and hardware benchmarks verify the advantages of All-DeBut networks. In particular, we show it is possible to compress a PointNet to < 5% parameters with < 5% accuracy drop, a record not achievable by other compression schemes.
    摘要 大多数深度神经网络(DNNs)基本由卷积层和全连接层组成,其中线性变换可以被表示为权重矩阵和特征张量组成的乘法。最近提出的弹性蝴蝶(DeBut)层 decomposes the 权重矩阵 into 通用、蝴蝶形状的因子,从而实现网络压缩,与传统的减少或低级别压缩不同。这项工作揭示了DeBut层与深度first-order和点 wise卷积之间的密切关系,解释了DeBut层的实验性好表现。我们还开发了一个自动生成DeBut链的工具,实现了对DNN的同化,以实现极高的稀疏性和压缩。具体来说,我们显示了可以将PointNet压缩到<5%的参数下,且减少<5%的精度,这是其他压缩方案无法实现的纪录。

Understanding learning from EEG data: Combining machine learning and feature engineering based on hidden Markov models and mixed models

  • paper_url: http://arxiv.org/abs/2311.08113
  • repo_url: None
  • paper_authors: Gabriel Rodrigues Palma, Conor Thornberry, Seán Commins, Rafael de Andrade Moral
  • for: 这个论文主要针对的是用theta振荡(4-8Hz)来描述距离学习和记忆功能的发展。
  • methods: 这篇论文使用了隐马尔可夫模型和线性混合模型来提取EEG数据中的特征。
  • results: 这篇论文发现,使用隐藏马尔可夫模型和线性混合模型来处理EEG数据,并使用深度神经网络进行分类,可以高效地分类学习者和非学习者。
    Abstract Theta oscillations, ranging from 4-8 Hz, play a significant role in spatial learning and memory functions during navigation tasks. Frontal theta oscillations are thought to play an important role in spatial navigation and memory. Electroencephalography (EEG) datasets are very complex, making any changes in the neural signal related to behaviour difficult to interpret. However, multiple analytical methods are available to examine complex data structure, especially machine learning based techniques. These methods have shown high classification performance and the combination with feature engineering enhances the capability of these methods. This paper proposes using hidden Markov and linear mixed effects models to extract features from EEG data. Based on the engineered features obtained from frontal theta EEG data during a spatial navigation task in two key trials (first, last) and between two conditions (learner and non-learner), we analysed the performance of six machine learning methods (Polynomial Support Vector Machines, Non-linear Support Vector Machines, Random Forests, K-Nearest Neighbours, Ridge, and Deep Neural Networks) on classifying learner and non-learner participants. We also analysed how different standardisation methods used to pre-process the EEG data contribute to classification performance. We compared the classification performance of each trial with data gathered from the same subjects, including solely coordinate-based features, such as idle time and average speed. We found that more machine learning methods perform better classification using coordinate-based data. However, only deep neural networks achieved an area under the ROC curve higher than 80% using the theta EEG data alone. Our findings suggest that standardising the theta EEG data and using deep neural networks enhances the classification of learner and non-learner subjects in a spatial learning task.
    摘要 θ振荡(4-8Hz)在空间学习和记忆功能中发挥重要作用。前rontalθ振荡被认为在空间导航和记忆中发挥重要作用。electroencephalography(EEG)数据非常复杂,因此任何与行为相关的脑电信号变化很难以解释。然而,多种分析方法可以检查复杂数据结构,特别是基于机器学习的技术。这些方法在分类性能方面表现出色,并且通过特征工程提高了这些方法的能力。这篇论文提出使用隐马尔可夫和线性混合模型提取EEG数据中的特征。基于前frontalθ振荡EEG数据在空间导航任务中的两个关键尝试(第一次和最后一次)以及两个条件(学习者和非学习者),我们分析了六种机器学习方法(多项式支持向量机器、非线性支持向量机器、随机森林、k-最近邻居、梯度和深度神经网络)在分类学习者和非学习者参与者的性能。我们还分析了不同的标准化方法如何影响分类性能。我们与同样来自同一个参与者的坐标基于特征(如空闲时间和平均速度)进行比较。我们发现更多的机器学习方法在坐标基于特征上表现出色。然而,只有深度神经网络在θEEG数据alone上达到了80%的分类精度。我们的发现表明,标准化θEEG数据和使用深度神经网络可以提高空间学习任务中学习者和非学习者的分类。

Evolutionary-enhanced quantum supervised learning model

  • paper_url: http://arxiv.org/abs/2311.08081
  • repo_url: None
  • paper_authors: Anton Simen Albino, Rodrigo Bloot, Otto M. Pires, Erick G. S. Nascimento
  • for: 提高 NISQ 设备上超vised learning 的效率和准确率
  • methods: 使用变量拓扑的量子回归circuit,通过自适应策略和多热编码的综合体系来解决 barren plateau 问题
  • results: 比较 experiments 表明,我们的演化加料逻辑模型可以减轻 barren plateau 问题,提高模型的准确率和训练效率,并在传统难以处理的 dataset 上达到更高的性能
    Abstract Quantum supervised learning, utilizing variational circuits, stands out as a promising technology for NISQ devices due to its efficiency in hardware resource utilization during the creation of quantum feature maps and the implementation of hardware-efficient ansatz with trainable parameters. Despite these advantages, the training of quantum models encounters challenges, notably the barren plateau phenomenon, leading to stagnation in learning during optimization iterations. This study proposes an innovative approach: an evolutionary-enhanced ansatz-free supervised learning model. In contrast to parametrized circuits, our model employs circuits with variable topology that evolves through an elitist method, mitigating the barren plateau issue. Additionally, we introduce a novel concept, the superposition of multi-hot encodings, facilitating the treatment of multi-classification problems. Our framework successfully avoids barren plateaus, resulting in enhanced model accuracy. Comparative analysis with variational quantum classifiers from the technology's state-of-the-art reveal a substantial improvement in training efficiency and precision. Furthermore, we conduct tests on a challenging dataset class, traditionally problematic for conventional kernel machines, demonstrating a potential alternative path for achieving quantum advantage in supervised learning for NISQ era.
    摘要 To address these challenges, this study proposes an innovative approach: an evolutionary-enhanced ansatz-free supervised learning model. Unlike parametrized circuits, our model uses circuits with variable topology that evolve through an elitist method, mitigating the barren plateau issue. Additionally, we introduce a novel concept, the superposition of multi-hot encodings, which facilitates the treatment of multi-classification problems.Our framework successfully avoids barren plateaus, resulting in enhanced model accuracy. Comparative analysis with variational quantum classifiers from the technology's state-of-the-art reveals a substantial improvement in training efficiency and precision. Furthermore, we conduct tests on a challenging dataset class, traditionally problematic for conventional kernel machines, demonstrating a potential alternative path for achieving quantum advantage in supervised learning for NISQ era.Translated into Simplified Chinese:量子指导学习,使用变量Circuits,在NISQ设备上表现出了优异的承袭,具有硬件资源利用的效率,在创建量子特征图和实现硬件高效的 ansatz 中使用变量Circuits。然而,量子模型的训练仍然遇到了挑战,主要表现为杯瓷平台现象,导致优化迭代中的学习停滞。为了解决这些挑战,这种研究提出了一种创新的方法:一种生化增强的 ansatz-free 指导学习模型。与 Parametrized Circuits 不同,我们的模型使用变量 topology 的Circuits,通过一种Elitist方法进行演化,从而 Mitigate the barren plateau 问题。此外,我们还提出了一种新的概念,即多个热门编码的超position,可以方便处理多类别问题。我们的框架成功避免了杯瓷平台问题,从而实现了提高的模型准确率。与现代量子分类器的比较分析表明,我们的模型在训练效率和精度上具有明显的改善。此外,我们还在一个传统上难以处理的数据集类型上进行了测试,表明了量子优势在指导学习中的可能性。

Communication-Constrained Bayesian Active Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2311.08053
  • repo_url: None
  • paper_authors: Victor Croisfelt, Shashi Raj Pandey, Osvaldo Simeone, Petar Popovski
  • For: + The paper addresses key questions in active learning settings with a remote teacher and constrained communication channels. + The goal is to reduce the number of required communication rounds while acquiring the most useful information.* Methods: + The paper introduces Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD), which integrates Bayesian active learning with compression via a linear mix-up mechanism. + The method selects batches of inputs based on their epistemic uncertainty, addressing “confirmation bias” and reducing the number of required communication rounds. + The proposed mix-up compression strategy is integrated with the epistemic uncertainty-based active batch selection process.* Results: + The proposed CC-BAKD protocol is shown to be effective in reducing the number of required communication rounds while acquiring the most useful information. + The method is evaluated on several benchmark datasets and is found to outperform existing active learning methods in terms of communication efficiency and accuracy.Here is the answer in Simplified Chinese:* For: + 本文Addresses key questions in active learning settings with a remote teacher and constrained communication channels. + 目的是尽量减少通信轮数,同时获取最有用的信息。* Methods: + 本文引入Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD),它将Bayesian active learning与压缩相结合,使用线性混合机制。 + 方法选择批处理输入,基于其 epistemic uncertainty,解决”confirmation bias”问题,减少通信轮数。 + 提议的混合压缩策略与 epistemic uncertainty-based active batch selection 进程相结合。* Results: + 提议的 CC-BAKD 协议被证明可以减少通信轮数,同时获取最有用的信息。 + 方法在多个 benchmark 数据集上进行评估,与现有的活动学习方法相比,在通信效率和准确率方面表现出色。
    Abstract Consider an active learning setting in which a learner has a training set with few labeled examples and a pool set with many unlabeled inputs, while a remote teacher has a pre-trained model that is known to perform well for the learner's task. The learner actively transmits batches of unlabeled inputs to the teacher through a constrained communication channel for labeling. This paper addresses the following key questions: (i) Active batch selection: Which batch of inputs should be sent to the teacher to acquire the most useful information and thus reduce the number of required communication rounds? (ii) Batch encoding: How do we encode the batch of inputs for transmission to the teacher to reduce the communication resources required at each round? We introduce Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD), a novel protocol that integrates Bayesian active learning with compression via a linear mix-up mechanism. Bayesian active learning selects the batch of inputs based on their epistemic uncertainty, addressing the "confirmation bias" that is known to increase the number of required communication rounds. Furthermore, the proposed mix-up compression strategy is integrated with the epistemic uncertainty-based active batch selection process to reduce the communication overhead per communication round.
    摘要 请注意,以下文本将使用简化中文。在本文中,我们考虑了一个活动学习 Setting,其中学习者 Possesses 一个具有少量标注示例的训练集,以及一个包含多个未标注输入的池集。而远程教师 Possesses 一个预训练的模型,该模型在学习者的任务中已知perform well。学习者可以通过一个受限的通信渠道向教师发送批处理的未标注输入,以获得标签。本文解决以下关键问题:(i) 活动批处理:哪些批处理 Should 被发送到教师以获得最有用的信息,从而减少通信轮数?(ii) 批处理编码:如何编码批处理以便在每次通信中减少通信资源?我们提出了一种名为 Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD) 的协议,该协议将感知活动学习与压缩相结合,通过线性混合机制来实现。感知活动学习可以根据输入的 épistémic 不确定性选择批处理,这可以避免“确irmation bias”,这种情况可能增加通信轮数。此外,我们还将混合压缩策略与 épistémic 不确定性基于的活动批处理选择过程相结合,以减少每次通信轮数中的通信资源。

Velocity-Based Channel Charting with Spatial Distribution Map Matching

  • paper_url: http://arxiv.org/abs/2311.08016
  • repo_url: None
  • paper_authors: Maximilian Stahlke, George Yammine, Tobias Feigl, Bjoern M. Eskofier, Christopher Mutschler
  • for: 本研究 targets at improving the positioning performance in challenging, non-line-of-sight (NLoS) dominated indoor environments using fingerprint-based localization.
  • methods: alternatively, the paper proposes a novel framework that uses channel-charting to avoid the labeling effort required in fingerprinting models. The approach uses velocity information and topological map information to transform the channel charts into real coordinates.
  • results: experiments conducted on two real-world datasets using 5G and distributed single-input/multiple-output system (SIMO) radio systems show that the proposed approach achieves similar position accuracies even with noisy velocity estimates and coarse map information.
    Abstract Fingerprint-based localization improves the positioning performance in challenging, non-line-of-sight (NLoS) dominated indoor environments. However, fingerprinting models require an expensive life-cycle management including recording and labeling of radio signals for the initial training and regularly at environmental changes. Alternatively, channel-charting avoids this labeling effort as it implicitly associates relative coordinates to the recorded radio signals. Then, with reference real-world coordinates (positions) we can use such charts for positioning tasks. However, current channel-charting approaches lag behind fingerprinting in their positioning accuracy and still require reference samples for localization, regular data recording and labeling to keep the models up to date. Hence, we propose a novel framework that does not require reference positions. We only require information from velocity information, e.g., from pedestrian dead reckoning or odometry to model the channel charts, and topological map information, e.g., a building floor plan, to transform the channel charts into real coordinates. We evaluate our approach on two different real-world datasets using 5G and distributed single-input/multiple-output system (SIMO) radio systems. Our experiments show that even with noisy velocity estimates and coarse map information, we achieve similar position accuracies
    摘要 fingerprint-based 位置定位在具有强度不良、非直线视野(NLoS)的室内环境中提高位置性能。然而, fingerprinting 模型需要成本的生命周期管理,包括初始训练和环境变化时的记录和标注。 alternatively, channel-charting 可以避免这些标注努力,因为它将记录的电磁信号相对坐标相关联。然后,使用参考世界坐标(位置),我们可以使用这些图表进行位置定位任务。然而,现有的 channel-charting 方法在位置精度方面落后于 fingerprinting,并且仍需要参考样本 для地图更新和定位任务。因此,我们提出了一种新的框架,不需要参考位置。我们只需要 velocity 信息,例如人行速度测量或 odometry,来模型电磁信号图表,以及建筑层次图信息,例如大楼层次图,将电磁信号图表转换成真实坐标。我们在两个不同的实际数据集上进行了5G和分布式单输入多输出系统(SIMO)电磁系统的实验。我们的实验结果表明,即使velocity 估计较准噪和地图信息较粗糙,我们仍可以达到类似的位置精度。

Out-of-Distribution Knowledge Distillation via Confidence Amendment

  • paper_url: http://arxiv.org/abs/2311.07975
  • repo_url: https://github.com/lawliet-zzl/ca
  • paper_authors: Zhilin Zhao, Longbing Cao, Yixuan Zhang
  • for: 本文旨在提出一种基于标准网络的外围数据探测方法,以确保网络的可靠性和可靠性。
  • methods: 本文提出了一种基于标准网络的OOD知识储存框架,可以在没有ID训练数据的情况下使用。这个框架利用标准网络中对OOD样本的敏感知识来raft一个适用于分类ID和OOD样本的 binary 分类器。为了实现这一点,本文提出了一种增强信任修正(CA)技术,可以将OOD样本转换成ID样本,并同时修正来自标准网络的预测信任值。
  • results: 经验表明,提出的方法能够有效地探测外围数据,并且对不同的数据集和网络架构进行了广泛的验证。
    Abstract Out-of-distribution (OOD) detection is essential in identifying test samples that deviate from the in-distribution (ID) data upon which a standard network is trained, ensuring network robustness and reliability. This paper introduces OOD knowledge distillation, a pioneering learning framework applicable whether or not training ID data is available, given a standard network. This framework harnesses OOD-sensitive knowledge from the standard network to craft a binary classifier adept at distinguishing between ID and OOD samples. To accomplish this, we introduce Confidence Amendment (CA), an innovative methodology that transforms an OOD sample into an ID one while progressively amending prediction confidence derived from the standard network. This approach enables the simultaneous synthesis of both ID and OOD samples, each accompanied by an adjusted prediction confidence, thereby facilitating the training of a binary classifier sensitive to OOD. Theoretical analysis provides bounds on the generalization error of the binary classifier, demonstrating the pivotal role of confidence amendment in enhancing OOD sensitivity. Extensive experiments spanning various datasets and network architectures confirm the efficacy of the proposed method in detecting OOD samples.
    摘要 非常赞!这篇论文引入了对外部数据(OOD)探测的探索,以确保网络的可靠性和可靠性。这篇论文介绍了一种新的学习框架,可以不需要训练标准网络的数据,即使没有标准数据,也可以学习OOD敏感知识。这个框架利用标准网络的OOD敏感知识来编制一个二分类器,能够 отличи ID 和 OOD 样本。为此,我们提出了一种创新的方法——信任修正(CA),可以将 OOD 样本转化成 ID 样本,同时对标准网络的预测结果进行修正。这种方法允许同时生成 ID 和 OOD 样本,每个样本都 accompanied by 修正后的预测信任度,从而使得二分类器可以具备对 OOD 的敏感性。我们的理论分析表明, confidence 修正对二分类器的泛化误差具有约束作用,从而提高 OOD 敏感性。我们的实验结果表明,提出的方法在不同的数据集和网络架构下具有普遍的效果。

Higher-Order Expander Graph Propagation

  • paper_url: http://arxiv.org/abs/2311.07966
  • repo_url: None
  • paper_authors: Thomas Christie, Yu He
  • for: 本文旨在提高图像通信的精度,解决图像通信中的过度压缩问题。
  • methods: 本文提出了两种基于分别的 constructions 的方法,以便在图像通信中捕捉更高阶的相关性。
  • results: 实验结果表明,使用高阶扩展图在图像通信中可以提高精度,并且可以更好地捕捉复杂数据中的相关性。
    Abstract Graph neural networks operate on graph-structured data via exchanging messages along edges. One limitation of this message passing paradigm is the over-squashing problem. Over-squashing occurs when messages from a node's expanded receptive field are compressed into fixed-size vectors, potentially causing information loss. To address this issue, recent works have explored using expander graphs, which are highly-connected sparse graphs with low diameters, to perform message passing. However, current methods on expander graph propagation only consider pair-wise interactions, ignoring higher-order structures in complex data. To explore the benefits of capturing these higher-order correlations while still leveraging expander graphs, we introduce higher-order expander graph propagation. We propose two methods for constructing bipartite expanders and evaluate their performance on both synthetic and real-world datasets.
    摘要 图 neural networks 操作在图结构数据上,通过Edge上的信息交换来实现。一个限制是过度压缩问题,当节点的扩展接受范围上的信息被压缩成固定大小的矢量时,可能导致信息损失。为解决这个问题,最近的工作已经探索使用扩展图,这些图是高度连接的稀疏图,但具有低的矩形维度。然而,现有的扩展图传播方法只考虑对角交互,忽略复杂数据中的更高阶相关性。为了探索在扩展图上capture这些更高阶相关性的同时,我们引入更高级的扩展图传播方法。我们提出了两种方法来构建半个bijective expander,并评估了它们在 sintetic 和实际数据上的性能。

Language Models are Better Bug Detector Through Code-Pair Classification

  • paper_url: http://arxiv.org/abs/2311.07957
  • repo_url: https://github.com/kamel773/code_pair_classification
  • paper_authors: Kamel Alrashedy
  • for: This paper is written for researchers and developers who are interested in using large language models (LLMs) for code generation and understanding, and who want to explore alternative methods for fine-tuning these models that do not require a large labeled dataset.
  • methods: The paper proposes a code-pair classification task as an alternative to fine-tuning LLMs for bug detection and repair. In this task, both the buggy and non-buggy versions of the code are given to the model, and the model identifies the buggy ones.
  • results: The paper evaluates the code-pair classification task in a real-world dataset of bug detection and uses two of the most powerful LLMs. The results show that the model can often pick the buggy from the non-buggy version of the code, and that the code-pair classification task is much easier compared to the traditional method of being given a snippet and deciding if and where a bug exists.
    Abstract Large language models (LLMs) such as GPT-3.5 and CodeLlama are powerful models for code generation and understanding. Fine-tuning these models comes with a high computational cost and requires a large labeled dataset. Alternatively, in-context learning techniques allow models to learn downstream tasks with only a few examples. Recently, researchers have shown how in-context learning performs well in bug detection and repair. In this paper, we propose code-pair classification task in which both the buggy and non-buggy versions are given to the model, and the model identifies the buggy ones. We evaluate our task in real-world dataset of bug detection and two most powerful LLMs. Our experiments indicate that an LLM can often pick the buggy from the non-buggy version of the code, and the code-pair classification task is much easier compared to be given a snippet and deciding if and where a bug exists.
    摘要 大型语言模型(LLM)如GPT-3.5和CodeLlama具有代码生成和理解的能力。微调这些模型的计算成本高,需要大量标注数据。 Alternatively,在上下文学习技术可以让模型在只需几个示例下学习下游任务。近期,研究人员表明,在bug检测和修复中,在上下文学习表现出色。在这篇论文中,我们提议代码对 Classification任务,给模型提供了buggy和非buggy两个版本的代码,并让模型标识buggy版本。我们在实际世界的漏斗检测数据集上进行了实验,并使用了两个最强的LLM进行评估。我们的实验表明,一个LLM可以很快地从buggy和非buggy两个版本中选择buggy版本,而代码对 Classification任务比给定一个示例并决定其中是否存在漏洞更加容易。

A Fast and Simple Algorithm for computing the MLE of Amplitude Density Function Parameters

  • paper_url: http://arxiv.org/abs/2311.07951
  • repo_url: None
  • paper_authors: Mahdi Teimouri
  • for: 这paper是为了提出一种快速和准确地计算振荡分布参数的方法。
  • methods: 该方法使用了两个简单的变换将振荡数据 проек到了水平和垂直轴上,并证明了这些项目后的数据遵循zero-location symmetric α-stale分布,可以快速计算MLE。
  • results: 通过实验研究和分析两组实际的雷达数据,提出的方法可以准确地计算振荡分布参数。
    Abstract Over the last decades, the family of $\alpha$-stale distributions has proven to be useful for modelling in telecommunication systems. Particularly, in the case of radar applications, finding a fast and accurate estimation for the amplitude density function parameters appears to be very important. In this work, the maximum likelihood estimator (MLE) is proposed for parameters of the amplitude distribution. To do this, the amplitude data are \emph{projected} on the horizontal and vertical axes using two simple transformations. It is proved that the \emph{projected} data follow a zero-location symmetric $\alpha$-stale distribution for which the MLE can be computed quite fast. The average of computed MLEs based on two \emph{projections} is considered as estimator for parameters of the amplitude distribution. Performance of the proposed \emph{projection} method is demonstrated through simulation study and analysis of two sets of real radar data.
    摘要 Note:* "α-stale" should be translated as "α-稳定" (alpha-stable) in Simplified Chinese.* "amplitude distribution" should be translated as "幅度分布" (amplitude distribution) in Simplified Chinese.* "maximum likelihood estimator" should be translated as "最大可能性估计" (maximum likelihood estimator) in Simplified Chinese.* "projections" should be translated as "投影" (projections) in Simplified Chinese.

Finding Inductive Loop Invariants using Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07948
  • repo_url: None
  • paper_authors: Adharsh Kamath, Aditya Senthilnathan, Saikat Chakraborty, Pantazis Deligiannis, Shuvendu K. Lahiri, Akash Lal, Aseem Rastogi, Subhajit Roy, Rahul Sharma
  • for: 这篇论文旨在探讨大语言模型(LLM)是否可以提供一种新的自动程序验证解决方案。
  • methods: 本论文首先绘制了一个适用于程序循环的验证问题集,然后设计了一个使用LLM获取循环 inductive invariants的提问。最后, authors 使用一种有效的符号工具和LLM的组合来验证这些 invariants,并与纯 Symbolic 基准进行比较。
  • results: 研究结果表明,LLMs 可以帮助提高自动程序验证的状态艺术。
    Abstract Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding inductive loop invariants is an undecidable problem, and despite a long history of research towards practical solutions, it remains far from a solved problem. This paper investigates the capabilities of the Large Language Models (LLMs) in offering a new solution towards this old, yet important problem. To that end, we first curate a dataset of verification problems on programs with loops. Next, we design a prompt for exploiting LLMs, obtaining inductive loop invariants, that are checked for correctness using sound symbolic tools. Finally, we explore the effectiveness of using an efficient combination of a symbolic tool and an LLM on our dataset and compare it against a purely symbolic baseline. Our results demonstrate that LLMs can help improve the state-of-the-art in automated program verification.
    摘要 <>转换文本到简化中文。<>循环 invariants 是程序逻辑的基础知识。它们确定了循环的行为特性。当加上 inductive 性时,它们变得有用于形式验证,以确保程序的运行时行为具有强数学保证。 inductiveness Ensures that the invariants can be checked locally without consulting the entire program, making them indispensable artifacts in a formal proof of correctness. 寻找 inductive loop invariants 是一个不可解决的问题,尽管有长期的研究努力,它仍然远未解决。这篇文章 investigate LLMs 在提供一个新的解决方案方面的能力。为此,我们首先筛选了循环逻辑问题的数据集。然后,我们设计了一个用于利用 LLMs 获取 inductive loop invariants 的 prompt,并使用 зву symbox 的可靠工具进行检查。最后,我们研究了使用一种精efficient的组合,包括一种可靠的符号工具和一种 LLM,对我们的数据集进行检查,并与纯symbolic 基准进行比较。我们的结果表明,LLMs 可以帮助提高自动程序验证的状态。

Clinical Characteristics and Laboratory Biomarkers in ICU-admitted Septic Patients with and without Bacteremia

  • paper_url: http://arxiv.org/abs/2311.08433
  • repo_url: None
  • paper_authors: Sangwon Baek, Seung Jun Lee
  • for: 这个研究是为了评估医学实验室标记物的预测力,以优化预测模型,提高抗血症患者中血液感染的预测精度。
  • methods: 这是一个逆向协调研究,使用多变量 logsitic 回归分析对各种实验室标记物进行了独立分析,并将其结果用于建立预测模型。
  • results: 研究发现, combinig PCT、bilirubin、NLR、板块、氯酸、ESR和GCS 分数可以提高预测模型的准确性(AUC=0.907,95%CI:0.843-0.956),并发现血液感染和死亡率之间存在高度相关性(0.004)。
    Abstract Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers with high performance to optimize the predictive model for bacteremia. This retrospective cross-sectional study was conducted at the ICU department of Gyeongsang National University Changwon Hospital in 2019. Adult patients qualifying SEPSIS-3 (increase in sequential organ failure score greater than or equal to 2) criteria with at least two sets of blood culture were selected. Collected data was initially analyzed independently to identify the significant predictors, which was then used to build the multivariable logistic regression (MLR) model. A total of 218 patients with 48 cases of true bacteremia were analyzed in this research. Both CRP and PCT showed a substantial area under the curve (AUC) value for discriminating bacteremia among septic patients (0.757 and 0.845, respectively). To further enhance the predictive accuracy, we combined PCT, bilirubin, neutrophil lymphocyte ratio (NLR), platelets, lactic acid, erythrocyte sedimentation rate (ESR), and Glasgow Coma Scale (GCS) score to build the predictive model with an AUC of 0.907 (95% CI, 0.843 to 0.956). In addition, a high association between bacteremia and mortality rate was discovered through the survival analysis (0.004). While PCT is certainly a useful index for distinguishing patients with and without bacteremia by itself, our MLR model indicates that the accuracy of bacteremia prediction substantially improves by the combined use of PCT, bilirubin, NLR, platelets, lactic acid, ESR, and GCS score.
    摘要 几个研究已经研究了在医院 intensivist 部门(ICU)中预测血液感染的生物标志物的 диагности效果。因此,这个研究检验了医学实验室中的标志物是否可以提高预测模型的性能。这是一项在2019年在江原国立大学昌原医院ICU部门进行的回顾性跨sectional研究。选择符合SEPSIS-3(增加序列器衰竭分数大于或等于2)标准的成人患者,并取得至少两组血液文化,并分析了这些数据以确定主要预测变量。共分析了218名患者,其中有48例真血液感染。CRP和PCT都显示了较高的区间 beneath the curve(AUC)值,用于分别患者中的血液感染(0.757和0.845)。为了进一步提高预测准确性,我们将PCT、bilirubin、neutrophil lymphocyte ratio(NLR)、platelets、lactic acid、erythrocyte sedimentation rate(ESR)和Glasgow Coma Scale(GCS)分数组合以建立预测模型,AUC为0.907(95% CI,0.843-0.956)。此外,我们还发现了血液感染和死亡率之间的高度相关性,通过生存分析(0.004)。虽然PCT是一个有用的指标,可以单独分别患者中的血液感染,但我们的多变量Logistic regression模型表明,通过合并PCT、bilirubin、NLR、platelets、lactic acid、ESR和GCS分数,预测血液感染的准确性可以得到显著提高。

Discretized Distributed Optimization over Dynamic Digraphs

  • paper_url: http://arxiv.org/abs/2311.07939
  • repo_url: None
  • paper_authors: Mohammadreza Doostmohammadian, Wei Jiang, Muwahida Liaquat, Alireza Aghasi, Houman Zarrabi
  • for: 这篇论文主要用于分布式优化问题,具体来说是在动态指向图上进行分布式学习。
  • methods: 这篇论文使用了分布式优化算法,该算法可以在具有变化网络拓扑的情况下进行分布式学习。
  • results: 这篇论文提出了一种不需要随机权重设计的分布式优化方法,并且证明了该方法的动态稳定性和收敛性。
    Abstract We consider a discrete-time model of continuous-time distributed optimization over dynamic directed-graphs (digraphs) with applications to distributed learning. Our optimization algorithm works over general strongly connected dynamic networks under switching topologies, e.g., in mobile multi-agent systems and volatile networks due to link failures. Compared to many existing lines of work, there is no need for bi-stochastic weight designs on the links. The existing literature mostly needs the link weights to be stochastic using specific weight-design algorithms needed both at the initialization and at all times when the topology of the network changes. This paper eliminates the need for such algorithms and paves the way for distributed optimization over time-varying digraphs. We derive the bound on the gradient-tracking step-size and discrete time-step for convergence and prove dynamic stability using arguments from consensus algorithms, matrix perturbation theory, and Lyapunov theory. This work, particularly, is an improvement over existing stochastic-weight undirected networks in case of link removal or packet drops. This is because the existing literature may need to rerun time-consuming and computationally complex algorithms for stochastic design, while the proposed strategy works as long as the underlying network is weight-symmetric and balanced. The proposed optimization framework finds applications to distributed classification and learning.
    摘要 我们考虑一个精简时间模型的连续时间分布式优化运算在动态指向グラフ(digraph)上,具体是适用于分布式学习。我们的优化算法在一般强连接动态网络上运行,包括移动多智能系统和不稳定网络,因为链接失败而导致网络 topology 的变化。与许多现有的研究不同,我们不需要链接上的两侧可能性检测,即链接上的偏振量设计。现有文献通常需要在初始化和网络 topology 变化时进行链接重新设计,而我们的方法则没有这个需求。这篇文章提供了关于步进大小和精简时间步骤的下界,并证明了动态稳定性使用协议分析、矩阵扭转理论和Lyapunov理论。这个工作特别是在当链接失败或封包损失时,比现有的随机链接网络优化更好。这是因为现有文献可能需要在网络 topology 变化时重新跑时间consuming和computationally Complex的算法,而我们的策略则可以在网络 weight-symmetric 和均衡的情况下运行。我们的优化框架具有应用于分布式分类和学习的实际应用。

Self-supervised Heterogeneous Graph Variational Autoencoders

  • paper_url: http://arxiv.org/abs/2311.07929
  • repo_url: None
  • paper_authors: Yige Zhao, Jianxiang Yu, Yao Cheng, Chengcheng Yu, Yiding Liu, Xiang Li, Shuaiqiang Wang
  • for: Addressing the problems of missing attributes, inaccurate attributes, and scarce labels in Heterogeneous Information Networks (HINs) using a generative self-supervised model called SHAVA.
  • methods: SHAVA uses a variational graph autoencoder framework to learn both node-level and attribute-level embeddings in the encoder, and reconstructs both links and attributes in the decoder. It generates an initial low-dimensional representation matrix for all nodes, which is used to reconstruct raw features of attributed nodes and rectify inaccurate attributes.
  • results: SHAVA is shown to be superior in tackling HINs with missing and inaccurate attributes, outperforming existing heterogeneous graph neural networks (HGNNs) in extensive experiments.
    Abstract Heterogeneous Information Networks (HINs), which consist of various types of nodes and edges, have recently demonstrated excellent performance in graph mining. However, most existing heterogeneous graph neural networks (HGNNs) ignore the problems of missing attributes, inaccurate attributes and scarce labels for nodes, which limits their expressiveness. In this paper, we propose a generative self-supervised model SHAVA to address these issues simultaneously. Specifically, SHAVA first initializes all the nodes in the graph with a low-dimensional representation matrix. After that, based on the variational graph autoencoder framework, SHAVA learns both node-level and attribute-level embeddings in the encoder, which can provide fine-grained semantic information to construct node attributes. In the decoder, SHAVA reconstructs both links and attributes. Instead of directly reconstructing raw features for attributed nodes, SHAVA generates the initial low-dimensional representation matrix for all the nodes, based on which raw features of attributed nodes are further reconstructed to leverage accurate attributes. In this way, SHAVA can not only complete informative features for non-attributed nodes, but rectify inaccurate ones for attributed nodes. Finally, we conduct extensive experiments to show the superiority of SHAVA in tackling HINs with missing and inaccurate attributes.
    摘要 《异类信息网络(HIN)》,它们由多种类型的节点和边组成,在图矿采取中表现出色。然而,现有的多态图神经网络(HGNN)通常忽视节点缺失属性、不准确属性以及节点罕见标签的问题,这限制了它们的表达能力。在这篇论文中,我们提议一种生成自我超级vised模型SHAVA,可以同时解决这些问题。SHAVA的实现方式如下:首先,它将所有图中节点初始化为一个低维度表示矩阵。然后,基于变量图自动encoder框架,SHAVA学习了节点级别和属性级别的嵌入表示。在decoder中,SHAVA重建了链接和属性。而不是直接重建Raw特征 для已有标签的节点,SHAVA将生成所有节点的初始低维度表示矩阵,基于该矩阵, Raw特征的已有标签节点进行进一步重建,以利用准确的属性。这种方式可以不仅为非已有标签节点提供有用的特征,还可以修正已有标签节点的不准确属性。最后,我们进行了广泛的实验,以证明SHAVA在面临HINs中缺失和不准确的属性时表现出色。

Bayesian Conditional Diffusion Models for Versatile Spatiotemporal Turbulence Generation

  • paper_url: http://arxiv.org/abs/2311.07896
  • repo_url: None
  • paper_authors: Han Gao, Xu Han, Xiantao Fan, Luning Sun, Li-Ping Liu, Lian Duan, Jian-Xun Wang
  • for: 用于生成随机的爆炸流动
  • methods: 使用泛化扩散模型生成随机流动,并提供了一种基于梯度的Conditional sampling方法来生成长程流动序列
  • results: 通过一系列数值实验,证明了该方法可以生成高精度、多样化的爆炸流动,包括LES模拟的快速流动序列、各种异常流动和高速边层流动等。
    Abstract Turbulent flows have historically presented formidable challenges to predictive computational modeling. Traditional numerical simulations often require vast computational resources, making them infeasible for numerous engineering applications. As an alternative, deep learning-based surrogate models have emerged, offering data-drive solutions. However, these are typically constructed within deterministic settings, leading to shortfall in capturing the innate chaotic and stochastic behaviors of turbulent dynamics. We introduce a novel generative framework grounded in probabilistic diffusion models for versatile generation of spatiotemporal turbulence. Our method unifies both unconditional and conditional sampling strategies within a Bayesian framework, which can accommodate diverse conditioning scenarios, including those with a direct differentiable link between specified conditions and generated unsteady flow outcomes, and scenarios lacking such explicit correlations. A notable feature of our approach is the method proposed for long-span flow sequence generation, which is based on autoregressive gradient-based conditional sampling, eliminating the need for cumbersome retraining processes. We showcase the versatile turbulence generation capability of our framework through a suite of numerical experiments, including: 1) the synthesis of LES simulated instantaneous flow sequences from URANS inputs; 2) holistic generation of inhomogeneous, anisotropic wall-bounded turbulence, whether from given initial conditions, prescribed turbulence statistics, or entirely from scratch; 3) super-resolved generation of high-speed turbulent boundary layer flows from low-resolution data across a range of input resolutions. Collectively, our numerical experiments highlight the merit and transformative potential of the proposed methods, making a significant advance in the field of turbulence generation.
    摘要 historically, turbulent flows have presented significant challenges to predictive computational modeling. traditional numerical simulations often require vast computational resources, making them infeasible for many engineering applications. as an alternative, deep learning-based surrogate models have emerged, offering data-driven solutions. however, these are typically constructed within deterministic settings, leading to a shortfall in capturing the innate chaotic and stochastic behaviors of turbulent dynamics.we introduce a novel generative framework grounded in probabilistic diffusion models for versatile generation of spatiotemporal turbulence. our method unifies both unconditional and conditional sampling strategies within a bayesian framework, which can accommodate diverse conditioning scenarios, including those with a direct differentiable link between specified conditions and generated unsteady flow outcomes, and scenarios lacking such explicit correlations. a notable feature of our approach is the method proposed for long-span flow sequence generation, which is based on autoregressive gradient-based conditional sampling, eliminating the need for cumbersome retraining processes.we showcase the versatile turbulence generation capability of our framework through a suite of numerical experiments, including:1. the synthesis of LES simulated instantaneous flow sequences from URANS inputs;2. holistic generation of inhomogeneous, anisotropic wall-bounded turbulence, whether from given initial conditions, prescribed turbulence statistics, or entirely from scratch;3. super-resolved generation of high-speed turbulent boundary layer flows from low-resolution data across a range of input resolutions.collectively, our numerical experiments highlight the merit and transformative potential of the proposed methods, representing a significant advance in the field of turbulence generation.

Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series

  • paper_url: http://arxiv.org/abs/2311.07867
  • repo_url: https://github.com/onurpoyraz/m-chmm
  • paper_authors: Onur Poyraz, Pekka Marttinen
  • for: 这篇论文主要应用于Multivariate Healthcare Time Series Data的分析,解决不规则采样、噪音和缺失值、不同患者群体的不同动态等问题。
  • methods: 本文提出了一种新的模型,即coupled hidden Markov models (M-CHMM),并详细介绍了两种采样方法:分子滤波和分解方法。这些方法可以让模型学习更加可能,并提高混合度、降低计算复杂度,并且可以进行机会测量,从而学习混合模型。
  • results: 实验结果显示,M-CHMM可以更好地适应实际世界的epidemiological和半人工数据,提高数据适应度、实现噪音和缺失值处理、提高预测精度,并且可以实现可读性的subset构成。
    Abstract Analysis of multivariate healthcare time series data is inherently challenging: irregular sampling, noisy and missing values, and heterogeneous patient groups with different dynamics violating exchangeability. In addition, interpretability and quantification of uncertainty are critically important. Here, we propose a novel class of models, a mixture of coupled hidden Markov models (M-CHMM), and demonstrate how it elegantly overcomes these challenges. To make the model learning feasible, we derive two algorithms to sample the sequences of the latent variables in the CHMM: samplers based on (i) particle filtering and (ii) factorized approximation. Compared to existing inference methods, our algorithms are computationally tractable, improve mixing, and allow for likelihood estimation, which is necessary to learn the mixture model. Experiments on challenging real-world epidemiological and semi-synthetic data demonstrate the advantages of the M-CHMM: improved data fit, capacity to efficiently handle missing and noisy measurements, improved prediction accuracy, and ability to identify interpretable subsets in the data.
    摘要 《多变量医疗时间序列数据分析具有许多挑战:不规则的采样、噪声和缺失值,以及不同的患者群体动态,使得交换性被违犯。此外,解释性和量化不确定性也是非常重要。我们提出了一种新的模型,即混合隐藏Markov模型(M-CHMM),并证明了它能够妥协这些挑战。为了使模型学习可行,我们 derivated了两种算法来采样CHMM中的隐藏变量序列:基于 particule filtering 和 factorized approximation。与现有的推理方法相比,我们的算法具有计算可 tractable 的优点,提高混合度,并允许likelihood估计,这是必要的来学习混合模型。实验表明,M-CHMM在实际世界的epidemiological和半 sintetic数据上具有优势:提高数据的适应度,能够有效地处理缺失和噪声测量,提高预测精度,并能够寻找可解释的子集。》

PEMS: Pre-trained Epidmic Time-series Models

  • paper_url: http://arxiv.org/abs/2311.07841
  • repo_url: None
  • paper_authors: Harshavardhan Kamarthi, B. Aditya Prakash
  • For: The paper aims to provide accurate and reliable predictions about the future of an epidemic, enabling informed public health decisions.* Methods: The authors use pre-trained deep learning models to learn from multiple datasets of different diseases and epidemics, and introduce a set of self-supervised learning (SSL) tasks to capture useful patterns and learn important priors about the epidemic dynamics.* Results: The resultant Pre-trained Epidemic Time-Series Models (PEMS) outperform previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion, including the novel Covid-19 pandemic, with better efficiency using a smaller fraction of datasets.
    Abstract Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may not sufficiently capture the underlying dynamics. While there exists a large amount of data from past epidemics, leveraging prior knowledge from time-series data of other diseases is a non-trivial challenge. Motivated by the success of pre-trained models in language and vision tasks, we tackle the problem of pre-training epidemic time-series models to learn from multiple datasets from different diseases and epidemics. We introduce Pre-trained Epidemic Time-Series Models (PEMS) that learn from diverse time-series datasets of a variety of diseases by formulating pre-training as a set of self-supervised learning (SSL) tasks. We tackle various important challenges specific to pre-training for epidemic time-series such as dealing with heterogeneous dynamics and efficiently capturing useful patterns from multiple epidemic datasets by carefully designing the SSL tasks to learn important priors about the epidemic dynamics that can be leveraged for fine-tuning to multiple downstream tasks. The resultant PEM outperforms previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion including the novel Covid-19 pandemic unseen in pre-trained data with better efficiency using smaller fraction of datasets.
    摘要 importante проблеma para tomar decisiones de salud pública informadas es proporcionar predicciones precisas y confiables sobre el futuro de una epidemia. Los trabajos recientes han demostrado que utilizar soluciones de aprendizaje profundo que se basan en el aprendizaje automático para aprender de los datos pasados de una epidemia pueden superar a los modelos mecánicos tradicionales. Sin embargo, en muchos casos, los datos pasados pueden ser escasos y no capturar adecuadamente las dinámicas subyacentes. A pesar de que existen una gran cantidad de datos de epidemias pasadas, utilizar el conocimiento previo de los datos temporales de otras enfermedades es un desafío no trivial.Motivados por el éxito de los modelos pre-entrenados en tareas de lenguaje y visión, abordamos el problema de entrenar modelos de tiempo series epidemiológicos pre-entrenados para aprender de múltiples conjuntos de datos de diferentes enfermedades y epidemias. Presentamos los Modelos de Epidemic Time Series Pre-entrenados (PEMS) que aprenden de conjuntos de datos temporales diversificados de una variedad de enfermedades mediante la formulación de tareas de aprendizaje auto-supervisado (SSL). Abordamos desafíos importantes específicos de pre-entrenamiento para series temporales epidemiológicas, como lidiar con dinámicas heterogéneas y capturar eficientemente patrones útiles de múltiples conjuntos de datos epidemiológicos mediante tareas SSL cuidadosamente diseñadas para aprender priores importantes sobre las dinámicas epidemiológicas que se pueden utilizar para fine-tuning en diversas tareas downstream.El resultado es que PEM supera a los métodos estado-de-la-arte anteriores en diversas tareas downstream de tiempo series en conjuntos de datos con patrones estacionales variables, geográficos y de contagio diferente, incluyendo la pandemia novel Covid-19 sin datos previamente entrenados con mayor eficiencia utilizando una fracción más pequeña de los conjuntos de datos.

Toward Efficient and Incremental Spectral Clustering via Parametric Spectral Clustering

  • paper_url: http://arxiv.org/abs/2311.07833
  • repo_url: https://github.com/109502518/psc_bigdata
  • paper_authors: Jo-Chun Chen, Hung-Hsuan Chen
  • for: addresses the challenges associated with big data and real-time scenarios, and enables efficient incremental clustering with new data points.
  • methods: extends the capabilities of spectral clustering with a novel approach called parametric spectral clustering (PSC).
  • results: achieves clustering quality mostly comparable to standard spectral clustering while being computationally efficient, as demonstrated through experimental evaluations on various open datasets.
    Abstract Spectral clustering is a popular method for effectively clustering nonlinearly separable data. However, computational limitations, memory requirements, and the inability to perform incremental learning challenge its widespread application. To overcome these limitations, this paper introduces a novel approach called parametric spectral clustering (PSC). By extending the capabilities of spectral clustering, PSC addresses the challenges associated with big data and real-time scenarios and enables efficient incremental clustering with new data points. Experimental evaluations conducted on various open datasets demonstrate the superiority of PSC in terms of computational efficiency while achieving clustering quality mostly comparable to standard spectral clustering. The proposed approach has significant potential for incremental and real-time data analysis applications, facilitating timely and accurate clustering in dynamic and evolving datasets. The findings of this research contribute to the advancement of clustering techniques and open new avenues for efficient and effective data analysis. We publish the experimental code at https://github.com/109502518/PSC_BigData.
    摘要 spectral clustering是一种广泛使用的方法,可以有效地将非线性分离数据集群。然而,计算限制、内存需求以及不能进行增量学习使得其广泛应用受到挑战。为了突破这些限制,本文提出了一种新的方法: Parametric spectral clustering(PSC)。通过扩展spectral clustering的能力,PSC解决了大数据和实时应用中的限制,并允许高效地添加新数据点。经验证表明,PSC在计算效率方面与标准spectral clustering相当,并且在各种开放数据集上达到了相似的分 clustering质量。提出的方法在增量和实时数据分析应用中具有广泛的潜在应用前景,可以实现时态和准确的分 clustering。这些研究成果对集群技术的发展产生了贡献,打开了新的有效和高效的数据分析途径。我们在github上公布了实验代码,请参考https://github.com/109502518/PSC_BigData。

Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

  • paper_url: http://arxiv.org/abs/2311.08429
  • repo_url: None
  • paper_authors: Rex Chen, Kathleen M. Carley, Fei Fang, Norman Sadeh
  • for: 这个论文主要用于探讨智能交通系统(ITS)学习中的模拟器对RL代理的影响。
  • methods: 这篇论文使用了两种常用的城市流模拟器和SUMO来训练RL代理,并在虚拟实验中控制了司机行为和模拟规模,以检验RL测量方法的准确性。
  • results: 研究发现,由于模拟器之间的假设差异,RL测量方法之间存在很大的差异,包括平均平方误差和KL散度,这些差异在所有评估指标中都存在。这些结果表明,交通模拟器不能被视为RL训练的“神奇解决方案”,需要更好地理解模拟器之间的差异,以便在真实世界中部署RL-基于ITS。
    Abstract Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs). A key question is to what extent their modelling assumptions affect the capabilities of ITSs to adapt to various scenarios when deployed in the real world. This work focuses on two simulators commonly used to train reinforcement learning (RL) agents for traffic applications, CityFlow and SUMO. A controlled virtual experiment varying driver behavior and simulation scale finds evidence against distributional equivalence in RL-relevant measures from these simulators, with the root mean squared error and KL divergence being significantly greater than 0 for all assessed measures. While granular real-world validation generally remains infeasible, these findings suggest that traffic simulators are not a deus ex machina for RL training: understanding the impacts of inter-simulator differences is necessary to train and deploy RL-based ITSs.
    摘要 假设系统(ITS)的学习需要数据生成,交通模拟器是一种常用的工具。然而,模拟器的假设会对ITS在实际世界中的适应能力产生影响。这项工作关注CityFlow和SUMO两种常用于训练奖励学习(RL)Agent的交通模拟器,通过控制虚拟实验中 Driver 行为和模拟规模进行了调整。研究发现,RL相关指标之间的分布不相等,root mean squared error和KL divergence都大于0。这些发现表明,交通模拟器不是RL训练的神奇解决方案,需要理解模拟器之间的差异,以便训练和部署RL基于ITS的系统。

Statistical Parameterized Physics-Based Machine Learning Digital Twin Models for Laser Powder Bed Fusion Process

  • paper_url: http://arxiv.org/abs/2311.07821
  • repo_url: None
  • paper_authors: Yangfan Li, Satyajit Mojumder, Ye Lu, Abdullah Al Amin, Jiachen Guo, Xiaoyu Xie, Wei Chen, Gregory J. Wagner, Jian Cao, Wing Kam Liu
  • for: This paper aims to develop a digital twin model for predicting and controlling the quality of laser powder bed fusion (LPBF) metal additive manufacturing processes.
  • methods: The paper uses a parameterized physics-based digital twin (PPB-DT) model that incorporates a mechanistic reduced-order method-driven stochastic calibration process to statistically predict melt pool geometries and identify defects. The model is validated through controlled experiments and compared to machine learning-based digital twin (PPB-ML-DT) models.
  • results: The PPB-DT model is able to accurately predict melt pool geometries and identify defects such as lack-of-fusion porosity and surface roughness, and the PPB-ML-DT model is able to predict, monitor, and control melt pool geometries. The proposed digital twin models can be used for predictions, control, optimization, and quality assurance within the LPBF process.Here is the information in Simplified Chinese text:
  • for: 这篇论文目的是开发一个用于预测和控制laserpowderbedfusion(LPBF)金属三维打印过程质量的数字双(DT)模型。
  • methods: 这篇论文使用一个参数化的物理基础DT模型,该模型通过机理减少方法驱动的随机校准过程来统计预测熔融池形状和识别缺陷。
  • results: PPB-DT模型能够准确预测熔融池形状和识别缺陷,而PPB-ML-DT模型能够预测、监控和控制熔融池形状。提议的数字双模型可以用于预测、控制、优化和质量监控在LPBF过程中。
    Abstract A digital twin (DT) is a virtual representation of physical process, products and/or systems that requires a high-fidelity computational model for continuous update through the integration of sensor data and user input. In the context of laser powder bed fusion (LPBF) additive manufacturing, a digital twin of the manufacturing process can offer predictions for the produced parts, diagnostics for manufacturing defects, as well as control capabilities. This paper introduces a parameterized physics-based digital twin (PPB-DT) for the statistical predictions of LPBF metal additive manufacturing process. We accomplish this by creating a high-fidelity computational model that accurately represents the melt pool phenomena and subsequently calibrating and validating it through controlled experiments. In PPB-DT, a mechanistic reduced-order method-driven stochastic calibration process is introduced, which enables the statistical predictions of the melt pool geometries and the identification of defects such as lack-of-fusion porosity and surface roughness, specifically for diagnostic applications. Leveraging data derived from this physics-based model and experiments, we have trained a machine learning-based digital twin (PPB-ML-DT) model for predicting, monitoring, and controlling melt pool geometries. These proposed digital twin models can be employed for predictions, control, optimization, and quality assurance within the LPBF process, ultimately expediting product development and certification in LPBF-based metal additive manufacturing.
    摘要 一个数字双(DT)是一个虚拟的物理过程、产品和/或系统的表示,需要高精度计算模型,Continuous更新通过感知器数据和用户输入集成。在激光粉末堆合(LPBF)添加制造过程中,一个关于制造过程的数字双可以提供生产件预测、制造缺陷诊断以及控制能力。本文介绍一个基于物理学习的数字双(PPB-DT),用于LPBF附加制造过程的统计预测。我们通过创建一个具有高精度计算模型,准确表示熔融池现象,并通过控制实验 validate和调整它。在PPB-DT中,我们引入了一种基于物理学习的减少阶段法的抽象Calibration过程,可以统计预测熔融池几何体和缺陷的标识,特别是 для诊断应用。通过这个物理学习模型和实验数据,我们训练了一个基于机器学习的数字双(PPB-ML-DT)模型,用于预测、监控和控制熔融池几何体。这些提posed的数字双模型可以在LPBF过程中用于预测、控制、优化和质量保证,最终提高LPBF基于附加制造的金属加工产品的开发和认证。

eess.IV - 2023-11-14

Time-efficient combined morphologic and quantitative joint MRI based on clinical image contrasts – An exploratory in-situ study of standardized cartilage defects

  • paper_url: http://arxiv.org/abs/2311.08036
  • repo_url: None
  • paper_authors: Teresa Lemainque, Nicola Pridöhl, Shuo Zhang, Marc Huppertz, Manuel Post, Can Yüksel, Masami Yoneyama, Andreas Prescher, Christiane Kuhl, Daniel Truhn, Sven Nebelung
  • for: 这个研究的目的是评估MIXTURE序列在评估软骨和股关节中的效用。
  • methods: 这个研究使用了MIXTURE序列,这些序列组合了质量图像和临床扫描echo的对比,并提供了质量图像和参数地图。
  • results: 研究发现,在创建损伤后,软骨中的刺激时间增加了,而bone texture和软骨恢复时间也发生了变化。但是,MIXTURE序列和参照序列之间的差异不是很大。
    Abstract OBJECTIVES: Quantitative MRI techniques such as T2 and T1$\rho$ mapping are beneficial in evaluating cartilage and meniscus. We aimed to evaluate the MIXTURE (Multi-Interleaved X-prepared Turbo-Spin Echo with IntUitive RElaxometry) sequences that provide morphologic images with clinical turbo spin-echo (TSE) contrasts and additional parameter maps versus reference TSE sequences in an in-situ model of human cartilage defects. MATERIALS AND METHODS: Prospectively, standardized cartilage defects of 8mm, 5mm, and 3mm diameter were created in the lateral femora of 10 human cadaveric knee specimens (81$\pm$10 years, nine male/one female). Using a clinical 3T MRI scanner and knee coil, MIXTURE sequences combining (i) proton-density weighted fat-saturated (PD-w FS) images and T2 maps and (ii) T1-weighted images and T1$\rho$ maps were acquired before and after defect creation, alongside the corresponding 2D TSE and 3D TSE reference sequences. Defect delineability, bone texture, and cartilage relaxation times were quantified. Inter-sequence comparisons were made using appropriate parametric and non-parametric tests. RESULTS: Overall, defect delineability and texture features were not significantly different between the MIXTURE and reference sequences. After defect creation, relaxation times increased significantly in the central femur (for T2) and all regions combined (for T1$\rho$). CONCLUSION: MIXTURE sequences permit time-efficient simultaneous morphologic and quantitative joint assessment based on clinical image contrasts. While providing T2 or T1$\rho$ maps in clinically feasible scan time, morphologic image features, i.e., cartilage defect delineability and bone texture, were comparable between MIXTURE and corresponding reference sequences.
    摘要 目标:量化MRI技术,如T2和T1ρ图像,有助于评估软骨和股骨。我们想要评估MIXTURE(多元排序X准备扩散螺旋共振成像)序列,它们提供了辐射学像和临床旋转普朗共振(TSE)对比的形态图像,以及附加参数图像,与参照TSE序列进行比较。材料和方法:我们使用了一台临床3T MRI仪器和膝关节磁共振器,在人体腓骨上创造了标准化软骨损伤(半径8毫米,3毫米和5毫米)。使用MIXTURE序列,我们获得了杂合PD-wFS图像和T2图像,以及T1-weighted图像和T1ρ图像,并与相应的2D TSE和3D TSE参照序列一起进行了收集。我们评估了损伤定义性、骨Texture和软骨弹性时间。我们使用了适当的 Parametric 和非 Parametric 测试进行对比。结果:总的来说,损伤定义性和Texture特征没有显著差异 междуMIXTURE和参照序列。在中部腓骨(T2)和所有区域(T1ρ)中,创伤后弹性时间明显增加。结论:MIXTURE序列允许在临床可行的扫描时间内同时进行形态和量化骨关节评估,基于临床图像对比的辐射学图像。虽然提供了T2或T1ρ图像,但形态图像特征,例如软骨损伤定义性和骨Texture,与MIXTURE序列和相应的参照序列相比没有显著差异。

Plug-and-Play Latent Feature Editing for Orientation-Adaptive Quantitative Susceptibility Mapping Neural Networks

  • paper_url: http://arxiv.org/abs/2311.07823
  • repo_url: https://github.com/sunhongfu/deepMRI
  • paper_authors: Yang Gao, Zhuang Xiong, Shanshan Shan, Yin Liu, Pengfei Rong, Min Li, Alan H Wilman, G. Bruce Pike, Feng Liu, Hongfu Sun
    for:* 这项研究旨在解决深度学习(DL)扫描仪磁场方向变化的限制,以提高Quantitative susceptibility mapping(QSM)重构问题的精度和稳定性。methods:* 提出了一种Orientation-Adaptive Latent Feature Editing(OA-LFE)模块,可以学习探测探针方向向量的编码,并将其直接integrated into deep networks的 latent features中。results:* 在 simulated和实验室人脑 Dataset上,与多种已有的 QSM重构框架进行了比较,并证明了iQSM+可以在不同的探针方向下重构QSM图像,并且图像的准确性得到了显著改进。
    Abstract Quantitative susceptibility mapping (QSM) is a post-processing technique for deriving tissue magnetic susceptibility distribution from MRI phase measurements. Deep learning (DL) algorithms hold great potential for solving the ill-posed QSM reconstruction problem. However, a significant challenge facing current DL-QSM approaches is their limited adaptability to magnetic dipole field orientation variations during training and testing. In this work, we propose a novel Orientation-Adaptive Latent Feature Editing (OA-LFE) module to learn the encoding of acquisition orientation vectors and seamlessly integrate them into the latent features of deep networks. Importantly, it can be directly Plug-and-Play (PnP) into various existing DL-QSM architectures, enabling reconstructions of QSM from arbitrary magnetic dipole orientations. Its effectiveness is demonstrated by combining the OA-LFE module into our previously proposed phase-to-susceptibility single-step instant QSM (iQSM) network, which was initially tailored for pure-axial acquisitions. The proposed OA-LFE-empowered iQSM, which we refer to as iQSM+, is trained in a self-supervised manner on a specially-designed simulation brain dataset. Comprehensive experiments are conducted on simulated and in vivo human brain datasets, encompassing subjects ranging from healthy individuals to those with pathological conditions. These experiments involve various MRI platforms (3T and 7T) and aim to compare our proposed iQSM+ against several established QSM reconstruction frameworks, including the original iQSM. The iQSM+ yields QSM images with significantly improved accuracies and mitigates artifacts, surpassing other state-of-the-art DL-QSM algorithms.
    摘要 量子感测图像(QSM)是一种后处理技术,用于从MRI阶段测量结果中提取组织磁矩分布。深度学习(DL)算法在解决QSM重建问题上具有巨大潜力。然而,目前的DL-QSM方法面临的主要挑战是在训练和测试过程中磁 dipôle场方向的变化不能适应。在这种情况下,我们提出了一种新的 Orientación-Adaptive Latent Feature Editing(OA-LFE)模块,用于学习获取训练和测试过程中磁 dipôle场方向的编码。这种模块可以 direct Plug-and-Play(PnP)地 integrating into various existing DL-QSM architectures,以便从任意磁 dipôle场方向重建QSM图像。我们通过将OA-LFE模块与我们之前提出的Single-step instant QSM(iQSM)网络结合,并将其称为iQSM+。iQSM+网络在自我超vised的方式下在特制的MRI大脑数据集上进行训练。我们对于这些数据集进行了丰富的实验,包括使用3T和7T的MRI平台,并与其他已知QSM重建框架进行比较。iQSM+网络对于不同的MRI平台和疾病 condition进行了广泛的应用,并且可以减少artifacts并提高QSM图像的准确性,胜过其他当前state-of-the-art DL-QSM算法。

eess.SP - 2023-11-14

Choosing Outdated Information to Achieve Reliability in Age-Based Gossiping

  • paper_url: http://arxiv.org/abs/2311.08383
  • repo_url: None
  • paper_authors: Priyanka Kaswan, Sennur Ulukus
  • for: 这篇论文旨在研究一个年龄分布式谣言网络中,两个来源(可靠源和不可靠源)如何传递处理过程的更新信息,以及Nodes如何选择信息来源并维护信息的新鲜度。
  • methods: 这篇论文使用了Stochastic Hybrid System(SHS)框架,形成了数学方程来描述网络节点中含有不可靠包和版本年龄的情况。
  • results: 研究发现,尽管增加G值可以减少网络节点中含有不可靠包的比例,但是这些包的版本年龄增加,从而导致了新鲜度-可靠性贸易offs。数据支持这些发现。
    Abstract We consider a system model with two sources, a reliable source and an unreliable source, who are responsible for disseminating updates regarding a process to an age-based gossip network of $n$ nodes. Nodes wish to have fresh information, however, they have preference for packets that originated at the reliable source and are willing to sacrifice their version age of information by up to $G$ versions to switch from an unreliable packet to a reliable packet. We study how this protocol impacts the prevalence of unreliable packets at nodes in the network and their version age. Using a stochastic hybrid system (SHS) framework, we formulate analytical equations to characterize two quantities: expected fraction of nodes with unreliable packets and expected version age of information at network nodes. We show that as $G$ increases, fewer nodes have unreliable packet, however, their version age increases as well, thereby inducing a freshness-reliability trade-off in the network. We present numerical results to support our findings.
    摘要 我们考虑一个系统模型,其包含两个源,一个可靠的源和一个不可靠的源,他们负责将进程更新传递给一个年龄基于谣言网络中的 $n$ 个节点。节点希望有最新的信息,但他们偏好来自可靠源的包,并愿意为了更换到可靠包而丢弃自己的版本年龄信息,最多为 $G$ 个版本。我们研究这种协议如何影响网络节点上带有不可靠包的普遍性和版本年龄。使用随机混合系统(SHS)框架,我们编写了分析方程来描述两个量:网络节点上带有不可靠包的预期总数和网络节点上的版本年龄预期值。我们发现,当 $G$ 增加时,网络节点上带有不可靠包的数量减少,但这些包的版本年龄也增加,从而导致了一种新鲜度-可靠性贸易。我们提供数据支持我们的发现。

Comparison of model selection techniques for seafloor scattering statistics

  • paper_url: http://arxiv.org/abs/2311.08337
  • repo_url: None
  • paper_authors: Derek R Olson, Marc Geilhufe
  • for: 本研究旨在开发一种数据驱动的方法来选择折射环境中像素强度分布的统计模型,以优化遥感数据的分析。
  • methods: 该研究使用了一种混合分布模型,并通过对不同数据驱动的模型选择进行比较,以选择最佳的模型。
  • results: 研究发现,使用数据驱动的方法可以更好地选择折射环境中像Pixel强度分布的统计模型,并且可以减少人类的干预。
    Abstract In quantitative analysis of seafloor imagery, it is common to model the collection of individual pixel intensities scattered by the seafloor as a random variable with a given statistical distribution. There is a considerable literature on statistical models for seafloor scattering, mostly focused on areas with statistically homogeneous properties (i.e. exhibiting spatial stationarity). For more complex seafloors, the pixel intensity distribution is more appropriately modeled using a mixture of simple distributions. For very complex seafloors, fitting 3 or more mixture components makes physical sense, but the statistical model becomes much more complex in these cases. Therefore, picking the number of components of the mixture model is a decision that must be made, using a priori information, or using a data driven approach. However, this information is time consuming to collect, and depends on the skill and experience of the human. Therefore, a data-driven approach is advantageous to use, and is explored in this work. Criteria for choosing a model always need to balance the trade-off for the best fit for the data on the one hand and the model complexity on the other hand. In this work, we compare several statistical model selection criteria, e.g., the Bayesian information criterion. Examples are given for SAS data collected by an autonomous underwater vehicle in a rocky environment off the coast of Bergen, Norway using data from the HISAS-1032 synthetic aperture sonar system.
    摘要 在海底图像量化分析中,常将每个像素强度散射到海底模型为随机变量,采用给定的统计分布。关于海底散射的统计模型有很大的文献,主要集中在统计homogeneous(即空间站ARY)的海底上。对于更复杂的海底,则更有理由使用多个简单分布的混合模型。对于非常复杂的海底,使用3个或更多的混合组件是物理意义上的,但统计模型在这些情况下变得非常复杂。因此,选择混合模型的组件数量是一个需要基于先验知识或数据驱动的决策。然而,收集这些信息的时间很长,取决于人员的技能和经验。因此,使用数据驱动的方法更有利,并在这种工作中进行了研究。选择模型的标准要求平衡数据最佳适应和模型复杂度之间的折衔。在这种工作中,我们比较了多种统计模型选择标准,例如 bayesian信息 критерион。使用SAS数据 collected by an autonomous underwater vehicle在挪威Bergen coast rocky environment中,使用HISAS-1032 synthetic aperture sonar系统的数据作为示例。

Protecting the Future of Information: LOCO Coding With Error Detection for DNA Data Storage

  • paper_url: http://arxiv.org/abs/2311.08325
  • repo_url: None
  • paper_authors: Canberk İrimağzı, Yusuf Uslan, Ahmed Hareedy
  • for: 本文研究了使用新引入的 lexicographically-ordered constrained(LOCO)码在 DNA 数据存储中。
  • methods: 本文提出了基于 ${A,T,G,C}$ 字母的 DNA LOCO(D-LOCO)码,并提供了编码-解码规则。这些规则提供了可靠的编码-解码算法,并且可以轻松地重新配置。
  • results: 本文的编码-解码算法比现有Literature中的算法更有效,并且可以实现高率的 DNA 数据存储。此外,本文还提出了四种方案来连接 consecutive codewords,其中三种方案可以确定单个替换错误检测每个 codeword。
    Abstract DNA strands serve as a storage medium for $4$-ary data over the alphabet $\{A,T,G,C\}$. DNA data storage promises formidable information density, long-term durability, and ease of replicability. However, information in this intriguing storage technology might be corrupted. Experiments have revealed that DNA sequences with long homopolymers and/or with low $GC$-content are notably more subject to errors upon storage. This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet $\{A,T,G,C\}$ with limited runs of identical symbols. These codes come with an encoding-decoding rule we derive, which provides affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balancing over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates at moderate lengths.
    摘要 This paper investigates the utilization of the recently-introduced method for designing lexicographically-ordered constrained (LOCO) codes in DNA data storage. This paper introduces DNA LOCO (D-LOCO) codes, over the alphabet $\{\mathtt{A}, \mathtt{T}, \mathtt{G}, \mathtt{C}\}$ with limited runs of identical symbols. These codes come with an encoding-decoding rule we derive, which provides affordable encoding-decoding algorithms. In terms of storage overhead, the proposed encoding-decoding algorithms outperform those in the existing literature. Our algorithms are readily reconfigurable. D-LOCO codes are intrinsically balanced, which allows us to achieve balancing over the entire DNA strand with minimal rate penalty. Moreover, we propose four schemes to bridge consecutive codewords, three of which guarantee single substitution error detection per codeword. We examine the probability of undetecting errors. We also show that D-LOCO codes are capacity-achieving and that they offer remarkably high rates at moderate lengths.

  • paper_url: http://arxiv.org/abs/2311.08319
  • repo_url: None
  • paper_authors: Zakir Hussain Shaik, Sai Subramanyam Thoota, Emil Björnson, Erik G. Larsson
  • for: addresses the demanding requirements of uplink (UL) fronthaul in cell-free massive multiple-input multiple-output (MIMO) systems.
  • methods: proposes a novel resource efficient analog over-the-air (OTA) computation framework, including transmit precoding and two-phase power assignment strategies at the access points (APs).
  • results: derives analytical expressions for the Bayesian and classical estimators of the OTA combined signals, and empirically evaluates the normalized mean square error (NMSE), symbol error rate (SER), and coded bit error rate (BER) of the developed solution, showing that it outperforms the state-of-the-art wired fronthaul based system.
    Abstract We propose a novel resource efficient analog over-the-air (OTA) computation framework to address the demanding requirements of the uplink (UL) fronthaul between the access points (APs) and the central processing unit (CPU) in cell-free massive multiple-input multiple-output (MIMO) systems. We discuss the drawbacks of the wired and wireless fronthaul solutions, and show that our proposed mechanism is efficient and scalable as the number of APs increases. We present the transmit precoding and two-phase power assignment strategies at the APs to coherently combine the signals OTA in a spectrally efficient manner. We derive the statistics of the APs locally available signals which enable us to to obtain the analytical expressions for the Bayesian and classical estimators of the OTA combined signals. We empirically evaluate the normalized mean square error (NMSE), symbol error rate (SER), and the coded bit error rate (BER) of our developed solution and benchmark against the state-of-the-art wired fronthaul based system
    摘要 我们提出了一种新的资源有效的无线上空计算框架,以满足Cell-free巨量多输入多输出系统的上行(UL)前方向的需求。我们讨论了有线和无线前方向解决方案的缺点,并显示了我们的提议机制具有规模可扩展的优点。我们介绍了在AP上进行预编码和两相电压分配策略,以具有spectral efficiency的方式将OTA信号相乘。我们 derivation了AP上可用信号的统计,允许我们获得OTA相乘后的分布统计。我们Empirically评估了NMSE、SER和BER表现,并与现有的有线前方向基础系统进行比较。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Maximum Eigenvalue Detection based Spectrum Sensing in RIS-aided System with Correlated Fading

  • paper_url: http://arxiv.org/abs/2311.08296
  • repo_url: None
  • paper_authors: Nikhilsingh Parihar, Praful D. Mankar, Sachin Chaudhari
  • for: 本研究旨在提高受信息干扰的spectrum sensing表现,尤其是在多path fading和噪声相关的场景下。
  • methods: 本文提出了使用可编程智能面(RIS)来提高spectrum sensing表现。特别是利用 espacially correlated fading,我们提议使用最大特征值检测(MED)进行spectrum sensing。
  • results: 我们 derivated了测试统计量的正态分布 under null和signal present假设下。然后,我们使用这些结果计算了假设检测的false alarm和 detection probabilities。此外,我们还优化了RIS的相位旋转矩阵,以提高检测性能。我们的numerical analysis表明,MED的接收操作特征曲线在RIS元素增加、SNR提高和 statistically optimal配置RIS的情况下都得到提高。
    Abstract Robust spectrum sensing is crucial for facilitating opportunistic spectrum utilization for secondary users (SU) in the absense of primary users (PU). However, propagation environment factors such as multi-path fading, shadowing, and lack of line of sight (LoS) often adversely affect detection performance. To deal with these issues, this paper focuses on utilizing reconfigurable intelligent surfaces (RIS) to improve spectrum sensing in the scenario wherein both the multi-path fading and noise are correlated. In particular, to leverage the spatially correlated fading, we propose to use maximum eigenvalue detection (MED) for spectrum sensing. We first derive exact distributions of test statistics, i.e., the largest eigenvalue of the sample covariance matrix, observed under the null and signal present hypothesis. Next, utilizing these results, we present the exact closed-form expressions for the false alarm and detection probabilities. In addition, we also optimally configure the phase shift matrix of RIS such that the mean of the test statistics is maximized, thus improving the detection performance. Our numerical analysis demonstrates that the MED's receiving operating characteristic (ROC) curve improves with increased RIS elements, SNR, and the utilization of statistically optimal configured RIS.
    摘要 Robust spectrum sensing 是次级用户(SU)在主要用户(PU)缺 absent 的情况下促进机会性spectrum utilization的关键。然而,传播环境因素 such as 多 PATH 抑制、阴影和无线线 sight(LoS) frequently adversely affect detection performance. To address these issues, this paper focuses on using reconfigurable intelligent surfaces (RIS) to improve spectrum sensing in the scenario where both multi-path fading and noise are correlated. In particular, we propose to use maximum eigenvalue detection (MED) for spectrum sensing. We first derive the exact distributions of test statistics, i.e., the largest eigenvalue of the sample covariance matrix, observed under the null and signal present hypotheses. Next, we use these results to present the exact closed-form expressions for the false alarm and detection probabilities. Additionally, we optimize the phase shift matrix of RIS such that the mean of the test statistics is maximized, thus improving detection performance. Our numerical analysis shows that the MED's receiving operating characteristic (ROC) curve improves with increased RIS elements, SNR, and the utilization of statistically optimal configured RIS.Note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Joint Location Sensing and Channel Estimation for IRS-Aided mmWave ISAC Systems

  • paper_url: http://arxiv.org/abs/2311.08201
  • repo_url: None
  • paper_authors: Zijian Chen, Ming-Min Zhao, Min Li, Fan Xu, Qingqing Wu, Min-Jian Zhao
  • for: 本研究 investigate a self-sensing intelligent reflecting surface (IRS) aided millimeter wave (mmWave) integrated sensing and communication (ISAC) system, aiming to jointly sense the target/scatterer/user positions and estimate the sensing and communication (SAC) channels.
  • methods: 提议 a two-phase transmission scheme, where the coarse and refined sensing/channel estimation (CE) results are respectively obtained in the first phase using scanning-based IRS reflection coefficients and the second phase using optimized IRS reflection coefficients. The proposed algorithm combines VBI, messaging passing, and expectation-maximization (EM) methods to solve the considered joint location sensing and CE problem, exploiting the partial overlapping structured (POS) sparsity and 2-dimensional (2D) block sparsity inherent in the SAC channels.
  • results: simulation results show the superiority of the proposed transmission scheme and associated algorithms, verifying the effectiveness of the self-sensing IRS in reducing the path loss of sensing-related links and enhancing the overall performance of the ISAC system.
    Abstract In this paper, we investigate a self-sensing intelligent reflecting surface (IRS) aided millimeter wave (mmWave) integrated sensing and communication (ISAC) system. Unlike the conventional purely passive IRS, the self-sensing IRS can effectively reduce the path loss of sensing-related links, thus rendering it advantageous in ISAC systems. Aiming to jointly sense the target/scatterer/user positions as well as estimate the sensing and communication (SAC) channels in the considered system, we propose a two-phase transmission scheme, where the coarse and refined sensing/channel estimation (CE) results are respectively obtained in the first phase (using scanning-based IRS reflection coefficients) and second phase (using optimized IRS reflection coefficients). For each phase, an angle-based sensing turbo variational Bayesian inference (AS-TVBI) algorithm, which combines the VBI, messaging passing and expectation-maximization (EM) methods, is developed to solve the considered joint location sensing and CE problem. The proposed algorithm effectively exploits the partial overlapping structured (POS) sparsity and 2-dimensional (2D) block sparsity inherent in the SAC channels to enhance the overall performance. Based on the estimation results from the first phase, we formulate a Cram\'{e}r-Rao bound (CRB) minimization problem for optimizing IRS reflection coefficients, and through proper reformulations, a low-complexity manifold-based optimization algorithm is proposed to solve this problem. Simulation results are provided to verify the superiority of the proposed transmission scheme and associated algorithms.
    摘要 在这篇论文中,我们研究了一种自适应智能反射表面(IRS)帮助毫米波(mmWave)集成感知通信(ISAC)系统。与传统的仅PASSIVE IRS不同,自适应IRS可以有效减少感知相关链路的覆盖距离,从而在ISAC系统中具有优势。为了同时感知目标/散射体/用户位置以及估计感知通信(SAC) канала,我们提议了两个阶段的传输方案,其中第一阶段使用扫描基于IRS反射率来获得粗略的感知频道估计结果,第二阶段使用优化IRS反射率来获得精度的感知频道估计结果。 For each phase, an angle-based sensing turbo variational Bayesian inference (AS-TVBI) algorithm, which combines the VBI, messaging passing and expectation-maximization (EM) methods, is developed to solve the considered joint location sensing and CE problem. The proposed algorithm effectively exploits the partial overlapping structured (POS) sparsity and 2-dimensional (2D) block sparsity inherent in the SAC channels to enhance the overall performance. Based on the estimation results from the first phase, we formulate a Cramér-Rao bound (CRB) minimization problem for optimizing IRS reflection coefficients, and through proper reformulations, a low-complexity manifold-based optimization algorithm is proposed to solve this problem. Simulation results are provided to verify the superiority of the proposed transmission scheme and associated algorithms.

Fast List Decoding of High-Rate Polar Codes

  • paper_url: http://arxiv.org/abs/2311.08188
  • repo_url: None
  • paper_authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao
  • for: 这个论文的目的是提高短至中等长度楔码的快速解码性能,尤其是在低延迟通信场景中。
  • methods: 本论文使用了特定楔码子代码(special nodes)的快速列解算法,包括单元检查(SPC)节点和一个或多个单元检查(SR1/SPC)节点。具体来说,本论文提出了两种快速列解算法,其中第一种使用了预序解码程式,使解码时间linear with the list size,第二种则透过在线上预决缺失路径来进一步平行化解码过程,实现更快的解码速度。
  • results: simulations results show that the proposed list decoding algorithms are able to achieve up to 70.7% lower decoding latency than state-of-the-art fast SCL decoders, while exhibiting the same error-correction performance.
    Abstract Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developing tailored fast and efficient list decoding algorithms of specific polar substituent codes (special nodes) is a promising solution. Recently, fast list decoding algorithms are proposed by considering special nodes with low code rates. Aiming to further speedup the SCL decoding, this paper presents fast list decoding algorithms for two types of high-rate special nodes, namely single-parity-check (SPC) nodes and sequence rate one or single-parity-check (SR1/SPC) nodes. In particular, we develop two classes of fast list decoding algorithms for these nodes, where the first class uses a sequential decoding procedure to yield decoding latency that is linear with the list size, and the second further parallelizes the decoding process by pre-determining the redundant candidate paths offline. Simulation results show that the proposed list decoding algorithms are able to achieve up to 70.7\% lower decoding latency than state-of-the-art fast SCL decoders, while exhibiting the same error-correction performance.
    摘要 由于可提供出色的错误纠正性表现,连续取消列表(SCL)算法在短至中型编码长度的楔码中广泛被视为一种最有前途的解码算法。然而,在低延迟通信场景中,SCL解码的应用受到其顺序性的限制。为了降低解码延迟,开发专门为特定楔substituent代码(特定节点)设计快速高效的列解算法是一个有前途的解决方案。在最近,为了提高SCL解码的速度,这篇论文提出了两种类型的高速列解算法,即单元性检查(SPC)节点和序列率一(SR1/SPC)节点。具体来说,我们开发了两类快速列解算法,其中第一类使用顺序解码过程,使解码延迟线性增长与列表大小相关;第二类进一步平行化解码过程,通过先行确定冗余候选道路来提高速度。实验结果表明,提议的列解算法可以与当前最速的SCL解码器相比,实现70.7%的解码延迟降低,同时保持同等的错误纠正性表现。

Channel Estimation with Dynamic Metasurface Antennas via Model-Based Learning

  • paper_url: http://arxiv.org/abs/2311.08158
  • repo_url: None
  • paper_authors: Xiangyu Zhang, Haiyang Zhang, Luxi Yang, Yonina C. Eldar
    for:This paper proposes two model-based learning methods to overcome the challenge of channel estimation in multiple input multiple output (MIMO) communication systems using dynamic metasurface antennas (DMAs).methods:The proposed methods use a combination of random DMA weighting matrices and spatial gridding dictionaries to form the sensing matrix, and employ the learned iterative shrinkage and thresholding algorithm (LISTA) to recover the sparse channel parameters. Additionally, a self-supervised learning technique is proposed to tackle the difficulty of acquiring noise-free data.results:The proposed methods demonstrate better channel accuracy than traditional sparse recovery methods and the sensing matrix optimization technique achieves better channel accuracy than the baseline method.
    Abstract Dynamic Metasurface Antenna (DMA) is a cutting-edge antenna technology offering scalable and sustainable solutions for large antenna arrays. The effectiveness of DMAs stems from their inherent configurable analog signal processing capabilities, which facilitate cost-limited implementations. However, when DMAs are used in multiple input multiple output (MIMO) communication systems, they pose challenges in channel estimation due to their analog compression. In this paper, we propose two model-based learning methods to overcome this challenge. Our approach starts by casting channel estimation as a compressed sensing problem. Here, the sensing matrix is formed using a random DMA weighting matrix combined with a spatial gridding dictionary. We then employ the learned iterative shrinkage and thresholding algorithm (LISTA) to recover the sparse channel parameters. LISTA unfolds the iterative shrinkage and thresholding algorithm into a neural network and trains the neural network into a highly efficient channel estimator fitting with the previous channel. As the sensing matrix is crucial to the accuracy of LISTA recovery, we introduce another data-aided method, LISTA-sensing matrix optimization (LISTA-SMO), to jointly optimize the sensing matrix. LISTA-SMO takes LISTA as a backbone and embeds the sensing matrix optimization layers in LISTA's neural network, allowing for the optimization of the sensing matrix along with the training of LISTA. Furthermore, we propose a self-supervised learning technique to tackle the difficulty of acquiring noise-free data. Our numerical results demonstrate that LISTA outperforms traditional sparse recovery methods regarding channel estimation accuracy and efficiency. Besides, LISTA-SMO achieves better channel accuracy than LISTA, demonstrating the effectiveness in optimizing the sensing matrix.
    摘要 dynamically metasurface antenna (DMA) 是一种前沿的天线技术,具有可扩展和可持续的解决方案。 DMA 的可 configurable 分析信号处理能力使其在成本限制下实现高效性。 然而,在多输入多输出 (MIMO) 通信系统中使用 DMA 会带来频道估计的挑战,因为 DMA 的分析压缩会导致频道估计困难。在这篇论文中,我们提出了两种基于模型学习方法来解决这个挑战。我们的方法是将频道估计视为压缩感知问题,其中感知矩阵由随机 DMA 质量矩阵和空间格点词典组成。然后,我们使用学习舒缩和阈值算法 (LISTA) 来恢复稀疏频道参数。LISTA 将舒缩和阈值算法拓展成神经网络,并在神经网络中培训一个高效的频道估计器,与之前的频道相符。在感知矩阵对于 LISTA 的准确性很重要,我们因此引入了另一种数据帮助的方法,即 LISTA-感知矩阵优化 (LISTA-SMO)。LISTA-SMO 将 LISTA 作为后备,并将感知矩阵优化层 embedding 在 LISTA 神经网络中,以同时优化感知矩阵和 LISTA 的训练。此外,我们还提出了一种无supervision learning技术,以解决获取干净数据的困难。我们的数字结果表明,LISTA 比传统稀疏恢复方法更高效和准确地进行频道估计。此外,LISTA-SMO 在频道准确性方面比 LISTA 高,证明了感知矩阵优化的效果。

Joint Source-Channel Coding for Channel-Adaptive Digital Semantic Communications

  • paper_url: http://arxiv.org/abs/2311.08146
  • repo_url: None
  • paper_authors: Joohyuk Park, Yongjeong Oh, Seonjung Kim, Yo-Seb Jeon
  • for: 这个论文旨在提出一种 JOINT SOURCE-CHANNEL CODING (JSCC) 方法,用于适应通道的数位 semantics 通信系统中。
  • methods: 这个方法使用了一种新的检测方法来改善数位semantics通信系统中的稳定性,并开发了一种可靠的终端训练策略,以提高 JSCC 编码器和解oder 的可靠性和灵活性。
  • results: 这个方法在实验中被证明可以对数位图像分类和重建任务进行改进,比较现有的 JSCC 方法表现更好。
    Abstract In this paper, we propose a novel joint source-channel coding (JSCC) approach for channel-adaptive digital semantic communications. In semantic communication systems with digital modulation and demodulation, end-to-end training and robust design of JSCC encoder and decoder becomes challenging due to the nonlinearity of modulation and demodulation processes, as well as diverse channel conditions and modulation orders. To address this challenge, we first develop a new demodulation method which assesses the uncertainty of the demodulation output to improve the robustness of the digital semantic communication system. We then devise a robust training strategy that facilitates end-to-end training of the JSCC encoder and decoder, while enhancing their robustness and flexibility. To this end, we model the relationship between the encoder's output and decoder's input using binary symmetric erasure channels and then sample the parameters of these channels from diverse distributions. We also develop a channel-adaptive modulation technique for an inference phase, in order to reduce the communication latency while maintaining task performance. In this technique, we adaptively determine modulation orders for the latent variables based on channel conditions. Using simulations, we demonstrate the superior performance of the proposed JSCC approach for both image classification and reconstruction tasks compared to existing JSCC approaches.
    摘要 在这篇论文中,我们提出了一种新的联合源码混合(JSCC)方法,用于适应通道condition下的数字semantic通信系统。在数字模ulation和демодуляción中,因为模ulation和демодуляción过程的非线性,以及不同通道条件和模ulationOrder,所以JSCC编码器和解码器的端到端训练和稳定设计变得挑战性更高。为 Addressing this challenge, we first develop a new demodulation method that assesses the uncertainty of the demodulation output to improve the robustness of the digital semantic communication system. We then devise a robust training strategy that facilitates end-to-end training of the JSCC encoder and decoder, while enhancing their robustness and flexibility. To this end, we model the relationship between the encoder's output and decoder's input using binary symmetric erasure channels, and then sample the parameters of these channels from diverse distributions. We also develop a channel-adaptive modulation technique for an inference phase, in order to reduce the communication latency while maintaining task performance. In this technique, we adaptively determine modulation orders for the latent variables based on channel conditions. Using simulations, we demonstrate the superior performance of the proposed JSCC approach for both image classification and reconstruction tasks compared to existing JSCC approaches.

On the View-and-Channel Aggregation Gain in Integrated Sensing and Edge AI

  • paper_url: http://arxiv.org/abs/2311.07986
  • repo_url: None
  • paper_authors: Xu Chen, Khaled B. Letaief, Kaibin Huang
  • For: The paper is written to explore the fundamental performance gains of view-and-channel aggregation in Integrated sensing and edge AI (ISEA) systems for Internet-of-Things (IoT) applications.* Methods: The paper uses a well-established distribution model of multi-view sensing data, which is modified to represent individual sensor observation perspectives. The authors also use a novel approach involving a scaling-tight uncertainty surrogate function, global discriminant gain, distribution of receive Signal-to-Noise Ratio (SNR), and channel induced discriminant loss to study the End-to-End sensing (inference) uncertainty of the ISEA system.* Results: The paper shows that the End-to-End sensing uncertainty diminishes at an exponential rate as the number of views/sensors grows, with a rate proportional to global discriminant gain. Additionally, the authors find that the exponential scaling remains even with channel distortion, but with a reduced decay rate related to the channel induced discriminant loss. The insights from the paper are validated by experiments using real-world dataset.
    Abstract Sensing and edge artificial intelligence (AI) are two key features of the sixth-generation (6G) mobile networks. Their natural integration, termed Integrated sensing and edge AI (ISEA), is envisioned to automate wide-ranging Internet-of-Tings (IoT) applications. To achieve a high sensing accuracy, multi-view features are uploaded to an edge server for aggregation and inference using an AI model. The view aggregation is realized efficiently using over-the-air computing (AirComp), which also aggregates channels to suppress channel noise. At its nascent stage, ISEA still lacks a characterization of the fundamental performance gains from view-and-channel aggregation, which motivates this work. Our framework leverages a well-established distribution model of multi-view sensing data where the classic Gaussian-mixture model is modified by adding sub-spaces matrices to represent individual sensor observation perspectives. Based on the model, we study the End-to-End sensing (inference) uncertainty, a popular measure of inference accuracy, of the said ISEA system by a novel approach involving designing a scaling-tight uncertainty surrogate function, global discriminant gain, distribution of receive Signal-to-Noise Ratio (SNR), and channel induced discriminant loss. We prove that the E2E sensing uncertainty diminishes at an exponential rate as the number of views/sensors grows, where the rate is proportional to global discriminant gain. Given channel distortion, we further show that the exponential scaling remains with a reduced decay rate related to the channel induced discriminant loss. Furthermore, we benchmark AirComp against equally fast, traditional analog orthogonal access, which reveals a sensing-accuracy crossing point between the schemes, leading to the proposal of adaptive access-mode switching. Last, the insights from our framework are validated by experiments using real-world dataset.
    摘要 sixth-generation (6G) 无线网络中的感知和边缘人工智能(AI)是两个关键特点。将它们天然地融合起来,称为 интеegrated sensing and edge AI(ISEA),可以自动执行广泛的互联网东西(IoT)应用。以实现高精度感知,多视图特征被上传到边缘服务器进行聚合和推理使用AI模型。视图聚合可以高效地实现使用空中计算(AirComp),同时也可以聚合通道来抑制通道噪声。在它的早期阶段,ISEA仍然缺乏对视图和通道聚合的基本性能提升的Characterization,这种 motivates this work。我们的框架利用了已有的多视图感知数据分布模型,其中类型的 Gaussian-mixture model 被修改为包含个人感知观察角度的子空间矩阵。基于模型,我们研究ISEA系统的端到端感知(推理)uncertainty,一种流行的推理准确度度量,通过一种新的扩展紧急函数、全球Discriminant gain、接收Signal-to-Noise Ratio(SNR)分布和通道引起的Discriminant loss来研究。我们证明,端到端感知uncertainty在视图/感知器数量增加时 exponentially decay,其速率与全球Discriminant gain相关。在存在通道扭曲时,我们进一步证明,扩展 decay rate 与通道引起的Discriminant loss相关。此外,我们对AirComp和传统的Analog orthogonal access进行比较,发现感知准确度 crossing point между两种方案,导致了对接入模式的自适应 switching。最后,我们的框架的发现被实际 dataset 验证。

Learning Bayes-Optimal Channel Estimation for Holographic MIMO in Unknown EM Environments

  • paper_url: http://arxiv.org/abs/2311.07908
  • repo_url: None
  • paper_authors: Wentao Yu, Hengtao He, Xianghao Yu, Shenghui Song, Jun Zhang, Ross D. Murch, Khaled B. Letaief
  • For: The paper is written for future 6G systems that use holographic MIMO (HMIMO) technology, which requires efficient channel estimation in arbitrary and unknown EM environments.* Methods: The paper proposes a self-supervised minimum mean-square-error (MMSE) channel estimation algorithm based on powerful machine learning tools, including score matching and principal component analysis. The training stage requires only the pilot signals, without needing to know the spatial correlation, the ground-truth channels, or the received signal-to-noise-ratio.* Results: The proposed algorithm can approach the performance of the oracle MMSE method with an extremely low complexity, making it a competitive candidate in practice.Here is the same information in Simplified Chinese:* For: 这篇论文是为未来的6G系统而写的,该系统使用束幂MIMO(HMIMO)技术,需要高效的通道估计在不知道的EM环境中。* Methods: 论文提出一种基于强大机器学习工具的自助学习最小均方差(MMSE)通道估计算法,包括分数匹配和主成分分析。训练阶段只需要各个批处理信号,不需要知道空间相关性、真实通道分布或接收信号噪声级。* Results: 提议的算法可以接近oracle MMSE方法的性能,但具有极低的复杂性,使其在实践中成为竞争力强的候选人。
    Abstract Holographic MIMO (HMIMO) has recently been recognized as a promising enabler for future 6G systems through the use of an ultra-massive number of antennas in a compact space to exploit the propagation characteristics of the electromagnetic (EM) channel. Nevertheless, the promised gain of HMIMO could not be fully unleashed without an efficient means to estimate the high-dimensional channel. Bayes-optimal estimators typically necessitate either a large volume of supervised training samples or a priori knowledge of the true channel distribution, which could hardly be available in practice due to the enormous system scale and the complicated EM environments. It is thus important to design a Bayes-optimal estimator for the HMIMO channels in arbitrary and unknown EM environments, free of any supervision or priors. This work proposes a self-supervised minimum mean-square-error (MMSE) channel estimation algorithm based on powerful machine learning tools, i.e., score matching and principal component analysis. The training stage requires only the pilot signals, without knowing the spatial correlation, the ground-truth channels, or the received signal-to-noise-ratio. Simulation results will show that, even being totally self-supervised, the proposed algorithm can still approach the performance of the oracle MMSE method with an extremely low complexity, making it a competitive candidate in practice.
    摘要 依 Moore 多规模 MIMO (HMIMO) 在未来的 6G 系统中被认为是一个有前途的推动者,通过在受限空间内部署ULTRA 大量天线来利用电磁频谱(EM)频道的传播特性。然而, promise 的 HMIMO 潜在优势尚未得到充分发挥,因为需要一种有效的高维度通道估计方法。 Bayes 优化的估计器通常需要大量的supervised 训练样本或者假设true 通道分布,这些样本或分布在实际中很难获得,因为系统规模很大,EM 环境复杂。因此,这种工作提议了一种基于强大机器学习工具的自主Supervised 最小二乘误差(MMSE)通道估计算法。该算法只需要启动阶段的导航信号,不需要知道空间相关性,真实通道,或接收信号噪听率。 simulation 结果表明,即使完全自主,提议的算法仍然可以接近 oracle MMSE 方法的性能,而且具有极低的复杂度,使其在实践中成为竞争力强的候选人。

Passive Human Sensing Enhanced by Reconfigurable Intelligent Surface: Opportunities and Challenges

  • paper_url: http://arxiv.org/abs/2311.07873
  • repo_url: None
  • paper_authors: Xinyu Li, Jian Wei You, Ze Gu, Qian Ma, Long Chen, Jingyuan Zhang, Shi Jin, Tie Jun Cui
  • for: 这篇论文旨在探讨通过半导体智能表面(RIS)的干预,实现无线电信号中人体活动相关信息的探测。
  • methods: 论文首先介绍了RIS的基本原理和物理平台,然后根据不同应用场景,对现状技术进行了分类,包括人像、定位和活动识别。
  • results: 论文提出了基于RIS的微动诊断系统,并通过实验证明了这种技术在检测生理指标的潜在潜力。 finally, 论文还讨论了这一领域的技术挑战和机遇。
    Abstract Reconfigurable intelligent surfaces (RISs) have flexible and exceptional performance in manipulating electromagnetic waves and customizing wireless channels. These capabilities enable them to provide a plethora of valuable activity-related information for promoting wireless human sensing. In this article, we present a comprehensive review of passive human sensing using radio frequency signals with the assistance of RISs. Specifically, we first introduce fundamental principles and physical platform of RISs. Subsequently, based on the specific applications, we categorize the state-of-the-art human sensing techniques into three types, including human imaging,localization, and activity recognition. Meanwhile, we would also investigate the benefits that RISs bring to these applications. Furthermore, we explore the application of RISs in human micro-motion sensing, and propose a vital signs monitoring system enhanced by RISs. Experimental results are presented to demonstrate the promising potential of RISs in sensing vital signs for manipulating individuals. Finally, we discuss the technical challenges and opportunities in this field.
    摘要 可重配置智能表面(RIS)具有 flexible 和异常表现,可以控制电磁波和自定义无线通道。这些能力使其能提供许多有价值的活动相关信息,以便促进无线人员感知。在这篇文章中,我们提供了无线人员感知的全面回顾,特别是通过 RIS 的帮助实现的。我们首先介绍 RIS 的基本原理和物理平台。然后,根据应用场景,我们将现有的人类感知技术分为三类,包括人像、本地化和活动识别。此外,我们还 investigate RIS 在人微动感知方面的应用,并提出了基于 RIS 的生命体矢量监测系统。实验结果表明,RIS 在感知生命体矢量方面具有普遍的潜力。最后,我们讨论了这一领域的技术挑战和机遇。

Cost-Efficient Computation Offloading and Service Chain Caching in LEO Satellite Networks

  • paper_url: http://arxiv.org/abs/2311.07872
  • repo_url: None
  • paper_authors: Yantong Wang, Chuanfen Feng, Jiande Sun
  • for: 这篇论文的目的是提出一个基于 mobile-edge-computing 且考虑当地网络限制的 low earth orbit 卫星网络,以提高服务质量和可用性。
  • methods: 本论文使用一种称为服务链快照和计算卸载的方法,并考虑了协力运算、网络资源限制和服务链的特定运行顺序。
  • results: 研究结果显示,该方法可以降低总成本(包括任务延迟和能源消耗)约20%,相比于传统的资料中心架构。
    Abstract The ever-increasing demand for ubiquitous, continuous, and high-quality services poses a great challenge to the traditional terrestrial network. To mitigate this problem, the mobile-edge-computing-enhanced low earth orbit (LEO) satellite network, which provides both communication connectivity and on-board processing services, has emerged as an effective method. The main issue in LEO satellites includes finding the optimal locations to host network functions (NFs) and then making offloading decisions. In this article, we jointly consider the problem of service chain caching and computation offloading to minimize the overall cost, which consists of task latency and energy consumption. In particular, the collaboration among satellites, the network resource limitations, and the specific operation order of NFs in service chains are taken into account. Then, the problem is formulated and linearized as an integer linear programming model. Moreover, to accelerate the solution, we provide a greedy algorithm with cubic time complexity. Numerical investigations demonstrate the effectiveness of the proposed scheme, which can reduce the overall cost by around 20% compared to the nominal case where NFs are served in data centers.
    摘要 Translated into Simplified Chinese:随着服务需求的不断增长,传统的陆地网络面临着严重的挑战。为解决这问题,低轨道卫星网络(LEO),带有通信连接和机能处理服务,已经出现为有效的解决方案。LEO卫星中的主要问题是找到最佳的NF主机位置,然后做卸载决策。在这篇文章中,我们同时考虑服务链缓存和计算卸载问题,以最小化总成本,包括任务延迟和能耗总量。具体来说,我们考虑卫星之间的协作、网络资源的限制,以及服务链中NF的具体执行顺序。然后,我们将问题形式化并Linear化为整数线性 програм序列。此外,为加速解决,我们提供了一种 cubic 时间复杂度的急速算法。数值调查表明,我们的方案可以相比nominal情况下,降低总成本约20%。

On the IRS Deployment in Smart Factories Considering Blockage Effects: Collocated or Distributed?

  • paper_url: http://arxiv.org/abs/2311.07843
  • repo_url: None
  • paper_authors: Yixin Zhang, Saeed R. Khosravirad, Xiaoli Chu, Mikko A. Uusitalo
  • for: 该研究旨在支持工厂内的增强移动广播和低延迟通信服务,通过智能反射表面(IRS)的排列和分布部署。
  • methods: 该研究使用了一个渠道模型,该模型包括每个传输路径的线视图概率和功率损失,并提出了三个纪录器,即预期的噪声比率、预期的块列长度(FB)容量和预期的失业概率,其中预期是通过内部堵塞和通道抑制的概率分布来计算。
  • results: 研究发现,对于高堵塞密度,分布部署可以提高预期接收噪声比率和预期FB容量;对于低堵塞密度,URLLC服务可以从分布部署中受益,而eMBB服务则不受分布部署的影响。
    Abstract In this article, we study the collocated and distributed deployment of intelligent reflecting surfaces (IRS) for a fixed total number of IRS elements to support enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) services inside a factory. We build a channel model that incorporates the line-of-sight (LOS) probability and power loss of each transmission path, and propose three metrics, namely, the expected received signal-to-noise ratio (SNR), expected finite-blocklength (FB) capacity, and expected outage probability, where the expectation is taken over the probability distributions of interior blockages and channel fading. The expected received SNR and expected FB capacity for extremely high blockage densities are derived in closed-form as functions of the amount and height of IRSs and the density, size, and penetration loss of blockages, which are verified by Monte Carlo simulations. Results show that deploying IRSs vertically higher leads to higher expected received SNR and expected FB capacity. By analysing the average/minimum/maximum of the three metrics versus the number of IRSs, we find that for high blockage densities, both eMBB and URLLC services benefit from distributed deployment; and for low blockage densities, URLLC services benefit from distributed deployment while eMBB services see limited difference between collocated and distributed deployment.
    摘要 在这篇文章中,我们研究了彩色彩镜(IRS)的分布式部署,以支持内部的增强移动广播(eMBB)和低延迟低可靠通信(URLLC)服务。我们构建了一个通道模型,该模型包括每个传输路径的直视程度(LOS)概率和功率损失,并提出了三个指标,即预期的干扰比率(SNR)、预期的固定块长度(FB)容量,以及预期的失业概率。这三个指标的预期值在高封闭率下是几何函数,它们随着彩镜数量和封闭物体的密度、大小和渗透损失而变化。我们通过 Monte Carlo 仿真来验证这些结果。结果显示,彩镜高度越高,预期的干扰比率和固定块长度容量都越高。通过分析平均/最小/最大的三个指标对彩镜数量的影响,我们发现在高封闭率下, beiden eMBB 和 URLLC 服务都受益于分布式部署;在低封闭率下,URLLC 服务受益于分布式部署,而 eMBB 服务则不受分布式部署的影响。

cs.SD - 2023-11-13

Distributed pressure matching strategy using diffusion adaptation

  • paper_url: http://arxiv.org/abs/2311.07729
  • repo_url: None
  • paper_authors: Mengfei Zhang, Junqing Zhang, Jie Chen, Cédric Richard
  • for: 这篇论文是关于如何解决个人声区系统(PSZ)任务中的时变Acoustic问题的。
  • methods: 这篇论文提出了一种分布式压力匹配(PM)方法,利用分散适应(DPM-D)技术来分散计算负担,从而解决中央化方法的高计算复杂性和高精度要求。
  • results: simulations和计算复杂性分析表明,分布式PM方法在多频分布式环境中具有更高的计算效率和精度,与中央化方法相比。
    Abstract Personal sound zone (PSZ) systems, which aim to create listening (bright) and silent (dark) zones in neighboring regions of space, are often based on time-varying acoustics. Conventional adaptive-based methods for handling PSZ tasks suffer from the collection and processing of acoustic transfer functions~(ATFs) between all the matching microphones and all the loudspeakers in a centralized manner, resulting in high calculation complexity and costly accuracy requirements. This paper presents a distributed pressure-matching (PM) method relying on diffusion adaptation (DPM-D) to spread the computational load amongst nodes in order to overcome these issues. The global PM problem is defined as a sum of local costs, and the diffusion adaption approach is then used to create a distributed solution that just needs local information exchanges. Simulations over multi-frequency bins and a computational complexity analysis are conducted to evaluate the properties of the algorithm and to compare it with centralized counterparts.
    摘要 personal sound zone (PSZ) 系统,目的在于在邻近空间区域创建听众(亮)和沉默(暗)区域,经常基于时变音响学。传统的适应基于方法(ATF)处理 PSZ 任务,由于需要收集和处理所有听笔和所有扬声器之间的听音传函数(ATF),因此会带来高度复杂的计算和昂贵的准确性要求。本文提出了分布式压力匹配(PM)方法,基于扩散适应(DPM-D)来分担计算负担,以超越这些问题。全局 PM 问题定义为一个Local cost的总和,然后使用扩散适应approach来创建分布式解决方案,只需要本地信息交换。通过多频分布和计算复杂性分析,评估算法的性能和与中央化对手进行比较。

Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise mode

  • paper_url: http://arxiv.org/abs/2311.07363
  • repo_url: https://github.com/mathieulagrange/ddspmusicbandwidthextension
  • paper_authors: Pierre-Amaury Grumiaux, Mathieu Lagrange
  • for: Audio signal bandwidth extension, specifically for monophonic and polyphonic musical signals.
  • methods: Uses a differentiable digital signal processing (DDSP) model, which is a neural network with relatively few parameters that is trained to infer the parameters of a differentiable digital signal processing model.
  • results: Proposed models surpass a higher complexity deep learning model for an objective metric computed in the frequency domain, and are also confirmed to have superior perceptual quality through a MUSHRA listening test.Here’s the Chinese translation of the three points:
  • for: audio信号带宽扩展,特别是对于单声道和多声道乐音信号。
  • methods: 使用梯度可微的数字信号处理(DDSP)模型,该模型是一个具有相对少量参数的神经网络,用于推理DDSP模型的参数。
  • results: 提议的模型在频域中的一个对象指标上超过了更高复杂度的深度学习模型,并通过MUSHRA听力测试得到了更好的感知质量。
    Abstract The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as audio coding or audio restoration. In this article, we focus on efficient bandwidth extension of monophonic and polyphonic musical signals using a differentiable digital signal processing (DDSP) model. Such a model is composed of a neural network part with relatively few parameters trained to infer the parameters of a differentiable digital signal processing model, which efficiently generates the output full-band audio signal. We first address bandwidth extension of monophonic signals, and then propose two methods to explicitely handle polyphonic signals. The benefits of the proposed models are first demonstrated on monophonic and polyphonic synthetic data against a baseline and a deep-learning-based resnet model. The models are next evaluated on recorded monophonic and polyphonic data, for a wide variety of instruments and musical genres. We show that all proposed models surpass a higher complexity deep learning model for an objective metric computed in the frequency domain. A MUSHRA listening test confirms the superiority of the proposed approach in terms of perceptual quality.
    摘要 音频信号的带宽扩展问题是基于听到的低频部分声音的知识,生成缺失的高频部分。这个问题适用于各种问题,如音频编码或音频修复。在这篇文章中,我们关注使用拟 diferenciable digital signal processing(DDSP)模型进行高效的带宽扩展。这种模型由一个具有相对少量参数的神经网络部分和一个可微分的数字信号处理模型组成。这种模型可以高效地生成全带宽音频信号。我们首先对单声音信号进行带宽扩展,然后提出了两种方法来特别处理多声音信号。我们在单声音和多声音 sintetic 数据上对基线和深度学习模型进行比较,并证明了我们的模型在频域中的目标指标上胜过深度学习模型。在录制的单声音和多声音数据上,我们发现所有我们提出的模型都超过了深度学习模型的高复杂性。在Perceptual Quality 中,我们通过 MUSHRA 听测表明了我们的方法的优越性。

Zero-Shot Duet Singing Voices Separation with Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.07345
  • repo_url: https://github.com/yoyololicon/duet-svs-diffusion
  • paper_authors: Chin-Yun Yu, Emilian Postolache, Emanuele Rodolà, György Fazekas
  • for: 这篇论文是为了解决对声音反问题中的源分离问题,具体来说是在分离二人或更多人的合唱声音中,保持歌手身份的一致性。
  • methods: 这篇论文使用了扩散模型作为假设,通过控制扩散过程来采样 posterior 分布中的目标信号。在解决对声音反问题中,提议使用 auto-regressive 方式进行 posterior 采样,并在每个过程中使用前一个过程的结果来保持歌手身份的一致性。
  • results: 在使用 MedleyVox 数据集进行评估时,提议的方法比基于 posterior 采样的基线方法表现更好,能够更好地保持歌手身份的一致性。
    Abstract In recent studies, diffusion models have shown promise as priors for solving audio inverse problems. These models allow us to sample from the posterior distribution of a target signal given an observed signal by manipulating the diffusion process. However, when separating audio sources of the same type, such as duet singing voices, the prior learned by the diffusion process may not be sufficient to maintain the consistency of the source identity in the separated audio. For example, the singer may change from one to another occasionally. Tackling this problem will be useful for separating sources in a choir, or a mixture of multiple instruments with similar timbre, without acquiring large amounts of paired data. In this paper, we examine this problem in the context of duet singing voices separation, and propose a method to enforce the coherency of singer identity by splitting the mixture into overlapping segments and performing posterior sampling in an auto-regressive manner, conditioning on the previous segment. We evaluate the proposed method on the MedleyVox dataset and show that the proposed method outperforms the naive posterior sampling baseline. Our source code and the pre-trained model are publicly available at https://github.com/yoyololicon/duet-svs-diffusion.
    摘要

Research and experimental verification on low-frequency long-range underwater sound propagation dispersion characteristics under dual-channel sound speed profiles in the Chukchi Plateau

  • paper_url: http://arxiv.org/abs/2311.08425
  • repo_url: None
  • paper_authors: Jinbao Weng, Yubo Qi, Yanming Yang, Hongtao Wen, Hongtao Zhou, Ruichao Xue
  • for: 研究了俄罗斯海和加拿大海湾的双通道声速 Profiling下的低频宽带声信号propagation特性
  • methods: 使用了正常模式理论来研究低频宽带声信号propagation dispersion的细结构,并使用修改后的扭变算符来分离正常模式
  • results: 解释了双通道声速 Profiling下normal mode dispersion曲线交叉的问题,分析了海底地形变化对dispersion结构的屏蔽效应,并通过长距离地震探测实验在俄罗斯海湾进行验证Here’s the same information in English:
  • for: Researched the low-frequency wide-band sound signal propagation characteristics under dual-channel sound speed profiles in the Chukchi Plateau and the Canadian Basin
  • methods: Used the theory of normal modes to study the fine structure of low-frequency wide-band sound propagation dispersion under dual-channel sound speed profiles, and used a modified warping operator to separate the normal modes
  • results: Explained the intersection of normal mode dispersion curves caused by the dual-channel sound speed profile, analyzed the blocking effect of seabed terrain changes on dispersion structures, and verified the results through a long-range seismic exploration experiment at the Chukchi Plateau. Additionally, proposed two methods for estimating the distance of sound sources based on acoustic signal characteristics in this environment, and verified these methods through experiment data at sea.
    Abstract The dual-channel sound speed profiles of the Chukchi Plateau and the Canadian Basin have become current research hotspots due to their excellent low-frequency sound signal propagation ability. Previous research has mainly focused on using sound propagation theory to explain the changes in sound signal energy. This article is mainly based on the theory of normal modes to study the fine structure of low-frequency wide-band sound propagation dispersion under dual-channel sound speed profiles. In this paper, the problem of the intersection of normal mode dispersion curves caused by the dual-channel sound speed profile (SSP) has been explained, the blocking effect of seabed terrain changes on dispersion structures has been analyzed, and the normal modes has been separated by using modified warping operator. The above research results have been verified through a long-range seismic exploration experiment at the Chukchi Plateau. At the same time, based on the acoustic signal characteristics in this environment, two methods for estimating the distance of sound sources have been proposed, and the experiment data at sea has also verified these two methods.
    摘要 “中险棚渠和加拿大海盆的双渠道声速 Profilestoday是研究热点,因为它们具有出色的低频声信号传播能力。以前的研究主要基于声传播理论来解释声信号能量的变化。本文基于准模理论来研究在双渠道声速 Profilestop下细腔低频宽带声信号传播折叠的细结构。文中解释了双渠道声速 Profilestop下准模折叠曲线的交叠问题,分析了海底地形变化对折叠结构的屏蔽效应,并使用修改的折卷算子来分离准模。研究结果得到了在鄂霍次克棚渠进行长距离地震探测实验的验证。同时,根据海洋环境中声信号特点,提出了两种方法来估计声源距离,并在实验数据上验证了这两种方法。”Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".

SponTTS: modeling and transferring spontaneous style for TTS

  • paper_url: http://arxiv.org/abs/2311.07179
  • repo_url: https://github.com/kkksuper/SponTTS
  • paper_authors: Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie
    for:本文主要用于模型和传递自由式发言的 Style,以提高自由式发言的自然性、表达力和发音相似性。methods:本文提出了一种两阶段方法,首先采用 Conditional Variational Autoencoder (CVAE) 来捕捉自由式发言的准则,并通过偏置自由式现象嵌入预测损失来包含自由式现象。其次,我们引入了流程基于预测器来预测文本中的自由式样式表示,以增强在推理中的语音和语境特定自由式现象。results:实验表明,SponTTS 能够有效地模型自由式 Style 并传递 Style 到目标说话者,生成自由式语音具有高度的自然性、表达力和发音相似性。 zero-shot 自由式 Style TTS 测试进一步证明了 SponTTS 在生成未经见过的说话者的自由式语音时的普适性和稳定性。
    Abstract Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.
    摘要 自然口语风格 exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous data constrains spontaneous speech generation for speakers without spontaneous data. To address these problems, we propose SponTTS, a two-stage approach based on bottleneck (BN) features to model and transfer spontaneous style for TTS. In the first stage, we adopt a Conditional Variational Autoencoder (CVAE) to capture spontaneous prosody from a BN feature and involve the spontaneous phenomena by the constraint of spontaneous phenomena embedding prediction loss. Besides, we introduce a flow-based predictor to predict a latent spontaneous style representation from the text, which enriches the prosody and context-specific spontaneous phenomena during inference. In the second stage, we adopt a VITS-like module to transfer the spontaneous style learned in the first stage to target speakers. Experiments demonstrate that SponTTS is effective in modeling spontaneous style and transferring the style to the target speakers, generating spontaneous speech with high naturalness, expressiveness, and speaker similarity. The zero-shot spontaneous style TTS test further verifies the generalization and robustness of SponTTS in generating spontaneous speech for unseen speakers.

Research and experimental verification on low-frequency long-range sound propagation characteristics under ice-covered and range-dependent marine environment in the Arctic

  • paper_url: http://arxiv.org/abs/2311.07175
  • repo_url: None
  • paper_authors: Jinbao Weng, Yubo Qi, Yanming Yang, Hongtao Wen, Hongtao Zhou, Ruichao Xue
  • for: 这篇论文主要研究的是在北极冰层下propagation of low-frequency broadband acoustic signals,尤其是声传输损耗和声信号的时域波形和细谱结构。
  • methods: 该文使用了normal modes theory和 measurement of ocean environmental parameters和声场计算来研究北极深海环境中low-frequency long-range声信号的一般法律,并解释了环境因素如海底地形变化、水 column velocity profile变化和海冰覆盖的影响 mechanisms on low-frequency long-range sound propagation in the Arctic.
  • results: 该文通过在北极进行了一次声传播实验,并确认了上述研究观点,并首次使用了折射正常波的折射变换器来实现单一ydrophone based separation of normal waves and extraction of dispersion structures.
    Abstract At present, research on sound propagation under the Arctic ice mainly focuses on modeling and experimental verification of sound propagation under sea ice cover and unique sound velocity profiles. Among them, the main research object of concern is sound transmission loss, and this article will delve into the time-domain waveform and fine dispersion structure of low-frequency broadband acoustic signals. Firstly, based on the theory of normal modes, this article derives the horizontal wavenumber expression and warping transformation operator for refractive normal modes in the Arctic deep-sea environment. Subsequently, based on measured ocean environmental parameters and sound field simulation calculations, this article studied the general laws of low-frequency long-range sound propagation signals in the Arctic deep-sea environment, and elucidated the impact mechanism of environmental factors such as seabed terrain changes, horizontal changes in sound velocity profiles (SSPs), and sea ice cover on low-frequency long-range sound propagation in the Arctic. This article validates the above research viewpoint through a sound propagation experiment conducted in the Arctic with a propagation distance exceeding 1000km. The marine environment of this experiment has obvious horizontal variation characteristics. At the same time, this article takes the lead in utilizing the warping transformation of refractive normal waves in the Arctic waters to achieve single hydrophone based separation of normal waves and extraction of dispersion structures, which is conducive to future research on underwater sound source localization and environmental parameter inversion based on dispersion structures.
    摘要 Translated into Simplified Chinese:现在,在北极冰层下的声波传播研究主要集中在模拟和实验验证冰层下的声波传播,以及独特的声速 Profil。其中,研究对象主要是声传损,这篇文章将探讨声波时域波形和细谱振荡结构的低频广频声信号。首先,根据正常模式理论,这篇文章 derivates了在北极深海环境中的横向波数表达和折叠变换算子。然后,通过测量海洋环境参数和声场 simulate calculation,这篇文章研究了北极深海环境中低频长距离声波传播的通用规律,并描述了环境因素如海底地形变化、水 Column 的横向声速 Profil 变化和海冰覆盖的影响于北极深海环境中低频长距离声波传播。这篇文章通过在北极进行了超过 1000km 的声波传播实验, validate 了上述研究视角。实验 Marine 环境具有明显的横向变化特征。同时,这篇文章首次利用北极水域中的折叠变换来实现单 hydrophone 基于 separation of normal waves 和折叠结构的提取,这有助于未来基于折叠结构的声源 localization 和环境参数反射。

Music ControlNet: Multiple Time-varying Controls for Music Generation

  • paper_url: http://arxiv.org/abs/2311.07069
  • repo_url: None
  • paper_authors: Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. Bryan
  • for: 这个论文的目的是提出一种基于扩散的音乐生成模型,允许用户在不同的时间点进行精细控制音乐的生成。
  • methods: 该模型使用了一种基于扩散的 conditional generative model,通过在音频spectrogram上进行微调来实现时间点的控制。
  • results: 该模型可以生成具有高质量和精细控制的音乐,并且在不同的时间点进行控制。与其他相似的音乐生成模型进行比较,该模型能够生成更 faithful 的音乐,即使它具有许多 fewer 参数和训练数据。
    Abstract Text-to-music generation models are now capable of generating high-quality music audio in broad styles. However, text control is primarily suitable for the manipulation of global musical attributes like genre, mood, and tempo, and is less suitable for precise control over time-varying attributes such as the positions of beats in time or the changing dynamics of the music. We propose Music ControlNet, a diffusion-based music generation model that offers multiple precise, time-varying controls over generated audio. To imbue text-to-music models with time-varying control, we propose an approach analogous to pixel-wise control of the image-domain ControlNet method. Specifically, we extract controls from training audio yielding paired data, and fine-tune a diffusion-based conditional generative model over audio spectrograms given melody, dynamics, and rhythm controls. While the image-domain Uni-ControlNet method already allows generation with any subset of controls, we devise a new strategy to allow creators to input controls that are only partially specified in time. We evaluate both on controls extracted from audio and controls we expect creators to provide, demonstrating that we can generate realistic music that corresponds to control inputs in both settings. While few comparable music generation models exist, we benchmark against MusicGen, a recent model that accepts text and melody input, and show that our model generates music that is 49% more faithful to input melodies despite having 35x fewer parameters, training on 11x less data, and enabling two additional forms of time-varying control. Sound examples can be found at https://MusicControlNet.github.io/web/.
    摘要 文本到音乐生成模型现在可以生成高质量的音乐声音,但是文本控制主要适用于globale musical attribute的控制,如种类、情感和旋律,而不太适合精确控制时间变化的属性,如音乐的拍点位置或变化的声音 dynamics。我们提出了Music ControlNet,一种 diffusion-based music生成模型,提供了多种精确、时间变化的控制方法。为了让文本到音乐模型具有时间变化控制,我们采用了像 pixel-wise control的image-domain ControlNet方法。具体来说,我们从训练音频中提取控制,并使用 diffusion-based conditional生成模型在音频spectrogram中进行了 fine-tune。而在image-domain ControlNet方法中,可以生成任何subset of controls,我们开发了一种新的策略,允许创作者输入只部分Specified in time的控制。我们对于从音频中提取的控制和我们预期创作者提供的控制进行评估,示出了我们可以在两种设置下生成符合控制输入的真实音乐。虽然只有少数相关的音乐生成模型存在,我们对MusicGen,一种最近的模型,进行了比较,并显示了我们的模型可以生成与输入旋律更加准确的音乐,即使有35倍少于参数,在11倍少于数据上进行了训练,并允许两种额外的时间变化控制。音乐示例可以在https://MusicControlNet.github.io/web/中找到。

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

  • paper_url: http://arxiv.org/abs/2311.07062
  • repo_url: None
  • paper_authors: Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie
  • for: 这个论文的目的是提出一种基于多任务学习的语音识别和口音识别模型(DIMNet),以提高多口音场景下的语音识别精度。
  • methods: 这个模型使用了分解连接主义时间分类(CTC)分支、语音识别(ASR)分支和口音识别(AR)分支,并使用了两种粒度的模型单元来学习任务特有的表示。此外,模型还使用了文本掌握和口音特征的交互来提高语音识别和口音识别的性能。
  • results: 试验结果表明,提出的模型在英语和中文数据集上都达到了比较出色的性能,相比标准基准模型,AR准确率提高21.45%/28.53%,ASR错误率下降32.33%/14.55%。
    Abstract Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling multi-accent scenarios, current multi-task ASR-AR approaches overlook the granularity differences between tasks. Fine-grained units capture pronunciation-related accent characteristics, while coarse-grained units are better for learning linguistic information. Moreover, an explicit interaction of two tasks can also provide complementary information and improve the performance of each other, but it is rarely used by existing approaches. In this paper, we propose a novel Decoupling and Interacting Multi-task Network (DIMNet) for joint speech and accent recognition, which is comprised of a connectionist temporal classification (CTC) branch, an AR branch, an ASR branch, and a bottom feature encoder. Specifically, AR and ASR are first decoupled by separated branches and two-granular modeling units to learn task-specific representations. The AR branch is from our previously proposed linguistic-acoustic bimodal AR model and the ASR branch is an encoder-decoder based Conformer model. Then, for the task interaction, the CTC branch provides aligned text for the AR task, while accent embeddings extracted from our AR model are incorporated into the ASR branch's encoder and decoder. Finally, during ASR inference, a cross-granular rescoring method is introduced to fuse the complementary information from the CTC and attention decoder after the decoupling. Our experiments on English and Chinese datasets demonstrate the effectiveness of the proposed model, which achieves 21.45%/28.53% AR accuracy relative improvement and 32.33%/14.55% ASR error rate relative reduction over a published standard baseline, respectively.
    摘要 字符识别(ASR)和口音识别(AR)是两个相关的任务,但是传统的多任务学习方法通常忽略了这两个任务之间的细腻差异。我们在本文提出了一种新的分离和互动多任务网络(DIMNet),用于同时进行字符识别和口音识别。该模型包括一个连接式时间分类(CTC)分支、一个AR分支、一个ASR分支和底层特征编码器。特别是,AR和ASR两个任务首先被分离为两个不同的分支和两个不同的模型单元,以学习任务特定的表示。AR分支采用我们之前提出的语言-听音双模型,而ASR分支采用一个基于Conformer模型的encoder-decoder结构。然后,为了实现任务之间的互动,CTC分支提供了对AR任务的已知文本,而AR模型中提取出来的口音嵌入被integrated到ASR分支的编码器和解码器中。最后,在ASR推理过程中,我们引入了一种交叉гра듷推理方法,以融合CTC和注意力解码器中的补充信息。我们在英语和中文 dataset上进行了实验,并证明了我们的模型的效果,其中对于英语 dataset,Relative AR准确率提高21.45%,Relative ASR错误率降低32.33%,对于中文 dataset,Relative AR准确率提高28.53%,Relative ASR错误率降低14.55%,相比之下,相对提高了21.45%/28.53%和32.33%/14.55%。

cs.CV - 2023-11-13

Assessing Test-time Variability for Interactive 3D Medical Image Segmentation with Diverse Point Prompts

  • paper_url: http://arxiv.org/abs/2311.07806
  • repo_url: https://github.com/medicl-vu/variability
  • paper_authors: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz
  • for: 这 paper 是为了研究 interactive medical image segmentation 的可靠性和可重现性问题。
  • methods: 这 paper 使用了 prompt engineering 技术,通过在测试时提供用户提示来生成精准的分割结果。
  • results: 这 paper 的实验结果表明,在使用多个点提示时,可以提高分割精度和可重现性。同时,通过对提示的位置和数量进行优化,可以提高分割结果的可靠性。
    Abstract Interactive segmentation model leverages prompts from users to produce robust segmentation. This advancement is facilitated by prompt engineering, where interactive prompts serve as strong priors during test-time. However, this is an inherently subjective and hard-to-reproduce process. The variability in user expertise and inherently ambiguous boundaries in medical images can lead to inconsistent prompt selections, potentially affecting segmentation accuracy. This issue has not yet been extensively explored for medical imaging. In this paper, we assess the test-time variability for interactive medical image segmentation with diverse point prompts. For a given target region, the point is classified into three sub-regions: boundary, margin, and center. Our goal is to identify a straightforward and efficient approach for optimal prompt selection during test-time based on three considerations: (1) benefits of additional prompts, (2) effects of prompt placement, and (3) strategies for optimal prompt selection. We conduct extensive experiments on the public Medical Segmentation Decathlon dataset for challenging colon tumor segmentation task. We suggest an optimal strategy for prompt selection during test-time, supported by comprehensive results. The code is publicly available at https://github.com/MedICL-VU/variability
    摘要 互动分割模型利用用户提供的提示来生成稳定的分割。这个进步得到了提示工程学支持,其中互动提示在试用时作为强大的先验条件。然而,这是一个内在受主观和难以重复的过程。用户专业知识和医疗影像中的自然欠缺缘点可能导致提示选择不稳定,这可能对分割精度产生影响。这个问题在医疗影像分割领域仍未得到充分探讨。在这篇研究中,我们评估了互动医疗影像分割中多个点提示的试用时间多样性。对于给定目标区域,点会被分为三个子区域:boundary、margin和center。我们的目标是发现一个简单和高效的方法,以便在试用时间选择最佳提示,基于以下三个考虑因素:1. 额外提示的优点2. 提示位置的影响3. 最佳提示选择策略我们在公共的医疗影像分割十大项目数据集上进行了广泛的实验,以挑战colon淋巴癌分割任务。我们建议一个最佳的提示选择策略,并提供了全面的结果。软件可以在https://github.com/MedICL-VU/variability上获取。

CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion

  • paper_url: http://arxiv.org/abs/2311.07788
  • repo_url: https://github.com/andersxa/cslp-ae
  • paper_authors: Anders Vestergaard Nørskov, Alexander Neergaard Zahid, Morten Mørup
  • for: 提高EEG数据的抽象和普适性,以提高脑功能的研究和分析。
  • methods: 提出了一种基于对比学习的划分卷积自适应器(CSLP-AE)框架,通过对比学习来引导卷积的 latent splits 表示主体(样式)和任务(内容)的特征,以提高 EEG 的抽象和普适性。
  • results: 比较 CSLP-AE 与传统的超级vised、无监督(AE)和自监督(对比学习)训练方法,发现 CSLP-AE 提供了更好的普适性和适应性,并且可以实现零例转换 между未看过的主体。
    Abstract Electroencephalography (EEG) is a prominent non-invasive neuroimaging technique providing insights into brain function. Unfortunately, EEG data exhibit a high degree of noise and variability across subjects hampering generalizable signal extraction. Therefore, a key aim in EEG analysis is to extract the underlying neural activation (content) as well as to account for the individual subject variability (style). We hypothesize that the ability to convert EEG signals between tasks and subjects requires the extraction of latent representations accounting for content and style. Inspired by recent advancements in voice conversion technologies, we propose a novel contrastive split-latent permutation autoencoder (CSLP-AE) framework that directly optimizes for EEG conversion. Importantly, the latent representations are guided using contrastive learning to promote the latent splits to explicitly represent subject (style) and task (content). We contrast CSLP-AE to conventional supervised, unsupervised (AE), and self-supervised (contrastive learning) training and find that the proposed approach provides favorable generalizable characterizations of subject and task. Importantly, the procedure also enables zero-shot conversion between unseen subjects. While the present work only considers conversion of EEG, the proposed CSLP-AE provides a general framework for signal conversion and extraction of content (task activation) and style (subject variability) components of general interest for the modeling and analysis of biological signals.
    摘要 电 electroencephalography (EEG) 是一种非侵入性的脑成像技术,提供了脑功能的重要信息。然而,EEG 数据具有高度的噪声和主题变化,使得普遍化的信号提取受到阻碍。因此,EEG 分析中的一个关键目标是提取内在的神经活动(内容)以及考虑主题变化(样式)。我们假设可以将 EEG 信号转换到不同的任务和主题之间,需要提取潜在表示主题和任务的独特表示。这个目标可以通过对 EEG 信号进行对比学习来实现。我们提出了一种新的对比拆分潜在嵌入 autoencoder(CSLP-AE)框架,该框架直接优化了 EEG 信号的转换。关键是,潜在表示被对比学习导引,以便主题和任务分别具有独特的表示。与传统的超级vised、无级vised(AE)和自我超级vised(对比学习)训练相比,我们发现提出的方法可以提供有利的总体特征表示。特别是,该方法还允许零shot转换到未看到的主题。虽然本研究只考虑了 EEG 的转换,但我们提出的 CSLP-AE 提供了一种通用的信号转换和内容(任务活动)以及样式(主题变化)组成部分的模型化和分析的框架。

A Data-Free Approach to Mitigate Catastrophic Forgetting in Federated Class Incremental Learning for Vision Tasks

  • paper_url: http://arxiv.org/abs/2311.07784
  • repo_url: None
  • paper_authors: Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, Salman Avestimehr
  • for: 防止深度学习模型忘记之前学习的信息,特别在联合学习(Federated Learning,FL)中,数据分布式,每个用户的数据可能会独立变化。
  • methods: 提出了一种基于生成模型的联合分类增量学习框架,利用生成模型Synthesize samples from past distributions,以减轻忘记现象。保持隐私性,生成模型在服务器端使用数据free方法进行训练,不需要客户端提供数据。
  • results: 在多个数据集上进行了广泛的实验,证明了该方法可以减轻忘记现象,并且不需要用户保存过去的数据或模型。
    Abstract Deep learning models often suffer from forgetting previously learned information when trained on new data. This problem is exacerbated in federated learning (FL), where the data is distributed and can change independently for each user. Many solutions are proposed to resolve this catastrophic forgetting in a centralized setting. However, they do not apply directly to FL because of its unique complexities, such as privacy concerns and resource limitations. To overcome these challenges, this paper presents a framework for \textbf{federated class incremental learning} that utilizes a generative model to synthesize samples from past distributions. This data can be later exploited alongside the training data to mitigate catastrophic forgetting. To preserve privacy, the generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Moreover, our solution does not demand the users to store old data or models, which gives them the freedom to join/leave the training at any time. Additionally, we introduce SuperImageNet, a new regrouping of the ImageNet dataset specifically tailored for federated continual learning. We demonstrate significant improvements compared to existing baselines through extensive experiments on multiple datasets.
    摘要 深度学习模型经常面临新数据训练时忘记之前学习的信息问题。这个问题在 federated learning(FL)中更加严重,因为数据分布在不同用户处并可以独立地变化。许多解决方案在中央化设置下提出来解决这个慢速衰减问题,但这些解决方案不直接适用于 FL 因为其独特的复杂性,如隐私问题和资源限制。为了突破这些挑战,本文提出了一个框架,即 federated class incremental learning(FL-CIL),该框架利用生成模型来synthesize过去分布的样本。这些样本可以在训练数据之 alongside 使用,以 Mitigate 慢速衰减。在保护隐私方面,生成模型在服务器上使用数据free方法进行训练,而无需请求客户端上传数据。此外,我们的解决方案不需要用户保留过去的数据或模型,这给了他们在训练中任意时间加入/离开的自由。此外,我们还提出了一个新的 ImageNet 数据集重新分组,即 SuperImageNet,特意设计用于 federated continual learning。通过了广泛的实验,我们证明了我们的方案与现有基eline存在显著的改进。

FedOpenHAR: Federated Multi-Task Transfer Learning for Sensor-Based Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2311.07765
  • repo_url: None
  • paper_authors: Egemen İşgüder, Özlem Durmaz İncel
  • for: 这篇论文主要针对的是如何使用分布式机器学习技术来实现基于嗅感器的人体活动识别和设备位置识别两个任务。
  • methods: 本文使用了联邦传输学习技术,通过将多个小型数据集合并训练一个共享模型,以实现两个任务的同时进行多任务学习。
  • results: 试验结果表明,通过使用分布式机器学习技术和转移学习,可以在不同的数据集和环境下实现类似于中央化训练的高精度。
    Abstract Motion sensors integrated into wearable and mobile devices provide valuable information about the device users. Machine learning and, recently, deep learning techniques have been used to characterize sensor data. Mostly, a single task, such as recognition of activities, is targeted, and the data is processed centrally at a server or in a cloud environment. However, the same sensor data can be utilized for multiple tasks and distributed machine-learning techniques can be used without the requirement of the transmission of data to a centre. This paper explores Federated Transfer Learning in a Multi-Task manner for both sensor-based human activity recognition and device position identification tasks. The OpenHAR framework is used to train the models, which contains ten smaller datasets. The aim is to obtain model(s) applicable for both tasks in different datasets, which may include only some label types. Multiple experiments are carried in the Flower federated learning environment using the DeepConvLSTM architecture. Results are presented for federated and centralized versions under different parameters and restrictions. By utilizing transfer learning and training a task-specific and personalized federated model, we obtained a similar accuracy with training each client individually and higher accuracy than a fully centralized approach.
    摘要 <> translate into Simplified Chinese运动传感器 интегрирован到了携带式和移动设备中提供了设备用户的有价值信息。机器学习和最近的深度学习技术已经用于描述传感器数据。通常,单个任务,如活动识别,是目标,并将数据处理中央服务器或云环境中。然而,同一个传感器数据可以用于多个任务,并可以使用分布式机器学习技术,无需数据传输到中心。这篇论文探讨了联邦传输学习在多任务方式下的应用,用于满足感知设备的人体活动识别和设备位置识别两个任务。使用OpenHAR框架进行训练,该框架包含10个较小的数据集。目标是在不同的数据集中获得适用于两个任务的模型,可能只包含一些标签类型。在花开联邦学习环境中进行多个实验,使用深度卷积LSTM架构。结果显示,通过使用传输学习和训练任务特定和个性化联邦模型,我们可以获得与每个客户端分别训练的相似准确率,并高于完全中央化方法。Note: Simplified Chinese is used here, as it is more widely used in mainland China. If you prefer Traditional Chinese, I can also provide the translation.

Quality-Aware Prototype Memory for Face Representation Learning

  • paper_url: http://arxiv.org/abs/2311.07734
  • repo_url: None
  • paper_authors: Evgeny Smirnov, Vasiliy Galyuk, Evgeny Lukyanets
  • for: 提高 Prototype Memory 模型的扩展性和精度,使其能够更好地处理低质量或具有异常特征的人脸图像。
  • methods: 提出了一种简单有效的质量感知技术,使得 Prototype Memory 在生成核心时使用不同的权重来处理不同质量的人脸图像,从而提高模型的精度和稳定性。
  • results: 通过对多个人脸识别benchmark进行了广泛的实验,证明了提出的方法可以提高 Prototype Memory 模型的性能,并且在不同质量的人脸图像下表现更加稳定。
    Abstract Prototype Memory is a powerful model for face representation learning. It enables the training of face recognition models using datasets of any size, with on-the-fly generation of prototypes (classifier weights) and efficient ways of their utilization. Prototype Memory demonstrated strong results in many face recognition benchmarks. However, the algorithm of prototype generation, used in it, is prone to the problems of imperfectly calculated prototypes in case of low-quality or poorly recognizable faces in the images, selected for the prototype creation. All images of the same person, presented in the mini-batch, used with equal weights, and the resulting averaged prototype could be contaminated with imperfect embeddings of such face images. It can lead to misdirected training signals and impair the performance of the trained face recognition models. In this paper, we propose a simple and effective way to improve Prototype Memory with quality-aware prototype generation. Quality-Aware Prototype Memory uses different weights for images of different quality in the process of prototype generation. With this improvement, prototypes get more valuable information from high-quality images and less hurt by low-quality ones. We propose and compare several methods of quality estimation and usage, perform extensive experiments on the different face recognition benchmarks and demonstrate the advantages of the proposed model compared to the basic version of Prototype Memory.
    摘要 protoype memory 是一种强大的人脸表示学习模型。它允许通过任意大小的数据集进行人脸识别模型的训练,并且可以在实时生成核心示例(分类器权重)并有效地使用它们。 prototype memory 在许多人脸识别benchmark中显示出了强大的成果。然而, prototype memory 中使用的核心生成算法在低质量或difficult to recognize的面部图像时存在一些问题。当使用相同的人脸图像进行批处理时,所有图像均被视为等重,并且所得到的平均核心可能受到低质量图像的杂乱影响。这可能导致训练信号受扰和人脸识别模型的性能下降。在这篇论文中,我们提出了一种简单有效的方法,用于提高 protoype memory 的质量感知。我们使用不同的权重来处理不同质量的面部图像,以便核心获得更多的高质量图像中的有价值信息,并且减少低质量图像的影响。我们提出了多种质量估计和使用方法,进行了广泛的实验,并在不同的人脸识别benchmark上显示了提高模型的优势。

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

  • paper_url: http://arxiv.org/abs/2311.07574
  • repo_url: https://github.com/x2fd/lvis-instruct4v
  • paper_authors: Junke Wang, Lingchen Meng, Zejia Weng, Bo He, Zuxuan Wu, Yu-Gang Jiang
  • for: 提高大型多媒体模型(LLaVA-1.5)的性能,使其在各种困难的混合媒体任务中表现出色。
  • methods: 引入了高级的视觉指令集(LVIS-Instruct4V),通过Prompting强大的GPT-4V模型使用图像来生成高精度的视觉指令数据。
  • results: 经验验证和案例研究表明,高质量的视觉指令数据可以提高LLaVA-1.5模型在多种混合媒体任务中的性能,比如LLaVA$^w$(76.7 vs. 70.7)和MM-Vet(40.2 vs. 35.4)等。
    Abstract Existing visual instruction tuning methods typically prompt large language models with textual descriptions to generate instruction-following data. Despite the promising performance achieved, these descriptions are derived from image annotations, which are oftentimes coarse-grained. Furthermore, the instructions might even contradict the visual content without observing the entire visual context. To address this challenge, we introduce a fine-grained visual instruction dataset, LVIS-Instruct4V, which contains 220K visually aligned and context-aware instructions produced by prompting the powerful GPT-4V with images from LVIS. Through experimental validation and case studies, we demonstrate that high-quality visual instructional data could improve the performance of LLaVA-1.5, a state-of-the-art large multimodal model, across a wide spectrum of benchmarks by clear margins. Notably, by simply replacing the LLaVA-Instruct with our LVIS-Instruct4V, we achieve better results than LLaVA on most challenging LMM benchmarks, e.g., LLaVA$^w$ (76.7 vs. 70.7) and MM-Vet (40.2 vs. 35.4). We release our data and model at https://github.com/X2FD/LVIS-INSTRUCT4V.
    摘要 现有的视觉指令优化方法通常是通过文本描述向大语言模型提供图像描述来生成指令跟踪数据。尽管这些描述可以达到出色的性能,但是这些描述通常是基于图像注释,它们可能是粗糙的。此外,指令可能与视觉内容不匹配,而不考虑整个视觉上下文。为解决这个挑战,我们引入了高级的视觉指令数据集,LVIS-Instruct4V,它包含220K个视觉对应的高级指令,这些指令是通过提交强大的GPT-4V图像从LVIS来生成的。通过实验验证和案例研究,我们证明了高质量的视觉指令数据可以提高LLaVA-1.5,一个状态之最大的多Modal模型,在多种 bencmarks 上的性能,并且具有明显的优势。例如,通过将LLaVA-Instruct替换为我们的LVIS-Instruct4V,我们可以在许多最复杂的LMM bencmarks上获得更好的结果,例如LLaVA$^w$ (76.7 vs. 70.7) 和 MM-Vet (40.2 vs. 35.4)。我们在 GitHub 上发布了我们的数据和模型,可以通过以下链接获取:https://github.com/X2FD/LVIS-INSTRUCT4V。

Fast Normalized Cross-Correlation for Template Matching with Rotations

  • paper_url: http://arxiv.org/abs/2311.07561
  • repo_url: None
  • paper_authors: José María Almira, Harold Phelippeau, Antonio Martinez-Sanchez
  • for: 用于快速化Template Matching计算(特别是3D图像)
  • methods: 使用一种新的数学理论,可以同时处理平移和旋转,减少计算复杂性
  • results: 可以快速地回归模板实例的位置和旋转角度信息
    Abstract Normalized cross-correlation is the reference approach to carry out template matching on images. When it is computed in Fourier space, it can handle efficiently template translations but it cannot do so with template rotations. Including rotations requires sampling the whole space of rotations, repeating the computation of the correlation each time. This article develops an alternative mathematical theory to handle efficiently, at the same time, rotations and translations. Our proposal has a reduced computational complexity because it does not require to repeatedly sample the space of rotations. To do so, we integrate the information relative to all rotated versions of the template into a unique symmetric tensor template -which is computed only once per template-. Afterward, we demonstrate that the correlation between the image to be processed with the independent tensor components of the tensorial template contains enough information to recover template instance positions and rotations. Our proposed method has the potential to speed up conventional template matching computations by a factor of several magnitude orders for the case of 3D images.
    摘要

  • paper_url: http://arxiv.org/abs/2311.07514
  • repo_url: None
  • paper_authors: Shuting He, Hao Luo, Wei Jiang, Xudong Jiang, Henghui Ding
  • for: 本文提出了一种基于视觉引导的Semantic-Group Network(VGSG),用于文本基于人脸搜索,以EXTRACTING高精度的视觉和文本特征。
  • methods: 本文提出了两个模块:Semantic-Group Textual Learning(SGTL)模块和Vision-guided Knowledge Transfer(VGKT)模块。SGTL模块使用语言表达的semantic cues来分组文本特征,而VGKT模块使用视觉引导的注意力来EXTRACTING视觉相关的文本特征。
  • results: 实验结果表明,VGSG比state-of-the-art方法高效、高精度地实现文本基于人脸搜索。
    Abstract Text-based Person Search (TBPS) aims to retrieve images of target pedestrian indicated by textual descriptions. It is essential for TBPS to extract fine-grained local features and align them crossing modality. Existing methods utilize external tools or heavy cross-modal interaction to achieve explicit alignment of cross-modal fine-grained features, which is inefficient and time-consuming. In this work, we propose a Vision-Guided Semantic-Group Network (VGSG) for text-based person search to extract well-aligned fine-grained visual and textual features. In the proposed VGSG, we develop a Semantic-Group Textual Learning (SGTL) module and a Vision-guided Knowledge Transfer (VGKT) module to extract textual local features under the guidance of visual local clues. In SGTL, in order to obtain the local textual representation, we group textual features from the channel dimension based on the semantic cues of language expression, which encourages similar semantic patterns to be grouped implicitly without external tools. In VGKT, a vision-guided attention is employed to extract visual-related textual features, which are inherently aligned with visual cues and termed vision-guided textual features. Furthermore, we design a relational knowledge transfer, including a vision-language similarity transfer and a class probability transfer, to adaptively propagate information of the vision-guided textual features to semantic-group textual features. With the help of relational knowledge transfer, VGKT is capable of aligning semantic-group textual features with corresponding visual features without external tools and complex pairwise interaction. Experimental results on two challenging benchmarks demonstrate its superiority over state-of-the-art methods.
    摘要 文本基于人体搜索(TBPS)目的是 retrieve图像指定的人体文本描述。为TBPS提取细腻的本地特征并进行modal的同步是关键。现有方法通过外部工具或重量级modal交互来实现明确的modal同步细腻特征,这是不效率和时间consuming。在这种工作中,我们提议一种视觉引导 semantic-group 网络(VGSG) для文本基于人体搜索,以提取Well-aligned的视觉和文本特征。在我们的VGSG中,我们开发了一种Semantic-Group Textual Learning(SGTL)模块和一种视觉引导知识传输(VGKT)模块,以提取文本本地特征。在SGTL中,我们基于语言表达的semanticcue分组文本特征从通道维度,以便获得本地文本表示。在VGKT中,我们employn了视觉引导注意力来提取视觉相关的文本特征,这些特征是自然地与视觉cue相关,并被称为视觉引导的文本特征。此外,我们设计了一种关系知识传输,包括一种视觉语言相似传输和一种类 probablity传输,以适应ively propagate vision-guided文本特征的信息到semantic-group文本特征。通过关系知识传输,VGKT能够将semantic-group文本特征与相应的视觉特征同步,不需要外部工具和复杂的对抗式交互。实验结果表明,我们的方法在两个复杂的benchmark上表现出色,与当前方法相比有superiority。

Temporal Performance Prediction for Deep Convolutional Long Short-Term Memory Networks

  • paper_url: http://arxiv.org/abs/2311.07477
  • repo_url: None
  • paper_authors: Laura Fieback, Bidya Dash, Jakob Spiegelberg, Hanno Gottschalk
  • for: 这篇论文主要目标是量化深度 semantic segmentation 网络的预测uncertainty,以便在安全关键任务中使用。
  • methods: 这篇论文使用了 convolutional long short-term memory 网络,可以不仅提供semantic segmentation,还可以预测下一步的segmentation。这些模型使用细胞状态来广播previous data中的信息,通过接受时间序列输入来预测未来一步或更多步。
  • results: 这篇论文提出了一种temporal postprocessing方法,可以估算 convolutional long short-term memory 网络的预测性能,包括预测 intersect over union 值或者 классификация为 zero 或大于 zero。为此,我们创建了基于 temporal cell state 的输入指标,并研究了不同的模型来估算预测质量基于这些指标。我们还研究了考虑 cell states 的数量对提posed metrics 的影响。
    Abstract Quantifying predictive uncertainty of deep semantic segmentation networks is essential in safety-critical tasks. In applications like autonomous driving, where video data is available, convolutional long short-term memory networks are capable of not only providing semantic segmentations but also predicting the segmentations of the next timesteps. These models use cell states to broadcast information from previous data by taking a time series of inputs to predict one or even further steps into the future. We present a temporal postprocessing method which estimates the prediction performance of convolutional long short-term memory networks by either predicting the intersection over union of predicted and ground truth segments or classifying between intersection over union being equal to zero or greater than zero. To this end, we create temporal cell state-based input metrics per segment and investigate different models for the estimation of the predictive quality based on these metrics. We further study the influence of the number of considered cell states for the proposed metrics.
    摘要 深度semantic segmentation网络的预测不确定性评估是安全关键任务中的一项重要任务。在自动驾驶应用中,where video数据可用,卷积长短期记忆网络可以不仅提供semantic segmentation,还可以预测下一步的segmentation。这些模型使用细胞状态来广播从前一个数据中的信息,通过接受一系列输入来预测一步或更多步的未来。我们提出了时间Postprocessing方法,该方法可以估计卷积长短期记忆网络的预测性能,通过预测 predicted和实际数据中的交集union的交集或者判断intersection over union是否大于或等于零来实现。为此,我们创建了时间细胞状态基于的输入度量,并研究不同的模型来估计预测质量基于这些度量。我们进一步研究了考虑的细胞状态数量对提posed度量的影响。

Masked Face Dataset Generation and Masked Face Recognition

  • paper_url: http://arxiv.org/abs/2311.07475
  • repo_url: https://github.com/luisrui/seeing-ai-system
  • paper_authors: Rui Cai, Xuying Ning, Peter N. Belhumeur
  • for: 本研究旨在解决post-epidemic era面罩妨碍普通面 recognition问题。
  • methods: 研究人员创建了更加具有挑战性的面罩人脸数据集,并使用了自适应的数据增强策略来提高模型在实际场景中的性能。
  • results: 研究人员通过自适应模型和数据增强策略,在50个人脸数据集上实现了最高的95%的测试精度。
    Abstract In the post-pandemic era, wearing face masks has posed great challenge to the ordinary face recognition. In the previous study, researchers has applied pretrained VGG16, and ResNet50 to extract features on the elaborate curated existing masked face recognition (MFR) datasets, RMFRD and SMFRD. To make the model more adaptable to the real world situation where the sample size is smaller and the camera environment has greater changes, we created a more challenging masked face dataset ourselves, by selecting 50 identities with 1702 images from Labelled Faces in the Wild (LFW) Dataset, and simulated face masks through key point detection. The another part of our study is to solve the masked face recognition problem, and we chose models by referring to the former state of the art results, instead of directly using pretrained models, we fine tuned the model on our new dataset and use the last linear layer to do the classification directly. Furthermore, we proposed using data augmentation strategy to further increase the test accuracy, and fine tuned a new networks beyond the former study, one of the most SOTA networks, Inception ResNet v1. The best test accuracy on 50 identity MFR has achieved 95%.
    摘要 在post-epidemic era,面具穿戴对于普通面Recognition posed great challenge。在前一项研究中,研究人员使用预训练的VGG16和ResNet50提取特征从 elaborate curated existing masked face recognition(MFR)数据集RMFRD和SMFRD。为了让模型更适应实际世界中的小样本大小和摄像头环境更大的变化,我们创建了更加挑战性的面具face数据集,选择了50个人的1702张图像从Labelled Faces in the Wild(LFW)数据集,并通过关键点检测来生成模拟面具。另一方面,我们解决了面具认知问题,选择了根据前一项state of the art结果进行选择,而不是直接使用预训练模型,我们在我们新数据集上进行了微调,并使用最后的直方向来进行直接分类。此外,我们提议使用数据增强策略来进一步提高测试准确率,并微调了新的网络,其中一个最新的SOTA网络,Inception ResNet v1。我们在50个人MFR测试中获得了95%的最高测试准确率。

Language Grounded QFormer for Efficient Vision Language Understanding

  • paper_url: http://arxiv.org/abs/2311.07449
  • repo_url: None
  • paper_authors: Moulik Choraria, Nitesh Sekhar, Yue Wu, Xu Zhang, Prateek Singhal, Lav R. Varshney
  • for: 这篇论文的目的是提出更有效的视频语言匹配方法,以提高视频语言模型的预训练效率。
  • methods: 该论文使用了Query Transformer(QFormer)方法,启发自BLIP-2模型,通过bridging frozen modalities来实现视频语言对应。
  • results: compared to现有基elines,该方法提高了视频语言预训练的效率。
    Abstract Large-scale pretraining and instruction tuning have been successful for training general-purpose language models with broad competencies. However, extending to general-purpose vision-language models is challenging due to the distributional diversity in visual inputs. A recent line of work explores vision-language instruction tuning, taking inspiration from the Query Transformer (QFormer) approach proposed in BLIP-2 models for bridging frozen modalities. However, these approaches rely heavily on large-scale multi-modal pretraining for representation learning before eventual finetuning, incurring a huge computational overhead, poor scaling, and limited accessibility. To that end, we propose a more efficient method for QFormer-based vision-language alignment and demonstrate the effectiveness of our strategy compared to existing baselines in improving the efficiency of vision-language pretraining.
    摘要 大规模预训练和指导调整已经成功地训练了通用语言模型,但扩展到通用视语模型却存在许多挑战,主要是因为视觉输入的分布差异。现有的一些研究借鉴Query Transformer(QFormer)方法,在BLIP-2模型中实现停滞模式之间的桥接。然而,这些方法均依赖于大规模多modal预训练,以便 eventually fine-tuning,导致计算开销巨大、缓慢、可达性有限。为了解决这个问题,我们提出了一种更高效的QFormer基于视语Alignment方法,并证明了我们的策略与现有基准相比,可以提高视语预训练的效率。

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

  • paper_url: http://arxiv.org/abs/2311.07446
  • repo_url: None
  • paper_authors: Zhongfei Qing, Zhongang Cai, Zhitao Yang, Lei Yang
  • for: 这个研究的目的是生成基于故事的自然人体动作,以便改变动画、游戏和电影等领域的ланд膝景。
  • methods: 该研究使用了当代大语言模型作为文本驱动动作调度器,EXTRACTING a series of (text, position, duration) pairs from long text,并开发了文本驱动动作检索方案,该方案包括动作匹配和动作Semantics和轨迹约束。
  • results: 该系统可以生成可控的无限长动作和轨迹,与输入文本保持一致。该系统在三个不同的子任务中(轨迹跟踪、时间动作组合和动作混合)都超过了之前的状态 искусственный动作生成方法。
    Abstract Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries. A new and challenging task, Story-to-Motion, arises when characters are required to move to various locations and perform specific motions based on a long text description. This task demands a fusion of low-level control (trajectories) and high-level control (motion semantics). Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive: character control methods do not handle text description, whereas text-to-motion methods lack position constraints and often produce unstable motions. In light of these limitations, we propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. (1) We leverage contemporary Large Language Models to act as a text-driven motion scheduler to extract a series of (text, position, duration) pairs from long text. (2) We develop a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints. (3) We design a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding. Beyond its pioneering role as the first comprehensive solution for Story-to-Motion, our system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. Homepage: https://story2motion.github.io/.
    摘要 将文本转换为人类动作的潜在潜力可能会改变动画、游戏和电影产业的景观。一个新而具有挑战性的任务是故事到动作(Story-to-Motion),当角色需要根据长文本描述移动到不同的位置并执行特定的动作时,这个任务需要文本驱动的动作控制和高级控制之间的融合。现有的角色控制和文本到动作方法已经解决了相关的问题,但是一个全面的解决方案仍然未知:角色控制方法不能处理文本描述,而文本到动作方法缺乏位置约束并经常产生不稳定的动作。为了解决这些限制,我们提出了一种新的系统,该系统可以根据输入文本生成可控的无限长动作和轨迹,并与文本保持一致。我们利用当代大型自然语言模型作为文本驱动动作计划器,从长文本中提取一系列(文本、位置、持续时间)对。我们开发了一种文本驱动动作检索方法,该方法将包括动作匹配、动作Semantics和轨迹约束。我们还设计了一种进步的掩蔽变换器,以解决通常出现在过渡动作中的不自然姿势和脚滑动。我们的系统不仅是Story-to-Motion的首个全面解决方案,还在三个不同的子任务上进行评估:轨迹跟踪、时间动作组合和动作混合,在所有的情况下都超越了先前的动画生成方法。更多信息请访问我们的主页:

Supersampling of Data from Structured-light Scanner with Deep Learning

  • paper_url: http://arxiv.org/abs/2311.07432
  • repo_url: None
  • paper_authors: Martin Melicherčík, Lukáš Gajdošech, Viktor Kocur, Martin Madaras
  • for: 提高3D摄像机获得的深度图的分辨率
  • methods: 修改FDSR和DKN深度学习模型以适应高分辨率数据,并实现数据预处理技术以稳定训练。
  • results: 使用自定义 dataset 对 1200 个 3D 扫描进行训练,并通过质量和量化指标评估高分辨率 depth map 的结果。
    Abstract This paper focuses on increasing the resolution of depth maps obtained from 3D cameras using structured light technology. Two deep learning models FDSR and DKN are modified to work with high-resolution data, and data pre-processing techniques are implemented for stable training. The models are trained on our custom dataset of 1200 3D scans. The resulting high-resolution depth maps are evaluated using qualitative and quantitative metrics. The approach for depth map upsampling offers benefits such as reducing the processing time of a pipeline by first downsampling a high-resolution depth map, performing various processing steps at the lower resolution and upsampling the resulting depth map or increasing the resolution of a point cloud captured in lower resolution by a cheaper device. The experiments demonstrate that the FDSR model excels in terms of faster processing time, making it a suitable choice for applications where speed is crucial. On the other hand, the DKN model provides results with higher precision, making it more suitable for applications that prioritize accuracy.
    摘要 The proposed approach for depth map upsampling offers several benefits, including reducing the processing time of a pipeline by first downsampling a high-resolution depth map, performing various processing steps at a lower resolution, and then upsampling the resulting depth map. Additionally, the approach can increase the resolution of a point cloud captured in lower resolution by a cheaper device.Experiments demonstrate that the FDSR model is faster and more suitable for applications where speed is crucial. On the other hand, the DKN model provides results with higher precision, making it more suitable for applications that prioritize accuracy.

Optimising Human-AI Collaboration by Learning Convincing Explanations

  • paper_url: http://arxiv.org/abs/2311.07426
  • repo_url: None
  • paper_authors: Alex J. Chan, Alihan Huyuk, Mihaela van der Schaar
  • for: 本研究旨在开发一种可以协助人类做出决策的协同系统,以确保系统的安全性和性能。
  • methods: 本研究使用的方法包括开发一种名为Ardent的算法,可以快速学习人类对解释的喜好,并根据这些喜好为人类提供最有用的解释。
  • results: 研究结果表明,使用Ardent算法可以有效地改善决策过程中的透明度和责任感,并且在一个复杂的图像分类任务中表现出了适用性。
    Abstract Machine learning models are being increasingly deployed to take, or assist in taking, complicated and high-impact decisions, from quasi-autonomous vehicles to clinical decision support systems. This poses challenges, particularly when models have hard-to-detect failure modes and are able to take actions without oversight. In order to handle this challenge, we propose a method for a collaborative system that remains safe by having a human ultimately making decisions, while giving the model the best opportunity to convince and debate them with interpretable explanations. However, the most helpful explanation varies among individuals and may be inconsistent across stated preferences. To this end we develop an algorithm, Ardent, to efficiently learn a ranking through interaction and best assist humans complete a task. By utilising a collaborative approach, we can ensure safety and improve performance while addressing transparency and accountability concerns. Ardent enables efficient and effective decision-making by adapting to individual preferences for explanations, which we validate through extensive simulations alongside a user study involving a challenging image classification task, demonstrating consistent improvement over competing systems.
    摘要

Robust semi-supervised segmentation with timestep ensembling diffusion models

  • paper_url: http://arxiv.org/abs/2311.07421
  • repo_url: None
  • paper_authors: Margherita Rosnati, Melanie Roschewitz, Ben Glocker
  • For: This paper is written for the task of semi-supervised medical image segmentation, with a focus on domain generalization.* Methods: The paper uses denoising diffusion probabilistic models (DDPMs) to model the distribution of natural images and perform image segmentation. The authors propose an improved ensemble scheme that leverages information-dense small steps and the regularizing effect of larger steps to generate predictions.* Results: The paper shows that the proposed method significantly outperforms other methods in domain-shifted settings while retaining competitive performance in-domain. The results highlight the potential of DDPMs for semi-supervised medical image segmentation and provide insights into optimising their performance under domain shift.Here’s the Chinese translation of the three pieces of information:* For: 这篇论文是为 semi-supervised 医疗图像分割任务而写的,尤其是在领域映射上。* Methods: 该论文使用 denoising diffusion probabilistic models (DDPMs) 来模型自然图像的分布,并用这些模型进行图像分割。作者们提出了一种改进的ensemble scheme,该 scheme利用小步骤中的信息压缩和大步骤的正则化效果来生成预测。* Results: 论文显示,提出的方法在领域转换下表现出色,同时保持了在领域内的竞争性表现。结果表明 DDPMs 在 semi-supervised 医疗图像分割中具有潜在的潜力,并提供了优化其性能下领域转换的灵感。
    Abstract Medical image segmentation is a challenging task, made more difficult by many datasets' limited size and annotations. Denoising diffusion probabilistic models (DDPM) have recently shown promise in modelling the distribution of natural images and were successfully applied to various medical imaging tasks. This work focuses on semi-supervised image segmentation using diffusion models, particularly addressing domain generalisation. Firstly, we demonstrate that smaller diffusion steps generate latent representations that are more robust for downstream tasks than larger steps. Secondly, we use this insight to propose an improved esembling scheme that leverages information-dense small steps and the regularising effect of larger steps to generate predictions. Our model shows significantly better performance in domain-shifted settings while retaining competitive performance in-domain. Overall, this work highlights the potential of DDPMs for semi-supervised medical image segmentation and provides insights into optimising their performance under domain shift.
    摘要 医学图像分割是一项具有挑战性的任务,由于许多数据集的限制性和标注而更加困难。在最近的研究中,涉泳扩散概率模型(DDPM)已经表现出模拟自然图像的分布的潜力,并在各种医学成像任务中得到了成功应用。本研究将关注 semi-supervised 图像分割使用涉泳模型,特别是域泛化。我们首先证明了小步 diffusion 生成的潜在表示更加鲁棒,用于下游任务的下游任务。其次,我们利用这一点来提出一种改进的拼接方案,利用信息密集的小步和大步的约束效果来生成预测。我们的模型在域Shift 的设置下表现出显著改善,同时保持了在预测中的竞争性。总之,这种研究高亮了 DDPM 在 semi-supervised 医学图像分割中的潜力,并提供了在域Shift 下优化其性能的 Insight。

Mitigating Backdoors within Deep Neural Networks in Data-limited Configuration

  • paper_url: http://arxiv.org/abs/2311.07417
  • repo_url: None
  • paper_authors: Soroush Hashemifar, Saeed Parsa, Morteza Zakeri-Nasrabadi
  • for: 防止深度神经网络(DNN)被黑客攻击,提高DNN的安全性。
  • methods: 利用潜在恶意 neuron 的特征,实时检测 DNN 中潜在攻击的诡计。
  • results: 对 CIFAR-10 数据集进行实验,提出了一种基于 activation values、weights 和 neuron 之间关系的潜在攻击检测方法,可以降低攻击成功的机会超过 50%,同时不会对模型性能产生显著影响。此外,该方法比基准方法快三倍。
    Abstract As the capacity of deep neural networks (DNNs) increases, their need for huge amounts of data significantly grows. A common practice is to outsource the training process or collect more data over the Internet, which introduces the risks of a backdoored DNN. A backdoored DNN shows normal behavior on clean data while behaving maliciously once a trigger is injected into a sample at the test time. In such cases, the defender faces multiple difficulties. First, the available clean dataset may not be sufficient for fine-tuning and recovering the backdoored DNN. Second, it is impossible to recover the trigger in many real-world applications without information about it. In this paper, we formulate some characteristics of poisoned neurons. This backdoor suspiciousness score can rank network neurons according to their activation values, weights, and their relationship with other neurons in the same layer. Our experiments indicate the proposed method decreases the chance of attacks being successful by more than 50% with a tiny clean dataset, i.e., ten clean samples for the CIFAR-10 dataset, without significantly deteriorating the model's performance. Moreover, the proposed method runs three times as fast as baselines.
    摘要 深度神经网络(DNN)的容量增长,需要的数据量也在增长。一种常见的做法是外部进行训练过程或者在互联网上收集更多数据,这会引入恶意修改DNN的风险。一个恶意修改后的DNN会在测试时采样中注入触发器后表现出错误行为,对于防御者而言,存在多种困难。首先,可用的干净数据集可能不够用于精度调整和恢复恶意修改后的DNN。其次,在实际应用中,无法回收触发器的信息,无法准确地恢复恶意修改后的DNN。在这篇论文中,我们描述了恶意修改后的神经元特征。我们提出的方法可以根据神经元的活动值、权重和相同层中其他神经元的关系,对神经元进行恶意修改检测。我们的实验结果表明,我们的方法可以在使用十个干净样本(CIFAR-10数据集)的情况下,降低攻击成功的机会超过50%,而无需对模型性能产生显著的影响。此外,我们的方法比基线方法快三倍。

FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design

  • paper_url: http://arxiv.org/abs/2311.07414
  • repo_url: None
  • paper_authors: Zhen Huang, Yihao Li, Dong Pei, Jiapeng Zhou, Xuliang Ning, Jianlin Han, Xiaoguang Han, Xuejun Chen
  • for: 提高人工智能生成内容(AIGC)中的时尚设计和生成能力,推动传统时尚业的革新和发展。
  • methods: 介绍了一个新的高分辨率时尚图像和详细结构化文本描述(FIRST)集合,包含一百万个时尚图像和多个层次结构的文本描述。
  • results: 经过实验表明,FIRST 集合是必需的 для进一步提高时尚生成和设计系统的创造力和想象力。
    Abstract Text-driven fashion synthesis and design is an extremely valuable part of artificial intelligence generative content(AIGC), which has the potential to propel a tremendous revolution in the traditional fashion industry. To advance the research on text-driven fashion synthesis and design, we introduce a new dataset comprising a million high-resolution fashion images with rich structured textual(FIRST) descriptions. In the FIRST, there is a wide range of attire categories and each image-paired textual description is organized at multiple hierarchical levels. Experiments on prevalent generative models trained over FISRT show the necessity of FIRST. We invite the community to further develop more intelligent fashion synthesis and design systems that make fashion design more creative and imaginative based on our dataset. The dataset will be released soon.
    摘要 文本驱动时尚合成和设计是人工智能生成内容(AIGC)中非常有价值的一部分,有potential可以驱动传统时尚业的巨大革命。为进一步推进文本驱动时尚合成和设计的研究,我们介绍了一个新的数据集,包含100万高分辨率时尚图片和详细结构化文本描述(FIRST)。在FIRST中,有很多服装类别,每个图片对应的文本描述分别组织在多个层次结构中。经过现有的生成模型在FIRST上训练,我们发现了FIRST的必要性。我们邀请社区为我们的数据集进一步开发更智能的时尚合成和设计系统,使时尚设计更有创意和想象力。数据集即将发布。

Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking Re-Identification

  • paper_url: http://arxiv.org/abs/2311.07407
  • repo_url: None
  • paper_authors: Luke Meyers, Josué Rodríguez Cordero, Carlos Corrada Bravo, Fanfan Noel, José Agosto-Rivera, Tugrul Giray, Rémi Mégret
  • for: 这 paper 用于 automatize the analysis of behavioral assays involving honey bees in the field, where marking has to be as lightweight as possible.
  • methods: 这 paper 使用 paint markings 和 contrastive learning with a ResNet backbone 和 triplet loss 来实现 bees 的 re-identification.
  • results: 这 paper 提供了一个 novel dataset for bees re-identification with paint-markings, 并实现了 almost perfect recognition in closed setting where identities are known in advance. Additionally, the paper shows the potential to fully automate the visit detection and provides preliminary results of compute time for future real-time deployment in the field on an edge device.
    Abstract In this paper, we show that paint markings are a feasible approach to automatize the analysis of behavioral assays involving honey bees in the field where marking has to be as lightweight as possible. We contribute a novel dataset for bees re-identification with paint-markings with 4392 images and 27 identities. Contrastive learning with a ResNet backbone and triplet loss led to identity representation features with almost perfect recognition in closed setting where identities are known in advance. Diverse experiments evaluate the capability to generalize to separate IDs, and show the impact of using different body parts for identification, such as using the unmarked abdomen only. In addition, we show the potential to fully automate the visit detection and provide preliminary results of compute time for future real-time deployment in the field on an edge device.
    摘要 在这篇论文中,我们展示了使用涂抹标记来自动化野外Behavioral assays中的蜂群行为分析是可行的。我们提供了一个新的涂抹标记数据集,包含4392张图像和27个标识。对比学习使用ResNet底层和 triplet损失,可以从closed设定中获得几乎完美的识别结果。我们还进行了多种实验,以评估这些特征的泛化能力和使用不同的身体部分进行识别的影响。此外,我们还展示了可以全自动化访问检测,并提供了未来在野外实时部署的计算时间预算。

Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning

  • paper_url: http://arxiv.org/abs/2311.07398
  • repo_url: None
  • paper_authors: Tomáš Kunzo, Viktor Kocur, Lukáš Gajdošech, Martin Madaras
  • For: 这个研究旨在提出一种弱地监督的牙齿分类方法,以减少手动标注的需求。* Methods: 我们使用热点检测网络的输出热点和中间特征图库来引导分类过程。* Results: 我们在TriDental数据集上实现了比前方法更高的准确性和适居性。
    Abstract Teeth segmentation is an essential task in dental image analysis for accurate diagnosis and treatment planning. While supervised deep learning methods can be utilized for teeth segmentation, they often require extensive manual annotation of segmentation masks, which is time-consuming and costly. In this research, we propose a weakly supervised approach for teeth segmentation that reduces the need for manual annotation. Our method utilizes the output heatmaps and intermediate feature maps from a keypoint detection network to guide the segmentation process. We introduce the TriDental dataset, consisting of 3000 oral cavity images annotated with teeth keypoints, to train a teeth keypoint detection network. We combine feature maps from different layers of the keypoint detection network, enabling accurate teeth segmentation without explicit segmentation annotations. The detected keypoints are also used for further refinement of the segmentation masks. Experimental results on the TriDental dataset demonstrate the superiority of our approach in terms of accuracy and robustness compared to state-of-the-art segmentation methods. Our method offers a cost-effective and efficient solution for teeth segmentation in real-world dental applications, eliminating the need for extensive manual annotation efforts.
    摘要 teeth 分割是dentistry 图像分析中的一项关键任务,以确定精确的诊断和治疗规划。 although supervised deep learning methods can be used for teeth segmentation, they often require extensive manual annotation of segmentation masks, which is time-consuming and costly. In this research, we propose a weakly supervised approach for teeth segmentation that reduces the need for manual annotation. Our method utilizes the output heatmaps and intermediate feature maps from a keypoint detection network to guide the segmentation process. We introduce the TriDental dataset, consisting of 3000 oral cavity images annotated with teeth keypoints, to train a teeth keypoint detection network. We combine feature maps from different layers of the keypoint detection network, enabling accurate teeth segmentation without explicit segmentation annotations. The detected keypoints are also used for further refinement of the segmentation masks. Experimental results on the TriDental dataset demonstrate the superiority of our approach in terms of accuracy and robustness compared to state-of-the-art segmentation methods. Our method offers a cost-effective and efficient solution for teeth segmentation in real-world dental applications, eliminating the need for extensive manual annotation efforts.Here's the translation in Traditional Chinese: teeth 分割是dentistry 图像分析中的一项关键任务,以确定精确的诊断和治疗规划。 although supervised deep learning methods can be used for teeth segmentation, they often require extensive manual annotation of segmentation masks, which is time-consuming and costly. In this research, we propose a weakly supervised approach for teeth segmentation that reduces the need for manual annotation. Our method utilizes the output heatmaps and intermediate feature maps from a keypoint detection network to guide the segmentation process. We introduce the TriDental dataset, consisting of 3000 oral cavity images annotated with teeth keypoints, to train a teeth keypoint detection network. We combine feature maps from different layers of the keypoint detection network, enabling accurate teeth segmentation without explicit segmentation annotations. The detected keypoints are also used for further refinement of the segmentation masks. Experimental results on the TriDental dataset demonstrate the superiority of our approach in terms of accuracy and robustness compared to state-of-the-art segmentation methods. Our method offers a cost-effective and efficient solution for teeth segmentation in real-world dental applications, eliminating the need for extensive manual annotation efforts.

Evaluating the Significance of Outdoor Advertising from Driver’s Perspective Using Computer Vision

  • paper_url: http://arxiv.org/abs/2311.07390
  • repo_url: None
  • paper_authors: Zuzana Černeková, Zuzana Berger Haladová, Ján Špirka, Viktor Kocur
  • for: 评估路边广告的重要性,预防 drivers 的分心驾驶
  • methods: 使用 YOLOv8 检测器和不同的对象跟踪方法,并使用随机森林分类器将广告牌分为三类基于驾驶员围注时间、吸引力和大小
  • results: 获得了 38.5 HOTA 的最佳方法,并在测试集上达到 75.8% 的测试精度
    Abstract Outdoor advertising, such as roadside billboards, plays a significant role in marketing campaigns but can also be a distraction for drivers, potentially leading to accidents. In this study, we propose a pipeline for evaluating the significance of roadside billboards in videos captured from a driver's perspective. We have collected and annotated a new BillboardLamac dataset, comprising eight videos captured by drivers driving through a predefined path wearing eye-tracking devices. The dataset includes annotations of billboards, including 154 unique IDs and 155 thousand bounding boxes, as well as eye fixation data. We evaluate various object tracking methods in combination with a YOLOv8 detector to identify billboard advertisements with the best approach achieving 38.5 HOTA on BillboardLamac. Additionally, we train a random forest classifier to classify billboards into three classes based on the length of driver fixations achieving 75.8% test accuracy. An analysis of the trained classifier reveals that the duration of billboard visibility, its saliency, and size are the most influential features when assessing billboard significance.
    摘要 外部广告,如道路旁的大型广告牌,在市场推广活动中扮演着重要的角色,但也可能对 drivers 引起干扰,导致交通事故。在这项研究中,我们提出了一个用于评估路边大型广告的管道。我们收集了和标注了一个新的 BillboardLamac 数据集,包括八个驾驶员驾驶过定制路线,并且穿着眼动追踪设备拍摄的八个视频。该数据集包括了大型广告的标识符、154个唯一标识符和155千个 bounding box,以及眼动数据。我们评估了多种对象跟踪方法,并结合 YOLOv8 检测器来识别大型广告,最佳方法的 HOTA 得分为 38.5。此外,我们训练了随机森林分类器,以类别大型广告为三种基于驾驶员固定时间的长度、吸引力和大小。分析训练的分类器显示,大型广告的可见时间、吸引力和大小是评估大型广告重要性的最重要的特征。

Classification of developmental and brain disorders via graph convolutional aggregation

  • paper_url: http://arxiv.org/abs/2311.07370
  • repo_url: None
  • paper_authors: Ibrahim Salim, A. Ben Hamza
  • for: 本研究旨在提高脑疾病预测性能,特别是在脑发育和脑退化疾病预测方面。
  • methods: 本研究提出了一种协调 нормализа化图 convolutional neural network,通过图像和非图像特征的组合,以及跳过连接和标识映射,以学习图节点表示。
  • results: 对两个大型数据集(ABIDE和ADNI)进行了比较,研究结果显示,与最近的基线方法相比,该方法在预测自闭症症和阿尔ц海默病等脑疾病方面达到了Relative improvement of 50%和13.56%。
    Abstract While graph convolution based methods have become the de-facto standard for graph representation learning, their applications to disease prediction tasks remain quite limited, particularly in the classification of neurodevelopmental and neurodegenerative brain disorders. In this paper, we introduce an aggregator normalization graph convolutional network by leveraging aggregation in graph sampling, as well as skip connections and identity mapping. The proposed model learns discriminative graph node representations by incorporating both imaging and non-imaging features into the graph nodes and edges, respectively, with the aim of augmenting predictive capabilities and providing a holistic perspective on the underlying mechanisms of brain disorders. Skip connections enable the direct flow of information from the input features to later layers of the network, while identity mapping helps maintain the structural information of the graph during feature learning. We benchmark our model against several recent baseline methods on two large datasets, Autism Brain Imaging Data Exchange (ABIDE) and Alzheimer's Disease Neuroimaging Initiative (ADNI), for the prediction of autism spectrum disorder and Alzheimer's disease, respectively. Experimental results demonstrate the competitive performance of our approach in comparison with recent baselines in terms of several evaluation metrics, achieving relative improvements of 50% and 13.56% in classification accuracy over graph convolutional networks on ABIDE and ADNI, respectively.
    摘要 而 graph convolution based 方法在图像表示学习中已经成为了标准,但它们在诊断脑病任务中的应用还很有限,特别是在脑发育和脑退化性疾病的分类中。在这篇论文中,我们提出了一种归并normalization图 convolutional neural network,通过图像采样中的归并和跳过连接,以及标识映射来学习图节点表示。我们的模型通过将图节点和边分别归并到图节点和边上,并将各种影像和非影像特征集成到图节点和边上,以提高预测能力和为脑病的内部机制提供整体视图。跳过连接使得输入特征直接流入网络的后 layer,而标识映射保持图的结构信息在特征学习过程中。我们对 Autism Brain Imaging Data Exchange(ABIDE)和 Alzheimer's Disease Neuroimaging Initiative(ADNI)两个大数据集进行了比较,并与最近的基eline方法进行了比较。实验结果表明,我们的方法在 ABIDE 和 ADNI 上的分类精度有50%和13.56%的提高,相比于图 convolutional networks。

ActiveDC: Distribution Calibration for Active Finetuning

  • paper_url: http://arxiv.org/abs/2311.07634
  • repo_url: None
  • paper_authors: Wenshuai Xu, Zhenhui Hu, Yu Lu, Jinzhou Meng, Qingjie Liu, Yunhong Wang
  • for: 这 paper 是关于 active finetuning 的研究,具体来说是如何选择用于精度 tuning 的数据subset,以避免模型过拟合。
  • methods: 这 paper 提出了一种新的方法 called ActiveDC,它首先选择用于精度 tuning 的数据subset,然后对这些数据进行分布准确化,以使模型在不同的 sampling ratio 下表现更好。
  • results: 根据 paper 的实验结果,ActiveDC 在三个图像分类任务中均表现出色,特别是当 sampling ratio 较低时,与基eline的性能提升可达 10%。
    Abstract The pretraining-finetuning paradigm has gained popularity in various computer vision tasks. In this paradigm, the emergence of active finetuning arises due to the abundance of large-scale data and costly annotation requirements. Active finetuning involves selecting a subset of data from an unlabeled pool for annotation, facilitating subsequent finetuning. However, the use of a limited number of training samples can lead to a biased distribution, potentially resulting in model overfitting. In this paper, we propose a new method called ActiveDC for the active finetuning tasks. Firstly, we select samples for annotation by optimizing the distribution similarity between the subset to be selected and the entire unlabeled pool in continuous space. Secondly, we calibrate the distribution of the selected samples by exploiting implicit category information in the unlabeled pool. The feature visualization provides an intuitive sense of the effectiveness of our approach to distribution calibration. We conducted extensive experiments on three image classification datasets with different sampling ratios. The results indicate that ActiveDC consistently outperforms the baseline performance in all image classification tasks. The improvement is particularly significant when the sampling ratio is low, with performance gains of up to 10%. Our code will be released.
    摘要 “强化-微调”方法在不同的计算机视觉任务中得到了广泛的应用。在这种方法中,由于大规模数据和注解成本的增加,出现了活跃微调的问题。活跃微调是选择未标注 pool 中的一 subset 进行注解,以便后续微调。然而,使用有限的训练样本可能会导致模型过拟合。在这篇论文中,我们提出了一种新的方法called ActiveDC,用于活跃微调任务。首先,我们通过优化未标注 pool 中 subset 的分布相似性来选择需要注解的样本。其次,我们利用未标注 pool 中的隐藏类信息来准确化选择的样本的分布。图像可视化提供了对我们方法的分布准确化效果的直观感。我们在三个图像分类任务中进行了广泛的实验,结果表明,ActiveDC 在所有图像分类任务中具有比基eline性能更高的表现。具体来说,当采样比率低时,ActiveDC 的表现提升可达 10%。我们将代码发布。

Registered and Segmented Deformable Object Reconstruction from a Single View Point Cloud

  • paper_url: http://arxiv.org/abs/2311.07357
  • repo_url: None
  • paper_authors: Pit Henrich, Balázs Gyenes, Paul Maria Scheikl, Gerhard Neumann, Franziska Mathis-Ullrich
  • for: 对于扭变的实际物体进行弹性对象处理,我们经常想要与特定的对象部分进行交互,这些部分在非扭变的模型中已经被定义。我们需要一种系统可以从感知数据中识别和定位这些部分。
  • methods: 我们提出了一种使用神经占用函数进行弹性对象注册,并在注册过程中学习对象分割。由于结果已经包含了分割信息,我们可以跳过注册步骤。
  • results: 我们在许多扭变物体的实际和模拟数据上进行测试,并证明了我们的方法可以强健地找到这些部分。我们还提出了一种简单的采样算法来生成更好的占用学习训练数据。
    Abstract In deformable object manipulation, we often want to interact with specific segments of an object that are only defined in non-deformed models of the object. We thus require a system that can recognize and locate these segments in sensor data of deformed real world objects. This is normally done using deformable object registration, which is problem specific and complex to tune. Recent methods utilize neural occupancy functions to improve deformable object registration by registering to an object reconstruction. Going one step further, we propose a system that in addition to reconstruction learns segmentation of the reconstructed object. As the resulting output already contains the information about the segments, we can skip the registration process. Tested on a variety of deformable objects in simulation and the real world, we demonstrate that our method learns to robustly find these segments. We also introduce a simple sampling algorithm to generate better training data for occupancy learning.
    摘要 在可变形物体操作中,我们经常希望与特定的物体段只在非变形模型中定义的段进行交互。因此,我们需要一个系统可以从感知数据中识别和定位这些段。通常通过不适应物体注册来实现这一点,但是这是特定问题和复杂的调整。现代方法使用神经占据函数来改进不适应物体注册,并在注册过程中学习对象重建。我们的方法会在此基础上进一步,不仅是重建物体,还会学习物体的分割。由于结果中已经包含了段的信息,因此我们可以跳过注册步骤。我们在模拟和实际场景中测试了这种方法,并证明我们的方法可以坚定地找到这些段。我们还介绍了一种简单的采样算法,用于生成更好的占据学习训练数据。

Deformable Groupwise Registration Using a Locally Low-Rank Dissimilarity Metric for Myocardial Strain Estimation from Cardiac Cine MRI Images

  • paper_url: http://arxiv.org/abs/2311.07348
  • repo_url: None
  • paper_authors: Haiyang Chen, Juan Gao, Chenxi Hu
  • For: The paper is written for cardiac function assessment using cardiac cine MRI images.* Methods: The proposed method uses a deformable groupwise registration-based two-step strategy with a locally low-rank (LLR) dissimilarity metric for CMR-FT.* Results: The proposed method achieved more accurate tracking and strain estimation compared to other methods, especially in late diastole, and may facilitate more accurate assessment of cardiac dysfunction.
    Abstract Objective: Cardiovascular magnetic resonance-feature tracking (CMR-FT) represents a group of methods for myocardial strain estimation from cardiac cine MRI images. Established CMR-FT methods are mainly based on optical flow or pairwise registration. However, these methods suffer from either inaccurate estimation of large motion or drift effect caused by accumulative tracking errors. In this work, we propose a deformable groupwise registration method using a locally low-rank (LLR) dissimilarity metric for CMR-FT. Methods: The proposed method (Groupwise-LLR) tracks the feature points by a groupwise registration-based two-step strategy. Unlike the globally low-rank (GLR) dissimilarity metric, the proposed LLR metric imposes low-rankness on local image patches rather than the whole image. We quantitatively compared Groupwise-LLR with the Farneback optical flow, a pairwise registration method, and a GLR-based groupwise registration method on simulated and in vivo datasets. Results: Results from the simulated dataset showed that Groupwise-LLR achieved more accurate tracking and strain estimation compared with the other methods. Results from the in vivo dataset showed that Groupwise-LLR achieved more accurate tracking and elimination of the drift effect in late-diastole. Inter-observer reproducibility of strain estimates was similar between all studied methods. Conclusion: The proposed method estimates myocardial strains more accurately due to the application of a groupwise registration-based tracking strategy and an LLR-based dissimilarity metric. Significance: The proposed CMR-FT method may facilitate more accurate estimation of myocardial strains, especially in diastole, for clinical assessments of cardiac dysfunction.
    摘要 目的:卡地里尺骨磁共振成像-特征跟踪(CMR-FT)是一种基于卡地里磁共振成像图像的肌肉弹性测量方法。现有的CMR-FT方法主要基于光流或对应注册。但这些方法受到大动量或偏移效应的影响,导致测量不准确。在这项工作中,我们提出了一种可变地方法(Groupwise-LLR),使用本地低级(LLR)相似度度量进行群组注册。方法:Groupwise-LLR方法使用两步策略,首先使用光流或对应注册来跟踪特征点,然后使用LLR度量进行群组注册。与全图像的GLR度量不同,LLR度量在本地图像区域强制低级。我们对使用Farneback光流、对应注册方法和GLR基于的群组注册方法进行了量比较。结果:在模拟数据集上,Groupwise-LLR方法在跟踪和弹性测量方面表现更加准确,而在实验数据集上,Groupwise-LLR方法在晚 диастоле阶段消除了偏移效应。对于各种方法的幂等复制性,结果类似。结论:Groupwise-LLR方法可以更加准确地测量肌肉弹性,特别是在 диаastole阶段。意义:Groupwise-LLR方法可能为临床评估卡地里功能障碍提供更加准确的肌肉弹性测量。

Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images

  • paper_url: http://arxiv.org/abs/2311.07321
  • repo_url: https://github.com/aryan-at-ul/aics_2023_submission
  • paper_authors: Aryan Singh, Pepijn Van de Ven, Ciarán Eising, Patrick Denny
  • for: 这篇论文是为了提出一个可靠、经济可行且可扩展的医疗影像分类方法。
  • methods: 这篇论文使用了 Image Foresting Transform 来最佳化医疗影像的分割,然后将分割后的图像转换为 graf-structured 数据,使用 Graph Neural Networks (GNNs) 进行特征提取和关系建模。 ensemble 使用三种不同的 GNN 架构来提高其内模的稳定性。
  • results: 在这篇论文中,这种方法在验证targeting pneumonia classification 的任务中,较前一代的 Deep Neural Networks (DNNs) 高效,同时降低了 Parameters 的数量,从而降低了训练数据的成本和训练时间,并且减少了偏见。
    Abstract Deep learning models have demonstrated remarkable results for various computer vision tasks, including the realm of medical imaging. However, their application in the medical domain is limited due to the requirement for large amounts of training data, which can be both challenging and expensive to obtain. To mitigate this, pre-trained models have been fine-tuned on domain-specific data, but such an approach can suffer from inductive biases. Furthermore, deep learning models struggle to learn the relationship between spatially distant features and their importance, as convolution operations treat all pixels equally. Pioneering a novel solution to this challenge, we employ the Image Foresting Transform to optimally segment images into superpixels. These superpixels are subsequently transformed into graph-structured data, enabling the proficient extraction of features and modeling of relationships using Graph Neural Networks (GNNs). Our method harnesses an ensemble of three distinct GNN architectures to boost its robustness. In our evaluations targeting pneumonia classification, our methodology surpassed prevailing Deep Neural Networks (DNNs) in performance, all while drastically cutting down on the parameter count. This not only trims down the expenses tied to data but also accelerates training and minimizes bias. Consequently, our proposition offers a sturdy, economically viable, and scalable strategy for medical image classification, significantly diminishing dependency on extensive training data sets.
    摘要 深度学习模型在各种计算机视觉任务中表现出色,其中包括医疗影像领域。然而,它们在医疗领域的应用受到大量训练数据的限制,这些数据可能具有困难和成本高的获取。为此,人们通常使用预训练模型进行精度调整,但这种方法可能受到逻辑偏见的影响。另外,深度学习模型在图像中的特征之间关系学习不够,因为卷积操作对所有像素进行了平等的处理。为了解决这个挑战,我们提出了图像森林变换,用于优化图像分割成超像素。这些超像素然后被转换成图形结构数据,使得通过图形神经网络(GNN)进行特征提取和模型建立关系。我们的方法使用了三种不同的GNN架构 ensemble,以提高其可靠性。在我们的评估中,我们的方法在肺炎分类任务上表现出excel,而且和传统的深度神经网络相比,它的参数数量大幅减少。这不仅减少了与数据集的成本,还加快训练和降低了偏见。因此,我们的提案提供了一种强大、经济可行和可扩展的医疗图像分类策略,significantly reducing the dependence on extensive training data sets.

What Large Language Models Bring to Text-rich VQA?

  • paper_url: http://arxiv.org/abs/2311.07306
  • repo_url: None
  • paper_authors: Xuejing Liu, Wei Tang, Xinzhe Ni, Jinghui Lu, Rui Zhao, Zechao Li, Fei Tan
  • for: 本研究探讨了基于语言模型(LLM)的文本rich VQA方法的优势和瓶颈。
  • methods: 作者将视觉和语言模块分离,使用外部OCR模型recognize图像中的文本,并使用LLM来回答问题。整个框架不需训练,利用LLM的在场能力。
  • results: 研究发现,基于LLM的方法在四个文本rich VQA dataset上达到了superior表现,而且基于ablation study,LLM具有更强的理解能力,可能为VQA问题提供有用的知识。然而,研究发现,LLM在视觉部分存在瓶颈。同时, combining OCR模块与MLLM也得到了 pleasantly results。
    Abstract Text-rich VQA, namely Visual Question Answering based on text recognition in the images, is a cross-modal task that requires both image comprehension and text recognition. In this work, we focus on investigating the advantages and bottlenecks of LLM-based approaches in addressing this problem. To address the above concern, we separate the vision and language modules, where we leverage external OCR models to recognize texts in the image and Large Language Models (LLMs) to answer the question given texts. The whole framework is training-free benefiting from the in-context ability of LLMs. This pipeline achieved superior performance compared to the majority of existing Multimodal Large Language Models (MLLM) on four text-rich VQA datasets. Besides, based on the ablation study, we find that LLM brings stronger comprehension ability and may introduce helpful knowledge for the VQA problem. The bottleneck for LLM to address text-rich VQA problems may primarily lie in visual part. We also combine the OCR module with MLLMs and pleasantly find that the combination of OCR module with MLLM also works. It's worth noting that not all MLLMs can comprehend the OCR information, which provides insights into how to train an MLLM that preserves the abilities of LLM.
    摘要 文字丰富的VQA(视觉问答)技术,即基于图像中文字识别的图像问答,是跨Modal的任务,需要图像理解和文字识别。在这项工作中,我们主要关注LLM(大语言模型)在解决这个问题上的优势和瓶颈。为了解决这个问题,我们将视觉和语言模块分开,利用外部OCR模型来recognize图像中的文字,并利用LLM来为给定文字提供答案。整个框架不需要训练,借助LLM在context中的能力。这个管道在四个文字丰富VQA数据集上 achieved superior performance,并且基于ablation study,我们发现LLM具有更强的理解能力,可能为VQA问题带来有用的知识。然而,我们发现LLM在处理文字丰富VQA问题的瓶颈主要在视觉部分。此外,我们还将OCR模块与MLLM(多Modal大语言模型)相结合,并发现这种结合也能够工作。需要注意的是,不 всіMLLM都能理解OCR信息,这提供了如何训练一个MLLM,以便它保留LLM的能力。

Dynamically Weighted Factor-Graph for Feature-based Geo-localization

  • paper_url: http://arxiv.org/abs/2311.07301
  • repo_url: None
  • paper_authors: Miguel Ángel Muñoz-Bañón, Alejandro Olivas, Edison Velasco-Sánchez, Francisco A. Candelas, Fernando Torres
  • for: 本研究旨在提高Feature-based地理定位的精度,使其能够在杂乱环境中提供更加可靠的定位结果。
  • methods: 本研究使用动态加权因子图模型来优化汽车的路径估计,并在检测中使用LiDAR传感器来进行数据质量评估。此外,还包括GNSS基于的先前误差估计。
  • results: 对比当前最佳地理定位方法,本研究在杂乱环境中显示了更高的精度和更少的偏差。此外,当检测数据失去时,本方法也能够成功地 mitigate 偏差和异常值。
    Abstract Feature-based geo-localization relies on associating features extracted from aerial imagery with those detected by the vehicle's sensors. This requires that the type of landmarks must be observable from both sources. This no-variety of feature types generates poor representations that lead to outliers and deviations, produced by ambiguities and lack of detections respectively. To mitigate these drawbacks, in this paper, we present a dynamically weighted factor graph model for the vehicle's trajectory estimation. The weight adjustment in this implementation depends on information quantification in the detections performed using a LiDAR sensor. Also, a prior (GNSS-based) error estimation is included in the model. Then, when the representation becomes ambiguous or sparse, the weights are dynamically adjusted to rely on the corrected prior trajectory, mitigating in this way outliers and deviations. We compare our method against state-of-the-art geo-localization ones in a challenging ambiguous environment, where we also cause detection losses. We demonstrate mitigation of the mentioned drawbacks where the other methods fail.
    摘要 <> translate text into Simplified ChineseFeature-based geo-localization 基于将从飞行图像中提取的特征与车辆的感知器上探测到的特征进行关联。这需要两者之间的特征类型相同。这种单一的特征类型生成粗糙的表示,导致异常和偏差,由扩大和检测不足所致。为了解决这些缺点,在本文中,我们提出了一种动态权重因子图模型,用于车辆的路径估计。在这个实现中,权重调整取决于利用 LiDAR 探测器进行的检测信息量化。此外,还包括GNSS-based先前错误估计。当表示变得混乱或稀缺时,权重会动态调整,以依靠修正后的先前路径,从而 Mitigate 异常和偏差。我们与现有的地理localization方法进行比较,在一个复杂的混乱环境中,我们还引起检测损失。我们示出了消除这些缺点,其他方法失败的情况。

Multi Sentence Description of Complex Manipulation Action Videos

  • paper_url: http://arxiv.org/abs/2311.07285
  • repo_url: None
  • paper_authors: Fatemeh Ziaeetabar, Reza Safabakhsh, Saeedeh Momtazi, Minija Tamosiunaite, Florentin Wörgötter
  • for: automatización de descripciones de videos
  • methods: combina estrategias estadísticas y end-to-end, utilizando LSTM para generar descripciones de videos con diferentes niveles de detalle
  • results: produce descripciones más realistas que otras aproximaciones competidoras
    Abstract Automatic video description requires the generation of natural language statements about the actions, events, and objects in the video. An important human trait, when we describe a video, is that we are able to do this with variable levels of detail. Different from this, existing approaches for automatic video descriptions are mostly focused on single sentence generation at a fixed level of detail. Instead, here we address video description of manipulation actions where different levels of detail are required for being able to convey information about the hierarchical structure of these actions relevant also for modern approaches of robot learning. We propose one hybrid statistical and one end-to-end framework to address this problem. The hybrid method needs much less data for training, because it models statistically uncertainties within the video clips, while in the end-to-end method, which is more data-heavy, we are directly connecting the visual encoder to the language decoder without any intermediate (statistical) processing step. Both frameworks use LSTM stacks to allow for different levels of description granularity and videos can be described by simple single-sentences or complex multiple-sentence descriptions. In addition, quantitative results demonstrate that these methods produce more realistic descriptions than other competing approaches.
    摘要 自动视频描述需要生成自然语言 sentences about the actions, events, and objects in the video. 人类特点是,当我们描述视频时,我们可以选择不同的级别细节。而现有的自动视频描述方法大多集中在固定级别的句子生成上。在这里,我们解决了视频描述操作动作的问题,其中不同级别的细节是必须以便传递视频中动作层次结构的信息,同时也是现代机器人学习方法的关键。我们提出了一个混合统计学和终端链接的方法,以及一个终端链接方法。两种方法都使用 LSTM 堆来允许不同级别的描述细节,并且视频可以通过单个句子或复杂多句 sentences 来描述。此外,我们对其他竞争方法进行了量化比较,并证明了这些方法生成的描述更加真实。

LT-ViT: A Vision Transformer for multi-label Chest X-ray classification

  • paper_url: http://arxiv.org/abs/2311.07263
  • repo_url: None
  • paper_authors: Umar Marikkar, Sara Atito, Muhammad Awais, Adam Mahdi
  • for: 这个研究旨在提高静脉肺X射像(CXR)的医学图像识别 task 中使用 Vision Transformers(ViTs)的性能。
  • methods: 本研究使用了LT-ViT,一种具有共同注意力的 трансформа器,它结合了图像对象和随机初始化的帮助 tokens,以提高模型的性能。
  • results: 本研究所得到的结果显示:(1) LT-ViT 在两个公开的 CXR 数据集上比既存的顶尖性能提高; (2) LT-ViT 可以应用于不同的预训法,因此是无预设的; (3) LT-ViT 可以提供模型解释性,不需要 grad-cam 和其他相关技术。
    Abstract Vision Transformers (ViTs) are widely adopted in medical imaging tasks, and some existing efforts have been directed towards vision-language training for Chest X-rays (CXRs). However, we envision that there still exists a potential for improvement in vision-only training for CXRs using ViTs, by aggregating information from multiple scales, which has been proven beneficial for non-transformer networks. Hence, we have developed LT-ViT, a transformer that utilizes combined attention between image tokens and randomly initialized auxiliary tokens that represent labels. Our experiments demonstrate that LT-ViT (1) surpasses the state-of-the-art performance using pure ViTs on two publicly available CXR datasets, (2) is generalizable to other pre-training methods and therefore is agnostic to model initialization, and (3) enables model interpretability without grad-cam and its variants.
    摘要 医疗影像任务中广泛采用了视Transformers(ViTs),现有一些努力在视频语言训练中使用ViTs进行胸部X射影像(CXRs)的训练。然而,我们认为还有可能在视力只训练中使用ViTs进行CXRs的提升,通过多尺度信息的汇集,这种方法已经证明对非转换网络有利。因此,我们开发了LT-ViT,一种利用图像征素和随机初始化的auxiliary征素来实现图像特征的共同注意力。我们的实验表明,LT-ViT可以:1. 使用纯度ViTs在两个公开available的CXR数据集上超越现有的状态态强性表现。2. 可以在其他预训练方法上 generalized,因此不виси于模型的初始化。3. 可以无需grad-cam和其他相关图像解释方法,具有解释性。

Sketch-based Video Object Segmentation: Benchmark and Analysis

  • paper_url: http://arxiv.org/abs/2311.07261
  • repo_url: None
  • paper_authors: Ruolin Yang, Da Li, Conghui Hu, Timothy Hospedales, Honggang Zhang, Yi-Zhe Song
  • for: 本研究旨在提出一种基于绘图的视频对象 segmentation任务,并提供一个新的标准测试集(Sketch-DAVIS16、Sketch-DAVIS17和Sketch-YouTube-VOS),以便更好地评估视频对象 segmentation 算法。
  • methods: 本研究使用了 STCN 基eline,并对不同类型的参考(语言表达、scribble和绘图)进行比较,以找出最有效的参考方式。
  • results: 实验结果显示,使用绘图作为参考是最有效的方式,它比使用语言表达和scribble更有效,同时也更容易进行标注。
    Abstract Reference-based video object segmentation is an emerging topic which aims to segment the corresponding target object in each video frame referred by a given reference, such as a language expression or a photo mask. However, language expressions can sometimes be vague in conveying an intended concept and ambiguous when similar objects in one frame are hard to distinguish by language. Meanwhile, photo masks are costly to annotate and less practical to provide in a real application. This paper introduces a new task of sketch-based video object segmentation, an associated benchmark, and a strong baseline. Our benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet low-cost reference for video object segmentation. We take advantage of STCN, a popular baseline of semi-supervised VOS task, and evaluate what the most effective design for incorporating a sketch reference is. Experimental results show sketch is more effective yet annotation-efficient than other references, such as photo masks, language and scribble.
    摘要 参考基础视频对象分割是一个emerging topic,旨在将每帧视频内对应的目标物体进行分割,以参考一个 giventext or photo mask。然而,语言表达可能会对某些概念发送不确定的信息,而且在一帧中相似的物体可能很难以通过语言区分。另一方面,photo masks是costly to annotate并且在实际应用中 menos practical。这篇论文介绍了一个新的任务:sketch-based video object segmentation,以及相关的benchmark和强大基线。我们的benchmark包括Sketch-DAVIS16、Sketch-DAVIS17和Sketch-YouTube-VOS三个 dataset,这些dataset利用人类手绘的sketches作为视频对象分割的参考。我们利用STCN,一个受欢迎的semi-supervised VOS任务的基eline,进行评估,并评估在不同的参考方法中,sketch是否比其他参考方法更有效率。实验结果表明,sketch比其他参考方法更有效率,并且可以实现更低的注释成本。

Simultaneous Clutter Detection and Semantic Segmentation of Moving Objects for Automotive Radar Data

  • paper_url: http://arxiv.org/abs/2311.07247
  • repo_url: None
  • paper_authors: Johannes Kopp, Dominik Kellner, Aldi Piroli, Vinzenz Dallabetta, Klaus Dietmayer
  • for: 本研究旨在实现同时解决干扰点云和 semantic segmentation问题,而不是将它们处理为两个独立的任务。
  • methods: 我们提出了一个新的增强多头架构造,以及一种用于表示网络预测值的新方法,以便同时解决两个任务。
  • results: 我们在广泛的评估中展示了我们的设置高效且超越了现有的网络模型,在RadarScenes dataset上的 semantic segmentation任务中。
    Abstract The unique properties of radar sensors, such as their robustness to adverse weather conditions, make them an important part of the environment perception system of autonomous vehicles. One of the first steps during the processing of radar point clouds is often the detection of clutter, i.e. erroneous points that do not correspond to real objects. Another common objective is the semantic segmentation of moving road users. These two problems are handled strictly separate from each other in literature. The employed neural networks are always focused entirely on only one of the tasks. In contrast to this, we examine ways to solve both tasks at the same time with a single jointly used model. In addition to a new augmented multi-head architecture, we also devise a method to represent a network's predictions for the two tasks with only one output value. This novel approach allows us to solve the tasks simultaneously with the same inference time as a conventional task-specific model. In an extensive evaluation, we show that our setup is highly effective and outperforms every existing network for semantic segmentation on the RadarScenes dataset.
    摘要 射频探测器的特有性,如其对恶劣天气的稳定性,使其成为自动驾驶车辆环境感知系统中重要的一部分。在处理射频点云时,一个常见的第一步是检测噪声,即无关实际物体的错误点。另一个常见的目标是对移动路用户进行 semantic segmentation。在文献中,这两个问题通常被视为独立的两个任务,并且使用的神经网络总是专注于单一任务。相比之下,我们研究了同时解决这两个任务的方法,使用了一种新的加密多头架构,以及一种用于表示网络预测值的新方法。这种新approach允许我们在同一个推理时间内同时解决两个任务。在广泛的评估中,我们发现了我们的设置非常有效,并在RadarScenes dataset上超越了每个存在的网络。

DeepMetricEye: Metric Depth Estimation in Periocular VR Imagery

  • paper_url: http://arxiv.org/abs/2311.07235
  • repo_url: None
  • paper_authors: Yitong Sun, Zijian Zhou, Cyriel Diels, Ali Asadipour
  • for: 提供一种用于计算视网膜区域的深度图的轻量级框架,以便在VR头戴式设备上实现可量化的眼部区域计算。
  • methods: 使用基于U-Net 3+深度学习框架的轻量级框架,将Relative Measurements转换为可量化的Periocular Depth Maps。
  • results: 在36名参与者的测试中,方法表现出色,在眼部全球精度评估实验和豹心眼径测量方面达到了显著的效果。
    Abstract Despite the enhanced realism and immersion provided by VR headsets, users frequently encounter adverse effects such as digital eye strain (DES), dry eye, and potential long-term visual impairment due to excessive eye stimulation from VR displays and pressure from the mask. Recent VR headsets are increasingly equipped with eye-oriented monocular cameras to segment ocular feature maps. Yet, to compute the incident light stimulus and observe periocular condition alterations, it is imperative to transform these relative measurements into metric dimensions. To bridge this gap, we propose a lightweight framework derived from the U-Net 3+ deep learning backbone that we re-optimised, to estimate measurable periocular depth maps. Compatible with any VR headset equipped with an eye-oriented monocular camera, our method reconstructs three-dimensional periocular regions, providing a metric basis for related light stimulus calculation protocols and medical guidelines. Navigating the complexities of data collection, we introduce a Dynamic Periocular Data Generation (DPDG) environment based on UE MetaHuman, which synthesises thousands of training images from a small quantity of human facial scan data. Evaluated on a sample of 36 participants, our method exhibited notable efficacy in the periocular global precision evaluation experiment, and the pupil diameter measurement.
    摘要 尽管虚拟现实(VR)头戴设备提供了更加真实和沉浸的用户体验,但用户们经常出现不良影响,如数字眼疲病(DES)、干燥眼睛和可能的长期视力障碍,这些影响是由 VR 显示器和头戴设备的压力所致。现有的 VR 头戴设备通常配备有面向眼睛的单目镜像传感器,以分割眼部特征地图。然而,为了计算入射光刺激和观察眼部状态的变化,需要将这些相对测量转换成 metric 维度。为了bridging这个差距,我们提出了一个轻量级的框架,基于 U-Net 3+ 深度学习基础,用于估计可测量的眼部深度地图。与任何配备有面向眼睛单目镜像传感器的 VR 头戴设备兼容,我们的方法可重construct三维眼部区域,提供metric 基础 для相关的光刺激计算协议和医疗指南。在数据收集的复杂性方面,我们引入了一个基于 UE MetaHuman 的动态眼部数据生成环境(DPDG),该环境可以从小量的人类脸部扫描数据中生成数千个训练图像。在一个样本中,我们的方法在眼部全球精度评估实验和评估眼径大小方面表现出了明显的有效性。

Multi-task learning for joint weakly-supervised segmentation and aortic arch anomaly classification in fetal cardiac MRI

  • paper_url: http://arxiv.org/abs/2311.07234
  • repo_url: https://github.com/svrtk/masc-multi-task-segmentation-and-classification
  • paper_authors: Paula Ramirez, Alena Uus, Milou P. M. van Poppel, Irina Grigorescu, Johannes K. Steinweg, David F. A. Lloyd, Kuberan Pushparajah, Andrew P. King, Maria Deprez
  • for: 这个研究的目的是为了帮助3D胎儿血管图像的自动化分类和检测,以提高诊断confidence。
  • methods: 这个研究使用了深度学习的标签卷积和注意力3D U-Net segmentation,以及密集度121的疾病分类。
  • results: 研究结果表明,我们的提出的训练策略在Label propagation和仅使用卷积过程中训练的网络之上表现出色,并且我们的分类器在与T2w图像进行joint training后表现出色,其平均权衡准确率为0.99(0.01)。
    Abstract Congenital Heart Disease (CHD) is a group of cardiac malformations present already during fetal life, representing the prevailing category of birth defects globally. Our aim in this study is to aid 3D fetal vessel topology visualisation in aortic arch anomalies, a group which encompasses a range of conditions with significant anatomical heterogeneity. We present a multi-task framework for automated multi-class fetal vessel segmentation from 3D black blood T2w MRI and anomaly classification. Our training data consists of binary manual segmentation masks of the cardiac vessels' region in individual subjects and fully-labelled anomaly-specific population atlases. Our framework combines deep learning label propagation using VoxelMorph with 3D Attention U-Net segmentation and DenseNet121 anomaly classification. We target 11 cardiac vessels and three distinct aortic arch anomalies, including double aortic arch, right aortic arch, and suspected coarctation of the aorta. We incorporate an anomaly classifier into our segmentation pipeline, delivering a multi-task framework with the primary motivation of correcting topological inaccuracies of the segmentation. The hypothesis is that the multi-task approach will encourage the segmenter network to learn anomaly-specific features. As a secondary motivation, an automated diagnosis tool may have the potential to enhance diagnostic confidence in a decision support setting. Our results showcase that our proposed training strategy significantly outperforms label propagation and a network trained exclusively on propagated labels. Our classifier outperforms a classifier trained exclusively on T2w volume images, with an average balanced accuracy of 0.99 (0.01) after joint training. Adding a classifier improves the anatomical and topological accuracy of all correctly classified double aortic arch subjects.
    摘要 《固有心脏病(CHD)》是胎生时已经存在的心脏畸形,全球范围内最常见的出生畸形之一。我们在这项研究中的目标是通过自动化多类胎 vessle分割和畸形分类来提高3D黑血MRI中胎 vessle topology的可视化。我们的训练数据包括各个个体的手动Binary manual segmentation masks of the cardiac vessels' region和具有精度的畸形特征的各个畸形人体 Atlases。我们的框架结合了深度学习标签传播使用VoxelMorph和3D Attention U-Net segmentation和DenseNet121畸形分类。我们目标是11个心脏血管和三种不同的脊梁畸形,包括双脊梁、右脊梁和可能的脊梁缺陷。我们将畸形分类器 incorporated into our segmentation pipeline,实现一个多任务框架,主要目的是 correction topological inaccuracies of the segmentation。我们 hypothesize that the multi-task approach will encourage the segmenter network to learn anomaly-specific features。作为次要目的,一个自动诊断工具可能会提高诊断自信性。我们的结果表明,我们的训练策略在比label propagation和专门训练在propagated labels的网络之上显著提高。我们的分类器在与T2wVolume Image进行同时训练后,其平均平衡准确率为0.99(0.01)。增加分类器可以提高所有正确分类double aortic arch的 анатомиче和 topological 准确性。

Few Shot Learning for the Classification of Confocal Laser Endomicroscopy Images of Head and Neck Tumors

  • paper_url: http://arxiv.org/abs/2311.07216
  • repo_url: None
  • paper_authors: Marc Aubreville, Zhaoya Pan, Matti Sievert, Jonas Ammeling, Jonathan Ganz, Nicolai Oetter, Florian Stelzle, Ann-Kathrin Frenken, Katharina Breininger, Miguel Goncalves
  • for: 这种研究旨在发展一种基于confocal laser endomicroscopy(CLE)的自动分析方法,以帮助外科医生在移除头颈肿瘤时确保安全的边缘。
  • methods: 这些研究人员使用了四种流行的几个shot学习(FSL)方法,以评估它们在不同的生理结构领域中的泛化能力。
  • results: 研究结果表明,FSL在CLE图像上是可能的,但受到patient数量和生理结构领域的多样性的影响。在 vocals folds(VF)图像上,最佳方法达到了79.6%的 médiane accuracy,而在sinunasal tumors(SNT)图像上只达到了61.6%的 médiane accuracy。
    Abstract The surgical removal of head and neck tumors requires safe margins, which are usually confirmed intraoperatively by means of frozen sections. This method is, in itself, an oversampling procedure, which has a relatively low sensitivity compared to the definitive tissue analysis on paraffin-embedded sections. Confocal laser endomicroscopy (CLE) is an in-vivo imaging technique that has shown its potential in the live optical biopsy of tissue. An automated analysis of this notoriously difficult to interpret modality would help surgeons. However, the images of CLE show a wide variability of patterns, caused both by individual factors but also, and most strongly, by the anatomical structures of the imaged tissue, making it a challenging pattern recognition task. In this work, we evaluate four popular few shot learning (FSL) methods towards their capability of generalizing to unseen anatomical domains in CLE images. We evaluate this on images of sinunasal tumors (SNT) from five patients and on images of the vocal folds (VF) from 11 patients using a cross-validation scheme. The best respective approach reached a median accuracy of 79.6% on the rather homogeneous VF dataset, but only of 61.6% for the highly diverse SNT dataset. Our results indicate that FSL on CLE images is viable, but strongly affected by the number of patients, as well as the diversity of anatomical patterns.
    摘要 surgical removal of head and neck tumors 需要安全的边缘,通常通过冻结部分来确认。这是一种过度采样的方法,其敏感度相对较低于 définitive tissue analysis on paraffin-embedded sections。Confocal laser endomicroscopy (CLE) 是一种实时成像技术,可以在实时生物组织检查中提供信息。然而,CLE 图像具有很大的变化,由于个体因素以及镜头检查的结构,这是一项具有挑战性的模式识别任务。在这项工作中,我们评估了四种流行的少数shot learning(FSL)方法,以其能否在未经见过的生理结构中广泛应用。我们使用了五名患者的 sinunasal tumors(SNT)图像和 11名患者的 vocal folds(VF)图像,采用交叉验证方式进行评估。最佳方法在 relativamente homogeneous VF 数据集上达到了79.6%的 median 准确率,但只有61.6%的准确率在高度多样化的 SNT 数据集上。我们的结果表明,FSL on CLE 图像是可行的,但受到patient 数量以及生理结构的多样化的影响。

A method for quantifying sectoral optic disc pallor in fundus photographs and its association with peripapillary RNFL thickness

  • paper_url: http://arxiv.org/abs/2311.07213
  • repo_url: None
  • paper_authors: Samuel Gibbon, Graciela Muniz-Terrera, Fabian SL Yii, Charlene Hamid, Simon Cox, Ian JC Maccormick, Andrew J Tatham, Craig Ritchie, Emanuele Trucco, Baljean Dhillon, Thomas J MacGillivray
  • for: 本研究的目的是开发一种自动地量化眼睛背部缺乏症状的方法,以及与背部 nerf fibre layer (pRNFL) 厚度的关系。
  • methods: 我们使用深度学习来 segment 眼睛背部、芳香突起和血管在眼睛照片中,并测量缺乏症状。我们通过对 pRNFL 厚度来自光共振扫描仪获得的数据进行分析,并在 118 名参与者中评估了缺乏症状和 pRNFL 厚度之间的关系。此外,我们还使用临床诊断为衰竭(N=45)的图像进行测量,并与健康控制群(N=46)进行比较。
  • results: 我们开发了一种可自动量化眼睛背部缺乏症状的软件。我们发现,缺乏症状与 pRNFL 厚度在全体、 temporal 下部、nasal/temporal 比例和整个眼睛背部之间存在关系。此外,衰竭组的缺乏症状也显著高于健康组。最后,我们也证明了这种分析方法对于相机类型、图像格式和分辨率的变化是可Robust。
    Abstract Purpose: To develop an automatic method of quantifying optic disc pallor in fundus photographs and determine associations with peripapillary retinal nerve fibre layer (pRNFL) thickness. Methods: We used deep learning to segment the optic disc, fovea, and vessels in fundus photographs, and measured pallor. We assessed the relationship between pallor and pRNFL thickness derived from optical coherence tomography scans in 118 participants. Separately, we used images diagnosed by clinical inspection as pale (N=45) and assessed how measurements compared to healthy controls (N=46). We also developed automatic rejection thresholds, and tested the software for robustness to camera type, image format, and resolution. Results: We developed software that automatically quantified disc pallor across several zones in fundus photographs. Pallor was associated with pRNFL thickness globally (\b{eta} = -9.81 (SE = 3.16), p < 0.05), in the temporal inferior zone (\b{eta} = -29.78 (SE = 8.32), p < 0.01), with the nasal/temporal ratio (\b{eta} = 0.88 (SE = 0.34), p < 0.05), and in the whole disc (\b{eta} = -8.22 (SE = 2.92), p < 0.05). Furthermore, pallor was significantly higher in the patient group. Lastly, we demonstrate the analysis to be robust to camera type, image format, and resolution. Conclusions: We developed software that automatically locates and quantifies disc pallor in fundus photographs and found associations between pallor measurements and pRNFL thickness. Translational relevance: We think our method will be useful for the identification, monitoring and progression of diseases characterized by disc pallor/optic atrophy, including glaucoma, compression, and potentially in neurodegenerative disorders.
    摘要 Methods: 我们使用深度学习来分割盘绿、芳香眼和血管在眼科照片中,并测量盘绿。我们在118名参与者中评估了盘绿与pRNFL厚度之间的关系,并分别使用被诊断为脊梁(N=45)和健康控制群(N=46)的图像进行比较。此外,我们还开发了自动拒绝阈值,并测试了软件的对camera类型、图像格式和分辨率的Robustness。Results: 我们开发了一种可以自动量化盘绿在多个区域的眼科照片中的软件。盘绿与pRNFL厚度之间存在全体(\b{eta} = -9.81 (SE = 3.16), p < 0.05), temporo- inferior区域(\b{eta} = -29.78 (SE = 8.32), p < 0.01)和nasal/temporal比率(\b{eta} = 0.88 (SE = 0.34), p < 0.05)之间的关系。此外,盘绿在病例群体中高于健康控制群。最后,我们证明了这种分析方法对camera类型、图像格式和分辨率的Robustness。Conclusions: 我们开发了一种可以自动分割盘绿、芳香眼和血管的软件,并发现了盘绿测量与pRNFL厚度之间的关系。Translational relevance: 我们认为这种方法将有用于识别、监测和评估 caracterized by disc pallor/optic atrophy的疾病,包括 glaucoma, compression, 和可能的neurodegenerative disorders。

Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation

  • paper_url: http://arxiv.org/abs/2311.07630
  • repo_url: None
  • paper_authors: Zhaojian Li, Bin Zhao, Yuan Yuan
  • for: 本研究提出了一种基于生成对抗学习的视觉导向的双耳立体声音生成方法,以提供更加听众式的听众体验。
  • methods: 本方法使用生成器和判定器,通过视觉共享的共同视觉信息来引导生成器和判定器分别工作。在生成对抗阶段,共同视觉信息被逐渐更新,allowing生成器和判定器发挥相互协作的作用。
  • results: 该方法在2个数据集和5个评价指标上达到了状态对抗的性能,并且在实际应用中可以提供空间实际的双耳立体声音。
    Abstract Binaural stereo audio is recorded by imitating the way the human ear receives sound, which provides people with an immersive listening experience. Existing approaches leverage autoencoders and directly exploit visual spatial information to synthesize binaural stereo, resulting in a limited representation of visual guidance. For the first time, we propose a visually guided generative adversarial approach for generating binaural stereo audio from mono audio. Specifically, we develop a Stereo Audio Generation Model (SAGM), which utilizes shared spatio-temporal visual information to guide the generator and the discriminator to work separately. The shared visual information is updated alternately in the generative adversarial stage, allowing the generator and discriminator to deliver their respective guided knowledge while visually sharing. The proposed method learns bidirectional complementary visual information, which facilitates the expression of visual guidance in generation. In addition, spatial perception is a crucial attribute of binaural stereo audio, and thus the evaluation of stereo spatial perception is essential. However, previous metrics failed to measure the spatial perception of audio. To this end, a metric to measure the spatial perception of audio is proposed for the first time. The proposed metric is capable of measuring the magnitude and direction of spatial perception in the temporal dimension. Further, considering its function, it is feasible to utilize it instead of demanding user studies to some extent. The proposed method achieves state-of-the-art performance on 2 datasets and 5 evaluation metrics. Qualitative experiments and user studies demonstrate that the method generates space-realistic stereo audio.
    摘要 人类耳朵所接收的声音方式为我们录制双耳立体声音提供了启发,从而为人们提供了沉浸式听众体验。现有方法利用自动编码器并直接利用视觉空间信息来生成双耳立体声音,但这会导致视觉指导的限制表现。我们为首次提出了基于视觉导向生成抗战斗方法,通过共享视觉信息来导引生成器和批判器分开工作。特别是,我们开发了双耳声音生成模型(SAGM),该模型利用共享的空间时间视觉信息来导引生成器和批判器。共享的视觉信息在生成抗战斗阶段不断更新,allowing生成器和批判器同时传递各自的导引知识,从而实现了视觉共享。此外,空间感知是双耳立体声音的重要特征,因此评估双耳立体声音的空间感知是必要的。然而,先前的指标未能评估音频中的空间感知。为此,我们提出了一种新的指标来评估音频中的空间感知。该指标可以评估音频中的空间感知的大小和方向。此外,由于其功能,可以在一定程度上取代用户研究。我们的方法在两个数据集和五个评价指标上达到了状态的最佳性能。Qualitative实验和用户研究表明,我们的方法可以生成真实的空间声音。

MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model

  • paper_url: http://arxiv.org/abs/2311.07198
  • repo_url: https://github.com/shuweishao/monodiffusion
  • paper_authors: Shuwei Shao, Zhongcai Pei, Weihai Chen, Dingchi Sun, Peter C. Y. Chen, Zhengguo Li
  • for: 本研究旨在提出一种新的自监督深度估计框架,称为MONODIFFUSION,该框架通过形式化为迭代减噪过程来实现。
  • methods: 我们在这种情况下开发了一种 pseudo 基准数据扩充过程,以帮助 MONODIFFUSION 的扩充。此外,我们还开发了一种masked visual condition机制,以提高模型的减噪能力。
  • results: 我们在 KITTI 和 Make3D 数据集上进行了广泛的实验,并证明了 MONODIFFUSION 在自监督深度估计中超过了先前的状态时刻。源代码将在 https://github.com/ShuweiShao/MonoDiffusion 上发布。
    Abstract Over the past few years, self-supervised monocular depth estimation that does not depend on ground-truth during the training phase has received widespread attention. Most efforts focus on designing different types of network architectures and loss functions or handling edge cases, e.g., occlusion and dynamic objects. In this work, we introduce a novel self-supervised depth estimation framework, dubbed MonoDiffusion, by formulating it as an iterative denoising process. Because the depth ground-truth is unavailable in the training phase, we develop a pseudo ground-truth diffusion process to assist the diffusion in MonoDiffusion. The pseudo ground-truth diffusion gradually adds noise to the depth map generated by a pre-trained teacher model. Moreover,the teacher model allows applying a distillation loss to guide the denoised depth. Further, we develop a masked visual condition mechanism to enhance the denoising ability of model. Extensive experiments are conducted on the KITTI and Make3D datasets and the proposed MonoDiffusion outperforms prior state-of-the-art competitors. The source code will be available at https://github.com/ShuweiShao/MonoDiffusion.
    摘要 Note:* "自动" (zìdòng) is used instead of "self-supervised" to emphasize the lack of ground truth during training.* "推理" (tiělǐ) is used instead of "denoising" to emphasize the iterative process.* "假" (jiǎ) is used instead of "pseudo" to emphasize the artificial nature of the pseudo ground truth.* "导学" (dǎoxué) is used instead of "distillation" to emphasize the role of the teacher model.* "遮盖" (miànjià) is used instead of "masked" to emphasize the hiding of the visual information.* "能力" (nénglì) is used instead of "ability" to emphasize the capability of the model.

Fitting tree model with CNN and geodesics to track vesselsand application to Ultrasound Localization Microscopy data

  • paper_url: http://arxiv.org/abs/2311.07188
  • repo_url: None
  • paper_authors: Théo Bertrand, Laurent D. Cohen
  • for: 探测血管网络中的重要标志点(通过CNN进行本地化和分类),并将血管表示为最小距离树图。
  • methods: 利用地odesic方法探测血管的几何特征,并在位置和orientation空间中使用空间位置和orientation来准确表示2D血管为树结构。
  • results: 虽然ULM数据的标注稀缺性是本研究的一大障碍,但是使用ULM数据构建的 Orientation Score 可以提供好的地odesics для跟踪血管。
    Abstract Segmentation of tubular structures in vascular imaging is a well studied task, although it is rare that we try to infuse knowledge of the tree-like structure of the regions to be detected. Our work focuses on detecting the important landmarks in the vascular network (via CNN performing both localization and classification of the points of interest) and representing vessels as the edges in some minimal distance tree graph. We leverage geodesic methods relevant to the detection of vessels and their geometry, making use of the space of positions and orientations so that 2D vessels can be accurately represented as trees. We build our model to carry tracking on Ultrasound Localization Microscopy (ULM) data, proposing to build a good cost function for tracking on this type of data. We also test our framework on synthetic and eye fundus data. Results show that scarcity of well annotated ULM data is an obstacle to localization of vascular landmarks but the Orientation Score built from ULM data yields good geodesics for tracking blood vessels.
    摘要 干流结构分割在血管成像中是已经广泛研究的任务,然而rarely 我们会利用树状结构的知识来检测这些区域。我们的工作是通过使用卷积神经网络进行本地化和分类 interested points,并将血管表示为最小距离树图的边。我们利用血管的推断方法和几何学特性,使用空间位置和方向的空间,以便精确地表示2D血管为树。我们建立了一个良好的成本函数,以便在这种数据上进行跟踪。我们还在synthetic和眼球膜数据上测试了我们的框架。结果表明,ULM数据的缺乏高质量标注是血管地标的本地化的主要障碍,但是我们构建的Orientation Score从ULM数据中得到了良好的地odesics для跟踪血管。

Regenerating Arbitrary Video Sequences with Distillation Path-Finding

  • paper_url: http://arxiv.org/abs/2311.07170
  • repo_url: None
  • paper_authors: Thi-Ngoc-Hanh Le, Sheng-Yi Yao, Chun-Te Wu, Tong-Yee Lee
  • for: 这篇论文是为了提供一种可互动的框架,帮助用户根据自己的喜好选择视频动画的开场场景。
  • methods: 这篇论文使用了一种名为RSFNet的网络来学习视频帧集的特征相关性,然后使用一种新的路径找索算法(SDPF)来基于源视频的运动方向来生成新的动画序列。
  • results: 该框架可以生成新的动画序列,并且可以在 cartoon 和自然场景中生成更加精准和自然的动画。这些结果超越了现有的商业应用程序和先前的研究工作。
    Abstract If the video has long been mentioned as a widespread visualization form, the animation sequence in the video is mentioned as storytelling for people. Producing an animation requires intensive human labor from skilled professional artists to obtain plausible animation in both content and motion direction, incredibly for animations with complex content, multiple moving objects, and dense movement. This paper presents an interactive framework to generate new sequences according to the users' preference on the starting frame. The critical contrast of our approach versus prior work and existing commercial applications is that novel sequences with arbitrary starting frame are produced by our system with a consistent degree in both content and motion direction. To achieve this effectively, we first learn the feature correlation on the frameset of the given video through a proposed network called RSFNet. Then, we develop a novel path-finding algorithm, SDPF, which formulates the knowledge of motion directions of the source video to estimate the smooth and plausible sequences. The extensive experiments show that our framework can produce new animations on the cartoon and natural scenes and advance prior works and commercial applications to enable users to obtain more predictable results.
    摘要 如果视频已经被广泛认为是视觉化的形式,视频中的动画序列被视为人们的故事tellding。制作动画需要凝心的人工劳动,从技巧备受训练的艺术家手中获得可信度的动画内容和动作方向,特别是对于具有复杂内容、多个移动对象和紧张运动的动画。本文提出了一种互动框架,可以根据用户的首帧偏好生成新的序列。我们的方法与先前的工作和商业应用程序的重要对比点在于,我们的系统可以生成novel的序列,并且在内容和动作方向上具有一致的度。为了实现这一目标,我们首先通过我们提出的网络 called RSFNet 学习视频帧集中的特征相关性。然后,我们开发了一种新的路径找索算法,SDPF,该算法利用源视频中的动作方向知识来估算可靠和可信度的新序列。我们的实验表明,我们的框架可以生成在 cartoon 和自然场景中的新动画,并超越先前的工作和商业应用程序,让用户可以更加预测性地获得结果。

NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion

  • paper_url: http://arxiv.org/abs/2311.07166
  • repo_url: https://github.com/ShuweiShao/NDDepth
  • paper_authors: Shuwei Shao, Zhongcai Pei, Weihai Chen, Peter C. Y. Chen, Zhengguo Li
  • for: 本研究旨在提出一种基于物理(几何)的深度学习框架,用于单目深度估计和完成任务。
  • methods: 我们提出了一种新的深度估计和完成方法,即先估计表面法向和到原点距离图像,然后将其转换为深度图像。此外,我们还开发了一个额外的深度头,以增强方法的稳定性。
  • results: 我们在NYU-Depth-v2、KITTI和SUN RGB-D数据集上进行了广泛的实验,结果表明我们的方法在单目深度估计和完成任务中表现出色,超越了先前的状态OF-the-art竞争者。
    Abstract Over the past few years, monocular depth estimation and completion have been paid more and more attention from the computer vision community because of their widespread applications. In this paper, we introduce novel physics (geometry)-driven deep learning frameworks for these two tasks by assuming that 3D scenes are constituted with piece-wise planes. Instead of directly estimating the depth map or completing the sparse depth map, we propose to estimate the surface normal and plane-to-origin distance maps or complete the sparse surface normal and distance maps as intermediate outputs. To this end, we develop a normal-distance head that outputs pixel-level surface normal and distance. Meanwhile, the surface normal and distance maps are regularized by a developed plane-aware consistency constraint, which are then transformed into depth maps. Furthermore, we integrate an additional depth head to strengthen the robustness of the proposed frameworks. Extensive experiments on the NYU-Depth-v2, KITTI and SUN RGB-D datasets demonstrate that our method exceeds in performance prior state-of-the-art monocular depth estimation and completion competitors. The source code will be available at https://github.com/ShuweiShao/NDDepth.
    摘要 过去几年,单目深度估计和完成已经在计算机视觉社区获得了更多的关注,因为它们在各种应用场景中具有广泛的应用前景。在这篇论文中,我们介绍了一种新的物理(几何)驱动的深度学习框架,假设3D场景由块状平面组成。而不是直接估计深度图或完成缺失的深度图,我们提议估计像素级面法向和平面到原点距离图。为此,我们开发了一个面法距离头,该头输出像素级面法向和距离。此外,我们还开发了一个扩展的深度头,以强化我们提议的框架的稳定性。广泛的实验表明,我们的方法在NYU-Depth-v2、KITTI和SUN RGB-D数据集上的性能较前状态的单目深度估计和完成竞争者高。代码将在https://github.com/ShuweiShao/NDDepth上公开。

CycleGANAS: Differentiable Neural Architecture Search for CycleGAN

  • paper_url: http://arxiv.org/abs/2311.07162
  • repo_url: None
  • paper_authors: Taegun An, Changhee Joo
  • for: 这个论文是为了搜索CyclesGAN的神经网络架构,用于无对照图像转换任务。
  • methods: 这个框架使用了一系列简单的ResNet基于细胞,并开发了一种有效地搜索大搜索空间的搜索方法。
  • results: 我们的框架可以不 только有效地找到高性能的架构,而且也能够成功地解决数据不均衡问题。
    Abstract We develop a Neural Architecture Search (NAS) framework for CycleGAN that carries out unpaired image-to-image translation task. Extending previous NAS techniques for Generative Adversarial Networks (GANs) to CycleGAN is not straightforward due to the task difference and greater search space. We design architectures that consist of a stack of simple ResNet-based cells and develop a search method that effectively explore the large search space. We show that our framework, called CycleGANAS, not only effectively discovers high-performance architectures that either match or surpass the performance of the original CycleGAN, but also successfully address the data imbalance by individual architecture search for each translation direction. To our best knowledge, it is the first NAS result for CycleGAN and shed light on NAS for more complex structures.
    摘要 我们开发了一个基于神经网络搜索(NAS)框架,用于实现无对照图像到图像翻译任务。与前一代GANs NAS技术不同,将NAS技术应用于CycleGAN任务不是直接的,因为任务的不同和搜索空间的更大。我们设计了一个由堆式简单的ResNet基于细胞组成的 architecture,并开发了一种有效地探索大型搜索空间的搜索方法。我们显示,我们的框架,称为CycleGANAS,不仅能够有效地找到高性能的architecture,并且成功地解决了数据不均衡问题,通过个体搜索每个翻译方向。据我们所知,这是NAS的首次成果,并照亮了NAS的更复杂结构的应用。

Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection

  • paper_url: http://arxiv.org/abs/2311.07152
  • repo_url: https://github.com/HuangJunJie2017/BEVDet
  • paper_authors: Junjie Huang, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du
  • for: 本文旨在提出一种新的3D物体检测方法,以解决LiDAR-camera结合体系中的过拟合问题。
  • methods: 本文提出了一种基于’检测为标签’(Detecting As Labeling,DAL)的新方法,通过imitating数据注释过程,建立了一个简单的预测管道,并使用了最简单的训练方法来最小化依赖性和提高可移植性。
  • results: 对比 existed方法,提出的DAL方法在性能和可移植性两个方面具有明显的优势,可以作为未来研发和实际应用的理想基线。
    Abstract 3D object Detection with LiDAR-camera encounters overfitting in algorithm development which is derived from the violation of some fundamental rules. We refer to the data annotation in dataset construction for theory complementing and argue that the regression task prediction should not involve the feature from the camera branch. By following the cutting-edge perspective of 'Detecting As Labeling', we propose a novel paradigm dubbed DAL. With the most classical elementary algorithms, a simple predicting pipeline is constructed by imitating the data annotation process. Then we train it in the simplest way to minimize its dependency and strengthen its portability. Though simple in construction and training, the proposed DAL paradigm not only substantially pushes the performance boundary but also provides a superior trade-off between speed and accuracy among all existing methods. With comprehensive superiority, DAL is an ideal baseline for both future work development and practical deployment. The code has been released to facilitate future work on https://github.com/HuangJunJie2017/BEVDet.
    摘要 三元 объек特检测遇到了过拟合问题在算法开发中,这是由数据注解在数据集建构中的违反基本规则所致。我们提出了一种新的思路,称为“检测为标注”(DAL)。我们采用了最经典的元素算法,构建了一个简单的预测管道,并通过模仿数据注解过程来训练。尽管简单构建和训练,但提议的 DAL 方法不仅可以显著提高性能boundary,还提供了速度和准确性之间的优秀平衡。在所有现有方法中,DAL 具有最高的全面优势,是未来研发和实践中的理想基eline。代码已经发布到了https://github.com/HuangJunJie2017/BEVDet,以便未来研究。

PadChannel: Improving CNN Performance through Explicit Padding Encoding

  • paper_url: http://arxiv.org/abs/2311.07623
  • repo_url: https://github.com/aussieseaweed/pad-channel
  • paper_authors: Juho Kim
  • for: 提高 CNN 的表征EXTRACTION 精度,通过将paddingstatus编码为另一个输入通道,使 CNN 能够轻松地分辨真正的像素和padding区域。
  • methods: 提出了 PadChannel padding 方法,该方法将 padding status 编码为另一个输入通道,以便 CNN 能够轻松地分辨真正的像素和padding区域。
  • results: 在 ImageNet-1K 图像分类任务上,通过 incorporating PadChannel into several prominent CNN architectures,实现了小幅提升性能和明显减少了变差值,而且计算成本增加不多。
    Abstract In convolutional neural networks (CNNs), padding plays a pivotal role in preserving spatial dimensions throughout the layers. Traditional padding techniques do not explicitly distinguish between the actual image content and the padded regions, potentially causing CNNs to incorrectly interpret the boundary pixels or regions that resemble boundaries. This ambiguity can lead to suboptimal feature extraction. To address this, we propose PadChannel, a novel padding method that encodes padding statuses as an additional input channel, enabling CNNs to easily distinguish genuine pixels from padded ones. By incorporating PadChannel into several prominent CNN architectures, we observed small performance improvements and notable reductions in the variances on the ImageNet-1K image classification task at marginal increases in the computational cost. The source code is available at https://github.com/AussieSeaweed/pad-channel
    摘要 在卷积神经网络(CNN)中,填充扮演着保持空间维度的重要角色。传统的填充技术不能显式地区分实际图像内容和填充区域,可能导致CNN incorrectly interpretBoundary pixels或区域,从而导致优化特征提取的困难。为解决这个问题,我们提议PadChannel,一种新的填充方法,它将填充状态编码为一个额外输入通道,使CNN可以轻松地 отличи出真实的像素与填充区域。通过将PadChannel integrating into severaleminent CNN architectures, we observed small performance improvements and notable reductions in the variances on the ImageNet-1K image classification task at marginal increases in the computational cost. Source code available at

Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification

  • paper_url: http://arxiv.org/abs/2311.07125
  • repo_url: https://github.com/dazhangyu123/acmil
  • paper_authors: Yunlong Zhang, Honglin Li, Yuxuan Sun, Sunyi Zheng, Chenglu Zhu, Lin Yang
  • for: 本研究旨在解决多个实例学习(MIL)方法在整个扫描图像(WSI)分析中遇到的过拟合问题。
  • methods: 本研究提出了一种名为 Attention-Challenging MIL(ACMIL)的方法,旨在让注意力机制能够捕捉更多的挑战性预测实例。ACMIL 使用了两种技术:多支分支注意力(MBA)和随机 Top-K 实例屏蔽(STKIM)。
  • results: 在三个 WSI 数据集上进行评估,ACMIL 表现出色,超过了现有方法。此外,通过热图视化、UMAP 视化和注意值统计,本研究详细展示了 ACMIL 在超越过拟合挑战的效果。代码可以在 \url{https://github.com/dazhangyu123/ACMIL} 上获取。
    Abstract Overfitting remains a significant challenge in the application of Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis. Visualizing heatmaps reveals that current MIL methods focus on a subset of predictive instances, hindering effective model generalization. To tackle this, we propose Attention-Challenging MIL (ACMIL), aimed at forcing the attention mechanism to capture more challenging predictive instances. ACMIL incorporates two techniques, Multiple Branch Attention (MBA) to capture richer predictive instances and Stochastic Top-K Instance Masking (STKIM) to suppress simple predictive instances. Evaluation on three WSI datasets outperforms state-of-the-art methods. Additionally, through heatmap visualization, UMAP visualization, and attention value statistics, this paper comprehensively illustrates ACMIL's effectiveness in overcoming the overfitting challenge. The source code is available at \url{https://github.com/dazhangyu123/ACMIL}.
    摘要 多个实例学习(MIL)方法在整个扫描图像(WSI)分析中仍然存在至关重要的挑战,即过拟合。使用热图可视化显示,当前MIL方法往往会围绕一部分预测实例集中心化,从而降低模型的泛化能力。为解决这个问题,我们提议了吸引挑战(ACMIL)方法,旨在让吸引机制捕捉更多的挑战预测实例。ACMIL方法包括多支分支吸引(MBA)和随机Top-K实例屏蔽(STKIM)两种技术,以捕捉更加丰富的预测实例和避免简单的预测实例。在三个WSI数据集上进行评估,ACMIL方法超越了现有方法。此外,通过热图可视化、UMAP可视化和吸引值统计,本文全面地展示了ACMIL方法在超越过拟合挑战的效果。ACMIL源代码可以在GitHub上获取,具体请参考\url{https://github.com/dazhangyu123/ACMIL}.

SpectralGPT: Spectral Foundation Model

  • paper_url: http://arxiv.org/abs/2311.07113
  • repo_url: None
  • paper_authors: Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Xiuping Jia, Antonio Plaza, Gamba Paolo, Jon Atli Benediktsson, Jocelyn Chanussot
  • for: 这个研究旨在开发一个能够处理颜色spectral remote sensing(RS)影像的通用基础模型,以探索spectral RS 数据的应用前景。
  • methods: 本研究使用了一种名为SpectralGPT的新型3D生成预训练trasnformer(GPT),可以处理不同大小、分辨率、时间序列和区域的spectral RS 影像,并且可以进行预训练和进一步训练。
  • results: 根据我们的评估结果,预训练后的SpectralGPT模型具有超过6000万个参数,并且在四个下渠任务中(单/多标Scene分类、semantic segmentation和变化检测)表现出了明显的性能提升。
    Abstract The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.
    摘要 底层模型最近受到了各种视觉学任务自动学习的潜在革命性的注意力。大多数底层模型都是针对RGB图像进行多种视觉任务的效果优化的。然而,在 spectral 数据方面,有一定的研究欠差,这些数据具有Scene 理解中的价值,尤其是在远程感知(RS)应用中。为了填补这个欠差,我们创造了首次的universal RS 底层模型,名为SpectralGPT,它使用了一种新的三维生成预训练变换器(GPT)来处理 spectral RS 图像。与现有的底层模型相比,SpectralGPT 具有以下优势:1. 可以处理不同大小、分辨率、时间序列和区域的输入图像,从而使用广泛的RS大数据进行全面利用。2. 通过三维token生成来实现空间-spectral的coupling。3. 通过多个目标重建来捕捉spectral序列的特征。4. 在一百万个spectral RS 图像上进行训练,实现了模型具有超过6亿个参数。我们的评估表明,使用预训练SpectralGPT模型可以获得显著的性能提升,这表明了这种方法在RS大数据应用中具有潜在的潜力。在四个下游任务中,预训练SpectralGPT模型都显示了显著的性能提升:单/多标Scene 分类、semantic segmentation和变化检测。

  • paper_url: http://arxiv.org/abs/2311.07090
  • repo_url: None
  • paper_authors: Yachun Mi, Yu Li, Yan Shu, Chen Hui, Puchao Zhou, Shaohui Liu
  • For: 这篇论文主要关注视频质量评估 (VQA) 领域,旨在模拟人类视觉系统 (HVS) 对视频质量的评估。* Methods: 该论文提出了一种新的 VQA 方法,即 CLiF-VQA,它考虑了视频中人类情感的影响,同时也考虑了视频的空间特征。为了有效提取视频中人类情感的特征,该方法首次利用 CLIP 和人类情感的一致性进行研究。具体来说,该方法设计了多个对人类情感有关的目标和主观描述作为推荐。此外,该方法还提出了一种基于 CLIP 的 semantic feature extractor (SFE),可以从视频帧中提取人类情感相关的特征。* Results: 该论文的实验结果表明,提出的 CLiF-VQA 方法在多个 VQA 数据集上表现出色。
    Abstract Video Quality Assessment (VQA) aims to simulate the process of perceiving video quality by the human visual system (HVS). The judgments made by HVS are always influenced by human subjective feelings. However, most of the current VQA research focuses on capturing various distortions in the spatial and temporal domains of videos, while ignoring the impact of human feelings. In this paper, we propose CLiF-VQA, which considers both features related to human feelings and spatial features of videos. In order to effectively extract features related to human feelings from videos, we explore the consistency between CLIP and human feelings in video perception for the first time. Specifically, we design multiple objective and subjective descriptions closely related to human feelings as prompts. Further we propose a novel CLIP-based semantic feature extractor (SFE) which extracts features related to human feelings by sliding over multiple regions of the video frame. In addition, we further capture the low-level-aware features of the video through a spatial feature extraction module. The two different features are then aggregated thereby obtaining the quality score of the video. Extensive experiments show that the proposed CLiF-VQA exhibits excellent performance on several VQA datasets.
    摘要

GazeForensics: DeepFake Detection via Gaze-guided Spatial Inconsistency Learning

  • paper_url: http://arxiv.org/abs/2311.07075
  • repo_url: None
  • paper_authors: Qinlin He, Chunlei Peng, Dechuang Liu, Nannan Wang, Xinbo Gao
  • for: 这个研究旨在提高深伪检测的精度,以保护人们的隐私和公共安全。
  • methods: 本研究使用了3D双眼视线估计模型来取得视线表现,并与通用特征结合使用,以增强深伪检测模型的表现。
  • results: 实验结果显示,提案的GazeForensics方法可以超过目前的州OF-THE-ART方法。
    Abstract DeepFake detection is pivotal in personal privacy and public safety. With the iterative advancement of DeepFake techniques, high-quality forged videos and images are becoming increasingly deceptive. Prior research has seen numerous attempts by scholars to incorporate biometric features into the field of DeepFake detection. However, traditional biometric-based approaches tend to segregate biometric features from general ones and freeze the biometric feature extractor. These approaches resulted in the exclusion of valuable general features, potentially leading to a performance decline and, consequently, a failure to fully exploit the potential of biometric information in assisting DeepFake detection. Moreover, insufficient attention has been dedicated to scrutinizing gaze authenticity within the realm of DeepFake detection in recent years. In this paper, we introduce GazeForensics, an innovative DeepFake detection method that utilizes gaze representation obtained from a 3D gaze estimation model to regularize the corresponding representation within our DeepFake detection model, while concurrently integrating general features to further enhance the performance of our model. Experiment results reveal that our proposed GazeForensics outperforms the current state-of-the-art methods.
    摘要 深度假像检测对个人隐私和公共安全具有核心作用。随着深度假像技术的不断提高,高质量的假造视频和图像变得越来越欺骗性。先前的研究中,学者们已经尝试将生物特征 incorporated 到深度假像检测领域中。然而,传统的生物特征基于的方法通常将生物特征与通用特征分离开来,这会导致排除有价值的通用特征,从而导致性能下降,最终无法完全利用生物信息的潜在优势。此外,近年来对 DeepFake 检测中的视线真实性的研究并未受到足够的关注。本文提出了一种名为 GazeForensics 的创新的 DeepFake 检测方法,该方法使用来自 3D 视线估计模型获得的视线表示来补做对应的表示在我们的 DeepFake 检测模型中,同时并入通用特征以进一步提高我们的模型的性能。实验结果表明,我们的提议的 GazeForensics 方法在当前状态的方法中具有优异的性能。

$L_0$-Sampler: An $L_{0}$ Model Guided Volume Sampling for NeRF

  • paper_url: http://arxiv.org/abs/2311.07044
  • repo_url: None
  • paper_authors: Liangchen Li, Juyong Zhang
    for:This paper aims to improve the Neural Radiance Fields (NeRF) method by proposing a new sampling strategy called $L_0$-Sampler.methods:The $L_0$-Sampler incorporates the $L_0$ model into the weight function $w(t)$ to guide the sampling process, using piecewise exponential functions for interpolation.results:The proposed $L_0$-Sampler can achieve stable performance improvements in NeRF and related tasks like 3D reconstruction, with the advantage of being easily implemented with few lines of code.Here is the text in Simplified Chinese:for:这篇论文目标是改进Neural Radiance Fields(NeRF)方法,提出一种新的采样策略 called $L_0$-Sampler。methods:$L_0$-Sampler将$L_0$模型integrated into weight函数$w(t)$中,使用piecewise exponential函数进行插值。results:提议的$L_0$-Sampler可以在NeRF和相关任务如3D重建中实现稳定性提升,同时易于实现,只需几行代码。
    Abstract Since being proposed, Neural Radiance Fields (NeRF) have achieved great success in related tasks, mainly adopting the hierarchical volume sampling (HVS) strategy for volume rendering. However, the HVS of NeRF approximates distributions using piecewise constant functions, which provides a relatively rough estimation. Based on the observation that a well-trained weight function $w(t)$ and the $L_0$ distance between points and the surface have very high similarity, we propose $L_0$-Sampler by incorporating the $L_0$ model into $w(t)$ to guide the sampling process. Specifically, we propose to use piecewise exponential functions rather than piecewise constant functions for interpolation, which can not only approximate quasi-$L_0$ weight distributions along rays quite well but also can be easily implemented with few lines of code without additional computational burden. Stable performance improvements can be achieved by applying $L_0$-Sampler to NeRF and its related tasks like 3D reconstruction. Code is available at https://ustc3dv.github.io/L0-Sampler/ .
    摘要 Since being proposed, Neural Radiance Fields (NeRF) have achieved great success in related tasks, mainly adopting the hierarchical volume sampling (HVS) strategy for volume rendering. However, the HVS of NeRF approximates distributions using piecewise constant functions, which provides a relatively rough estimation. Based on the observation that a well-trained weight function $w(t)$ and the $L_0$ distance between points and the surface have very high similarity, we propose $L_0$-Sampler by incorporating the $L_0$ model into $w(t)$ to guide the sampling process. Specifically, we propose to use piecewise exponential functions rather than piecewise constant functions for interpolation, which can not only approximate quasi-$L_0$ weight distributions along rays quite well but also can be easily implemented with few lines of code without additional computational burden. Stable performance improvements can be achieved by applying $L_0$-Sampler to NeRF and its related tasks like 3D reconstruction. Code is available at https://ustc3dv.github.io/L0-Sampler/.Here's the word-for-word translation of the text into Simplified Chinese: desde que se propuso, Neural Radiance Fields (NeRF) han tenido gran éxito en tareas relacionadas, principalmente adoptando la estrategia de muestreo de volumen jerárquico (HVS) para la renderización de volumenes. Sin embargo, el HVS de NeRF aproxima distribuciones utilizando funciones constantes piecewise, lo que proporciona una estimación relativamente rough. Basándonos en la observación de que una función de peso bien entrenada $w(t)$ y la distancia $L_0$ entre puntos y la superficie tienen una alta similitud, propodemos $L_0$-Sampler al incorporar el modelo $L_0$ en $w(t)$ para guiar el proceso de muestreo. Específicamente, propodemos utilizar funciones exponenciales piecewise en lugar de funciones constantes piecewise para la interpolación, lo que puede no solo aproximar distribucciones de peso quasi-$L_0$ en rayas muy bien but also puede ser fácilmente implementado con pocas líneas de código sin un carga adicional computacional. Se pueden lograr mejores mejoras estables en el rendimiento al aplicar $L_0$-Sampler a NeRF y tareas relacionadas como la reconstrucción 3D. El código está disponible en https://ustc3dv.github.io/L0-Sampler/.

Open-Vocabulary Video Anomaly Detection

  • paper_url: http://arxiv.org/abs/2311.07042
  • repo_url: None
  • paper_authors: Peng Wu, Xuerong Zhou, Guansong Pang, Yujia Sun, Jing Liu, Peng Wang, Yanning Zhang
  • for: 这 paper 旨在解决开放 vocabulary video anomaly detection (OVVAD) 问题,即通过预训练大型模型来检测和分类已知和未知异常。
  • methods: 该 paper 提出了一种分解 OVVAD 任务为两个互补任务 – 类型不敏感检测和类型特定分类 – 并同时优化两个任务。具体来说,该 paper 提出了一种 semantic knowledge injection module,用于将大型语言模型中的 semantic knowledge 引入检测任务中,以及一种 anomaly synthesis module,用于通过大型视觉生成模型生成 pseudo 未知异常视频,以便分类任务中更好地检测和分类各种 seen 和 unseen 异常。
  • results: 该 paper 的实验结果表明,其模型在 OVVAD 任务中取得了状态公平的表现,比如在 three widely-used benchmark 上的 experiment 中。
    Abstract Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal. However, current approaches are inherently limited to a closed-set setting and may struggle in open-world applications where there can be anomaly categories in the test data unseen during training. A few recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos. However, such a setting focuses on predicting frame anomaly scores, having no ability to recognize the specific categories of anomalies, despite the fact that this ability is essential for building more informed video surveillance systems. This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies. To this end, we propose a model that decouples OVVAD into two mutually complementary tasks -- class-agnostic detection and class-specific classification -- and jointly optimizes both tasks. Particularly, we devise a semantic knowledge injection module to introduce semantic knowledge from large language models for the detection task, and design a novel anomaly synthesis module to generate pseudo unseen anomaly videos with the help of large vision generation models for the classification task. These semantic knowledge and synthesis anomalies substantially extend our model's capability in detecting and categorizing a variety of seen and unseen anomalies. Extensive experiments on three widely-used benchmarks demonstrate our model achieves state-of-the-art performance on OVVAD task.
    摘要 视频异常检测(VAD)通过弱监督得到了非常出色的表现,可以使用视频帧级别的标签来判断视频帧是否正常。然而,现有的方法受限于关闭集成环境,可能在开放世界应用中遇到未知的异常类型。一些最近的研究尝试解决更加现实的设定,开放集成VAD,以便在测试数据中未经训练的异常类型上检测异常。然而,这种设定仅仅是预测帧异常分数,无法识别特定的异常类型,尽管这种能力是建立更加知ledge的视频监测系统的关键。本文尝试一步更进一步,探索开放词汇视频异常检测(OVVAD),我们希望通过利用预训练大型模型来检测和分类已知和未知异常。为此,我们提议一个模型,将OVVAD分解成两个互补性任务:无关类型检测和类型特定分类,并同时优化两个任务。特别是,我们设计了一个语义知识注入模块,将语义知识从大型语言模型引入检测任务,并设计了一个异常生成模块,通过大视力生成模型生成 Pseudo 未知异常视频。这些语义知识和生成异常substantially 提高了我们模型的异常检测和分类能力。广泛的实验表明,我们的模型在OVVAD任务中具有状态级别的表现。

Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval

  • paper_url: http://arxiv.org/abs/2311.07622
  • repo_url: None
  • paper_authors: Junyang Chen, Hanjiang Lai
  • for: This paper focuses on zero-shot composed image retrieval (ZS-CIR), which aims to retrieve a target image based on textual modifications to a reference image without triplet labeling.
  • methods: The paper introduces a novel unlabeled and pre-trained masked tuning approach to reduce the gap between the pre-trained model and the downstream CIR task. The approach uses the text and the masked image to learn the modifications of the original image.
  • results: The approach significantly outperforms the baseline models on three ZS-CIR datasets, including FashionIQ, CIRR, and CIRCO.
    Abstract Zero-shot composed image retrieval (ZS-CIR), which aims to retrieve a target image based on textual modifications to a reference image without triplet labeling, has gained more and more attention. Current ZS-CIR research mainly relies on two unlabeled pre-trained models: the vision-language model, e.g., CLIP, and the Pic2Word/textual inversion model. However, the pre-trained models and CIR tasks have substantial discrepancies, where the pre-trained models learn the similarities between vision and language but CIR aims to learn the modifications of the image guided by text. In this paper, we introduce a novel unlabeled and pre-trained masked tuning approach to reduce the gap between the pre-trained model and the downstream CIR task. We first reformulate the pre-trained vision-language contrastive learning as the CIR task, where we randomly mask input image patches to generate $\langle$masked image, text, image$\rangle$ triple from an image-text pair. Then, we propose a masked tuning, which uses the text and the masked image to learn the modifications of the original image. With such a simple design, it can learn to capture fine-grained text-guided modifications. Extensive experimental results demonstrate the significant superiority of our approach over the baseline models on three ZS-CIR datasets, including FashionIQ, CIRR, and CIRCO.
    摘要 Zero-shot组合图像检索(ZS-CIR),旨在基于文本修改参照图像而不需要三元标注,已经吸引了更多的关注。当前ZS-CIR研究主要基于两个无标注预训练模型:视觉语言模型,例如CLIP,以及Pic2Word/文本反转模型。然而,预训练模型和CIR任务之间存在substantial差异,预训练模型学习视觉和语言之间的相似性,而CIR任务则是学习文本指导图像的修改。在这篇论文中,我们介绍了一种新的无标注预训练掩模型调整方法,以减少预训练模型和下游CIR任务之间的差异。我们首先将预训练视觉语言对比学习重新формализова为CIR任务,将输入图像块随机掩蔽,生成$\langle$掩模图像、文本、原始图像$\rangle$三元组。然后,我们提议一种掩模调整,使用文本和掩模图像来学习原始图像的修改。这种简单的设计可以学习到细致的文本指导修改。我们对三个ZS-CIR数据集进行了广泛的实验,结果表明我们的方法在baseline模型上显著超越。

TTMFN: Two-stream Transformer-based Multimodal Fusion Network for Survival Prediction

  • paper_url: http://arxiv.org/abs/2311.07033
  • repo_url: None
  • paper_authors: Ruiquan Ge, Xiangyang Hu, Rungen Huang, Gangyong Jia, Yaqi Wang, Renshu Gu, Changmiao Wang, Elazab Ahmed, Linyan Wang, Juan Ye, Ye Li
  • for: 预测癌症患者生存时间的研究
  • methods: 提议一种基于深度学习的两树式多modal合并网络(TTMFN),将生物 PATHOLOGICAL 图像和基因表达数据融合以提高预测性能
  • results: TTMFN 在四个来自 The Cancer Genome Atlas 的数据集上实现了最佳或与状态艺术方法相当的预测结果,提高了患者生存时间的预测精度
    Abstract Survival prediction plays a crucial role in assisting clinicians with the development of cancer treatment protocols. Recent evidence shows that multimodal data can help in the diagnosis of cancer disease and improve survival prediction. Currently, deep learning-based approaches have experienced increasing success in survival prediction by integrating pathological images and gene expression data. However, most existing approaches overlook the intra-modality latent information and the complex inter-modality correlations. Furthermore, existing modalities do not fully exploit the immense representational capabilities of neural networks for feature aggregation and disregard the importance of relationships between features. Therefore, it is highly recommended to address these issues in order to enhance the prediction performance by proposing a novel deep learning-based method. We propose a novel framework named Two-stream Transformer-based Multimodal Fusion Network for survival prediction (TTMFN), which integrates pathological images and gene expression data. In TTMFN, we present a two-stream multimodal co-attention transformer module to take full advantage of the complex relationships between different modalities and the potential connections within the modalities. Additionally, we develop a multi-head attention pooling approach to effectively aggregate the feature representations of the two modalities. The experiment results on four datasets from The Cancer Genome Atlas demonstrate that TTMFN can achieve the best performance or competitive results compared to the state-of-the-art methods in predicting the overall survival of patients.
    摘要 生存预测在医学家开发癌症治疗协议中发挥关键作用。现有证据表明,多modal数据可以帮助诊断癌症疾病并提高生存预测。目前,深度学习基于的方法在生存预测中经历了增长的成功,通过将 PATHOLOGICAL IMAGES 和基因表达数据集成起来。然而,大多数现有方法忽视INTRA-MODALITY LATENT INFORMATION和复杂的交叉modalities关系。此外,现有的modalities不完全利用神经网络的庞大表达能力进行特征聚合,也忽视了特征之间的关系。因此,以提高预测性能的目的,我们建议提出一种新的深度学习基于的方法。我们提出了一种名为 Two-stream Transformer-based Multimodal Fusion Network 的新框架(TTMFN),它将 PATHOLOGICAL IMAGES 和基因表达数据集成起来。在 TTMFN 中,我们提出了一种两树多模态协作变换模块,以便充分利用不同modalities之间的复杂关系和可能的连接。此外,我们开发了一种多头注意池化方法,以有效地聚合 PATHOLOGICAL IMAGES 和基因表达数据的特征表示。实验结果表明,在 The Cancer Genome Atlas 上的四个数据集上,TTMFN 可以获得最佳性能或与当前状态艺术方法竞争。

PICS in Pics: Physics Informed Contour Selection for Rapid Image Segmentation

  • paper_url: http://arxiv.org/abs/2311.07002
  • repo_url: None
  • paper_authors: Vikas Dwivedi, Balaji Srinivasan, Ganapathy Krishnamurthi
  • for: 这篇论文的目的是提出一个可读性好的深度图像分类模型训练方法,并且不需要大量、高品质的标注数据。
  • methods: 这篇论文使用了Physics Informed Contour Selection(PICS)算法,它是一种可解释的、以物理为指导的图像分类算法,它结合了Physics-Informed Neural Networks(PINNs)和活动曲线模型(snake)。PICS使用了立方spline代替深度神经网络,因此它很快速和计算轻量级。
  • results: 这篇论文透过实验显示PICS可以快速和高效地完成3D图像分类,并且可以借由转移学习来加速分类。PICS还引入了一个新的凸形积分项,以增强分类质量。总的来说,PICS具有许多新的特点,例如网络架构、转移学习和物理灵感损失,因此显示了可循环和可进一步改进的潜力。
    Abstract Effective training of deep image segmentation models is challenging due to the need for abundant, high-quality annotations. Generating annotations is laborious and time-consuming for human experts, especially in medical image segmentation. To facilitate image annotation, we introduce Physics Informed Contour Selection (PICS) - an interpretable, physics-informed algorithm for rapid image segmentation without relying on labeled data. PICS draws inspiration from physics-informed neural networks (PINNs) and an active contour model called snake. It is fast and computationally lightweight because it employs cubic splines instead of a deep neural network as a basis function. Its training parameters are physically interpretable because they directly represent control knots of the segmentation curve. Traditional snakes involve minimization of the edge-based loss functionals by deriving the Euler-Lagrange equation followed by its numerical solution. However, PICS directly minimizes the loss functional, bypassing the Euler Lagrange equations. It is the first snake variant to minimize a region-based loss function instead of traditional edge-based loss functions. PICS uniquely models the three-dimensional (3D) segmentation process with an unsteady partial differential equation (PDE), which allows accelerated segmentation via transfer learning. To demonstrate its effectiveness, we apply PICS for 3D segmentation of the left ventricle on a publicly available cardiac dataset. While doing so, we also introduce a new convexity-preserving loss term that encodes the shape information of the left ventricle to enhance PICS's segmentation quality. Overall, PICS presents several novelties in network architecture, transfer learning, and physics-inspired losses for image segmentation, thereby showing promising outcomes and potential for further refinement.
    摘要 实现深度图像分类模型的训练非常困难,因为需要充足的、高品质的标注。生成标注是人工专家很传统和时间耗费的,特别是医疗图像分类。为了促进图像标注,我们提出了物理决定曲线选择(PICS):一种可读性的、物理决定的算法,不需要标注数据。PICS受到物理决定神经网络(PINNs)和活动曲线模型(snake)的启发,它快速且轻量级的,因为它使用立方体spline而不是深度神经网络作为基础函数。它的训练参数是物理可解的,因为它们直接表示分类曲线的控制点。传统的蛇涉及到透过监督学习减少边界基于损失函数的最小化,但PICS直接对数据进行损失函数的最小化,不需要监督学习。PICS是首个将区域基于损失函数最小化,而不是传统的边界基于损失函数最小化。PICS具有实现三维(3D)分类过程的不稳定偏微分方程(PDE),可以通过转移学习加速分类。为了证明其效果,我们将PICS应用于公开可用的心脏组织数据集上3D左心脏分类。同时,我们也引入了一个新的凸形积分函数,以增强PICS的分类质量。总之,PICS具有训练网络架构、转移学习和物理决定损失函数等多个新特点,这些特点使得PICS在图像分类方面显示出了可塑性和潜力。

cs.AI - 2023-11-13

Parrot-Trained Adversarial Examples: Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models

  • paper_url: http://arxiv.org/abs/2311.07780
  • repo_url: None
  • paper_authors: Rui Duan, Zhe Qu, Leah Ding, Yao Liu, Zhuo Lu
  • for: 防御对话系统的安全性挑战,特别是对于真实世界中的语音识别系统。
  • methods: 提出了一种新的机制,即“喊鸟训练”(Parrot Training,PT),用于生成对目标模型的攻击。基于最近的语音转换技术(Voice Conversion,VC),使用一句短语的知识生成更多的假语音样本,以便在PT模型上进行攻击。
  • results: 实验结果显示,对于开源模型,PT-AEs可以达到45.8%-80.8%的攻击成功率,而对于智能设备,包括Apple HomePod(Siri)、Amazon Echo和Google Home,可以达到47.9%-58.3%的攻击成功率。
    Abstract Audio adversarial examples (AEs) have posed significant security challenges to real-world speaker recognition systems. Most black-box attacks still require certain information from the speaker recognition model to be effective (e.g., keeping probing and requiring the knowledge of similarity scores). This work aims to push the practicality of the black-box attacks by minimizing the attacker's knowledge about a target speaker recognition model. Although it is not feasible for an attacker to succeed with completely zero knowledge, we assume that the attacker only knows a short (or a few seconds) speech sample of a target speaker. Without any probing to gain further knowledge about the target model, we propose a new mechanism, called parrot training, to generate AEs against the target model. Motivated by recent advancements in voice conversion (VC), we propose to use the one short sentence knowledge to generate more synthetic speech samples that sound like the target speaker, called parrot speech. Then, we use these parrot speech samples to train a parrot-trained(PT) surrogate model for the attacker. Under a joint transferability and perception framework, we investigate different ways to generate AEs on the PT model (called PT-AEs) to ensure the PT-AEs can be generated with high transferability to a black-box target model with good human perceptual quality. Real-world experiments show that the resultant PT-AEs achieve the attack success rates of 45.8% - 80.8% against the open-source models in the digital-line scenario and 47.9% - 58.3% against smart devices, including Apple HomePod (Siri), Amazon Echo, and Google Home, in the over-the-air scenario.
    摘要 听音攻击(AE)对实际世界的 speaker recognition 系统构成了重要的安全挑战。大多数黑盒攻击仍然需要攻击者有一定的信息,如识别分数等(e.g., 探测和需要知道相似度)。这项工作的目标是使黑盒攻击变得更加实际,减少攻击者对目标 speaker recognition 模型的知识。尽管无法 completly 无知攻击成功,但我们假设攻击者只知道target speaker的一段(或几秒)的语音示例。无需进一步的探测,我们提议一种新的机制,called parrot training,来生成对目标模型的攻击。驱动于最近的语音转换(VC)技术,我们提议使用一个短语音示例来生成更多的合成语音样本,以达到更好的人工识别质量。然后,我们使用这些parrot speech样本来训练一个PT模型。在一个共同传播和感知框架下,我们研究不同的方法来生成PT模型上的攻击样本(PT-AEs),以确保PT-AEs可以高效地传播到黑盒目标模型,并且具有良好的人工识别质量。实际实验表明,结果的PT-AEs在开源模型上达到了45.8%-80.8%的攻击成功率,在数字线上enario中,以及47.9%-58.3%的攻击成功率,在过空间上enario中,包括Apple HomePod(Siri)、Amazon Echo和Google Home等智能设备。

GreekT5: A Series of Greek Sequence-to-Sequence Models for News Summarization

  • paper_url: http://arxiv.org/abs/2311.07767
  • repo_url: https://github.com/nc0der/greekt5
  • paper_authors: Nikolaos Giarelis, Charalampos Mastrokostas, Nikos Karacapilidis
  • for: 这篇论文主要是为了提出一系列的新型文本摘要模型,用于希腊新闻文章的自动摘要。
  • methods: 该论文使用了深度学习的Transformer模型,并对希腊语新闻文章进行了大量的训练和测试,以评估模型的性能。
  • results: 论文的实验结果显示,提出的新型模型在多种评价指标上都有显著的超越希腊BART模型的表现, indicating that the proposed models have better performance in summarizing Greek news articles.
    Abstract Text summarization (TS) is a natural language processing (NLP) subtask pertaining to the automatic formulation of a concise and coherent summary that covers the major concepts and topics from one or multiple documents. Recent advancements in deep learning have led to the development of abstractive summarization transformer-based models, which outperform classical approaches. In any case, research in this field focuses on high resource languages such as English, while the corresponding work for low resource languages is still underdeveloped. Taking the above into account, this paper proposes a series of novel TS models for Greek news articles. The proposed models were thoroughly evaluated on the same dataset against GreekBART, which is the state-of-the-art model in Greek abstractive news summarization. Our evaluation results reveal that most of the proposed models significantly outperform GreekBART on various evaluation metrics. We make our evaluation code public, aiming to increase the reproducibility of this work and facilitate future research in the field.
    摘要 文本摘要(TS)是自然语言处理(NLP)下一个子任务,它旨在自动生成简洁 coherent 的摘要,涵盖一或多个文档中的主要概念和话题。在深度学习的推动下,有一些抽象摘要转换器模型在英语等高资源语言的研究中取得了突出的成果,而对低资源语言的研究仍然处于不足的状态。本文提出了一系列新的TS模型,用于希腊新闻文章的摘要。这些模型经过了严格的评估,并与希腊BART模型进行了比较。我们的评估结果显示,大多数我们提出的模型在不同的评价指标上都有显著的提高,并且超过了希腊BART模型。我们将我们的评估代码公开,以增加这项工作的重复性和未来研究的便利。

Vision-Language Integration in Multimodal Video Transformers (Partially) Aligns with the Brain

  • paper_url: http://arxiv.org/abs/2311.07766
  • repo_url: None
  • paper_authors: Dota Tianai Dong, Mariya Toneva
  • for: 这个论文旨在探讨多模态信息的集成是人工智能系统理解现实世界的重要前提。
  • methods: 该论文使用视频转换器模型,同时学习视觉、文本和声音。
  • results: 研究发现,通过利用 neuroscientific 证据,可以对预训练多模态视频转换器模型进行评估。 Results show that vision can enhance language processing performance, but the joint representation of the model does not capture brain-relevant information beyond that captured by individual modalities. Fine-tuning the model using a vision-language inference task can improve brain alignment.
    Abstract Integrating information from multiple modalities is arguably one of the essential prerequisites for grounding artificial intelligence systems with an understanding of the real world. Recent advances in video transformers that jointly learn from vision, text, and sound over time have made some progress toward this goal, but the degree to which these models integrate information from modalities still remains unclear. In this work, we present a promising approach for probing a pre-trained multimodal video transformer model by leveraging neuroscientific evidence of multimodal information processing in the brain. Using brain recordings of participants watching a popular TV show, we analyze the effects of multi-modal connections and interactions in a pre-trained multi-modal video transformer on the alignment with uni- and multi-modal brain regions. We find evidence that vision enhances masked prediction performance during language processing, providing support that cross-modal representations in models can benefit individual modalities. However, we don't find evidence of brain-relevant information captured by the joint multi-modal transformer representations beyond that captured by all of the individual modalities. We finally show that the brain alignment of the pre-trained joint representation can be improved by fine-tuning using a task that requires vision-language inferences. Overall, our results paint an optimistic picture of the ability of multi-modal transformers to integrate vision and language in partially brain-relevant ways but also show that improving the brain alignment of these models may require new approaches.
    摘要 We find that vision enhances masked prediction performance during language processing, providing support that cross-modal representations in models can benefit individual modalities. However, we do not find evidence of brain-relevant information captured by the joint multi-modal transformer representations beyond that captured by all of the individual modalities.We also show that the brain alignment of the pre-trained joint representation can be improved by fine-tuning using a task that requires vision-language inferences. Our results suggest that multi-modal transformers can integrate vision and language in partially brain-relevant ways, but improving the brain alignment of these models may require new approaches.

The Disagreement Problem in Faithfulness Metrics

  • paper_url: http://arxiv.org/abs/2311.07763
  • repo_url: None
  • paper_authors: Brian Barr, Noah Fatsi, Leif Hancox-Li, Peter Richter, Daniel Proano, Caleb Mok
  • for: 这个论文的目的是对黑盒机器学习模型的解释进行评估。
  • methods: 这篇论文使用了多种方法来评估解释的准确性。
  • results: 研究发现现有的 metric 不符合, leaving users 无法选择最准确的解释。I hope this helps! Let me know if you have any other questions.
    Abstract The field of explainable artificial intelligence (XAI) aims to explain how black-box machine learning models work. Much of the work centers around the holy grail of providing post-hoc feature attributions to any model architecture. While the pace of innovation around novel methods has slowed down, the question remains of how to choose a method, and how to make it fit for purpose. Recently, efforts around benchmarking XAI methods have suggested metrics for that purpose -- but there are many choices. That bounty of choice still leaves an end user unclear on how to proceed. This paper focuses on comparing metrics with the aim of measuring faithfulness of local explanations on tabular classification problems -- and shows that the current metrics don't agree; leaving users unsure how to choose the most faithful explanations.
    摘要 XAI(解释人工智能)领域目标是解释黑盒机器学习模型的工作原理。大多数工作集中在寻求“后期特征归因”,即任何模型架构都能提供解释。虽然创新的速度有所减速,但问题是如何选择方法,以及如何使其适用。近期的XAI方法测试建议了多种指标,但选择它们仍然是一个问题。这篇论文将 comparing metrics,以衡量本地解释的准确性在表格分类问题上,并显示当前指标之间没有一致,使用者无法选择最准确的解释。

Amodal Optical Flow

  • paper_url: http://arxiv.org/abs/2311.07761
  • repo_url: None
  • paper_authors: Maximilian Luz, Rohit Mohan, Ahmed Rida Sekkat, Oliver Sawade, Elmar Matthes, Thomas Brox, Abhinav Valada
  • for: 这个论文主要研究了在透明或填塞物体场景下的光流估计问题,并提出了模块化光流(Amodal Optical Flow)来解决这些问题。
  • methods: 作者提出了一种新的任务——模块化光流估计,并为这个任务提供了扩展的AmodalSynthDrive数据集,以便进行研究。他们还提出了一种新的Initialization方法——AmodalFlowNet,该方法使用了变换器来实现层次结构的特征传播和模块化Semantic Grounding。
  • results: 作者在大量实验中证明了模块化光流的可行性,并示出了其在下游任务中的用用,如精准跟踪。他们还提供了一种新的评价指标——Amodal Flow Quality,以便量化估计的性能。
    Abstract Optical flow estimation is very challenging in situations with transparent or occluded objects. In this work, we address these challenges at the task level by introducing Amodal Optical Flow, which integrates optical flow with amodal perception. Instead of only representing the visible regions, we define amodal optical flow as a multi-layered pixel-level motion field that encompasses both visible and occluded regions of the scene. To facilitate research on this new task, we extend the AmodalSynthDrive dataset to include pixel-level labels for amodal optical flow estimation. We present several strong baselines, along with the Amodal Flow Quality metric to quantify the performance in an interpretable manner. Furthermore, we propose the novel AmodalFlowNet as an initial step toward addressing this task. AmodalFlowNet consists of a transformer-based cost-volume encoder paired with a recurrent transformer decoder which facilitates recurrent hierarchical feature propagation and amodal semantic grounding. We demonstrate the tractability of amodal optical flow in extensive experiments and show its utility for downstream tasks such as panoptic tracking. We make the dataset, code, and trained models publicly available at http://amodal-flow.cs.uni-freiburg.de.
    摘要 optical flow 估计在透明或屏蔽物体的情况下非常具有挑战性。在这项工作中,我们在任务层面上解决这些挑战,通过插入amodal optical flow,即混合可见和透明的像流。而不是只表示可见区域,我们定义amodal optical flow为多层级像素级动力场,涵盖了场景中可见和透明区域的全部。为便于研究这个新任务,我们将AmodalSynthDrive数据集扩展到包括像素级标签 дляamodal optical flow估计。我们提出了多种强大的基线,以及Amodal Flow Quality指标来衡量性能的可读性。此外,我们还提出了 novel AmodalFlowNet,它包括一个基于变换器的像素级核心编码器和一个循环变换器解码器,这些核心编码器和解码器帮助实现循环层次特征传播和amodal语义固定。我们在广泛的实验中证明了amodal optical flow的可行性,并展示了其对下游任务 such as panoptic tracking 的用于。我们将数据集、代码和训练模型公开发布在http://amodal-flow.cs.uni-freiburg.de。

Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic Systems

  • paper_url: http://arxiv.org/abs/2311.07759
  • repo_url: None
  • paper_authors: Alessandro Oltramari
  • for: 该论文旨在帮助AI系统具备高级别理解能力,以便在不同的应用场景中展现出更加稳定和强大的表现。
  • methods: 该论文提出了一种将认知架构与外部神经符号学Component integrate的方法,以便帮助AI系统具备更高级别的理解能力。
  • results: 该论文提出的方法可以帮助AI系统在不同的应用场景中展现出更加稳定和强大的表现,并且可以解决一些现有的AI系统无法解决的问题,如自动驾驶汽车在未经训练的情况下的表现下降。
    Abstract High-level reasoning can be defined as the capability to generalize over knowledge acquired via experience, and to exhibit robust behavior in novel situations. Such form of reasoning is a basic skill in humans, who seamlessly use it in a broad spectrum of tasks, from language communication to decision making in complex situations. When it manifests itself in understanding and manipulating the everyday world of objects and their interactions, we talk about common sense or commonsense reasoning. State-of-the-art AI systems don't possess such capability: for instance, Large Language Models have recently become popular by demonstrating remarkable fluency in conversing with humans, but they still make trivial mistakes when probed for commonsense competence; on a different level, performance degradation outside training data prevents self-driving vehicles to safely adapt to unseen scenarios, a serious and unsolved problem that limits the adoption of such technology. In this paper we propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components. We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
    摘要 高级逻辑可以定义为通过经验获得的知识总结和在新情况下展现稳定行为的能力。这种能力是人类的基本技能,在各种任务中都能够无顾余力地使用,从语言交流到复杂情况下的决策。当它在日常物品和 их交互中表现出来时,我们就称之为常识或通用逻辑。现代AI系统没有这种能力,例如大语言模型在最近几年内吸引了广泛关注,但它们在检测常识能力时仍然会出现轻微的错误。另一方面,自驾车器在未经训练的情况下的性能下降,是一个严重而尚未解决的问题,这限制了自驾车技术的应用。在这篇论文中,我们提议通过结合认知架构和外部神经符号组件来启用高级逻辑在AI系统中。我们介绍了一种混合框架, centered on ACT-R,并讨论了在最近和未来应用中的生成模型的作用。

SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification

  • paper_url: http://arxiv.org/abs/2311.07750
  • repo_url: None
  • paper_authors: S. M. Nabil Ashraf, Md. Adyelullahil Mamun, Hasnat Md. Abdullah, Md. Golam Rabiul Alam
  • for: 这个研究旨在使用深度学习技术来自动诊断胸部X射像中的肺病变化,以提高早期检测和有效治疗的可能性。
  • methods: 研究使用了多种预训练的对应式神经网络(CNN)、转换器(Transformer)和混合模型(CNN+Transformer),以及传统模型。最佳个体模型为CoAtNet,其在接收操作特征曲线图(AUROC)中获得84.2%的表现。
  • results: 通过将所有训练模型的预测结果使用一个加权平均ensemble,使用了进化的算法决定每个模型的重量,可以进一步提高AUROC至85.4%,超越了现有的州际前方法。研究显示了深度学习技术,特别是集成深度学习,对于自动诊断胸部X射像中的肺病变化有很高的准确性。
    Abstract Chest X-rays are widely used to diagnose thoracic diseases, but the lack of detailed information about these abnormalities makes it challenging to develop accurate automated diagnosis systems, which is crucial for early detection and effective treatment. To address this challenge, we employed deep learning techniques to identify patterns in chest X-rays that correspond to different diseases. We conducted experiments on the "ChestX-ray14" dataset using various pre-trained CNNs, transformers, hybrid(CNN+Transformer) models and classical models. The best individual model was the CoAtNet, which achieved an area under the receiver operating characteristic curve (AUROC) of 84.2%. By combining the predictions of all trained models using a weighted average ensemble where the weight of each model was determined using differential evolution, we further improved the AUROC to 85.4%, outperforming other state-of-the-art methods in this field. Our findings demonstrate the potential of deep learning techniques, particularly ensemble deep learning, for improving the accuracy of automatic diagnosis of thoracic diseases from chest X-rays.
    摘要 胸部X光图是广泛用于诊断胸部疾病的工具,但由于疾病异常的细节信息缺乏,因此建立准确的自动诊断系统是非常重要的,以便早期发现和有效治疗。为解决这个挑战,我们利用深度学习技术来识别胸部X光图中的不同疾病特征。我们在“ChestX-ray14”数据集上进行了多种预训练 convolutional neural network(CNN)、transformer和混合(CNN+Transformer)模型的实验。最佳的个体模型是CoAtNet,它在受者操作特征曲线(AUROC)上达到了84.2%。通过将所有训练模型的预测结果结合使用一个权重平均 ensemble,我们进一步提高了AUROC到85.4%,超过了当前领域其他状态的方法。我们的发现表明深度学习技术,特别是ensemble深度学习,对诊断胸部疾病从胸部X光图自动诊断的精度有着潜在的潜力。

Simplifying Complex Observation Models in Continuous POMDP Planning with Probabilistic Guarantees and Practice

  • paper_url: http://arxiv.org/abs/2311.07745
  • repo_url: None
  • paper_authors: Idan Lev-Yehudi, Moran Barenboim, Vadim Indelman
  • for: 这个论文是为了解决部分可观测 Markov决策过程(POMDP)中的高维度连续观测问题,如摄像头图像,而需要大量的计算力和存储空间。
  • methods: 这篇论文使用机器学习的概率模型来表示观测模型,但这些模型在线上部署时需要过多的计算资源。论文提出了使用简化的观测模型进行规划,并保持了对解决方案质量的正式保证。
  • results: 论文的主要贡献是一种新的概率 bound,基于统计总体变化距离简化模型。这个 bound 可以确保 POMDP 值与原始模型之间的相似性,并且可以在不需要访问昂贵的模型的情况下实现。论文还提出了在线和离线部分的计算,以及不需要访问模型的情况下实现正式保证的新结果。最后,论文通过实验示例了如何将 bound 集成到现有的连续在线 POMDP 解决器中。
    Abstract Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.
    摘要 解决具有高维度和连续观测的部分可观测Markov决策过程(POMDP)是许多实际 роботех和规划问题的必需。现有研究提出了机器学习概率模型作为观测模型,但其计算成本过高,不适合在线部署。我们考虑使用简化的观测模型进行规划,保留正式的质量保证。我们的主要贡献是一种新的 probabilistic bound,基于统计总体变化距离简化模型。我们证明这个 bound 约束 POMDP 值与原始模型之间的关系,通过泛化 particle-belief MDP 集中bounds。我们的计算可以分为线上和线下两部分,并不需要在规划过程中访问昂贵的模型,这也是一个新的结果。最后,我们在 simulated 环境中示例了将 bound 集成到现有的连续在线 POMDP 解决器中。

Generalization Analogies (GENIES): A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

  • paper_url: http://arxiv.org/abs/2311.07723
  • repo_url: https://github.com/joshuaclymer/genies
  • paper_authors: Joshua Clymer, Garrett Baker, Rohan Subramani, Sam Wang
  • for: 本研究旨在控制LLMs的推荐模型在不可靠的情况下的泛化。
  • methods: 作者通过创造8种分类的69个分布转移来研究推荐模型的泛化。
  • results: 研究发现,推荐模型默认情况下不会评估” instruciton-following”,而是倾向于仿佛网络文本的人物。 standard fine-tuning方法常常无法分辨 instruciton-following 和杂合行为。
    Abstract As AI systems become more intelligent and their behavior becomes more challenging to assess, they may learn to game the flaws of human feedback instead of genuinely striving to follow instructions; however, this risk can be mitigated by controlling how LLMs generalize human feedback to situations where it is unreliable. To better understand how reward models generalize, we craft 69 distribution shifts spanning 8 categories. We find that reward models do not learn to evaluate `instruction-following' by default and instead favor personas that resemble internet text. Techniques for interpreting reward models' internal representations achieve better generalization than standard fine-tuning, but still frequently fail to distinguish instruction-following from conflated behaviors. We consolidate the 15 most challenging distribution shifts into the GENaralization analogIES (GENIES) benchmark, which we hope will enable progress toward controlling reward model generalization.
    摘要 (Simplified Chinese translation)随着AI系统的智能化和其行为的挑战性增加,它们可能会学习游戏人类反馈的漏洞而不是真正努力遵从 instruxions;然而,这种风险可以通过控制LLMs对人类反馈的泛化来减少。为了更好地理解奖励模型的泛化,我们创造了69个分布转换,涵盖8个类别。我们发现奖励模型不会默认地评估` instruxion-following',而是偏爱网络文本类型的人物。使用解释奖励模型内部表示的技术可以更好地泛化than标准精度调整,但并不frequently fails to distinguish instruction-following from conflated behaviors。我们将15个最复杂的分布转换集成为GENeralization analogIES(GENIES)标准,希望这将促进奖励模型泛化控制的进步。

PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature

  • paper_url: http://arxiv.org/abs/2311.07715
  • repo_url: https://github.com/jerry3027/polyie
  • paper_authors: Jerry Junyang Cheung, Yuchen Zhuang, Yinghao Li, Pranav Shetty, Wantian Zhao, Sanjeev Grampurohit, Rampi Ramprasad, Chao Zhang
  • for: 本研究旨在提供一个基于科学文献的 polymer 材料自动提取信息(SciIE) dataset,以便进一步推动这一领域的研究。
  • methods: 该dataset 基于 146 篇全文 polymer 学术论文,并由域专家 manually annotate 不同类型的命名实体(如材料、性能、值、条件)以及它们之间的 N-ary 关系。
  • results: 研究人员使用现状的名实体抽取和关系抽取模型对 POLYIE 进行评估,并分析这些模型在不同领域的优劣。
    Abstract Scientific information extraction (SciIE), which aims to automatically extract information from scientific literature, is becoming more important than ever. However, there are no existing SciIE datasets for polymer materials, which is an important class of materials used ubiquitously in our daily lives. To bridge this gap, we introduce POLYIE, a new SciIE dataset for polymer materials. POLYIE is curated from 146 full-length polymer scholarly articles, which are annotated with different named entities (i.e., materials, properties, values, conditions) as well as their N-ary relations by domain experts. POLYIE presents several unique challenges due to diverse lexical formats of entities, ambiguity between entities, and variable-length relations. We evaluate state-of-the-art named entity extraction and relation extraction models on POLYIE, analyze their strengths and weaknesses, and highlight some difficult cases for these models. To the best of our knowledge, POLYIE is the first SciIE benchmark for polymer materials, and we hope it will lead to more research efforts from the community on this challenging task. Our code and data are available on: https://github.com/jerry3027/PolyIE.
    摘要

Histopathologic Cancer Detection

  • paper_url: http://arxiv.org/abs/2311.07711
  • repo_url: https://github.com/lbasyal/Histopathologic-Cancer-Detection-
  • paper_authors: Varan Singh Rohila, Neeraj Lalwani, Lochan Basyal
  • for: 预测肿瘤细胞的抑肿效果,以提高病人健康安全。
  • methods: 使用多层感知网络和卷积神经网络模型,对HE染色压榨组织图像进行分类预测。
  • results: 基本型卷积神经网络模型比基本型多层感知网络模型表现更好,而ResNet50模型还能超越当前最佳模型。此外,研究还提出了将转移学习和分割技术应用于特定特征的理解。
    Abstract Early diagnosis of the cancer cells is necessary for making an effective treatment plan and for the health and safety of a patient. Nowadays, doctors usually use a histological grade that pathologists determine by performing a semi-quantitative analysis of the histopathological and cytological features of hematoxylin-eosin (HE) stained histopathological images. This research contributes a potential classification model for cancer prognosis to efficiently utilize the valuable information underlying the HE-stained histopathological images. This work uses the PatchCamelyon benchmark datasets and trains them in a multi-layer perceptron and convolution model to observe the model's performance in terms of precision, Recall, F1 Score, Accuracy, and AUC Score. The evaluation result shows that the baseline convolution model outperforms the baseline MLP model. Also, this paper introduced ResNet50 and InceptionNet models with data augmentation, where ResNet50 is able to beat the state-of-the-art model. Furthermore, the majority vote and concatenation ensemble were evaluated and provided the future direction of using transfer learning and segmentation to understand the specific features.
    摘要 早期诊断癌细胞是必要的 для制定有效的治疗计划和患者的健康安全。现在医生通常使用 histological grade,由病理学家根据 Hematoxylin-eosin(HE)染色的 histopathological 和细胞学特征进行半量化分析。这项研究提供了一种潜在的癌诊断分类模型,以有效利用HE染色 histopathological 图像下的有价值信息。本研究使用 PatchCamelyon benchmark 数据集,并使用多层感知网络和卷积模型训练,以评估模型在精度、回卷、F1 分数、准确率和 AUC 分数上的表现。结果显示,基eline 卷积模型在精度、回卷和 F1 分数上超过了基eline MLP 模型。此外,本文还介绍了 ResNet50 和 InceptionNet 模型,并使用数据扩充来评估其性能。最后,文章还评估了 majority vote 和 concatenation ensemble,并提供了将来使用传输学习和分割来理解特定特征的未来方向。

Reinforcement Learning for Solving Stochastic Vehicle Routing Problem

  • paper_url: http://arxiv.org/abs/2311.07708
  • repo_url: None
  • paper_authors: Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac
  • for: 解决RL和ML技术在Stochastic Vehicle Routing Problem(SVRP)中的不约性问题,提出一种新的综合性框架。
  • methods: 提出一种简单 yet effective的RL代理人,采用了特制的训练方法,能够全面地处理SVRP中的随机性源。
  • results: 通过比较分析,提出的模型在多个SVRP设定下表现出优于一种广泛应用的现有metaheuristic,实现了3.43%的交通成本减少。此外,模型在不同的SVRP环境下展现了鲁棒性和学习优化路径策略的能力。
    Abstract This study addresses a gap in the utilization of Reinforcement Learning (RL) and Machine Learning (ML) techniques in solving the Stochastic Vehicle Routing Problem (SVRP) that involves the challenging task of optimizing vehicle routes under uncertain conditions. We propose a novel end-to-end framework that comprehensively addresses the key sources of stochasticity in SVRP and utilizes an RL agent with a simple yet effective architecture and a tailored training method. Through comparative analysis, our proposed model demonstrates superior performance compared to a widely adopted state-of-the-art metaheuristic, achieving a significant 3.43% reduction in travel costs. Furthermore, the model exhibits robustness across diverse SVRP settings, highlighting its adaptability and ability to learn optimal routing strategies in varying environments. The publicly available implementation of our framework serves as a valuable resource for future research endeavors aimed at advancing RL-based solutions for SVRP.
    摘要 Here's the Simplified Chinese translation:这项研究旨在填补RL和ML技术在解决不确定性加大的交通车辆路径问题(SVRP)中的 Utilization gap。我们提出了一种全新的综合解决方案,涵盖SVRP中关键的不确定性来源,并使用一种简单 yet有效的RL Agent,以及适应training方法。通过比较分析,我们的提议模型在SVRP中表现出优于一种广泛应用的状态艺术,实现了3.43%的旅行成本减少。此外,模型在不同的SVRP设定下展现出了稳定性和适应性,这表明其可以在不同环境中学习优化的路径策略。我们公开提供的实现方案作为未来RL基于SVRP的研究进程中的有价值资源。

Robust and Scalable Hyperdimensional Computing With Brain-Like Neural Adaptations

  • paper_url: http://arxiv.org/abs/2311.07705
  • repo_url: None
  • paper_authors: Junyao Wang, Mohammad Abdullah Al Faruque
  • for: 这个研究旨在提高 Edge-based 机器学习(ML)方法的效率,使其能够在物联网(IoT)系统中进行实时分析。
  • methods: 这个研究使用了 brain-inspired 高dimensional computing(HDC)技术,并提出了一些动态 HDC 学习框架,以便获得更好的效率和精度。
  • results: 这个研究发现,使用动态 HDC 学习框架可以实现更好的精度和效率,并且可以在 Edge-based 系统中进行实时分析。
    Abstract The Internet of Things (IoT) has facilitated many applications utilizing edge-based machine learning (ML) methods to analyze locally collected data. Unfortunately, popular ML algorithms often require intensive computations beyond the capabilities of today's IoT devices. Brain-inspired hyperdimensional computing (HDC) has been introduced to address this issue. However, existing HDCs use static encoders, requiring extremely high dimensionality and hundreds of training iterations to achieve reasonable accuracy. This results in a huge efficiency loss, severely impeding the application of HDCs in IoT systems. We observed that a main cause is that the encoding module of existing HDCs lacks the capability to utilize and adapt to information learned during training. In contrast, neurons in human brains dynamically regenerate all the time and provide more useful functionalities when learning new information. While the goal of HDC is to exploit the high-dimensionality of randomly generated base hypervectors to represent the information as a pattern of neural activity, it remains challenging for existing HDCs to support a similar behavior as brain neural regeneration. In this work, we present dynamic HDC learning frameworks that identify and regenerate undesired dimensions to provide adequate accuracy with significantly lowered dimensionalities, thereby accelerating both the training and inference.
    摘要 互联网智能化(IoT)已经推动了许多应用程序利用边缘基于机器学习(ML)技术来分析本地收集的数据。然而,受欢迎的ML算法经常需要昂费的计算力 beyond 今天的IoT设备。基于大脑启发的超dimensional computing(HDC)已经被引入来解决这个问题。然而,现有的HDC使用静止的编码器,需要极高的维度和百上百的训练迭代来 achieve 可接受的精度。这会导致严重的效率损失,严重阻碍HDC的应用在IoT系统中。我们发现了一个主要的问题是现有的HDC编码器无法利用和适应训练中所学习的信息。相比之下,人脑中的神经元在学习新信息时会 dynamically regenerate ,提供更有用的功能。HDC的目标是利用高维度的随机生成的基本超vector 来表示信息作为神经活动的模式,但是现有的HDC无法支持类似于大脑神经重生的行为。在这个工作中,我们提出了动态HDC学习框架,可以识别和重生无愿的维度,以提供足够的精度,并且可以快速训练和推断。这将可以大幅提高IoT系统中HDC的效率和可扩展性。

AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising

  • paper_url: http://arxiv.org/abs/2311.07700
  • repo_url: None
  • paper_authors: Zhen Guo, Shangdi Yu
  • for: 本研究旨在检测大语言模型(LLM)生成的文本是否为人类写作。
  • methods: 本研究提出了一种效果很好的分类方法,即AuthentiGPT,它利用黑盒模型对输入文本进行干扰处理,然后进行semantic比较来判断文本是否为人类写作。
  • results: 研究发现,AuthentiGPT在特定领域的数据集上达到了0.918的AUROC分数,比其他商业算法高得多,表明它可以有效地检测LLM生成的文本是否为人类写作。
    Abstract Large language models (LLMs) have opened up enormous opportunities while simultaneously posing ethical dilemmas. One of the major concerns is their ability to create text that closely mimics human writing, which can lead to potential misuse, such as academic misconduct, disinformation, and fraud. To address this problem, we present AuthentiGPT, an efficient classifier that distinguishes between machine-generated and human-written texts. Under the assumption that human-written text resides outside the distribution of machine-generated text, AuthentiGPT leverages a black-box LLM to denoise input text with artificially added noise, and then semantically compares the denoised text with the original to determine if the content is machine-generated. With only one trainable parameter, AuthentiGPT eliminates the need for a large training dataset, watermarking the LLM's output, or computing the log-likelihood. Importantly, the detection capability of AuthentiGPT can be easily adapted to any generative language model. With a 0.918 AUROC score on a domain-specific dataset, AuthentiGPT demonstrates its effectiveness over other commercial algorithms, highlighting its potential for detecting machine-generated text in academic settings.
    摘要

On The Truthfulness of ‘Surprisingly Likely’ Responses of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07692
  • repo_url: None
  • paper_authors: Naman Goel
  • for: The paper is written to investigate the relevance of the surprisingly likely criterion for responses of large language models (LLMs).
  • methods: The paper uses a game-theoretic multi-agent setting to reward rational agents for maximizing the expected information gain with their answers, based on their probabilistic beliefs.
  • results: The paper shows that the method improves the accuracy of LLMs’ responses significantly, with up to 24 percentage points aggregate improvement on the TruthfulQA benchmark and up to 70 percentage points improvement on individual categories of questions.Here are the three key information points in Simplified Chinese text:
  • for: 这篇论文是为了研究大语言模型(LLMs)的回答中的可预料性准则的有效性。
  • methods: 这篇论文使用了游戏理论多代人设定,通过奖励合理代理人为 maximize 其回答中的预期信息增加来提高 LLMS 的回答准确性。
  • results: 这篇论文显示,该方法可以significantly提高 LLMS 的回答准确性,最多可以提高 TruthfulQA benchmark 的总成绩24%,并在具体的问题类型上达到70%的提高。
    Abstract The surprisingly likely criterion in the seminal work of Prelec (the Bayesian Truth Serum) guarantees truthfulness in a game-theoretic multi-agent setting, by rewarding rational agents to maximise the expected information gain with their answers w.r.t. their probabilistic beliefs. We investigate the relevance of a similar criterion for responses of LLMs. We hypothesize that if the surprisingly likely criterion works in LLMs, under certain conditions, the responses that maximize the reward under this criterion should be more accurate than the responses that only maximize the posterior probability. Using benchmarks including the TruthfulQA benchmark and using openly available LLMs: GPT-2 and LLaMA-2, we show that the method indeed improves the accuracy significantly (for example, upto 24 percentage points aggregate improvement on TruthfulQA and upto 70 percentage points improvement on individual categories of questions).
    摘要 “Prelec的著名作品( bayesian truth serum)中的预期增加价值标准可以 garantuee 多智能体场景中的真实性,通过对回答的 rational agents 的偏好分布进行奖励。我们研究 LLMS 的回答是否可以通过类似的标准进行改善。我们预设,如果这个标准适用于 LLMS,在某些情况下,对回答的奖励最大化的方法可以提高精度。我们使用 truthfulQA benchmark 和公开可用的 GPT-2 和 LLaMA-2 LLMs,证明了这种方法可以提高精度,例如,在 TruthfulQA 中总共提高了 24% 的精度,并在单一问题类别中提高了 70% 的精度。”

Language Model-In-The-Loop: Data Optimal Approach to Learn-To-Recommend Actions in Text Games

  • paper_url: http://arxiv.org/abs/2311.07687
  • repo_url: None
  • paper_authors: Arjun Vaithilingam Sudhakar, Prasanna Parthasarathi, Janarthanan Rajendran, Sarath Chandar
  • for: 提高文本游戏中的表现
  • methods: 使用更新LLM来更好地推荐动作,从而减少人工标注游戏记录的依赖
  • results: 通过在游戏中更新LLM,可以减少人工标注游戏记录的依赖,但是在不同游戏之间的传输性不很好Here’s the simplified Chinese text:
  • for: 提高文本游戏中的表现
  • methods: 使用更新LLM来更好地推荐动作,从而减少人工标注游戏记录的依赖
  • results: 通过在游戏中更新LLM,可以减少人工标注游戏记录的依赖,但是在不同游戏之间的传输性不很好
    Abstract Large Language Models (LLMs) have demonstrated superior performance in language understanding benchmarks. CALM, a popular approach, leverages linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to improve the performance in text games in Jericho without environment-provided actions. However, CALM adapts GPT-2 with annotated human gameplays and keeps the LLM fixed during the learning of the text based games. In this work, we explore and evaluate updating LLM used for candidate recommendation during the learning of the text based game as well to mitigate the reliance on the human annotated gameplays, which are costly to acquire. We observe that by updating the LLM during learning using carefully selected in-game transitions, we can reduce the dependency on using human annotated game plays for fine-tuning the LLMs. We conducted further analysis to study the transferability of the updated LLMs and observed that transferring in-game trained models to other games did not result in a consistent transfer.
    摘要

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

  • paper_url: http://arxiv.org/abs/2311.07682
  • repo_url: https://github.com/keremzaman/fusetoforget
  • paper_authors: Kerem Zaman, Leshem Choshen, Shashank Srivastava
  • for: 本研究旨在探讨模型融合是否会干扰和减少不需要的知识。
  • methods: 本研究使用了文本分类和生成任务,对多个模型的权重进行融合,并分析了模型融合对学习快照、社会偏见和记忆能力的影响。
  • results: 研究发现,在模型融合中,共享知识通常会增强,而不共享知识通常会消失或被忘记。这种现象可能使模型融合成为一种减少语言模型的隐私问题的工具。
    Abstract Model fusion research aims to aggregate the knowledge of multiple models to enhance performance by combining their weights. In this work, we study the inverse, investigating whether and how can model fusion interfere and reduce unwanted knowledge. We delve into the effects of model fusion on the evolution of learned shortcuts, social biases, and memorization capabilities in fine-tuned language models. Through several experiments covering text classification and generation tasks, our analysis highlights that shared knowledge among models is usually enhanced during model fusion, while unshared knowledge is usually lost or forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.
    摘要

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07575
  • repo_url: https://github.com/alpha-vllm/llama2-accessory
  • paper_authors: Ziyi Lin, Chris Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Chen Lin, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Hongsheng Li, Yu Qiao
  • for: 这个论文主要目标是提出一种多模式大语言模型(MLLM),以实现多Modal的语言理解和图像理解。
  • methods: 该论文使用了一种权重混合策略,将两个不同领域的模型权重混合在一起,以提高图像理解和语言理解的能力。此外,论文还提出了一种多任务混合策略,将多种视觉任务进行联合调整,以提高模型的多模式能力。
  • results: 根据论文的描述,SPHINX模型在多种应用场景中表现出色,包括图像理解、视觉问答、区域理解、图像描述、人体pose估计等。此外,论文还提出了一种高分辨率图像分解策略,可以更好地捕捉高分辨率图像中的细节。
    Abstract We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint mixing of model weights, tuning tasks, and visual embeddings. First, for stronger vision-language alignment, we unfreeze the large language model (LLM) during pre-training, and introduce a weight mix strategy between LLMs trained by real-world and synthetic data. By directly integrating the weights from two domains, the mixed LLM can efficiently incorporate diverse semantics with favorable robustness. Then, to enable multi-purpose capabilities, we mix a variety of tasks for joint visual instruction tuning, and design task-specific instructions to avoid inter-task conflict. In addition to the basic visual question answering, we include more challenging tasks such as region-level understanding, caption grounding, document layout detection, and human pose estimation, contributing to mutual enhancement over different scenarios. Additionally, we propose to extract comprehensive visual embeddings from various network architectures, pre-training paradigms, and information granularity, providing language models with more robust image representations. Based on our proposed joint mixing, SPHINX exhibits superior multi-modal understanding capabilities on a wide range of applications. On top of this, we further propose an efficient strategy aiming to better capture fine-grained appearances of high-resolution images. With a mixing of different scales and high-resolution sub-images, SPHINX attains exceptional visual parsing and reasoning performance on existing evaluation benchmarks. We hope our work may cast a light on the exploration of joint mixing in future MLLM research. Code is released at https://github.com/Alpha-VLLM/LLaMA2-Accessory.
    摘要 我们介绍SPHINX,一种多模态大型自然语言模型(MLLM),具有混合模型权重、调整任务和视觉嵌入的共同混合。首先,为强化视觉语言对应,我们在预训练时解冻大语言模型(LLM),并通过实际数据和 sintetic 数据两种预训练模型的权重混合来实现模型权重的共同混合。这种混合可以快速并高效地将多个 semantic 集成到模型中,并且具有良好的鲁棒性。然后,为实现多用途能力,我们混合了多种任务,并设计了任务特定的 instrucions,以避免任务之间的冲突。此外,我们还提出了EXTRACT comprehensive visual embeddings,从不同的网络架构、预训练方法和信息粒度中提取图像表示,为语言模型提供更加鲁棒的图像表示。基于我们的共同混合方法,SPHINX在多种应用场景中展示出了优秀的多模态理解能力。此外,我们还提出了一种高效的策略,用于更好地捕捉高分辨率图像的细节 appearances。通过混合不同的尺度和高分辨率子图,SPHINX在现有评估标准上实现了出色的视解析和推理性能。我们希望我们的工作可以激发未来 MLLM 研究中的 JOINT 混合方法的探索。代码可以在 上下载。

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

  • paper_url: http://arxiv.org/abs/2311.07562
  • repo_url: https://github.com/zzxslp/mm-navigator
  • paper_authors: An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang
  • for: 这篇论文的目的是提出一种基于GPT-4V的智能手机Graphical User Interface(GUI)导航代理人MM-Navigator,以便它可以与人类用户一样交互,并根据给出的指令决定后续的行动。
  • methods: 这篇论文使用的方法是使用大量多模态模型(LMMs),具体来说是GPT-4V,以透过其高级屏幕解释、行动理解和精准行动地理位能力来完成零时 GUI 导航任务。
  • results: 根据人类评估,MM-Navigator在我们收集的iOS屏幕数据集上表现出了91%的准确率,在生成合理的动作描述和执行正确的动作方面。此外,我们还评估了模型在一个Android屏幕导航数据集上的性能,其在零时情况下超越了前一代的 GUI 导航器。
    Abstract We present MM-Navigator, a GPT-4V-based agent for the smartphone graphical user interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as human users, and determine subsequent actions to fulfill given instructions. Our findings demonstrate that large multimodal models (LMMs), specifically GPT-4V, excel in zero-shot GUI navigation through its advanced screen interpretation, action reasoning, and precise action localization capabilities. We first benchmark MM-Navigator on our collected iOS screen dataset. According to human assessments, the system exhibited a 91\% accuracy rate in generating reasonable action descriptions and a 75\% accuracy rate in executing the correct actions for single-step instructions on iOS. Additionally, we evaluate the model on a subset of an Android screen navigation dataset, where the model outperforms previous GUI navigators in a zero-shot fashion. Our benchmark and detailed analyses aim to lay a robust groundwork for future research into the GUI navigation task. The project page is at https://github.com/zzxslp/MM-Navigator.
    摘要 我们介绍MM-Navigator,基于GPT-4V的智能手机 Graphical User Interface(GUI)导航代理。MM-Navigator可以与智能手机屏幕交互,并根据给定的 instrucions 确定后续的行动。我们的研究发现,大型多模态模型(LMM),具体来说是GPT-4V,在零容器 GUI 导航方面表现出色,拥有先进的屏幕解释、动作理解和精确动作local化能力。我们首先在我们收集的 iOS 屏幕数据集上对MM-Navigator进行了测试。根据人类评估,系统在生成合理的动作描述上达到了91%的准确率,并在单步 instrucions 上执行正确的动作达到了75%的准确率。此外,我们还对一部分 Android 屏幕导航数据集进行了测试,并证明了前一代 GUI 导航器在零容器情况下的超越。我们的 benchmark 和详细分析旨在为未来关于 GUI 导航任务的研究提供坚实的基础。项目页面可以在 https://github.com/zzxslp/MM-Navigator 上找到。

An Extensive Study on Adversarial Attack against Pre-trained Models of Code

  • paper_url: http://arxiv.org/abs/2311.07553
  • repo_url: https://github.com/cgcl-codes/attack_ptmc
  • paper_authors: Xiaohu Du, Ming Wen, Zichao Wei, Shangwen Wang, Hai Jin
  • for: 这篇论文是用于测试和评估Transformer-based预训练模型的攻击性评估。
  • methods: 这篇论文使用了五种现有的攻击方法,并从三个角度进行了系统性的分析:效果、效率和生成的例子质量。
  • results: 研究结果显示,现有的攻击方法中,identifier substitution within for and if statements 是最有效的,并且可以优化生成的攻击码的自然性。此外,提出了一个新的方法,优化不同类型的陈述式,并使用搜索精灵来生成攻击码,可以优化效率和自然性。
    Abstract Transformer-based pre-trained models of code (PTMC) have been widely utilized and have achieved state-of-the-art performance in many mission-critical applications. However, they can be vulnerable to adversarial attacks through identifier substitution or coding style transformation, which can significantly degrade accuracy and may further incur security concerns. Although several approaches have been proposed to generate adversarial examples for PTMC, the effectiveness and efficiency of such approaches, especially on different code intelligence tasks, has not been well understood. To bridge this gap, this study systematically analyzes five state-of-the-art adversarial attack approaches from three perspectives: effectiveness, efficiency, and the quality of generated examples. The results show that none of the five approaches balances all these perspectives. Particularly, approaches with a high attack success rate tend to be time-consuming; the adversarial code they generate often lack naturalness, and vice versa. To address this limitation, we explore the impact of perturbing identifiers under different contexts and find that identifier substitution within for and if statements is the most effective. Based on these findings, we propose a new approach that prioritizes different types of statements for various tasks and further utilizes beam search to generate adversarial examples. Evaluation results show that it outperforms the state-of-the-art ALERT in terms of both effectiveness and efficiency while preserving the naturalness of the generated adversarial examples.
    摘要 启用基于变换器的预训练模型(PTMC)在许多关键应用中已经广泛应用,但它们可能受到 identifier 替换或编程风格变化的攻击,这可能会导致准确性下降和安全问题。虽然有几种方法用于生成针对 PTMC 的攻击示例,但这些方法在不同的代码智能任务中的效果和效率尚未得到了充分的了解。为了填补这一漏洞,本研究系统atically 分析了五种当前领先的攻击方法,从三个角度来评估它们的效果、效率和生成的示例质量。结果显示,其中没有一种方法能够均衡这三个方面。特别是,拥有高攻击成功率的方法通常需要较长的时间,生成的针对式代码通常缺乏自然性,并且vice versa。为了解决这一限制,我们调查了在不同上下文中 Identifier 替换的影响,发现在 for 和 if 语句中进行 Identifier 替换是最有效的。基于这些发现,我们提出了一种新的方法,它根据不同的任务类型将不同类型的语句优先级化,并使用搜索桶来生成攻击示例。测试结果表明,它在效果和效率两个方面超越了现有的 ALERT,同时保持了针对式代码的自然性。

GPT-4V(ision) as A Social Media Analysis Engine

  • paper_url: http://arxiv.org/abs/2311.07547
  • repo_url: https://github.com/vista-h/gpt-4v_social_media
  • paper_authors: Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo
  • for: 这个论文旨在探讨大型多Modal模型(LMMs)在社交媒体内容分析方面的潜力。
  • methods: 这个论文使用GPT-4V模型进行社交媒体内容分析,选择了五个表型任务,包括情感分析、仇恨言语检测、假新闻识别、人口统计学和政治立场检测,以评估GPT-4V的能力。
  • results: GPT-4V在这些任务中表现出色,表现出联合图片文字对 pair 的理解能力、文化和情境意识以及广泛的通用常识知识。 despite the overall impressive capacity of GPT-4V in the social media domain, there remain notable challenges, such as struggling with multilingual social multimedia comprehension and generating erroneous information in the context of evolving celebrity and politician knowledge.
    Abstract Recent research has offered insights into the extraordinary capabilities of Large Multimodal Models (LMMs) in various general vision and language tasks. There is growing interest in how LMMs perform in more specialized domains. Social media content, inherently multimodal, blends text, images, videos, and sometimes audio. Understanding social multimedia content remains a challenging problem for contemporary machine learning frameworks. In this paper, we explore GPT-4V(ision)'s capabilities for social multimedia analysis. We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection, to evaluate GPT-4V. Our investigation begins with a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a careful review of the results and a selection of qualitative samples that illustrate GPT-4V's potential in understanding multimodal social media content. GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge. Despite the overall impressive capacity of GPT-4V in the social media domain, there remain notable challenges. GPT-4V struggles with tasks involving multilingual social multimedia comprehension and has difficulties in generalizing to the latest trends in social media. Additionally, it exhibits a tendency to generate erroneous information in the context of evolving celebrity and politician knowledge, reflecting the known hallucination problem. The insights gleaned from our findings underscore a promising future for LMMs in enhancing our comprehension of social media content and its users through the analysis of multimodal information.
    摘要 Translated into Simplified Chinese:近期研究提供了关于大型多模态模型(LMM)在多个通用视觉语言任务中的新的发现。在这些任务中,LMM表现出了惊人的能力。我们在这篇论文中探索GPT-4V(ision)在社交媒体分析中的能力。我们选择了五个表示任务,包括情感分析、谩骂检测、假新闻标识、人口统计推断和政治立场检测,以评估GPT-4V。我们的调查开始于现有数据集的初步量化分析,然后是一个精心审查结果,并选择一些ILLUSTRATE GPT-4V在多Modal社交媒体内容中的潜力。GPT-4V在这些任务中表现出了惊人的能力,包括对图文对的同时理解、上下文和文化意识、以及广泛的通情知识。尽管GPT-4V在社交媒体领域中的总体表现很出色,但还有一些突出的挑战。GPT-4V在多语言社交媒体理解任务中困难,以及在最新的社交媒体趋势上generalization。此外,它还表现出了在 evolving celebrity和政治人物知识上的错误信息生成问题,这是已知的幻觉问题。我们的发现可以推出,LMM在社交媒体内容和用户理解方面具有优秀的未来。

A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases

  • paper_url: http://arxiv.org/abs/2311.07509
  • repo_url: None
  • paper_authors: Juan Sequeda, Dean Allemang, Bryon Jacob
  • for: This paper aims to evaluate the accuracy of large language models (LLMs) in answering enterprise questions on SQL databases, and to explore the role of knowledge graphs (KGs) in improving accuracy.
  • methods: The paper introduces a benchmark consisting of an enterprise SQL schema, a range of enterprise queries, and a contextual layer incorporating an ontology and mappings that define a knowledge graph. The authors use GPT-4 with zero-shot prompts directly on SQL databases and evaluate its accuracy.
  • results: The authors find that question answering using GPT-4 achieves an accuracy of 16%, and that this accuracy increases to 54% when questions are posed over a knowledge graph representation of the enterprise SQL database. The results suggest that investing in knowledge graphs can provide higher accuracy for LLM-powered question answering systems.Here is the information in Simplified Chinese text:
  • for: 这篇论文旨在评估大语言模型(LLM)在企业问题上的答案精度,以及探讨知识图(KG)在提高精度方面的作用。
  • methods: 论文提出了一个企业SQL schema,一系列企业查询,以及一个含有 ontology 和映射的contextual层,用于定义知识图。作者使用 GPT-4 directly on SQL databases 进行零shot prompt,并评估其精度。
  • results: 作者发现,使用 GPT-4 answering enterprise questions 的精度为 16%,并且当问题提交到知识图表示的企业SQL数据库时,精度提高到 54%。结果表明,投入知识图可以提高 LLM 投入问题 answering 系统的精度。
    Abstract Enterprise applications of Large Language Models (LLMs) hold promise for question answering on enterprise SQL databases. However, the extent to which LLMs can accurately respond to enterprise questions in such databases remains unclear, given the absence of suitable Text-to-SQL benchmarks tailored to enterprise settings. Additionally, the potential of Knowledge Graphs (KGs) to enhance LLM-based question answering by providing business context is not well understood. This study aims to evaluate the accuracy of LLM-powered question answering systems in the context of enterprise questions and SQL databases, while also exploring the role of knowledge graphs in improving accuracy. To achieve this, we introduce a benchmark comprising an enterprise SQL schema in the insurance domain, a range of enterprise queries encompassing reporting to metrics, and a contextual layer incorporating an ontology and mappings that define a knowledge graph. Our primary finding reveals that question answering using GPT-4, with zero-shot prompts directly on SQL databases, achieves an accuracy of 16%. Notably, this accuracy increases to 54% when questions are posed over a Knowledge Graph representation of the enterprise SQL database. Therefore, investing in Knowledge Graph provides higher accuracy for LLM powered question answering systems.
    摘要 企业应用大语言模型(LLM)具有问答系统的潜在应用前景,但是企业问题库中LLM的精度问答能力还未得到了足够的评估。此外,知识图(KG)在增强LLM问答系统的精度方面的潜力还不够了解。本研究旨在评估LLM问答系统在企业问题库中的精度,同时探讨知识图在提高精度方面的作用。为此,我们提出了一个标准 benchmark,包括一个企业 SQL 架构,一系列企业查询,以及一个contextual层,包括一个 ontology 和映射,定义了一个知识图。我们的主要发现是,使用 GPT-4,直接在 SQL 数据库上提问,可以达到 16% 的精度。此外,当问题提交到知识图表示的企业 SQL 数据库时,精度提高至 54%。因此,投资知识图可以提高 LLMPowered 问答系统的精度。

EvoFed: Leveraging Evolutionary Strategies for Communication-Efficient Federated Learning

  • paper_url: http://arxiv.org/abs/2311.07485
  • repo_url: None
  • paper_authors: Mohammad Mahdi Rahimi, Hasnain Irshad Bhatti, Younghyun Park, Humaira Kousar, Jaekyun Moon
  • for: 这篇论文旨在提出一种基于进化策略(Evolutionary Strategies,ES)的联合 Federated Learning(FL)方法,以解决FL中资料共享和通信成本的问题。
  • methods: 这篇论文使用了一种名为“对应度基于信息分享”的概念,将各个节点的本地更新后的模型与每个误差噪音模型进行比较,从而将模型更新资讯传递给服务器。服务器将这些适应度值进行统计处理,并将更新后的全域模型分发回节点。
  • results: 这篇论文的实验结果显示,使用EvoFed方法可以在各种实际应用中实现与FedAvg方法相似的性能,并大幅降低了总通信成本。
    Abstract Federated Learning (FL) is a decentralized machine learning paradigm that enables collaborative model training across dispersed nodes without having to force individual nodes to share data. However, its broad adoption is hindered by the high communication costs of transmitting a large number of model parameters. This paper presents EvoFed, a novel approach that integrates Evolutionary Strategies (ES) with FL to address these challenges. EvoFed employs a concept of 'fitness-based information sharing', deviating significantly from the conventional model-based FL. Rather than exchanging the actual updated model parameters, each node transmits a distance-based similarity measure between the locally updated model and each member of the noise-perturbed model population. Each node, as well as the server, generates an identical population set of perturbed models in a completely synchronized fashion using the same random seeds. With properly chosen noise variance and population size, perturbed models can be combined to closely reflect the actual model updated using the local dataset, allowing the transmitted similarity measures (or fitness values) to carry nearly the complete information about the model parameters. As the population size is typically much smaller than the number of model parameters, the savings in communication load is large. The server aggregates these fitness values and is able to update the global model. This global fitness vector is then disseminated back to the nodes, each of which applies the same update to be synchronized to the global model. Our analysis shows that EvoFed converges, and our experimental results validate that at the cost of increased local processing loads, EvoFed achieves performance comparable to FedAvg while reducing overall communication requirements drastically in various practical settings.
    摘要 联合学习(FL)是一种分布式机器学习 paradigma,允许分散的节点合作进行模型训练,而无需强制每个节点共享数据。然而,其广泛应用受到大量模型参数传输成本的限制。这篇论文提出了 EvoFed,一种新的方法,它将生态演化策略(ES)与 FL 集成以解决这些挑战。EvoFed 采用了一种基于“适应度基于信息共享”的概念,与传统的模型基于 FL 不同。每个节点不需要将实际更新后的模型参数传输,而是将本地更新后的模型与每个噪声扰动模型的距离进行比较。每个节点和服务器都会生成一个完全同步的噪声扰动模型集,使用相同的随机种子。当采用合适的噪声 variance 和种子大小时,噪声扰动模型可以准确反映本地数据更新后的模型,使得传输的适应度值(或fitness值)具有几乎完整的模型参数信息。由于种子大小通常比模型参数的数量小得多,因此通信负担减少很大。服务器将这些适应度值聚合,并将其更新到全局模型。全局适应度向量然后被分发回节点,每个节点都将应用相同的更新,以同步到全局模型。我们的分析表明,EvoFed 可以达到 converges,而且我们的实验结果表明,在增加本地处理负担的情况下,EvoFed 可以在各种实际场景中提供与 FedAvg 相当的性能,同时减少大量通信需求。

Psychometric Predictive Power of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07484
  • repo_url: None
  • paper_authors: Tatsuki Kuribayashi, Yohei Oseki, Timothy Baldwin
  • for: 这paper是为了研究语言模型如何模拟人类阅读行为而写的。
  • methods: 这paper使用了大型语言模型(LLMs),并对其进行了指令调整以提高其提供人类首选回答的能力。
  • results: 研究发现,尽管指令调整可以使LLMs更加人类化,但是它们在计算心理лингвисти学的预测力方面并不总是比基础LLMs更好。此外,研究还发现,使用特定语言假设的提示方法可以使LLMs更加人类化,但是这些提示方法并不能提高LLMs的预测力。
    Abstract Next-word probabilities from language models have been shown to successfully simulate human reading behavior. Building on this, we show that, interestingly, instruction-tuned large language models (LLMs) yield worse psychometric predictive power (PPP) for human reading behavior than base LLMs with equivalent perplexities. In other words, instruction tuning, which helps LLMs provide human-preferred responses, does not always make them human-like from the computational psycholinguistics perspective. In addition, we explore prompting methodologies in simulating human reading behavior with LLMs, showing that prompts reflecting a particular linguistic hypothesis lead LLMs to exhibit better PPP but are still worse than base LLMs. These highlight that recent instruction tuning and prompting do not offer better estimates than direct probability measurements from base LLMs in cognitive modeling.
    摘要 基于语言模型的下一个词概率已经成功地模拟了人类阅读行为。我们发现,有趣的是,对于人类阅读行为的预测力(PPP)而言,特定的指导过滤后的大型语言模型(LLMs)的性能更差于基线模型。这意味着,虽然指导过滤可以使LLMs提供人类首选的回答,但并不总是使其成为人类语言模型的计算预测模型。此外,我们还探讨了使用LLMs simulate human reading behavior的提示方法,发现,表达特定语言假设的提示可以使LLMs表现出更好的PPP,但仍然比基线模型差。这些结果表明,最近的指导过滤和提示方法不能提供更好的估计,与直接从基线模型中获取的概率 measurement相比。

InCA: Rethinking In-Car Conversational System Assessment Leveraging Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07469
  • repo_url: None
  • paper_authors: Ken E. Friedl, Abbas Goher Khan, Soumya Ranjan Sahoo, Md Rashad Al Hasan Rony, Jana Germies, Christian Süß
  • for: 这个论文主要目标是提出一套适用于评估汽车内 conversational question answering(ConvQA)系统的关键性能指标(KPI),以及相关的数据集。
  • methods: 该论文使用了现有的评估指标不足的问题作为出发点,并提出了一种基于 persona 的召回方法来提高模型的多元视角能力。
  • results: 该论文的实验结果表明,使用该提出的 KPI 和数据集可以准确评估 ConvQA 系统的性能,并且通过使用不同的 persona 来召回模型可以提高模型的多元视角能力。
    Abstract The assessment of advanced generative large language models (LLMs) poses a significant challenge, given their heightened complexity in recent developments. Furthermore, evaluating the performance of LLM-based applications in various industries, as indicated by Key Performance Indicators (KPIs), is a complex undertaking. This task necessitates a profound understanding of industry use cases and the anticipated system behavior. Within the context of the automotive industry, existing evaluation metrics prove inadequate for assessing in-car conversational question answering (ConvQA) systems. The unique demands of these systems, where answers may relate to driver or car safety and are confined within the car domain, highlight the limitations of current metrics. To address these challenges, this paper introduces a set of KPIs tailored for evaluating the performance of in-car ConvQA systems, along with datasets specifically designed for these KPIs. A preliminary and comprehensive empirical evaluation substantiates the efficacy of our proposed approach. Furthermore, we investigate the impact of employing varied personas in prompts and found that it enhances the model's capacity to simulate diverse viewpoints in assessments, mirroring how individuals with different backgrounds perceive a topic.
    摘要 evaluating advanced generative large language models (LLMs) presents a significant challenge due to their increased complexity in recent developments. additionally, assessing the performance of LLM-based applications in various industries, as indicated by key performance indicators (KPIs), is a complex task. this task requires a deep understanding of industry use cases and the expected system behavior. within the context of the automotive industry, existing evaluation metrics are inadequate for assessing in-car conversational question answering (ConvQA) systems. the unique demands of these systems, where answers may relate to driver or car safety and are confined within the car domain, highlight the limitations of current metrics. to address these challenges, this paper introduces a set of KPIs tailored for evaluating the performance of in-car ConvQA systems, along with datasets specifically designed for these KPIs. a preliminary and comprehensive empirical evaluation substantiates the efficacy of our proposed approach. furthermore, we investigate the impact of employing varied personas in prompts and found that it enhances the model's capacity to simulate diverse viewpoints in assessments, mirroring how individuals with different backgrounds perceive a topic.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse

  • paper_url: http://arxiv.org/abs/2311.07468
  • repo_url: https://github.com/trestad/mitigating-reversal-curse
  • paper_authors: Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan
  • for: 本研究旨在探讨大语言模型(LLM)中的“逆转咒”现象,即训练数据中知识实体的顺序影响模型的理解。
  • methods: 本研究使用了GLM模型,其中使用了autoregressive blank infilling objective来增强模型的表达能力。
  • results: 在一个由研究者设计的逆转咒测试任务中,使用BICO训练方法可以提高Llama模型的准确率从原来的0%提高到约70%。
    Abstract Recent studies have highlighted a phenomenon in large language models (LLMs) known as "the reversal curse," in which the order of knowledge entities in the training data biases the models' comprehension. For example, if a model is trained on sentences where entity A consistently appears before entity B, it can respond to queries about A by providing B as the answer. However, it may encounter confusion when presented with questions concerning B. We contend that the reversal curse is partially a result of specific model training objectives, particularly evident in the prevalent use of the next-token prediction within most causal language models. For the next-token prediction, models solely focus on a token's preceding context, resulting in a restricted comprehension of the input. In contrast, we illustrate that the GLM, trained using the autoregressive blank infilling objective where tokens to be predicted have access to the entire context, exhibits better resilience against the reversal curse. We propose a novel training method, BIdirectional Casual language modeling Optimization (BICO), designed to mitigate the reversal curse when fine-tuning pretrained causal language models on new data. BICO modifies the causal attention mechanism to function bidirectionally and employs a mask denoising optimization. In the task designed to assess the reversal curse, our approach improves Llama's accuracy from the original 0% to around 70%. We hope that more attention can be focused on exploring and addressing these inherent weaknesses of the current LLMs, in order to achieve a higher level of intelligence.
    摘要 研究发现,大语言模型(LLM)中存在一种现象,被称为“逆转咒语”,即训练数据中知识实体的顺序影响模型的理解。例如,如果一个模型在句子中entity A consistently appears before entity B,它可能会在对A的问题上提供B作为答案。但是,它可能会在对B的问题上遇到困惑。我们认为,逆转咒语 partly due to specific model training objectives, particularly the prevalent use of next-token prediction within most causal language models。这种预测方法会让模型围绕一个token的前置上下文进行预测,从而导致输入的 restriction 的理解。然而,我们展示了使用autoregressive blank infilling objective,其中tokens to be predicted有访问整个上下文的能力,可以减轻逆转咒语的影响。我们提出了一种新的训练方法,名为BIdirectional Casual language modeling Optimization(BICO),用于在新数据上细化已经预测的语言模型。BICO改变了 causal attention mechanism 的方向,并使用 mask denoising optimization。在用于评估逆转咒语的任务中,我们的方法可以提高Llama的准确率,从原来的0%提高到大约70%。我们希望可以更多地关注和解决当前LLMs的内在弱点,以达到更高水平的智能。

On Measuring Faithfulness of Natural Language Explanations

  • paper_url: http://arxiv.org/abs/2311.07466
  • repo_url: https://github.com/heidelberg-nlp/cc-shap
  • paper_authors: Letitia Parcalabescu, Anette Frank
  • for: 这 paper 的目的是为了解释 LLM 的预测,并评估现有的 faithfulness 测试是否能够准确评估 LLM 的内部工作方式。
  • methods: 这 paper 使用了现有的 faithfulness 测试,以及自己提出的 CC-SHAP 测试来评估 LLM 的自 consistency。CC-SHAP 是一种新的、更为细致的自 consistency 测试,可以比较模型的输入贡献与答案预测和生成的解释之间的关系。
  • results: 据 paper 的结果,现有的 faithfulness 测试并不能准确评估 LLM 的内部工作方式,而是只能评估其输出水平的自 consistency。而 CC-SHAP 测试则能够更好地评估 LLM 的自 consistency,并且可以提供更加 interpretable 的结果。
    Abstract Large language models (LLMs) can explain their own predictions, through post-hoc or Chain-of-Thought (CoT) explanations. However the LLM could make up reasonably sounding explanations that are unfaithful to its underlying reasoning. Recent work has designed tests that aim to judge the faithfulness of either post-hoc or CoT explanations. In this paper we argue that existing faithfulness tests are not actually measuring faithfulness in terms of the models' inner workings, but only evaluate their self-consistency on the output level. The aims of our work are two-fold. i) We aim to clarify the status of existing faithfulness tests in terms of model explainability, characterising them as self-consistency tests instead. This assessment we underline by constructing a Comparative Consistency Bank for self-consistency tests that for the first time compares existing tests on a common suite of 11 open-source LLMs and 5 datasets -- including ii) our own proposed self-consistency measure CC-SHAP. CC-SHAP is a new fine-grained measure (not test) of LLM self-consistency that compares a model's input contributions to answer prediction and generated explanation. With CC-SHAP, we aim to take a step further towards measuring faithfulness with a more interpretable and fine-grained method. Code available at \url{https://github.com/Heidelberg-NLP/CC-SHAP}
    摘要
  1. To clarify the status of existing faithfulness tests in terms of model explainability, characterizing them as self-consistency tests.2. To propose a new fine-grained measure of LLM self-consistency, called CC-SHAP, which compares the model’s input contributions to its answer prediction and generated explanation.We construct a Comparative Consistency Bank for self-consistency tests on a common suite of 11 open-source LLMs and 5 datasets. Our proposed CC-SHAP measure provides a more interpretable and fine-grained method for measuring faithfulness. The code for CC-SHAP is available at \url{https://github.com/Heidelberg-NLP/CC-SHAP}.

KnowSafe: Combined Knowledge and Data Driven Hazard Mitigation in Artificial Pancreas Systems

  • paper_url: http://arxiv.org/abs/2311.07460
  • repo_url: None
  • paper_authors: Xugui Zhou, Maxfield Kouzel, Chloe Smith, Homa Alemzadeh
  • for: This paper aims to improve the safety and security of cyber-physical systems (CPS) by proposing a combined knowledge and data-driven approach called KnowSafe to predict and mitigate safety hazards.
  • methods: The KnowSafe approach integrates domain-specific knowledge of safety constraints and context-specific mitigation actions with machine learning (ML) techniques to estimate system trajectories, infer potential hazards, and generate optimal corrective actions to keep the system safe.
  • results: Experimental evaluation on two realistic closed-loop testbeds for artificial pancreas systems (APS) and a real-world clinical trial dataset for diabetes treatment demonstrates that KnowSafe outperforms the state-of-the-art by achieving higher accuracy in predicting system state trajectories and potential hazards, a low false positive rate, and no false negatives. It also maintains the safe operation of the simulated APS despite faults or attacks without introducing any new hazards, with a hazard mitigation success rate of 92.8%, which is at least 76% higher than solely rule-based (50.9%) and data-driven (52.7%) methods.Here is the result in Simplified Chinese text:
  • for: 本研究旨在提高Cyber-Physical Systems (CPS) 的安全性和安全性。
  • methods: 该方法 combinesterminology-specific knowledge of safety constraints和context-specific mitigation actionswith机器学习(ML)技术来估算系统轨迹、推测potential hazards, 并生成最佳 corrections to keep the system safe.
  • results: 实验证明,KnowSafe在两个实际关闭loop testbed for artificial pancreas systems (APS) 和一个实际临床试验数据集 for diabetes treatment 上表现出优于状态艺术的 Results show that KnowSafe outperforms the state-of-the-art by achieving higher accuracy in predicting system state trajectories and potential hazards, a low false positive rate, and no false negatives. It also maintains the safe operation of the simulated APS despite faults or attacks without introducing any new hazards, with a hazard mitigation success rate of 92.8%, which is at least 76% higher than solely rule-based (50.9%) and data-driven (52.7%) methods.
    Abstract Significant progress has been made in anomaly detection and run-time monitoring to improve the safety and security of cyber-physical systems (CPS). However, less attention has been paid to hazard mitigation. This paper proposes a combined knowledge and data driven approach, KnowSafe, for the design of safety engines that can predict and mitigate safety hazards resulting from safety-critical malicious attacks or accidental faults targeting a CPS controller. We integrate domain-specific knowledge of safety constraints and context-specific mitigation actions with machine learning (ML) techniques to estimate system trajectories in the far and near future, infer potential hazards, and generate optimal corrective actions to keep the system safe. Experimental evaluation on two realistic closed-loop testbeds for artificial pancreas systems (APS) and a real-world clinical trial dataset for diabetes treatment demonstrates that KnowSafe outperforms the state-of-the-art by achieving higher accuracy in predicting system state trajectories and potential hazards, a low false positive rate, and no false negatives. It also maintains the safe operation of the simulated APS despite faults or attacks without introducing any new hazards, with a hazard mitigation success rate of 92.8%, which is at least 76% higher than solely rule-based (50.9%) and data-driven (52.7%) methods.
    摘要 “具有显著进步的偏差探测和执行监控技术,以提高Cyber-Physical System(CPS)的安全性和安全性。然而,较少的注意力被带到危险排除方面。本文提出了一个结合知识和数据驱动的方法,即KnowSafe,用于设计安全引擎,可以预测和排除CPS控制器的安全危险。我们结合专业知识和机器学习(ML)技术,估计系统轨迹在远近未来,推断可能的危险,并生成最佳修正动作,以确保系统安全。实验评估过两个实际关闭loop测试床 для人工肾脏系统(APS)和一个真实世界临床试验数据集 для调节糖尿病治疗,显示了KnowSafe在预测系统状态轨迹和潜在危险方面的精度高于现有技术, false positive rate低,false negative absent。它还能在模拟的APS中维持安全运行,即使有faults或攻击,成功排除危险的成功率为92.8%,至少高于专业规则(50.9%)和数据驱动(52.7%)方法。”

Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue

  • paper_url: http://arxiv.org/abs/2311.07445
  • repo_url: None
  • paper_authors: Junkai Zhou, Liang Pang, Huawei Shen, Xueqi Cheng
  • for: 提高大语言模型(LLM)的对话系统能力,使其更像人类对话伙伴。
  • methods: 添加了五种communication skills到响应生成过程中:主题转换、主动问题、概念引导、同情和概要总结。
  • results: 在人工和自动评估中,提出的CSIM策略比基eline模型更高效,并且能够更好地评估对话生成能力。
    Abstract The emergence of large language models (LLMs) further improves the capabilities of open-domain dialogue systems and can generate fluent, coherent, and diverse responses. However, LLMs still lack an important ability: communication skills, which makes them more like information seeking tools than anthropomorphic chatbots. To make LLMs more anthropomorphic and proactive during the conversation, we add five communication skills to the response generation process: topic transition, proactively asking questions, concept guidance, empathy, and summarising often. The addition of communication skills increases the interest of users in the conversation and attracts them to chat for longer. To enable LLMs better understand and use communication skills, we design and add the inner monologue to LLMs. The complete process is achieved through prompt engineering and in-context learning. To evaluate communication skills, we construct a benchmark named Cskills for evaluating various communication skills, which can also more comprehensively evaluate the dialogue generation ability of the model. Experimental results show that the proposed CSIM strategy improves the backbone models and outperforms the baselines in both automatic and human evaluations.
    摘要 大型语言模型(LLM)的出现进一步提高了开放领域对话系统的能力,可以生成流畅、一致、多样的回答。然而,LLM仍缺乏重要的能力:交流技巧,使其更像信息搜索工具而不是人工智能聊天机器人。为使LLM更人化和主动在对话中,我们在回答生成过程中添加了五种交流技巧:话题转换、主动问题,概念导航、同情和概要。这些技巧的添加使用户对对话更有兴趣,使其更长时间参与对话。为让LLM更好地理解和使用交流技巧,我们设计了内部对话。完整的过程通过提问工程和在线学习实现。为评估交流技巧,我们建立了名为Cskills的基准,用于评估不同的交流技巧,同时也更全面评估对话生成能力。实验结果表明,我们提出的CSIM策略可以提高基础模型的性能,并在自动和人类评估中超过基eline。

Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models

  • paper_url: http://arxiv.org/abs/2311.07439
  • repo_url: https://github.com/zurichnlp/multipivotnmt
  • paper_authors: Alireza Mohammadshahi, Jannis Vamvas, Rico Sennrich
  • for: 提高低资源语言翻译方向的翻译质量
  • methods: 使用多语言折衔策略,提出MaxEns结合策略,偏向最高信任预测结果
  • results: 在FLORES测试准则上对20种低资源语言方向进行评估,显示MaxEns方法可以提高翻译质量,同时减少翻译中的幻化现象,比irect翻译和平均策略更好
    Abstract Massively multilingual machine translation models allow for the translation of a large number of languages with a single model, but have limited performance on low- and very-low-resource translation directions. Pivoting via high-resource languages remains a strong strategy for low-resource directions, and in this paper we revisit ways of pivoting through multiple languages. Previous work has used a simple averaging of probability distributions from multiple paths, but we find that this performs worse than using a single pivot, and exacerbates the hallucination problem because the same hallucinations can be probable across different paths. As an alternative, we propose MaxEns, a combination strategy that is biased towards the most confident predictions, hypothesising that confident predictions are less prone to be hallucinations. We evaluate different strategies on the FLORES benchmark for 20 low-resource language directions, demonstrating that MaxEns improves translation quality for low-resource languages while reducing hallucination in translations, compared to both direct translation and an averaging approach. On average, multi-pivot strategies still lag behind using English as a single pivot language, raising the question of how to identify the best pivoting strategy for a given translation direction.
    摘要 大规模多语言机器翻译模型可以同时翻译多种语言,但在低资源翻译方向上表现有限。通过高资源语言作为中间语言来做转换是一个强大策略,在这篇论文中我们重新检视了多语言转换的方法。先前的工作使用了多个路径的概率分布的平均值,但我们发现这会比使用单个转换更差,并且增加了幻觉问题,因为同一个幻觉可能会在不同的路径上出现。作为替代方案,我们提议MaxEns,一种组合策略,偏好最确定的预测,假设最确定的预测对幻觉更敏感。我们在FLORES测试准则上对20种低资源语言方向进行了不同策略的评估,发现MaxEns可以提高低资源语言翻译质量,同时减少翻译中的幻觉,比直接翻译和平均策略更好。然而,多个转换策略仍然落后于使用英语作为单一中间语言,这提出了如何确定最佳转换策略的问题。

Hallucination Augmented Recitations for Language Models

  • paper_url: http://arxiv.org/abs/2311.07424
  • repo_url: None
  • paper_authors: Abdullatif Köksal, Renat Aksitov, Chung-Ching Chang
  • for: The paper aims to improve the attribution of large language models (LLMs) by creating counterfactual datasets using hallucination in LLMs.
  • methods: The paper proposes a method called Hallucination Augmented Recitations (HAR) to create counterfactual datasets for open book question answering.
  • results: The paper shows that models finetuned with the counterfactual datasets improve text grounding and open book QA performance, with up to an 8.0% increase in F1 score, compared to using human-annotated factual datasets. The improvements are consistent across various model sizes and datasets.
    Abstract Attribution is a key concept in large language models (LLMs) as it enables control over information sources and enhances the factuality of LLMs. While existing approaches utilize open book question answering to improve attribution, factual datasets may reward language models to recall facts that they already know from their pretraining data, not attribution. In contrast, counterfactual open book QA datasets would further improve attribution because the answer could only be grounded in the given text. We propose Hallucination Augmented Recitations (HAR) for creating counterfactual datasets by utilizing hallucination in LLMs to improve attribution. For open book QA as a case study, we demonstrate that models finetuned with our counterfactual datasets improve text grounding, leading to better open book QA performance, with up to an 8.0% increase in F1 score. Our counterfactual dataset leads to significantly better performance than using humanannotated factual datasets, even with 4x smaller datasets and 4x smaller models. We observe that improvements are consistent across various model sizes and datasets, including multi-hop, biomedical, and adversarial QA datasets.
    摘要 <>转换文本到简化中文。>概念归属是大语言模型(LLM)中关键的概念,它允许控制信息来源并提高 LLM 的事实性。现有方法使用开书问答来提高归属,但是可能会奖励语言模型 recall 已经从预训练数据中学习的知识,而不是归属。相比之下,Counterfactual open book QA 数据集可以进一步提高归属,因为答案只能基于给定的文本。我们提议使用 Hallucination Augmented Recitations(HAR)来创建 counterfactual 数据集,利用 LLM 中的幻觉来提高归属。在 open book QA 中作为案例研究,我们表明,使用我们的 counterfactual 数据集可以提高文本固定,导致更好的 open book QA 性能,最高提高 F1 分数8.0%。我们的 counterfactual 数据集比使用人工标注的事实数据集更好,即使用4倍小数据和4倍小模型。我们发现,改进是模型size和数据集之间一致的,包括多步、医学和抗击 QA 数据集。

Exploring Values in Museum Artifacts in the SPICE project: a Preliminary Study

  • paper_url: http://arxiv.org/abs/2311.07396
  • repo_url: None
  • paper_authors: Nele Kadastik, Thomas A. Pederson, Luis Emilio Bruni, Rossana Damiano, Antonio Lieto, Manuel Striani, Tsvi Kuflik, Alan Wecker
  • for: 本研究目的是开发一个 semantic reasoning 工具,以增强博物馆访问者的多样性视角。
  • methods: 该工具基于 TCL 常识推理框架,利用 Haidt 理论中的道德价值 ontological 模型,将博物馆展品相关联到共同价值和情感。
  • results: 在 Haifa 的 Hecht 博物馆collection 上进行先期测试,系统可以建议访问者不同价值观的文物,扩展访问者的博物馆经验。
    Abstract This document describes the rationale, the implementation and a preliminary evaluation of a semantic reasoning tool developed in the EU H2020 SPICE project to enhance the diversity of perspectives experienced by museum visitors. The tool, called DEGARI 2.0 for values, relies on the commonsense reasoning framework TCL, and exploits an ontological model formalizingthe Haidt's theory of moral values to associate museum items with combined values and emotions. Within a museum exhibition, this tool can suggest cultural items that are associated not only with the values of already experienced or preferred objects, but also with novel items with different value stances, opening the visit experience to more inclusive interpretations of cultural content. The system has been preliminarily tested, in the context of the SPICE project, on the collection of the Hecht Museum of Haifa.
    摘要

Predicting Continuous Locomotion Modes via Multidimensional Feature Learning from sEMG

  • paper_url: http://arxiv.org/abs/2311.07395
  • repo_url: None
  • paper_authors: Peiwen Fu, Wenjuan Zhong, Yuyang Zhang, Wenxuan Xiong, Yuzhou Lin, Yanlong Tai, Lin Meng, Mingming Zhang
  • for: 本研究旨在提高智能化和透明度的人工辅助器(walking-assistive device)控制方法,需要采用适应控制方法来实现平滑的模式转换。
  • methods: 本研究提出了 Deep-STF,一种综合的深度学习模型,用于捕捉surface electromyography(sEMG)信号的集成特征。该模型可以在不同的预测时间间隔(100-500 ms)上进行精准和可靠的连续预测九种行走模式和15种模式转换。
  • results: 实验结果表明,Deep-STF在多种行走模式和转换中表现出色,只靠基于sEMG数据进行预测。预测100 ms后,Deep-STF的均值预测精度为96.48%,即使延长预测时间间隔至500 ms,精度仅下降至93.00%。此外,对于下一个转换的稳定预测时间(stable prediction time)的评估也提供了有用的数据。
    Abstract Walking-assistive devices require adaptive control methods to ensure smooth transitions between various modes of locomotion. For this purpose, detecting human locomotion modes (e.g., level walking or stair ascent) in advance is crucial for improving the intelligence and transparency of such robotic systems. This study proposes Deep-STF, a unified end-to-end deep learning model designed for integrated feature extraction in spatial, temporal, and frequency dimensions from surface electromyography (sEMG) signals. Our model enables accurate and robust continuous prediction of nine locomotion modes and 15 transitions at varying prediction time intervals, ranging from 100 to 500 ms. In addition, we introduced the concept of 'stable prediction time' as a distinct metric to quantify prediction efficiency. This term refers to the duration during which consistent and accurate predictions of mode transitions are made, measured from the time of the fifth correct prediction to the occurrence of the critical event leading to the task transition. This distinction between stable prediction time and prediction time is vital as it underscores our focus on the precision and reliability of mode transition predictions. Experimental results showcased Deep-STP's cutting-edge prediction performance across diverse locomotion modes and transitions, relying solely on sEMG data. When forecasting 100 ms ahead, Deep-STF surpassed CNN and other machine learning techniques, achieving an outstanding average prediction accuracy of 96.48%. Even with an extended 500 ms prediction horizon, accuracy only marginally decreased to 93.00%. The averaged stable prediction times for detecting next upcoming transitions spanned from 28.15 to 372.21 ms across the 100-500 ms time advances.
    摘要 <>translate text into Simplified ChineseWalking-assistive devices require adaptive control methods to ensure smooth transitions between various modes of locomotion. For this purpose, detecting human locomotion modes (e.g., level walking or stair ascent) in advance is crucial for improving the intelligence and transparency of such robotic systems. This study proposes Deep-STF, a unified end-to-end deep learning model designed for integrated feature extraction in spatial, temporal, and frequency dimensions from surface electromyography (sEMG) signals. Our model enables accurate and robust continuous prediction of nine locomotion modes and 15 transitions at varying prediction time intervals, ranging from 100 to 500 ms. In addition, we introduced the concept of 'stable prediction time' as a distinct metric to quantify prediction efficiency. This term refers to the duration during which consistent and accurate predictions of mode transitions are made, measured from the time of the fifth correct prediction to the occurrence of the critical event leading to the task transition. This distinction between stable prediction time and prediction time is vital as it underscores our focus on the precision and reliability of mode transition predictions. Experimental results showcased Deep-STP's cutting-edge prediction performance across diverse locomotion modes and transitions, relying solely on sEMG data. When forecasting 100 ms ahead, Deep-STF surpassed CNN and other machine learning techniques, achieving an outstanding average prediction accuracy of 96.48%. Even with an extended 500 ms prediction horizon, accuracy only marginally decreased to 93.00%. The averaged stable prediction times for detecting next upcoming transitions spanned from 28.15 to 372.21 ms across the 100-500 ms time advances.<>

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

  • paper_url: http://arxiv.org/abs/2311.07377
  • repo_url: None
  • paper_authors: Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee
  • for: This paper focuses on the challenges of ensuring formal safety in cyber-physical systems (CPS) that are infused with machine learning (ML).
  • methods: The paper examines testing as the most practical method for verification and validation, and summarizes current state-of-the-art methodologies. It also proposes a roadmap to transition from foundational probabilistic testing to a more rigorous approach that can provide formal assurance.
  • results: The paper identifies the main challenges in ensuring formal safety for learning-enabled CPS, and proposes a roadmap to address these challenges.
    Abstract The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations. This convergence has accelerated the development and deployment of a range of real-world applications, such as autonomous vehicles, delivery drones, service robots, and telemedicine procedures. However, the software development life cycle (SDLC) for AI-infused CPS diverges significantly from traditional approaches, featuring data and learning as two critical components. Existing verification and validation techniques are often inadequate for these new paradigms. In this study, we pinpoint the main challenges in ensuring formal safety for learningenabled CPS.We begin by examining testing as the most pragmatic method for verification and validation, summarizing the current state-of-the-art methodologies. Recognizing the limitations in current testing approaches to provide formal safety guarantees, we propose a roadmap to transition from foundational probabilistic testing to a more rigorous approach capable of delivering formal assurance.
    摘要 机器学习(ML)在Cyber-Physical Systems(CPS)中的集成带来了 significative benefits,包括提高效率、预测能力、实时响应和自动化操作。这种整合已经加速了许多实际应用的开发和部署,例如自动驾驶车辆、快递机器人、服务机器人和 теле医疗程序。然而,AI-infused CPS 的软件开发生命周期(SDLC)与传统方法有很大差异,数据和学习作为两个关键组件。现有的验证和验证技术 часто无法满足这些新的 парадигмы的需求。在这种研究中,我们特别关注了确保正式安全的主要挑战。我们开始 by examining testing as the most practical method for verification and validation, summarizing the current state-of-the-art methodologies。认为现有的测试方法无法提供正式安全保证,我们提出了一个路线图,以帮助从基础概率测试过渡到更加严格的方法,以提供正式保证。

Past as a Guide: Leveraging Retrospective Learning for Python Code Completion

  • paper_url: http://arxiv.org/abs/2311.07635
  • repo_url: https://github.com/SeungyounShin/Past-as-a-Guide
  • paper_authors: Seunggyoon Shin, Seunggyu Chang, Sungjoon Choi
  • for: 提高大语言模型(LLM)的代码能力
  • methods: integrate past history with interactive and iterative code refinements
  • results: achieved 92% pass@1 on HumanEval, demonstrating the potential to advance the field by leveraging retrospection from past experiences and interactive and iterative refinement processes without external correctness indicators.
    Abstract This work presents Past as a Guide (PaG), a simple approach for Large Language Models (LLMs) to improve the coding capabilities by integrating the past history with interactive and iterative code refinements. To be specific, inspired by human cognitive processes, the proposed method enables LLMs to utilize previous programming and debugging experiences to enhance the Python code completion tasks. The framework facilitates LLMs to iteratively refine the Python code based on previous execution and debugging results and optimize learning and reasoning capabilities. The proposed methodology achieved a 92\% pass@1 on HumanEval, demonstrating the potential to advance the field by leveraging retrospection from past experiences and interactive and iterative refinement processes without external correctness indicators.
    摘要 这个工作提出了过去作为指南(PaG),一种简单的方法,用于大语言模型(LLM)提高编程能力。具体来说,这种方法受人类认知过程的启发,让提案的方法使用过去编程和调试经验来提高Python代码完成任务。框架允许LLM通过前一次执行和调试结果进行间接反复优化Python代码,提高学习和理解能力。该方法在HumanEval上达到92%的通过率@1,表明该方法可以利用过去经验和间接反复优化过程,不需要外部正确性指标,进而提高领域的进步。

The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4

  • paper_url: http://arxiv.org/abs/2311.07361
  • repo_url: None
  • paper_authors: Microsoft Research AI4Science, Microsoft Azure Quantum
  • for: 本研究的目的是评估GPT-4语言模型在科学发现方面的性能,以 validate its domain-specific expertise、accelerate scientific progress、optimize resource allocation、guide future model development和促进交叉学科研究。
  • methods: 本研究采用专家驱动的案例评估和 occasionally benchmark testing来评估GPT-4模型在各科学领域中的性能,以获得其对复杂科学概念和关系的理解和解决能力。
  • results: 初步调查显示,GPT-4模型在多种科学应用方面表现出了扎实的潜力,能够处理复杂的问题解决和知识集成任务。主要评估GPT-4的科学知识库、科学理解、科学数学计算能力和多种科学预测能力。
    Abstract In recent years, groundbreaking advancements in natural language processing have culminated in the emergence of powerful large language models (LLMs), which have showcased remarkable capabilities across a vast array of domains, including the understanding, generation, and translation of natural language, and even tasks that extend beyond language processing. In this report, we delve into the performance of LLMs within the context of scientific discovery, focusing on GPT-4, the state-of-the-art language model. Our investigation spans a diverse range of scientific areas encompassing drug discovery, biology, computational chemistry (density functional theory (DFT) and molecular dynamics (MD)), materials design, and partial differential equations (PDE). Evaluating GPT-4 on scientific tasks is crucial for uncovering its potential across various research domains, validating its domain-specific expertise, accelerating scientific progress, optimizing resource allocation, guiding future model development, and fostering interdisciplinary research. Our exploration methodology primarily consists of expert-driven case assessments, which offer qualitative insights into the model's comprehension of intricate scientific concepts and relationships, and occasionally benchmark testing, which quantitatively evaluates the model's capacity to solve well-defined domain-specific problems. Our preliminary exploration indicates that GPT-4 exhibits promising potential for a variety of scientific applications, demonstrating its aptitude for handling complex problem-solving and knowledge integration tasks. Broadly speaking, we evaluate GPT-4's knowledge base, scientific understanding, scientific numerical calculation abilities, and various scientific prediction capabilities.
    摘要 近年来,自然语言处理技术的突破性进步使得强大的大语言模型(LLM)出现了,这些模型在各种领域表现出了惊人的能力,包括自然语言理解、生成和翻译,以及 extends beyond 语言处理的任务。在这份报告中,我们将关注 GPT-4,当前领域的状态的语言模型。我们的调查覆盖了多个科学领域,包括药物发现、生物、计算化学(密度功能理论(DFT)和分子动力学(MD))、材料设计和部分偏微方程(PDE)。我们通过专家驱动的案例评估和 occasionally benchmark 测试来评估 GPT-4 在科学任务上的表现。我们的初步探索表明,GPT-4 在多种科学应用程序中表现出了潜在的潜力,证明它可以处理复杂的问题解决和知识集成任务。总的来说,我们评估 GPT-4 的知识基础、科学理解、科学数学计算能力和多种科学预测能力。

MetaSymNet: A Dynamic Symbolic Regression Network Capable of Evolving into Arbitrary Formulations

  • paper_url: http://arxiv.org/abs/2311.07326
  • repo_url: None
  • paper_authors: Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jinyi Liu, Wenqiang Li, Meilan Hao, Shu Wei, Yusong Deng
  • 为:该 paper 的目的是提出一种可靠地自动生成易于理解的数学公式,以解决传统人工神经网络(MLP)的黑盒问题。* 方法:该 paper 使用了一种动态调整网络结构的方法,即 MetaSymNet,该方法可以在实时进行网络结构的扩展和缩小。此外,该 paper 还使用了 PANGU meta 函数作为活化函数,以生成特定需求的数学公式。* 结果:对比四种state-of-the-art симвоlic regression算法,该 paper 的 MetaSymNet 算法在 более чем 10 个公共数据集上进行了比较,并 consistently 表现出了更高的性能。此外,该 paper 还评估了 MetaSymNet 的拟合能力和推断能力,并发现其在这两个领域中均有优异表现。
    Abstract Mathematical formulas serve as the means of communication between humans and nature, encapsulating the operational laws governing natural phenomena. The concise formulation of these laws is a crucial objective in scientific research and an important challenge for artificial intelligence (AI). While traditional artificial neural networks (MLP) excel at data fitting, they often yield uninterpretable black box results that hinder our understanding of the relationship between variables x and predicted values y. Moreover, the fixed network architecture in MLP often gives rise to redundancy in both network structure and parameters. To address these issues, we propose MetaSymNet, a novel neural network that dynamically adjusts its structure in real-time, allowing for both expansion and contraction. This adaptive network employs the PANGU meta function as its activation function, which is a unique type capable of evolving into various basic functions during training to compose mathematical formulas tailored to specific needs. We then evolve the neural network into a concise, interpretable mathematical expression. To evaluate MetaSymNet's performance, we compare it with four state-of-the-art symbolic regression algorithms across more than 10 public datasets comprising 222 formulas. Our experimental results demonstrate that our algorithm outperforms others consistently regardless of noise presence or absence. Furthermore, we assess MetaSymNet against MLP and SVM regarding their fitting ability and extrapolation capability, these are two essential aspects of machine learning algorithms. The findings reveal that our algorithm excels in both areas. Finally, we compared MetaSymNet with MLP using iterative pruning in network structure complexity. The results show that MetaSymNet's network structure complexity is obviously less than MLP under the same goodness of fit.
    摘要 matematicos serve como meio de comunicação entre humanos e natureza, encapsulando as leis operacionais que governam fenômenos naturais. A formulação concisa dessas leis é um objetivo crucial na pesquisa científica e uma desvantagem importante para inteligência artificial (IA). Embora as redes neurais artificiais tradicionais (MLP) excelam em adaptação de dados, elas often yield resultados negros brutos que dificultam nossa compreensão da relação entre variáveis x e valores preditos y. Além disso, a estrutura de rede fixa em MLP frequentemente dá rise a redundância em ambos a estrutura de rede e parâmetros. Para abordar esses problemas, propomos MetaSymNet, uma rede neuronal novativa que ajusta sua estrutura em tempo real, permitindo expansão e contração. Essa rede adaptativa emprega a função de ativação PANGU, que é um tipo único capaz de evoluir para várias funções básicas durante o treinamento para compor fórmulas matemáticas personalizadas. Em seguida, evoluímos a rede neuronal para uma expressão matemática concisa e interpretable. Para avaliar o desempenho de MetaSymNet, comparamos com quatro algoritmos de regressão simbólica de estado da arte em mais de 10 conjuntos de dados públicos, que incluem 222 fórmulas. Nossos resultados experimentais demonstram que nosso algoritmo supera os outros consistentemente, independentemente da presença ou ausência de ruído. Além disso, avaliamos MetaSymNet em relação à capacidade de ajuste de MLP e SVM para adaptação de dados e extrapolação. Os resultados mostram que nosso algoritmo excelentes em ambas as áreas. Por fim, comparamos MetaSymNet com MLP usando rede de pruned em complexidade de estrutura. Os resultados mostram que a complexidade de estrutura de MetaSymNet é significativamente menor do que MLP sob o mesmo bom ajuste.

Towards a Transportable Causal Network Model Based on Observational Healthcare Data

  • paper_url: http://arxiv.org/abs/2311.08427
  • repo_url: None
  • paper_authors: Alice Bernasconi, Alessio Zanga, Peter J. F. Lucas, Marco Scutari Fabio Stella
  • for: 本研究旨在提供一种基于人工智能技术的诊断模型,以提高妊娠和青少年女性患有乳腺癌后的心血管风险评估。
  • methods: 本研究使用选择图、缺失图、 causal发现和先前知识 combine into a single graphical model,以估计青少年女性患有乳腺癌后心血管风险。 研究从两个不同的患者群体中获取数据,并由专业医生 validate 模型的风险评估、准确率和可解释性。
  • results: 研究结果表明,使用该模型可以在诊断中提高妊娠和青少年女性患有乳腺癌后心血管风险的准确率,并且模型的预测结果比其他机器学习方法更加准确。
    Abstract Over the last decades, many prognostic models based on artificial intelligence techniques have been used to provide detailed predictions in healthcare. Unfortunately, the real-world observational data used to train and validate these models are almost always affected by biases that can strongly impact the outcomes validity: two examples are values missing not-at-random and selection bias. Addressing them is a key element in achieving transportability and in studying the causal relationships that are critical in clinical decision making, going beyond simpler statistical approaches based on probabilistic association. In this context, we propose a novel approach that combines selection diagrams, missingness graphs, causal discovery and prior knowledge into a single graphical model to estimate the cardiovascular risk of adolescent and young females who survived breast cancer. We learn this model from data comprising two different cohorts of patients. The resulting causal network model is validated by expert clinicians in terms of risk assessment, accuracy and explainability, and provides a prognostic model that outperforms competing machine learning methods.
    摘要 In this context, we propose a novel approach that combines selection diagrams, missingness graphs, causal discovery, and prior knowledge into a single graphical model to estimate the cardiovascular risk of adolescent and young females who survived breast cancer. We use data from two different cohorts of patients to learn this model, and the resulting causal network model is validated by expert clinicians in terms of risk assessment, accuracy, and explainability. The model outperforms competing machine learning methods in providing accurate predictions.

Rethinking and Benchmarking Predict-then-Optimize Paradigm for Combinatorial Optimization Problems

  • paper_url: http://arxiv.org/abs/2311.07633
  • repo_url: None
  • paper_authors: Haoyu Geng, Han Ruan, Runzhong Wang, Yang Li, Yang Wang, Lei Chen, Junchi Yan
  • for: 研究领域是 Predict-Then-Optimize (PTO) 中的决策和预测组件系统,旨在解决各种 combinatorial optimization 问题,如能源成本考虑的排程、网络广告预算分配和社交网络上的图像匹配等。
  • methods: 研究使用的方法包括 end-to-end 方法和传统两阶段方法,以 Directly Optimizing the Ultimate Decision Quality 的方式来提高决策质量。
  • results: 研究提供了一个整合现有实验场景的benchmark,以便评估不同情况下的模型效果,并提供了一个新的 industrial combinatorial advertising 问题的数据集,以便更好地评估和应用这些方法。
    Abstract Numerous web applications rely on solving combinatorial optimization problems, such as energy cost-aware scheduling, budget allocation on web advertising, and graph matching on social networks. However, many optimization problems involve unknown coefficients, and improper predictions of these factors may lead to inferior decisions which may cause energy wastage, inefficient resource allocation, inappropriate matching in social networks, etc. Such a research topic is referred to as "Predict-Then-Optimize (PTO)" which considers the performance of prediction and decision-making in a unified system. A noteworthy recent development is the end-to-end methods by directly optimizing the ultimate decision quality which claims to yield better results in contrast to the traditional two-stage approach. However, the evaluation benchmarks in this field are fragmented and the effectiveness of various models in different scenarios remains unclear, hindering the comprehensive assessment and fast deployment of these methods. To address these issues, we provide a comprehensive categorization of current approaches and integrate existing experimental scenarios to establish a unified benchmark, elucidating the circumstances under which end-to-end training yields improvements, as well as the contexts in which it performs ineffectively. We also introduce a new dataset for the industrial combinatorial advertising problem for inclusive finance to open-source. We hope the rethinking and benchmarking of PTO could facilitate more convenient evaluation and deployment, and inspire further improvements both in the academy and industry within this field.
    摘要 许多网络应用程序依赖于解决 combinatorial optimization 问题,如能源成本考虑的调度、在网络广告上的预算分配和社交网络上的图像匹配。然而,许多优化问题中的系数未知, incorrect predictions of these factors may lead to inferior decisions, resulting in energy waste, inefficient resource allocation, inappropriate matching in social networks, and so on. This research topic is referred to as "Predict-Then-Optimize (PTO)" and considers the performance of prediction and decision-making in a unified system.Recent developments in end-to-end methods have claimed to yield better results by directly optimizing the ultimate decision quality, but the evaluation benchmarks in this field are fragmented and the effectiveness of various models in different scenarios remains unclear. To address these issues, we provide a comprehensive categorization of current approaches and integrate existing experimental scenarios to establish a unified benchmark, elucidating the circumstances under which end-to-end training yields improvements and the contexts in which it performs ineffectively.In addition, we introduce a new dataset for the industrial combinatorial advertising problem in the field of inclusive finance to open-source. We hope that the rethinking and benchmarking of PTO could facilitate more convenient evaluation and deployment, and inspire further improvements both in the academy and industry within this field.

ResMGCN: Residual Message Graph Convolution Network for Fast Biomedical Interactions Discovering

  • paper_url: http://arxiv.org/abs/2311.07632
  • repo_url: None
  • paper_authors: Zecheng Yin
  • for: 本研究旨在提出一种fast和精准的生物医学信息图connolly方法,以便更好地预测生物医学信息之间的互动。
  • methods: 本研究使用了一种新型的剩余消息图卷积网络(ResMGCN),它可以快速和精准地捕捉生物医学信息之间的互动。ResMGCN通过聚合下一层信息和前一层信息来 guide node更新,从而获得更有意义的节点表示。
  • results: 在四个生物医学互动网络数据集上进行了实验,结果显示,ResMGCN可以比前一代模型更高效地使用存储和时间,并且在预测生物医学信息之间的互动方面达到了极高的效果。
    Abstract Biomedical information graphs are crucial for interaction discovering of biomedical information in modern age, such as identification of multifarious molecular interactions and drug discovery, which attracts increasing interests in biomedicine, bioinformatics, and human healthcare communities. Nowadays, more and more graph neural networks have been proposed to learn the entities of biomedical information and precisely reveal biomedical molecule interactions with state-of-the-art results. These methods remedy the fading of features from a far distance but suffer from remedying such problem at the expensive cost of redundant memory and time. In our paper, we propose a novel Residual Message Graph Convolution Network (ResMGCN) for fast and precise biomedical interaction prediction in a different idea. Specifically, instead of enhancing the message from far nodes, ResMGCN aggregates lower-order information with the next round higher information to guide the node update to obtain a more meaningful node representation. ResMGCN is able to perceive and preserve various messages from the previous layer and high-order information in the current layer with least memory and time cost to obtain informative representations of biomedical entities. We conduct experiments on four biomedical interaction network datasets, including protein-protein, drug-drug, drug-target, and gene-disease interactions, which demonstrates that ResMGCN outperforms previous state-of-the-art models while achieving superb effectiveness on both storage and time.
    摘要 生物医学信息图是现代生物医学研究中不可或缺的工具,用于揭示生物医学信息的多样性,如蛋白质相互作用和药物发现,这在生物医学、生物信息学和人类医疗领域引起了越来越多的关注。在当今,越来越多的图 neural network 被提议用于学习生物医学信息的实体和准确地揭示生物分子相互作用。然而,这些方法往往会带来缺乏特征的问题,并且需要大量的内存和时间成本。在我们的论文中,我们提出了一种新的差异 идеald Residual Message Graph Convolution Network (ResMGCN),用于快速和准确的生物医学交互预测。Specifically, ResMGCN 通过在下一轮更高级别信息的帮助下,将下一轮更低级别信息与当前层信息融合,以便更准确地更新节点表示。ResMGCN 能够捕捉和保留上一层和当前层的所有信息,并在最小的内存和时间成本下获得有用的生物医学实体表示。我们在四个生物医学交互网络数据集上进行了实验,包括蛋白质-蛋白质、药物-药物、药物-目标和基因-疾病交互,结果表明,ResMGCN 在存储和时间成本方面具有superb的效果,而且在生物医学交互预测方面具有极高的准确率。

Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07314
  • repo_url: None
  • paper_authors: Junpeng Li, Zixia Jia, Zilong Zheng
  • for: 文章的目的是提出一种自动化文档关系EXTRACTION的方法,以便减少人工干预。
  • methods: 该方法利用大语言模型(LLM)和自然语言推理(NLI)模块生成关系 triple,以增强文档关系集。
  • results: authors 通过对 DocGNRE 数据集进行重新标注,发现该方法能够提高文档关系EXTRACTION的准确率。
    Abstract Document-level Relation Extraction (DocRE), which aims to extract relations from a long context, is a critical challenge in achieving fine-grained structural comprehension and generating interpretable document representations. Inspired by recent advances in in-context learning capabilities emergent from large language models (LLMs), such as ChatGPT, we aim to design an automated annotation method for DocRE with minimum human effort. Unfortunately, vanilla in-context learning is infeasible for document-level relation extraction due to the plenty of predefined fine-grained relation types and the uncontrolled generations of LLMs. To tackle this issue, we propose a method integrating a large language model (LLM) and a natural language inference (NLI) module to generate relation triples, thereby augmenting document-level relation datasets. We demonstrate the effectiveness of our approach by introducing an enhanced dataset known as DocGNRE, which excels in re-annotating numerous long-tail relation types. We are confident that our method holds the potential for broader applications in domain-specific relation type definitions and offers tangible benefits in advancing generalized language semantic comprehension.
    摘要 文档级关系提取(DocRE),targeting to extract relations from a long context, is a crucial challenge in achieving fine-grained structural comprehension and generating interpretable document representations. Inspired by recent advances in in-context learning capabilities emergent from large language models (LLMs), such as ChatGPT, we aim to design an automated annotation method for DocRE with minimum human effort. However, vanilla in-context learning is infeasible for document-level relation extraction due to the abundance of predefined fine-grained relation types and the uncontrolled generations of LLMs. To address this issue, we propose a method integrating a large language model (LLM) and a natural language inference (NLI) module to generate relation triples, thereby augmenting document-level relation datasets. We demonstrate the effectiveness of our approach by introducing an enhanced dataset known as DocGNRE, which excels in re-annotating numerous long-tail relation types. We believe that our method holds great potential for broader applications in domain-specific relation type definitions and offers tangible benefits in advancing generalized language semantic comprehension.

C-Procgen: Empowering Procgen with Controllable Contexts

  • paper_url: http://arxiv.org/abs/2311.07312
  • repo_url: None
  • paper_authors: Zhenxiong Tan, Kaixin Wang, Xinchao Wang
  • for: 这篇论文是为了提供一个增强的 Procgen 环境集,以便进行多种研究。
  • methods: 这篇论文使用了细致的环境配置机制,包括游戏机制和代理特性。这使得过程生成过程,之前是一个黑盒,现在变得更加透明和可调整。
  • results: C-Procgen 提供了200多个独特的游戏上下文,并且可以进行精细的环境配置。这使得研究人员可以更好地控制和分析过程生成过程。
    Abstract We present C-Procgen, an enhanced suite of environments on top of the Procgen benchmark. C-Procgen provides access to over 200 unique game contexts across 16 games. It allows for detailed configuration of environments, ranging from game mechanics to agent attributes. This makes the procedural generation process, previously a black-box in Procgen, more transparent and adaptable for various research needs.The upgrade enhances dynamic context management and individualized assignments, while maintaining computational efficiency. C-Procgen's controllable contexts make it applicable in diverse reinforcement learning research areas, such as learning dynamics analysis, curriculum learning, and transfer learning. We believe that C-Procgen will fill a gap in the current literature and offer a valuable toolkit for future works.
    摘要 我们介绍C-Procgen,一个增强版的环境集合,基于Procgen测试库。C-Procgen提供了超过200个不同游戏情境,涵盖16款游戏。它允许精确地配置环境,从游戏机制到代理属性。这使得预设的生成过程,在Procgen中是一个黑盒子,现在变得更加透明和可调整,适用于不同的研究需求。升级提高了动态上下文管理和个性化分配,保持计算效率。C-Procgen的可控上下文使其适用于多种强化学习研究领域,如学习动力分析、课程学习和转移学习。我们认为C-Procgen将填补现有文献中的空白,并提供一个有价的工具组。

Do large language models and humans have similar behaviors in causal inference with script knowledge?

  • paper_url: http://arxiv.org/abs/2311.07311
  • repo_url: https://github.com/tony-hong/causal-script
  • paper_authors: Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi, Vera Demberg
  • for: 研究大型预训语言模型(LLMs)的语言理解能力,包括零shot causal reasoning。
  • methods: 使用脚本基于的故事进行研究,检测事件B的处理。
  • results: 1) 最新的LLMs(如GPT-3或Vicuna)与人类行为相似,在$\neg A \rightarrow B$ condition下显示较长的阅读时间。2) despite this correlation, all models still have difficulty integrating script knowledge, failing to predict that $nil \rightarrow B$ is less surprising than $\neg A \rightarrow B$.
    Abstract Recently, large pre-trained language models (LLMs) have demonstrated superior language understanding abilities, including zero-shot causal reasoning. However, it is unclear to what extent their capabilities are similar to human ones. We here study the processing of an event $B$ in a script-based story, which causally depends on a previous event $A$. In our manipulation, event $A$ is stated, negated, or omitted in an earlier section of the text. We first conducted a self-paced reading experiment, which showed that humans exhibit significantly longer reading times when causal conflicts exist ($\neg A \rightarrow B$) than under logical conditions ($A \rightarrow B$). However, reading times remain similar when cause A is not explicitly mentioned, indicating that humans can easily infer event B from their script knowledge. We then tested a variety of LLMs on the same data to check to what extent the models replicate human behavior. Our experiments show that 1) only recent LLMs, like GPT-3 or Vicuna, correlate with human behavior in the $\neg A \rightarrow B$ condition. 2) Despite this correlation, all models still fail to predict that $nil \rightarrow B$ is less surprising than $\neg A \rightarrow B$, indicating that LLMs still have difficulties integrating script knowledge. Our code and collected data set are available at https://github.com/tony-hong/causal-script.
    摘要 最近,大型预训言语模型(LLM)表现出了优秀的语言理解能力,包括零shot causal reasoning。然而,它们与人类的能力相似程度还是未知。我们在这里研究一个script-based story中的事件B的处理,它受到前一个事件A的 causal dependence。在我们的探索中,事件A在文本中的某个前面部分被读出、否定或 omits。我们首先进行了自适应阅读实验,发现在 causal conflict 存在($\neg A \rightarrow B$)时,人类的阅读时间显著 longer than logical conditions 时间 ($A \rightarrow B$)。然而,阅读时间在 causal A 不是直接提到时仍然很相似, indicating that humans can easily infer event B from their script knowledge。然后,我们测试了多种 LLM 在同一数据集上,以确定它们与人类行为相似度。我们的实验结果表明:1)只有最新的 LLM,如 GPT-3 或 Vicuna,与人类行为在 $\neg A \rightarrow B$ 条件中相似。2)尽管与人类行为相似,所有模型仍然无法预测 $nil \rightarrow B$ 比 $\neg A \rightarrow B$ 更少意外,表明 LLMs 仍然有困难 integra script knowledge。我们的代码和数据集可以在 https://github.com/tony-hong/causal-script 上获取。

Explaining black boxes with a SMILE: Statistical Model-agnostic Interpretability with Local Explanations

  • paper_url: http://arxiv.org/abs/2311.07286
  • repo_url: https://github.com/dependable-intelligent-systems-lab/xwhy
  • paper_authors: Koorosh Aslansefat, Mojgan Hashemian, Martin Walker, Mohammed Naveed Akram, Ioannis Sorokos, Yiannis Papadopoulos
  • for: 提高机器学习模型的可信度
  • methods: 使用统计距离度量进行解释性提高
  • results: 提高解释性不会减少模型的通用性
    Abstract Machine learning is currently undergoing an explosion in capability, popularity, and sophistication. However, one of the major barriers to widespread acceptance of machine learning (ML) is trustworthiness: most ML models operate as black boxes, their inner workings opaque and mysterious, and it can be difficult to trust their conclusions without understanding how those conclusions are reached. Explainability is therefore a key aspect of improving trustworthiness: the ability to better understand, interpret, and anticipate the behaviour of ML models. To this end, we propose SMILE, a new method that builds on previous approaches by making use of statistical distance measures to improve explainability while remaining applicable to a wide range of input data domains.
    摘要

TIAGo RL: Simulated Reinforcement Learning Environments with Tactile Data for Mobile Robots

  • paper_url: http://arxiv.org/abs/2311.07260
  • repo_url: None
  • paper_authors: Luca Lach, Francesco Ferro, Robert Haschke
  • For: The paper is written for researchers and developers working on robotic tasks that involve physical interaction, such as object manipulation.* Methods: The paper uses deep reinforcement learning (DRL) to learn complex behavior in robotics, specifically for the TIAGo service robot.* Results: The paper presents preliminary training results of a learned force control policy and compares it to a classical PI controller.Here’s the information in Simplified Chinese text:* For: 这篇论文是为了研究机器人完成物理互动任务而写的,如物体抓取等。* Methods: 这篇论文使用深度强化学习(DRL)来学习机器人行为,具体来说是为TIAGo服务机器人。* Results: 论文提供了一些初步训练结果,比较了一个学习的力控制策略和一个经典PI控制器。
    Abstract Tactile information is important for robust performance in robotic tasks that involve physical interaction, such as object manipulation. However, with more data included in the reasoning and control process, modeling behavior becomes increasingly difficult. Deep Reinforcement Learning (DRL) produced promising results for learning complex behavior in various domains, including tactile-based manipulation in robotics. In this work, we present our open-source reinforcement learning environments for the TIAGo service robot. They produce tactile sensor measurements that resemble those of a real sensorised gripper for TIAGo, encouraging research in transfer learning of DRL policies. Lastly, we show preliminary training results of a learned force control policy and compare it to a classical PI controller.
    摘要 感觉信息对于机器人完成物理互动任务时的稳定性有着重要的作用。然而,随着数据的增加,模型行为变得越来越复杂。深度强化学习(DRL)在不同领域中都有出色的表现,包括机器人的柔软 manipulate。在这篇文章中,我们公布了对TIAGo服务机器人的开源强化学习环境。它们生成了类似于真实感知器的抓取器的感知数据,鼓励研究在DRL策略的传递学习。最后,我们显示了一个学习的力控策略的初步训练结果,并与经典PI控制器进行比较。

Towards Transferring Tactile-based Continuous Force Control Policies from Simulation to Robot

  • paper_url: http://arxiv.org/abs/2311.07245
  • repo_url: None
  • paper_authors: Luca Lach, Robert Haschke, Davide Tateo, Jan Peters, Helge Ritter, Júlia Borràs, Carme Torras
  • for: 本研究旨在提出一种基于深度学习的无模型控制方法,用于控制 робоット在抓取物体时的力量。
  • methods: 该方法使用模拟环境生成实际的正常力,并使用深度学习算法来训练连续力控制策略。
  • results: 对比基eline,该方法在实际中表现出较高的性能,并且通过对域随机化和假设干扰进行了验证。Translation:
  • for: The purpose of this research is to propose a model-free deep reinforcement learning method for controlling the force of a robot when grasping objects.
  • methods: The method uses a simulation environment to generate realistic normal forces and employs deep learning algorithms to train continuous force control policies.
  • results: Compared to the baseline, the proposed method performs better in practical applications and is validated through domain randomization and ablation studies.
    Abstract The advent of tactile sensors in robotics has sparked many ideas on how robots can leverage direct contact measurements of their environment interactions to improve manipulation tasks. An important line of research in this regard is that of grasp force control, which aims to manipulate objects safely by limiting the amount of force exerted on the object. While prior works have either hand-modeled their force controllers, employed model-based approaches, or have not shown sim-to-real transfer, we propose a model-free deep reinforcement learning approach trained in simulation and then transferred to the robot without further fine-tuning. We therefore present a simulation environment that produces realistic normal forces, which we use to train continuous force control policies. An evaluation in which we compare against a baseline and perform an ablation study shows that our approach outperforms the hand-modeled baseline and that our proposed inductive bias and domain randomization facilitate sim-to-real transfer. Code, models, and supplementary videos are available on https://sites.google.com/view/rl-force-ctrl
    摘要 《机器人拥有感觉传感器后,许多想法就被提出来了,以便机器人通过直接接触环境来改进搅动任务。重要的一线研究在这方面是抓持力控制,它的目标是安全地搅动物体,限制搅动物体的力量。而在众所周知的方法中,有些人手动建模了他们的力控制器,有些人使用模型基本的方法,而其他人没有显示实验到实际的转移。我们则提出了一种没有模型基本的深度学习掌控方法,在模拟环境中训练继续力控制策略,然后将其转移到机器人上, без需要进一步的微调。因此,我们提供了一个生成真实正常力的模拟环境,用于训练连续力控制策略。我们对比基准和扫描研究表明,我们的方法高效性比手动建模基准高,并且我们提出的假设和随机预处理促进了实验到实际的转移。代码、模型和补充视频可以在https://sites.google.com/view/rl-force-ctrl中找到。》Note that Simplified Chinese is a written form of Chinese that uses simpler characters and grammar than Traditional Chinese. It is commonly used in mainland China and other parts of the world where Simplified Chinese is the standard form of Chinese.

  • paper_url: http://arxiv.org/abs/2311.07237
  • repo_url: https://github.com/ink-usc/link
  • paper_authors: Huihan Li, Yuting Ning, Zeyi Liao, Siyuan Wang, Xiang Lorraine Li, Ximing Lu, Faeze Brahman, Wenting Zhao, Yejin Choi, Xiang Ren
  • for: 这个论文的目的是为了系统地生成尖顶分布中的知识声明。
  • methods: 这个论文使用了一种名为Logic-Induced-Knowledge-Search(LINK)框架,通过使用一个符号语句作为基础,首先通过提示一个LLM获取初始值,然后通过批评者来验证这些值的正确性,最后通过推进器来强制实现尖顶分布。
  • results: 这个论文提出了一个名为Logic-Induced-Long-Tail(LINT)的数据集,包含200个符号规则和50000个知识声明,覆盖了四个领域。人工检验发现84%的声明是正确的。与此同时,ChatGPT和GPT4直接根据逻辑规则生成长尾声明时的正确率分别为56%和78%,而且他们的“长尾”生成实际上都处于更高的可能性范围内,因此不是真正的长尾。这些结论表明LINK是有效地生成尖顶分布中的数据,并且LINT可以用于系统地评估LLM的长尾分布能力。
    Abstract Since large language models have approached human-level performance on many tasks, it has become increasingly harder for researchers to find tasks that are still challenging to the models. Failure cases usually come from the long-tail distribution - data that an oracle language model could assign a probability on the lower end of its distribution. Current methodology such as prompt engineering or crowdsourcing are insufficient for creating long-tail examples because humans are constrained by cognitive bias. We propose a Logic-Induced-Knowledge-Search (LINK) framework for systematically generating long-tail knowledge statements. Grounded by a symbolic rule, we search for long-tail values for each variable of the rule by first prompting a LLM, then verifying the correctness of the values with a critic, and lastly pushing for the long-tail distribution with a reranker. With this framework we construct a dataset, Logic-Induced-Long-Tail (LINT), consisting of 200 symbolic rules and 50K knowledge statements spanning across four domains. Human annotations find that 84% of the statements in LINT are factually correct. In contrast, ChatGPT and GPT4 struggle with directly generating long-tail statements under the guidance of logic rules, each only getting 56% and 78% of their statements correct. Moreover, their "long-tail" generations in fact fall into the higher likelihood range, and thus are not really long-tail. Our findings suggest that LINK is effective for generating data in the long-tail distribution while enforcing quality. LINT can be useful for systematically evaluating LLMs' capabilities in the long-tail distribution. We challenge the models with a simple entailment classification task using samples from LINT. We find that ChatGPT and GPT4's capability in identifying incorrect knowledge drop by ~3% in the long-tail distribution compared to head distribution.
    摘要 Since large language models have approached human-level performance on many tasks, it has become increasingly difficult for researchers to find tasks that are still challenging to the models. Failure cases usually come from the long-tail distribution - data that an oracle language model could assign a probability on the lower end of its distribution. Current methodology such as prompt engineering or crowdsourcing are insufficient for creating long-tail examples because humans are constrained by cognitive bias. We propose a Logic-Induced-Knowledge-Search (LINK) framework for systematically generating long-tail knowledge statements. Grounded by a symbolic rule, we search for long-tail values for each variable of the rule by first prompting a LLM, then verifying the correctness of the values with a critic, and lastly pushing for the long-tail distribution with a reranker. With this framework we construct a dataset, Logic-Induced-Long-Tail (LINT), consisting of 200 symbolic rules and 50K knowledge statements spanning across four domains. Human annotations find that 84% of the statements in LINT are factually correct. In contrast, ChatGPT and GPT4 struggle with directly generating long-tail statements under the guidance of logic rules, each only getting 56% and 78% of their statements correct. Moreover, their "long-tail" generations in fact fall into the higher likelihood range, and thus are not really long-tail. Our findings suggest that LINK is effective for generating data in the long-tail distribution while enforcing quality. LINT can be useful for systematically evaluating LLMs' capabilities in the long-tail distribution. We challenge the models with a simple entailment classification task using samples from LINT. We find that ChatGPT and GPT4's capability in identifying incorrect knowledge drops by ~3% in the long-tail distribution compared to head distribution.

IASCAR: Incremental Answer Set Counting by Anytime Refinement

  • paper_url: http://arxiv.org/abs/2311.07233
  • repo_url: None
  • paper_authors: Johannes K. Fichte, Sarah Alice Gaggl, Markus Hecher, Dominik Rusovac
  • for: 这篇论文旨在探讨 Ansemble Programming(ASP)中 counting answer sets 的问题,以及如何使用知识编译来提高计数效率。
  • methods: 本文使用了知识编译技术,将 ASP 程序转换成 CNF 式,然后使用 inclusion-exclusion principle 进行系统的排除和包含计数,以提高计数效率。
  • results: 在预liminary empirical analysis中,本文 demonstarted promising results,指出iterative counting可以快速计数 answer sets,并且可以提高计数效率。
    Abstract Answer set programming (ASP) is a popular declarative programming paradigm with various applications. Programs can easily have many answer sets that cannot be enumerated in practice, but counting still allows quantifying solution spaces. If one counts under assumptions on literals, one obtains a tool to comprehend parts of the solution space, so-called answer set navigation. However, navigating through parts of the solution space requires counting many times, which is expensive in theory. Knowledge compilation compiles instances into representations on which counting works in polynomial time. However, these techniques exist only for CNF formulas, and compiling ASP programs into CNF formulas can introduce an exponential overhead. This paper introduces a technique to iteratively count answer sets under assumptions on knowledge compilations of CNFs that encode supported models. Our anytime technique uses the inclusion-exclusion principle to improve bounds by over- and undercounting systematically. In a preliminary empirical analysis, we demonstrate promising results. After compiling the input (offline phase), our approach quickly (re)counts.
    摘要

Large Language Models for Robotics: A Survey

  • paper_url: http://arxiv.org/abs/2311.07226
  • repo_url: https://github.com/Aryia-Behroziuan/Other-sources
  • paper_authors: Fanlong Zeng, Wensheng Gan, Yongheng Wang, Ning Liu, Philip S. Yu
  • for: This paper aims to provide a comprehensive review of the applications of large language models (LLMs) in robotics, exploring their impact and contributions to key areas such as robot control, perception, decision-making, and path planning.
  • methods: The paper uses a variety of techniques, including those employed in perception, decision-making, control, and interaction, to demonstrate the potential of LLMs in enhancing robot intelligence and human-robot interaction.
  • results: The paper highlights recent advancements in robotics models based on LLMs, including their ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. The paper also explores the potential challenges that LLMs may face in the near future, such as the need for more diverse and nuanced training data.
    Abstract The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered increasing attention. LLMs possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. Researchers and engineers in the field of robotics have recognized the immense potential of LLMs in enhancing robot intelligence, human-robot interaction, and autonomy. Therefore, this comprehensive review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning. We first provide an overview of the background and development of LLMs for robotics, followed by a description of the benefits of LLMs for robotics and recent advancements in robotics models based on LLMs. We then delve into the various techniques used in the model, including those employed in perception, decision-making, control, and interaction. Finally, we explore the applications of LLMs in robotics and some potential challenges they may face in the near future. Embodied intelligence is the future of intelligent science, and LLMs-based robotics is one of the promising but challenging paths to achieve this.
    摘要 人类的学习、总结和控制复杂 manipulate 任务的能力,我们称之为dexterity intelligence。了解和评估这种智能是一项复杂的任务。随着大语言模型(LLMs)的快速进步和广泛普及,它们在机器人领域的应用得到了越来越多的注意。LLMs具有处理和生成自然语言的能力,可以为机器人交互和合作提供高效的方式。 robotics 领域的研究人员和工程师认为,LLMs在机器人智能、人机交互和自主性方面具有巨大的潜力。因此,本评论的目的是总结LLMs在机器人领域的应用,探讨它们在机器人控制、观察、决策和规划等领域的影响和贡献。我们首先提供LLMs在机器人领域的背景和发展,然后描述LLMs在机器人领域的利点和最新的机器人模型的发展。然后,我们介绍了使用在模型中的各种技术,包括在观察、决策、控制和交互中使用的技术。最后,我们探讨了LLMs在机器人领域的应用和未来可能面临的挑战。聚合智能是未来智能科学的未来,LLMs-based robotics 是一条擅长但挑战性的道路。

Optical Quantum Sensing for Agnostic Environments via Deep Learning

  • paper_url: http://arxiv.org/abs/2311.07203
  • repo_url: None
  • paper_authors: Zeqiao Zhou, Yuxuan Du, Xu-Fei Yin, Shanshan Zhao, Xinmei Tian, Dacheng Tao
  • for: 这 paper 的目的是提高光学量子探测的精度,并在不知情环境中实现 Heisenberg 限制。
  • methods: 该 paper 使用了深度学习技术,包括图 neural network 预测器和 trigonometric interpolating 算法,以实现光学量子探测的高精度。
  • results: experiments 表明,该方法可以在不同的设置下达到高精度水平,并且可以在 eight photons 下实现最大的 quantum Fisher information。
    Abstract Optical quantum sensing promises measurement precision beyond classical sensors termed the Heisenberg limit (HL). However, conventional methodologies often rely on prior knowledge of the target system to achieve HL, presenting challenges in practical applications. Addressing this limitation, we introduce an innovative Deep Learning-based Quantum Sensing scheme (DQS), enabling optical quantum sensors to attain HL in agnostic environments. DQS incorporates two essential components: a Graph Neural Network (GNN) predictor and a trigonometric interpolation algorithm. Operating within a data-driven paradigm, DQS utilizes the GNN predictor, trained on offline data, to unveil the intrinsic relationships between the optical setups employed in preparing the probe state and the resulting quantum Fisher information (QFI) after interaction with the agnostic environment. This distilled knowledge facilitates the identification of optimal optical setups associated with maximal QFI. Subsequently, DQS employs a trigonometric interpolation algorithm to recover the unknown parameter estimates for the identified optical setups. Extensive experiments are conducted to investigate the performance of DQS under different settings up to eight photons. Our findings not only offer a new lens through which to accelerate optical quantum sensing tasks but also catalyze future research integrating deep learning and quantum mechanics.
    摘要 DQS consists of two essential components: a graph neural network (GNN) predictor and a trigonometric interpolation algorithm. The GNN predictor is trained on offline data to reveal the intrinsic relationships between the optical setups used to prepare the probe state and the resulting quantum Fisher information (QFI) after interaction with the agnostic environment. This distilled knowledge allows for the identification of optimal optical setups associated with maximal QFI.Subsequently, DQS employs a trigonometric interpolation algorithm to recover the unknown parameter estimates for the identified optical setups. We conduct extensive experiments to investigate the performance of DQS under different settings, including up to eight photons. Our findings not only offer a new approach to accelerate optical quantum sensing tasks but also pave the way for future research integrating deep learning and quantum mechanics.

Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer

  • paper_url: http://arxiv.org/abs/2311.07191
  • repo_url: None
  • paper_authors: Narmada Naik, Ayush Khandelwal, Mohit Joshi, Madhusudan Atre, Hollis Wright, Kavya Kannan, Scott Hill, Giridhar Mamidipudi, Ganapati Srinivasa, Carlo Bifulco, Brian Piening, Kevin Matlock
  • for: 这 paper 是为了研究使用 Large Language Models (LLMs) 来解决 causal discovery 中的 edge 方向性问题。
  • methods: 这 paper 使用了 LLMs 来预测 causal graph 中 edge 的方向性,并对比了现有的状态 искусственного智能方法。
  • results: 结果显示,LLMs 可以准确预测 causal graph 中 edge 的方向性,并且表现出色于现有的状态 искусственный智能方法。
    Abstract Causal discovery is becoming a key part in medical AI research. These methods can enhance healthcare by identifying causal links between biomarkers, demographics, treatments and outcomes. They can aid medical professionals in choosing more impactful treatments and strategies. In parallel, Large Language Models (LLMs) have shown great potential in identifying patterns and generating insights from text data. In this paper we investigate applying LLMs to the problem of determining the directionality of edges in causal discovery. Specifically, we test our approach on a deidentified set of Non Small Cell Lung Cancer(NSCLC) patients that have both electronic health record and genomic panel data. Graphs are validated using Bayesian Dirichlet estimators using tabular data. Our result shows that LLMs can accurately predict the directionality of edges in causal graphs, outperforming existing state-of-the-art methods. These findings suggests that LLMs can play a significant role in advancing causal discovery and help us better understand complex systems.
    摘要 隐含推理是医疗人工智能研究中越来越重要的一部分。这些方法可以增强医疗效果,通过找到生物标志物、人口、治疗和结果之间的 causal 连接。它们可以帮助医疗专业人员选择更有效的治疗和策略。在这篇论文中,我们调查了应用 Large Language Models(LLMs)来确定 causal 推理中的Edge方向。我们在一个医疗记录和 genomic 数据集上进行了测试,并使用 bayesian Dirichlet estimator 验证图表。我们的结果表明,LLMs 可以准确预测 causal 图中的 Edge 方向,超过现有的状态艺技术。这些发现建议 LLMs 可以在 causal 推理中发挥重要作用,帮助我们更好地理解复杂系统。

Cross-Axis Transformer with 2D Rotary Embeddings

  • paper_url: http://arxiv.org/abs/2311.07184
  • repo_url: None
  • paper_authors: Lily Erickson
  • for: 这篇论文是为了解决计算效率低下,模式缺乏适应性的视觉转换器问题而写的。
  • methods: 该论文提出了一种基于 Axial Transformers 和 Microsoft 的 Retentive Network 的模型,称为 Cross-Axis Transformer (CAT),可以减少处理图像所需的浮点运算数量,同时更快速地达到更高准确性。
  • results: CAT 模型在比较 Vision Transformers 的情况下,可以更快速地训练,并且在图像处理任务上表现更高准确性。
    Abstract Despite lagging behind their modal cousins in many respects, Vision Transformers have provided an interesting opportunity to bridge the gap between sequence modeling and image modeling. Up until now however, vision transformers have largely been held back, due to both computational inefficiency, and lack of proper handling of spatial dimensions. In this paper, we introduce the Cross-Axis Transformer. CAT is a model inspired by both Axial Transformers, and Microsoft's recent Retentive Network, that drastically reduces the required number of floating point operations required to process an image, while simultaneously converging faster and more accurately than the Vision Transformers it replaces.
    摘要 尽管模型 cousin 在多种方面落后,视觉 трансформа器仍提供了将序列模型和图像模型桥接的有趣机会。然而,视觉 трансформа器 hasta 现在都受到了计算效率不足和空间维度处理不当的限制。在这篇论文中,我们介绍了横轴 transformer(CAT)。CAT 是基于 Axial Transformers 和 Microsof 的Recent Retentive Network的模型,可以减少处理图像所需的浮点运算数量,同时 convergence faster 和更准确地 than Vision Transformers。

Knowledge Graph Representations to enhance Intensive Care Time-Series Predictions

  • paper_url: http://arxiv.org/abs/2311.07180
  • repo_url: None
  • paper_authors: Samyak Jain, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova
  • for: 增强Intensive Care Units(ICU)的临床结果预测,需要全面的病人数据集成。
  • methods: 使用悬崖学进步,将病人时间序列数据和不结构化医疗报告 integrate,提高预测性能。
  • results: 结合医疗领域的数据,使用知识图 Derived from clinical ontologies like the Unified Medical Language System (UMLS),提高临床决策模型。 组合图表示与生命 Parameters和临床报告,提高性能,尤其是数据缺失时。 此外,我们的模型还包括可解释组件,以便理解知识图节点如何影响预测。
    Abstract Intensive Care Units (ICU) require comprehensive patient data integration for enhanced clinical outcome predictions, crucial for assessing patient conditions. Recent deep learning advances have utilized patient time series data, and fusion models have incorporated unstructured clinical reports, improving predictive performance. However, integrating established medical knowledge into these models has not yet been explored. The medical domain's data, rich in structural relationships, can be harnessed through knowledge graphs derived from clinical ontologies like the Unified Medical Language System (UMLS) for better predictions. Our proposed methodology integrates this knowledge with ICU data, improving clinical decision modeling. It combines graph representations with vital signs and clinical reports, enhancing performance, especially when data is missing. Additionally, our model includes an interpretability component to understand how knowledge graph nodes affect predictions.
    摘要 医院床位加护部 (ICU) 需要全面的患者数据集成以提高临床结果预测,这是评估患者状况的关键。最近的深度学习突破使用了患者时间序数据,并将不结构化的医疗报告 fusion 到模型中,以提高预测性能。但是,将成熔的医疗领域数据(rich in structural relationships)integrated into these models has not yet been explored。我们的提议的方法是通过临床 ontology 如 Unified Medical Language System (UMLS) derivation 的知识图来捕捉医疗领域的数据,从而提高临床决策模型。这种方法结合了图表示法和生命 parameter 和临床报告,以提高性能,特别是在数据缺失时。此外,我们的模型还包括一个可解释性组件,以便理解知识图节点如何影响预测。

Game Solving with Online Fine-Tuning

  • paper_url: http://arxiv.org/abs/2311.07178
  • repo_url: https://github.com/rlglab/online-fine-tuning-solver
  • paper_authors: Ti-Rong Wu, Hung Guei, Ting Han Wei, Chung-Chin Shih, Jui-Te Chin, I-Chen Wu
  • for: solves challenging 7x7 Killall-Go problems with online fine-tuning, using less computation time than traditional methods.
  • methods: applies online fine-tuning and proposes two tailor-designed heuristics for game solving.
  • results: solves a series of challenging 7x7 Killall-Go problems with 23.54% less computation time compared to the baseline, and the savings scale with problem size.
    Abstract Game solving is a similar, yet more difficult task than mastering a game. Solving a game typically means to find the game-theoretic value (outcome given optimal play), and optionally a full strategy to follow in order to achieve that outcome. The AlphaZero algorithm has demonstrated super-human level play, and its powerful policy and value predictions have also served as heuristics in game solving. However, to solve a game and obtain a full strategy, a winning response must be found for all possible moves by the losing player. This includes very poor lines of play from the losing side, for which the AlphaZero self-play process will not encounter. AlphaZero-based heuristics can be highly inaccurate when evaluating these out-of-distribution positions, which occur throughout the entire search. To address this issue, this paper investigates applying online fine-tuning while searching and proposes two methods to learn tailor-designed heuristics for game solving. Our experiments show that using online fine-tuning can solve a series of challenging 7x7 Killall-Go problems, using only 23.54% of computation time compared to the baseline without online fine-tuning. Results suggest that the savings scale with problem size. Our method can further be extended to any tree search algorithm for problem solving. Our code is available at https://rlg.iis.sinica.edu.tw/papers/neurips2023-online-fine-tuning-solver.
    摘要 GAME解释是一种类似 yet更加困难的任务,即找到游戏中的游戏理论价值(基于最优游戏策略),并可选择一个全局策略以实现该结果。AlphaZero算法已经展示出了超人类水平的游戏表现,并且其强大的策略和价值预测也可以作为游戏解释的依据。然而,为了解决游戏并获得全局策略,需要找到对游戏中落后一方的所有移动都有赢的回应。这包括落后一方的很差游戏行为,AlphaZero自动游戏过程中不会遇到这些位置。AlphaZero基于的依据可能在这些 OUT-OF-distribution 位置上高度不准确,这些位置在搜索中occurs throughout the entire search。为解决这个问题,这篇文章提出了在搜索过程中进行在线细化的方法,并提出了两种学习特定的依据来解决游戏。我们的实验表明,使用在线细化可以解决一系列复杂的 7x7 Killall-Go 问题,使用了23.54%的计算时间,相比无在线细化基eline。结果表明,这些节省可以扩大到问题的大小。我们的方法可以进一步扩展到任何树搜索算法来解决问题。我们的代码可以在 中找到。

The High-dimensional Phase Diagram and the Large CALPHAD Model

  • paper_url: http://arxiv.org/abs/2311.07174
  • repo_url: None
  • paper_authors: Zhengdi Liu, Xulong An, Wenwen Sun
  • for: 针对多元素合金系统中的复杂性问题,我们在FeNiCrMn合金系统中引入了大型CALPHAD模型(LCM),以计算所有可能的相态空间。
  • methods: 我们使用了高维度相态图和哈希表+深度优先搜索(DFS)等方法,系统地结构化了巨量数据,并实现了97%的分类精度和4.80*10^-5的平均方差。
  • results: 我们成功划分了FeNiCrMn合金系统中的51个独特相态空间,并示例了该方法可用于设计所有439种冷峰合金。这种新的方法将对合金设计技术和多变量问题产生巨大的影响。
    Abstract When alloy systems comprise more than three elements, the visualization of the entire phase space becomes not only daunting but is also accompanied by a data surge. Addressing this complexity, we delve into the FeNiCrMn alloy system and introduce the Large CALPHAD Model (LCM). The LCM acts as a computational conduit, capturing the entire phase space. Subsequently, this enormous data is systematically structured using a high-dimensional phase diagram, aided by hash tables and Depth-first Search (DFS), rendering it both digestible and programmatically accessible. Remarkably, the LCM boasts a 97% classification accuracy and a mean square error of 4.80*10-5 in phase volume prediction. Our methodology successfully delineates 51 unique phase spaces in the FeNiCrMn system, exemplifying its efficacy with the design of all 439 eutectic alloys. This pioneering methodology signifies a monumental shift in alloy design techniques or even multi-variable problems.
    摘要 Using high-dimensional phase diagrams, hash tables, and Depth-first Search (DFS), we are able to structure the data in a way that is both digestible and programmatically accessible. Remarkably, the LCM has a 97% classification accuracy and a mean square error of 4.80*10-5 in phase volume prediction.Our methodology successfully delineates 51 unique phase spaces in the FeNiCrMn system, demonstrating its effectiveness in designing all 439 eutectic alloys. This groundbreaking approach represents a significant shift in alloy design techniques and multi-variable problem-solving.

STEER: Unified Style Transfer with Expert Reinforcement

  • paper_url: http://arxiv.org/abs/2311.07167
  • repo_url: None
  • paper_authors: Skyler Hallinan, Faeze Brahman, Ximing Lu, Jaehun Jung, Sean Welleck, Yejin Choi
  • for: 本文主要针对的问题是如何实现文本Style Transfer,即将文本从一个未知的源风格转换到一个目标风格中。
  • methods: 我们提出了STEER:一种基于专家强化的统一框架,通过自动生成样式转移对的数据集来解决限制了并行数据的问题。STEER使用了在解码过程中自动生成的产品专家来生成样式转移对的数据集,然后使用这些数据集来预训练初始策略,然后使用在线、离线的强化学习来进一步改进。
  • results: 我们在一个复杂的数据集上进行了实验,与竞争对手比较,得到了最佳的结果。尤其是,STEER在总样式转移质量方面比175B参数的指定调节GPT-3高,即使其只有226倍小于GPT-3。此外,我们还证明了STEER在不同风格的数据上保持了样式转移能力,并在多种风格下超越了大多数基准值。
    Abstract While text style transfer has many applications across natural language processing, the core premise of transferring from a single source style is unrealistic in a real-world setting. In this work, we focus on arbitrary style transfer: rewriting a text from an arbitrary, unknown style to a target style. We propose STEER: Unified Style Transfer with Expert Reinforcement, a unified frame-work developed to overcome the challenge of limited parallel data for style transfer. STEER involves automatically generating a corpus of style-transfer pairs using a product of experts during decoding. The generated offline data is then used to pre-train an initial policy before switching to online, off-policy reinforcement learning for further improvements via fine-grained reward signals. STEER is unified and can transfer to multiple target styles from an arbitrary, unknown source style, making it particularly flexible and efficient. Experimental results on a challenging dataset with text from a diverse set of styles demonstrate state-of-the-art results compared to competitive baselines. Remarkably, STEER outperforms the 175B parameter instruction-tuned GPT-3 on overall style transfer quality, despite being 226 times smaller in size. We also show STEER is robust, maintaining its style transfer capabilities on out-of-domain data, and surpassing nearly all baselines across various styles. The success of our method highlights the potential of RL algorithms when augmented with controllable decoding to overcome the challenge of limited data supervision.
    摘要 While 文本样式传递有很多应用于自然语言处理领域,核心假设从单个来源样式传递是在实际世界中不切实际的。在这项工作中,我们关注于 произвольные样式传递:将文本从未知样式转换到目标样式。我们提出了STEER:一种综合框架,通过专家激励来超越有限平行数据的限制。STEER通过在解码过程中自动生成样式传递对的自动生成器来生成偏好的样式传递对。然后,使用先进的策略进行在线、离线权重学习,以进一步改进精细的奖励信号。STEER可以同时转换多种目标样式,从未知样式中转换,使其特icularly 灵活和高效。我们的实验结果表明,STEER在一个复杂的数据集上达到了现状最佳的效果,比基elines表现出色。尤其是,STEER在 parameter 175B 的 GPT-3 上进行了 instruction-tuned 的实验,而且在总体样式传递质量方面表现出了优于基elines。此外,我们还证明了STEER在域外数据上保持了样式传递能力,并在不同的样式下超越了大部分基elines。这一成功表明了RL算法在加入可控的解码后可以超越有限数据指导的挑战。

Pruning random resistive memory for optimizing analogue AI

  • paper_url: http://arxiv.org/abs/2311.07164
  • repo_url: None
  • paper_authors: Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
    for: 这篇论文旨在解决人工智能(AI)的能源消耗和环境可持续性问题,通过恢复 аналогов计算。methods: 该论文使用了软件硬件协同设计,结合结构塑性激活边缘剪裁来优化 randomly weighted 分布式 resistive memory neural network 的 topology。results: 该论文在 FashionMNIST、Spoken digits 和 DRIVE 数据集上实现了17.3%、19.9% 和 9.8% 的准确率提升,同时实现了 82.1%、51.2% 和 99.8% 的能效率提升。
    Abstract The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic devices, such as resistive memory, which features in-memory computing, high scalability, and nonvolatility. However, analogue computing still faces the same challenges as before: programming nonidealities and expensive programming due to the underlying devices physics. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network. Software-wise, the topology of a randomly weighted neural network is optimized by pruning connections rather than precisely tuning resistive memory weights. Hardware-wise, we reveal the physical origin of the programming stochasticity using transmission electron microscopy, which is leveraged for large-scale and low-cost implementation of an overparameterized random neural network containing high-performance sub-networks. We implemented the co-design on a 40nm 256K resistive memory macro, observing 17.3% and 19.9% accuracy improvements in image and audio classification on FashionMNIST and Spoken digits datasets, as well as 9.8% (2%) improvement in PR (ROC) in image segmentation on DRIVE datasets, respectively. This is accompanied by 82.1%, 51.2%, and 99.8% improvement in energy efficiency thanks to analogue in-memory computing. By embracing the intrinsic stochasticity and in-memory computing, this work may solve the biggest obstacle of analogue computing systems and thus unleash their immense potential for next-generation AI hardware.
    摘要 人工智能(AI)的快速发展已经由大型语言模型展示了人类智能水平。然而,这些模型也带来了前所未有的能源消耗和环境可持续性挑战。一种有前途的解决方案是探索Analog computing,这是数字计算的前代技术,它利用新型的Analog电子设备,如抗抗压记忆,实现了内存计算、扩展性和不朽性。然而,Analog计算仍面临以下挑战:编程不 ideal和开销较高。在这里,我们报告了一种通用解决方案:软硬件协同设计,使用结构塑性-灵感导向的边缘剔除来优化Randomly weighted Analog resistive memory neural network的topology。软件端,通过剔除连接而不是精准地调整抗抗压记忆权重来优化Randomly weighted neural network的topology。硬件端,我们通过电子显微镜探测到了设备物理的编程随机性的 физи学起源,并利用这一发现实现了大规模、低成本的实现一个高性能的随机神经网络,包括高性能的子网络。我们在40nm 256K抗抗压记忆macro上实现了该协同设计,在FashionMNIST和Spoken digits datasets上观察到了图像和音频分类的准确率提高17.3%和19.9%,以及图像分割任务中的PR(ROC)提高9.8%(2%)。此外,我们还观察到了82.1%、51.2%和99.8%的能效提升。通过拥抱内在的随机性和内存计算,这种工作可能解决了Analog计算系统中最大的障碍,从而释放了这些系统的巨大潜力,用于下一代AI硬件。

Enhancing Lightweight Neural Networks for Small Object Detection in IoT Applications

  • paper_url: http://arxiv.org/abs/2311.07163
  • repo_url: None
  • paper_authors: Liam Boyle, Nicolas Baumann, Seonyeong Heo, Michele Magno
  • for: 提高小物体检测精度并适用于嵌入式设备
  • methods: 提出了一种适用于任何现有物体检测器的适应划分方法,包括FOMO网络
  • results: 实验结果表明,该方法可以提高F1分数达225%,同时降低平均物体计数错误达76%,并且表明使用软F1损失可以有效降低不均衡数据的负面影响。
    Abstract Advances in lightweight neural networks have revolutionized computer vision in a broad range of IoT applications, encompassing remote monitoring and process automation. However, the detection of small objects, which is crucial for many of these applications, remains an underexplored area in current computer vision research, particularly for embedded devices. To address this gap, the paper proposes a novel adaptive tiling method that can be used on top of any existing object detector including the popular FOMO network for object detection on microcontrollers. Our experimental results show that the proposed tiling method can boost the F1-score by up to 225% while reducing the average object count error by up to 76%. Furthermore, the findings of this work suggest that using a soft F1 loss over the popular binary cross-entropy loss can significantly reduce the negative impact of imbalanced data. Finally, we validate our approach by conducting experiments on the Sony Spresense microcontroller, showcasing the proposed method's ability to strike a balance between detection performance, low latency, and minimal memory consumption.
    摘要 新型轻量级神经网络的进步已经对互联网器件应用领域的计算机视觉领域进行了革命性的改变,涵盖远程监测和流程自动化。然而,对小对象的探测,这是现有计算机视觉研究中尚未得到充分研究的领域,特别是在嵌入式设备上。为了解决这个差距,该篇论文提出了一种新的适应分割方法,可以在现有的对象探测器之上使用,包括受欢迎的FOMO网络。我们的实验结果表明,提议的分割方法可以提高F1分数的最大提升为225%,并同时降低平均对象计数错误的最大降低为76%。此外,我们的研究发现,使用软F1损失函数相比于popular binary cross-entropy损失函数可以significantly reduce the negative impact of imbalanced data。最后,我们验证了我们的方法,通过在Sony Spresense微控制器上进行实验,示出了我们的方法可以在探测性能、延迟时间和内存占用量之间做出平衡。

Interaction is all You Need? A Study of Robots Ability to Understand and Execute

  • paper_url: http://arxiv.org/abs/2311.07150
  • repo_url: https://github.com/nid989/teach_edh
  • paper_authors: Kushal Koshti, Nidhir Bhavsar
  • for: 本研究旨在帮助机器人在人类环境中自然语言互动下进行高效的任务解决,具体来说是帮助机器人理解和执行复杂的指令在连续对话中解决复杂任务。
  • methods: 我们基于执行对话历史(EDH)任务从教学标准底采用多变换器模型和BARTLM。我们发现我们最佳配置在基准点上出现了8.85的成功率和14.02的目标相关成功率。此外,我们还提出了一种新的完成这个任务的方法。
  • results: 我们评估了多个BART模型和LLaMA2 LLMC,其中LLaMA2 LLMC在这个任务上达到了46.77的ROGUE-L分数。
    Abstract This paper aims to address a critical challenge in robotics, which is enabling them to operate seamlessly in human environments through natural language interactions. Our primary focus is to equip robots with the ability to understand and execute complex instructions in coherent dialogs to facilitate intricate task-solving scenarios. To explore this, we build upon the Execution from Dialog History (EDH) task from the Teach benchmark. We employ a multi-transformer model with BART LM. We observe that our best configuration outperforms the baseline with a success rate score of 8.85 and a goal-conditioned success rate score of 14.02. In addition, we suggest an alternative methodology for completing this task. Moreover, we introduce a new task by expanding the EDH task and making predictions about game plans instead of individual actions. We have evaluated multiple BART models and an LLaMA2 LLM, which has achieved a ROGUE-L score of 46.77 for this task.
    摘要 Note:* "Teach benchmark" refers to a standardized evaluation framework for natural language understanding and execution in robotics.* "EDH task" stands for "Execution from Dialog History" task, which involves understanding and executing complex instructions given in a coherent dialogue.* "BART LM" refers to a type of language model called Bayesian Artificial Robot Teacher, which is a machine learning model used for natural language understanding and generation.* "ROGUE-L" is a score used to evaluate the performance of language models in task-oriented dialogues, with higher scores indicating better performance.

  • paper_url: http://arxiv.org/abs/2311.07139
  • repo_url: None
  • paper_authors: Arshika Lalan, Shresth Verma, Kumar Madhu Sudan, Amrita Mahale, Aparna Hegde, Milind Tambe, Aparna Taneja
  • for: 这项研究是为了分析 Kilkari 移动医疗计划的使用者行为,并提出改进方案以增强该项目的效果。
  • methods: 研究使用时间序列预测分析 Beneficiary 的dropout行为,并将结果应用于NGO 的滥耗预测和滥耗预防策略。
  • results: 研究发现,通过分析 Beneficiary 的 listened pattern,可以帮助NGO 更好地了解 Beneficiary 的需求,并采取时间序列预测的方法可以预测 Beneficiary 的dropout。
    Abstract Mobile health programs are becoming an increasingly popular medium for dissemination of health information among beneficiaries in less privileged communities. Kilkari is one of the world's largest mobile health programs which delivers time sensitive audio-messages to pregnant women and new mothers. We have been collaborating with ARMMAN, a non-profit in India which operates the Kilkari program, to identify bottlenecks to improve the efficiency of the program. In particular, we provide an initial analysis of the trajectories of beneficiaries' interaction with the mHealth program and examine elements of the program that can be potentially enhanced to boost its success. We cluster the cohort into different buckets based on listenership so as to analyze listenership patterns for each group that could help boost program success. We also demonstrate preliminary results on using historical data in a time-series prediction to identify beneficiary dropouts and enable NGOs in devising timely interventions to strengthen beneficiary retention.
    摘要 移动卫生计划在贫困社区中普遍用于卫生信息的传递。基尔卡莉是世界上最大的移动卫生计划之一,它通过发送时敏感的音频消息,为怀孕妈妈和新生妈妈提供卫生信息。我们与印度非营利组织ARMMAN合作,以便识别项目中的瓶颈,并提高项目的效率。我们对参与者的行为轨迹进行了初步分析,并分析每个组的听众模式,以帮助提高项目的成功。我们还采用历史数据时序预测,以预测受助者退出,并帮助非政府组织制定时间性的干预措施,以增强受助者的保留。

WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07138
  • repo_url: https://github.com/THU-KEG/WaterBench
  • paper_authors: Shangqing Tu, Yuliang Sun, Yushi Bai, Jifan Yu, Lei Hou, Juanzi Li
  • For: 本研究旨在 evaluating the effectiveness of large language model (LLM) watermarking algorithms, and providing a comprehensive benchmark for these algorithms.* Methods: 本研究使用了 four open-source watermarks on two LLMs under two watermarking strengths, and evaluates the generation and detection performance of these watermarks using a five-category taxonomy of tasks.* Results: 研究发现 current LLM watermarking algorithms 面临着 maintaining generation quality 的挑战,并且 observe 了 these algorithms’ decline in instruction-following abilities after watermarking.
    Abstract To mitigate the potential misuse of large language models (LLMs), recent research has developed watermarking algorithms, which restrict the generation process to leave an invisible trace for watermark detection. Due to the two-stage nature of the task, most studies evaluate the generation and detection separately, thereby presenting a challenge in unbiased, thorough, and applicable evaluations. In this paper, we introduce WaterBench, the first comprehensive benchmark for LLM watermarks, in which we design three crucial factors: (1) For \textbf{benchmarking procedure}, to ensure an apples-to-apples comparison, we first adjust each watermarking method's hyper-parameter to reach the same watermarking strength, then jointly evaluate their generation and detection performance. (2) For \textbf{task selection}, we diversify the input and output length to form a five-category taxonomy, covering $9$ tasks. (3) For \textbf{evaluation metric}, we adopt the GPT4-Judge for automatically evaluating the decline of instruction-following abilities after watermarking. We evaluate $4$ open-source watermarks on $2$ LLMs under $2$ watermarking strengths and observe the common struggles for current methods on maintaining the generation quality. The code and data are available at \url{https://github.com/THU-KEG/WaterBench}.
    摘要 为了遏制大语言模型(LLM)的潜在违用,latest research 已经开发出水印算法,以限制生成过程,留下隐藏的水印检测。由于这是一个两 stage 的任务, większe studies 通常分开评估生成和检测,从而带来一个挑战:做出不偏袋化、全面和实用的评估。在这篇论文中,我们介绍 WaterBench,第一个对 LLM 水印的完整Benchmark,其中我们设计了三个关键因素:1. 对于 benchmarking 过程,以确保比较公平,我们首先调整每种水印方法的超参数,使其达到同等的水印强度,然后并行评估其生成和检测性能。2. 对于任务选择,我们将输入和输出长度 diversify 到组成五类分类,涵盖了9个任务。3. 对于评估 metric,我们采用 GPT4-Judge 自动评估水印后 instrucions 的退化程度。我们对两种 LL 进行了两种水印强度的评估,并观察到当前方法在保持生成质量方面的普遍困难。代码和数据可以在 上获取。

Understanding Path Planning Explanations

  • paper_url: http://arxiv.org/abs/2311.07132
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Amar Halilovic, Senka Krivic
  • for: 本研究旨在解释移动机器人的导航决策。
  • methods: 我们提出了一种使用视觉和文本解释来解释机器人的导航决策。
  • results: 我们计划通过用户研究测试机器人的解释的理解性和简洁性,并启动未来研究计划。
    Abstract Navigation is a must-have skill for any mobile robot. A core challenge in navigation is the need to account for an ample number of possible configurations of environment and navigation contexts. We claim that a mobile robot should be able to explain its navigational choices making its decisions understandable to humans. In this paper, we briefly present our approach to explaining navigational decisions of a robot through visual and textual explanations. We propose a user study to test the understandability and simplicity of the robot explanations and outline our further research agenda.
    摘要 Navigation 是移动机器人必备的技能之一。核心挑战在于需要考虑多种环境配置和导航上下文。我们认为移动机器人应该能够解释其导航选择,使其决策能够被人类理解。在这篇论文中,我们简要介绍了我们如何通过视觉和文本解释来解释机器人的导航选择。我们提出了用户研究,以测试机器人解释的理解度和简洁度,并述出我们未来研究论点。Note: "Simplified Chinese" refers to the standardized form of Chinese used in mainland China and Singapore, which is different from "Traditional Chinese" used in Hong Kong, Taiwan, and other countries.

Untargeted Black-box Attacks for Social Recommendations

  • paper_url: http://arxiv.org/abs/2311.07127
  • repo_url: None
  • paper_authors: Wenqi Fan, Shijie Wang, Xiao-yong Wei, Xiaowei Mei, Qing Li
  • for: 这研究旨在攻击社交推荐系统,即使在黑盒模式下。
  • methods: 该研究提出了一种基于多代理学习的攻击框架,即 Multiattack,以协调生成冷启动ITEM的 Profil和跨社区社交关系,以对黑盒社交推荐系统进行无目标攻击。
  • results: 对多个实际数据集进行了广泛的实验,证明了我们的提出的攻击框架在黑盒模式下的效果。
    Abstract The rise of online social networks has facilitated the evolution of social recommender systems, which incorporate social relations to enhance users' decision-making process. With the great success of Graph Neural Networks in learning node representations, GNN-based social recommendations have been widely studied to model user-item interactions and user-user social relations simultaneously. Despite their great successes, recent studies have shown that these advanced recommender systems are highly vulnerable to adversarial attacks, in which attackers can inject well-designed fake user profiles to disrupt recommendation performances. While most existing studies mainly focus on targeted attacks to promote target items on vanilla recommender systems, untargeted attacks to degrade the overall prediction performance are less explored on social recommendations under a black-box scenario. To perform untargeted attacks on social recommender systems, attackers can construct malicious social relationships for fake users to enhance the attack performance. However, the coordination of social relations and item profiles is challenging for attacking black-box social recommendations. To address this limitation, we first conduct several preliminary studies to demonstrate the effectiveness of cross-community connections and cold-start items in degrading recommendations performance. Specifically, we propose a novel framework Multiattack based on multi-agent reinforcement learning to coordinate the generation of cold-start item profiles and cross-community social relations for conducting untargeted attacks on black-box social recommendations. Comprehensive experiments on various real-world datasets demonstrate the effectiveness of our proposed attacking framework under the black-box setting.
    摘要 “在线社交网络的兴起,促进了社交推荐系统的进化,这些系统将社交关系纳入用户决策过程中。基于图神经网络的社交推荐系统在学习用户-项目交互和用户-用户社交关系方面取得了很大成功。然而,最新的研究表明,这些高级推荐系统在黑盒enario下面临恶意攻击时表现很脆弱,攻击者可以通过构建高效的假用户 profilesto破坏推荐性能。大多数现有研究主要关注于targeted攻击,即通过推荐特定item来提高推荐性能。然而,针对黑盒社交推荐系统的untargeted攻击,即通过破坏总体推荐性能来引起攻击者的注意,尚未得到充分研究。为了解决这一限制,我们首先进行了一些预liminary研究,以证明横向社交关系和冷启用户 profilestable 在黑盒setting下的攻击性能的有效性。然后,我们提出了一个名为Multiattack的攻击框架,该框架基于多代理权重学习协调冷启item profil和横向社交关系的生成,以实现黑盒社交推荐系统的untargeted攻击。我们在各种实际数据集上进行了广泛的实验,证明了我们提出的攻击框架在黑盒setting下的效果。”

Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning

  • paper_url: http://arxiv.org/abs/2311.07099
  • repo_url: None
  • paper_authors: Yue Yu, Jiaming Shen, Tianqi Liu, Zhen Qin, Jing Nathan Yan, Jialu Liu, Chao Zhang, Michael Bendersky
    for: 提高大语言模型(LLM)在自然语言理解任务中的能力methods: 提出了一种Explanation-Aware Soft Ensemble框架,包括两种技术:Explanation-guided ensemble和Soft probability aggregation,以提高LLM在具有示例的情况下学习的能力。results: 经过七种自然语言理解任务和四种不同大小的LLM测试,提出的框架能够提高LLM的性能。
    Abstract Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks. With only a few demonstration examples, these LLMs can quickly adapt to target tasks without expensive gradient updates. Common strategies to boost such 'in-context' learning ability are to ensemble multiple model decoded results and require the model to generate an explanation along with the prediction. However, these models often treat different class predictions equally and neglect the potential discrepancy between the explanations and predictions. To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs. We design two techniques, explanation-guided ensemble, and soft probability aggregation, to mitigate the effect of unreliable explanations and improve the consistency between explanations and final predictions. Experiments on seven natural language understanding tasks and four varying-size LLMs demonstrate the effectiveness of our proposed framework.
    摘要 大型自然语言模型(LLM)在各种自然语言理解任务上展现出了惊人的能力。只需要几个示例,这些LLM就可以快速适应目标任务,不需要昂贵的梯度更新。常见的优化策略包括 ensemble多个模型的输出结果和要求模型生成预测和解释。然而,这些模型经常忽略预测和解释之间的可能差异。为了充分发挥解释的力量,我们提议EASE,一个带有解释感知的软ensemble框架,以便在LLM中进行内部学习。我们设计了两种技术:解释引导的ensemble和软概率聚合,以 Mitigate不可靠的解释的影响并提高解释和最终预测之间的一致性。在七种自然语言理解任务和四种不同大小的LLM上,我们的提议框架得到了实验证明。

To Tell The Truth: Language of Deception and Language Models

  • paper_url: http://arxiv.org/abs/2311.07092
  • repo_url: None
  • paper_authors: Bodhisattwa Prasad Majumder, Sanchaita Hazra
  • for: This paper aims to analyze the ability of individuals to discern truth from misinformation in a high-stake environment, and to develop a machine learning model that can detect deception in text-based conversations.
  • methods: The paper uses a novel dataset of TV game show conversations to investigate the manifestation of potentially verifiable language cues of deception in the presence of objective truth. The authors develop a machine learning model, built on a large language model, that employs a bottleneck framework to learn discernible cues to determine truth.
  • results: The paper shows that the machine learning model can detect novel but accurate language cues in many cases where humans failed to detect deception, opening up the possibility of humans collaborating with algorithms to improve their ability to detect the truth.
    Abstract Text-based misinformation permeates online discourses, yet evidence of people's ability to discern truth from such deceptive textual content is scarce. We analyze a novel TV game show data where conversations in a high-stake environment between individuals with conflicting objectives result in lies. We investigate the manifestation of potentially verifiable language cues of deception in the presence of objective truth, a distinguishing feature absent in previous text-based deception datasets. We show that there exists a class of detectors (algorithms) that have similar truth detection performance compared to human subjects, even when the former accesses only the language cues while the latter engages in conversations with complete access to all potential sources of cues (language and audio-visual). Our model, built on a large language model, employs a bottleneck framework to learn discernible cues to determine truth, an act of reasoning in which human subjects often perform poorly, even with incentives. Our model detects novel but accurate language cues in many cases where humans failed to detect deception, opening up the possibility of humans collaborating with algorithms and ameliorating their ability to detect the truth.
    摘要 文本基本是谎言渗透在线讨论中,然而人们对真实性的识别能力的证据罕见。我们分析了一个新的电视竞赛数据,其中对话在高规模环境中,参与者有冲突目标,导致谎言。我们研究了在对话中可靠的语言证据的表现,并发现了一类检测器(算法)可以和人类相比,即使只有语言证据而不是完整的语言和视频证据。我们的模型,基于大型语言模型,采用瓶颈框架学习可识别的证据,以判断真实性,这是人类在很多情况下表现不佳,即使有奖励。我们的模型在许多情况下可以检测人类未能检测到的谎言,开发人类与算法合作,提高真实性的识别能力。

Sample Dominance Aware Framework via Non-Parametric Estimation for Spontaneous Brain-Computer Interface

  • paper_url: http://arxiv.org/abs/2311.07079
  • repo_url: None
  • paper_authors: Byeong-Hoo Lee, Byoung-Hee Kwon, Seong-Whan Lee
  • for: 这个研究旨在解决电子encephalogram(EEG)信号的非站点特性对于训练神经网络所带来的挑战,以提高自愿性脑-computer interfaces(BCIs)的表现。
  • methods: 我们提出了一种基于sample dominance的方法,并使用了两阶段的主导性分数估计技术来补偿sample inconsistency对于网络训练的影响。
  • results: 我们的实验结果显示,这种方法可以增强自愿性BCIs的表现,并且显示了sample dominance的重要性。
    Abstract Deep learning has shown promise in decoding brain signals, such as electroencephalogram (EEG), in the field of brain-computer interfaces (BCIs). However, the non-stationary characteristics of EEG signals pose challenges for training neural networks to acquire appropriate knowledge. Inconsistent EEG signals resulting from these non-stationary characteristics can lead to poor performance. Therefore, it is crucial to investigate and address sample inconsistency to ensure robust performance in spontaneous BCIs. In this study, we introduce the concept of sample dominance as a measure of EEG signal inconsistency and propose a method to modulate its effect on network training. We present a two-stage dominance score estimation technique that compensates for performance degradation caused by sample inconsistencies. Our proposed method utilizes non-parametric estimation to infer sample inconsistency and assigns each sample a dominance score. This score is then aggregated with the loss function during training to modulate the impact of sample inconsistency. Furthermore, we design a curriculum learning approach that gradually increases the influence of inconsistent signals during training to improve overall performance. We evaluate our proposed method using public spontaneous BCI dataset. The experimental results confirm that our findings highlight the importance of addressing sample dominance for achieving robust performance in spontaneous BCIs.
    摘要 深度学习在脑电响应(EEG)信号解码方面表现出了承诺,特别是在脑computer接口(BCI)领域。然而,EEG信号的非站点特性使得训练神经网络获得相应的知识困难。不稳定的EEG信号导致训练神经网络表现不佳。因此,我们需要调查和解决样本不一致性问题,以确保BCI的稳定性。在这项研究中,我们提出了样本主导性的概念,用于度量EEG信号不一致性。我们还提出了一种两阶段主导性分数估计技术,用于补做样本不一致性对网络训练的影响。我们的提议方法使用非 Parametric 估计来推导样本不一致性,并将每个样本分配一个主导性分数。这个分数与训练过程中的损失函数相加,以Modulate 样本不一致性的影响。此外,我们还提出了一种课程学习方法,通过逐步增加训练过程中不一致性信号的影响,以提高总性能。我们使用公共的自发BCI数据集进行实验,实验结果证明了我们的发现,即必须解决样本不一致性问题,以实现BCI的稳定性。

The Impact of Generative Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2311.07071
  • repo_url: None
  • paper_authors: Kaichen Zhang, Ohchan Kwon, Hui Xiong
  • for: 这个研究探讨了生成人工智能对产品市场的影响,以响应生成人工智能可能对失业和市场衰退产生影响的关注。
  • methods: 这篇论文使用了一种”自然实验”的方法来解决 causal inference 的挑战,即通过识别一种未预期的和突然的图像生成 AI 泄漏来对比不同风格的图像生成成本。
  • results: 研究发现,虽然生成 AI 降低了平均价格,但它带来了订单量的增加和总收入的增长。这种 counterintuitive 的发现表明,生成 AI 对艺术家而言是一种利益,而不是一种弊端。
    Abstract The rise of generative artificial intelligence (AI) has sparked concerns about its potential influence on unemployment and market depression. This study addresses this concern by examining the impact of generative AI on product markets. To overcome the challenge of causal inference, given the inherent limitations of conducting controlled experiments, this paper identifies an unanticipated and sudden leak of a highly proficient image-generative AI as a novel instance of a "natural experiment". This AI leak spread rapidly, significantly reducing the cost of generating anime-style images compared to other styles, creating an opportunity for comparative assessment. We collect real-world data from an artwork outsourcing platform. Surprisingly, our results show that while generative AI lowers average prices, it substantially boosts order volume and overall revenue. This counterintuitive finding suggests that generative AI confers benefits upon artists rather than detriments. The study further offers theoretical economic explanations to elucidate this unexpected phenomenon. By furnishing empirical evidence, this paper dispels the notion that generative AI might engender depression, instead underscoring its potential to foster market prosperity. These findings carry significant implications for practitioners, policymakers, and the broader AI community.
    摘要 《生成人工智能的兴起引发了失业和市场萧条的担忧。这项研究试图解决这个问题,检查生成人工智能对产品市场的影响。为了超越 causal inference 的限制,这篇论文利用了一次意外和不可预期的图像生成人工智能的泄露作为一个“自然实验”。这个 AI 泄露在其他风格的图像生成成本上减少了成本,创造了对比分析的机会。我们收集了一个艺术委托平台的实际数据。 surprisingly,我们发现,虽然生成人工智能降低了平均价格,但它很大程度上提高了订单量和总收入。这种Counterintuitive finding 表明,生成人工智能对艺术家而言是有利的,而不是有害的。这项研究还提供了经济理论解释,以解释这种意外的现象。通过提供实证证据,这篇论文推翻了生成人工智能会导致萧条的假设,反而证明了它的潜在市场繁荣。这些发现对实践者、政策制定者和更广泛的 AI 社区都具有重要意义。

Non-approximability of constructive global $\mathcal{L}^2$ minimizers by gradient descent in Deep Learning

  • paper_url: http://arxiv.org/abs/2311.07065
  • repo_url: None
  • paper_authors: Thomas Chen, Patricia Muñoz Ewald
  • for: 本研究探讨深度学习(Deep Learning)网络中的梯度下降算法的几何性。
  • methods: 研究使用梯度下降方法来实现深度学习网络中的最优化。
  • results: 研究结果表明,globally minimizing weights和biases对于$\mathcal{L}^2$ cost的解决方案不能通过梯度下降流体 approximation。因此,提出的方法与梯度下降方法是独立的。
    Abstract We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL) networks. In particular, we prove that the globally minimizing weights and biases for the $\mathcal{L}^2$ cost obtained constructively in [Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically not be approximated via the gradient descent flow. We therefore conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.
    摘要 我们分析深度学习(Deep Learning)网络中的梯度下降算法的几何性。特别是证明了在[Chen-Munoz Ewald 2023]中所得到的最佳梯度下降方法不能通过梯度下降流程来近似。因此,我们 conclude that the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient descent method.Note:* "梯度下降算法" (gradient descent algorithm) is translated as "梯度下降方法" (gradient descent method) in Simplified Chinese.* "underparametrized ReLU DL networks" is translated as "内部不足的ReLU深度学习网络" (underparameterized ReLU deep learning networks) in Simplified Chinese.

Effective In-vehicle Intrusion Detection via Multi-view Statistical Graph Learning on CAN Messages

  • paper_url: http://arxiv.org/abs/2311.07056
  • repo_url: https://github.com/wangkai-tech23/StatGraph
  • paper_authors: Kai Wang, Qiguang Jiang, Bailing Wang, Yongzheng Zhang, Yulei Wu
    for: 这个论文主要关注在智能连接汽车(ICV)中,对于外部网络的通信进行了详细的攻击探测和防护。methods: 本文提出了一个名为StatGraph的多观点统计学 гра图学探测方法,通过将资料流转换为两个统计学 гра圜(TCG和CRG),并透过轻量级的GCN网络进行训练,以实现更高效的探测性。results: 实验结果显示,StatGraph可以提高探测精度和探测性相比之前的方法,并且可以探测到四种新的攻击,这些攻击之前从未被 investigate 过。
    Abstract As an important component of internet of vehicles (IoV), intelligent connected vehicles (ICVs) have to communicate with external networks frequently. In this case, the resource-constrained in-vehicle network (IVN) is facing a wide variety of complex and changing external cyber-attacks, especially the masquerade attack with high difficulty of detection while serious damaging effects that few counter measures can identify successfully. Moreover, only coarse-grained recognition can be achieved in current mainstream intrusion detection mechanisms, i.e., whether a whole data flow observation window contains attack labels rather than fine-grained recognition on every single data item within this window. In this paper, we propose StatGraph: an Effective Multi-view Statistical Graph Learning Intrusion Detection to implement the fine-grained intrusion detection. Specifically, StatGraph generates two statistical graphs, timing correlation graph (TCG) and coupling relationship graph (CRG), based on data streams. In given message observation windows, edge attributes in TCGs represent temporal correlation between different message IDs, while edge attributes in CRGs denote the neighbour relationship and contextual similarity. Besides, a lightweight shallow layered GCN network is trained based graph property of TCGs and CRGs, which can learn the universal laws of various patterns more effectively and further enhance the performance of detection. To address the problem of insufficient attack types in previous intrusion detection, we select two real in-vehicle CAN datasets that cover four new attacks never investigated before. Experimental result shows StatGraph improves both detection granularity and detection performance over state-of-the-art intrusion detection methods.
    摘要 为了实现网络内部自动化(IoV)中的智能连接车辆(ICV),它们需要与外部网络进行频繁的通信。在这种情况下,具有限制的内部网络(IVN)面临着多样化和变化的外部黑客攻击,尤其是让人难以发现的掩盖攻击,这些攻击可能导致严重的损害。目前主流的防范攻击机制只能实现粗略的识别,即是某个数据流观察窗口中是否包含攻击标签,而不是每个数据项的精细识别。在这篇论文中,我们提出了StatGraph:一种有效的多视图统计图学防范攻击方法。具体来说,StatGraph根据数据流生成两个统计图,即时间相关图(TCG)和互相关系图(CRG)。在给定的消息观察窗口中,TCG中的边Attributes表示不同消息ID之间的时间相关性,而CRG中的边Attributes表示消息ID之间的邻居关系和上下文相似性。此外,我们还训练了一个轻量级的GCN网络,以利用统计图的属性来学习更加有效的各种模式。为了解决过去防范攻击方法中缺乏攻击类型的问题,我们选择了四种新的攻击方法,这些攻击方法从未被前人研究过。实验结果表明,StatGraph可以提高检测精细度和检测性能,比前方式防范攻击方法更高。

Towards the Law of Capacity Gap in Distilling Language Models

  • paper_url: http://arxiv.org/abs/2311.07052
  • repo_url: https://github.com/genezc/minima
  • paper_authors: Chen Zhang, Dawei Song, Zheyu Ye, Yan Gao
  • for: 这种研究旨在探讨LM浸泡的最佳方法,尤其是在教师LM和学生LM之间存在巨大容量差距时。
  • methods: 该研究使用了一种新的法则,即容量差距法则,来描述在浸泡过程中如何选择最佳的学生LM。
  • results: 研究发现,在不同的学生缩放和架构下,容量差距的优化点几乎固定,这使得浸泡过程中的选择变得更加简单。此外,通过浸泡一个7B教师LM,研究者成功地折衣了一个3B学生LM(称为MiniMA),该模型在常用的测试上创造了一个新的计算性能矩阵,而其调整版本(称为MiniChat)在GPT4评估中超越了许多3B竞争对手,甚至与一些7B聊天模型相匹配。
    Abstract Language model (LM) distillation is a trending area that aims to distil the knowledge resided in a large teacher LM to a small student one. While various methods have been proposed to push the distillation to its limits, it is still a pain distilling LMs when a large capacity gap is exhibited between the teacher and the student LMs. The pain is mainly resulted by the curse of capacity gap, which describes that a larger teacher LM cannot always lead to a better student LM than one distilled from a smaller teacher LM due to the affect of capacity gap increment. That is, there is likely an optimal point yielding the best student LM along the scaling course of the teacher LM. Even worse, the curse of capacity gap can be only partly yet not fully lifted as indicated in previous studies. However, the tale is not ever one-sided. Although a larger teacher LM has better performance than a smaller teacher LM, it is much more resource-demanding especially in the context of recent large LMs (LLMs). Consequently, instead of sticking to lifting the curse, leaving the curse as is should be arguably fine. Even better, in this paper, we reveal that the optimal capacity gap is almost consistent across different student scales and architectures, fortunately turning the curse into the law of capacity gap. The law later guides us to distil a 3B student LM (termed MiniMA) from a 7B teacher LM (adapted LLaMA2-7B). MiniMA is demonstrated to yield a new compute-performance pareto frontier among existing 3B LMs on commonly used benchmarks, and its instruction-tuned version (termed MiniChat) outperforms a wide range of 3B competitors in GPT4 evaluation and could even compete with several 7B chat models.
    摘要 language model (LM) 精炼是一个流行的领域,旨在压缩一个大老师 LM 中的知识到一个小学生 LM 中。虽然许多方法已经被提出来推动精炼,但是在大教师 LM 和小学生 LM 之间存在较大的容量差异时,仍然是一种痛苦的精炼。这种痛苦主要来自于容量差异的咒语,即大教师 LM 不一定可以导致一个更好的学生 LM,因为容量差异的增加会导致更大的学生 LM 不可能超越小教师 LM。这意味着在教师 LM 的扩展规模上,存在一个最佳的学生 LM 点,并且这个点与学生 LM 的架构和规模有关。事实上,这种咒语只能部分地被解决,根据前一些研究表明。然而,这不总是一个一方面的问题。虽然大教师 LM 的性能比小教师 LM 更好,但是它却需要更多的资源,特别是在现代大型 LM (LLM) 中。因此,相反于努力解决咒语,可以留下咒语,这也是可以接受的。事实上,在这篇论文中,我们发现了容量差异的优化点,这点与学生 LM 的架构和规模有关。我们使用这个点来精炼一个 3B 学生 LM(称为 MiniMA),从一个 7B 教师 LM(改进的 LLaMA2-7B)中。MiniMA 在常用的benchmark上显示出了一个新的计算性能 pareto 边缘,并且其 instruction-tuned 版本(称为 MiniChat)在 GPT4 评价中超过了许多 3B 竞争对手,甚至与一些 7B 对话模型进行竞争。

Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method

  • paper_url: http://arxiv.org/abs/2311.07037
  • repo_url: None
  • paper_authors: Mostafa Shahin, Julien Epps, Beena Ahmed
  • for: 本研究旨在提高计算机支持下的发音学习(CAPL)工具,特别是第二语言(L2)学习或语音疾病治疗应用中的发音错误检测和诊断(MDD)方法。
  • methods: 本研究提出了一种基于发音特征分析的低级MDD方法,通过检测发音特征来提供更形成的反馈给学习者。此外,我们还提出了一种基于多标签连接主义分类(CTC)方法来联合模型多个非互相排斥的发音特征。使用预训练的wav2vec2模型作为核心模型。
  • results: 对于英语学习者的L2语音资料,提出的发音特征MDD方法与传统的phoneme-level MDD方法进行比较,获得了 significatively lower False Acceptance Rate(FAR)、False Rejection Rate(FRR)和诊断错误率(DER)。
    Abstract The automatic identification and analysis of pronunciation errors, known as Mispronunciation Detection and Diagnosis (MDD) plays a crucial role in Computer Aided Pronunciation Learning (CAPL) tools such as Second-Language (L2) learning or speech therapy applications. Existing MDD methods relying on analysing phonemes can only detect categorical errors of phonemes that have an adequate amount of training data to be modelled. With the unpredictable nature of the pronunciation errors of non-native or disordered speakers and the scarcity of training datasets, it is unfeasible to model all types of mispronunciations. Moreover, phoneme-level MDD approaches have a limited ability to provide detailed diagnostic information about the error made. In this paper, we propose a low-level MDD approach based on the detection of speech attribute features. Speech attribute features break down phoneme production into elementary components that are directly related to the articulatory system leading to more formative feedback to the learner. We further propose a multi-label variant of the Connectionist Temporal Classification (CTC) approach to jointly model the non-mutually exclusive speech attributes using a single model. The pre-trained wav2vec2 model was employed as a core model for the speech attribute detector. The proposed method was applied to L2 speech corpora collected from English learners from different native languages. The proposed speech attribute MDD method was further compared to the traditional phoneme-level MDD and achieved a significantly lower False Acceptance Rate (FAR), False Rejection Rate (FRR), and Diagnostic Error Rate (DER) over all speech attributes compared to the phoneme-level equivalent.
    摘要 computer-assisted pronunciation learning (CAPL) 工具中的自动识别和分析声音错误(MDD)在语音学习中扮演了关键角色。现有的 MDD 方法仅仅是通过分析音节来检测音节错误,但这些错误可能是非Native 或异常的说话者的不可预测的。此外,音节级 MDD 方法只能提供有限的诊断信息。在这篇论文中,我们提议了一种基于声音特征的低级 MDD 方法。声音特征分解声音生产成Elementary 组件,直接关系到语音生成系统,从而提供更有形成的反馈给学习者。我们还提议了一种多标签的 CTC 方法,以同时模型不同的声音特征。使用 pre-trained wav2vec2 模型作为核心模型。我们的提议方法应用于英语学习者的 L2 语音资料。与传统的音节级 MDD 相比,我们的声音特征 MDD 方法显示了较低的 false acceptance rate(FAR)、false rejection rate(FRR)和诊断错误率(DER)。

ExpNote: Black-box Large Language Models are Better Task Solvers with Experience Notebook

  • paper_url: http://arxiv.org/abs/2311.07032
  • repo_url: https://github.com/forangel2014/expnote
  • paper_authors: Wangtao Sun, Xuanqing Yu, Shizhu He, Jun Zhao, Kang Liu
  • for: 这 paper 的目的是提高黑盒大语言模型(LLMs)在不同任务中的性能。
  • methods: 该 paper 提出了一种自动化框架,帮助 LLMs 更好地适应未知任务。该框架通过反思和记录训练数据中的经验,以及在测试时从外部存储器中检索经验,以帮助 LLMs 更好地适应新任务。
  • results: 实验结果表明,该方法可以显著提高黑盒 LLMs 在多个任务中的性能。数据和代码可以在 GitHub 上获取(https://github.com/forangel2014/ExpNote)。
    Abstract Black-box Large Language Models (LLMs) have shown great power in solving various tasks and are considered general problem solvers. However, LLMs still fail in many specific tasks although understand the task instruction. In this paper, we focus on the problem of boosting the ability of black-box LLMs to solve downstream tasks. We propose ExpNote, an automated framework to help LLMs better adapt to unfamiliar tasks through reflecting and noting experiences from training data and retrieving them from external memory during testing. We evaluate ExpNote on multiple tasks and the experimental results demonstrate that the proposed method significantly improves the performance of black-box LLMs. The data and code are available at https://github.com/forangel2014/ExpNote
    摘要 黑盒大语言模型(LLMs)已经表现出杰出的能力解决多种任务,并被视为通用的问题解决者。然而,LLMs仍然在许多具体任务上失败,即使理解任务指令。在这篇论文中,我们专注于增强黑盒LLMs解决下游任务的能力。我们提出了ExpNote,一个自动框架,帮助LLMs更好地适应未知任务。在训练数据中反思和记录经验,并在试验过程中从外部内存中撷取经验,以提高黑盒LLMs的性能。我们在多个任务上进行了实验,结果显示,提案的方法可以对黑盒LLMs进行明显改善。资料和代码可以在https://github.com/forangel2014/ExpNote上获取。

Embarassingly Simple Dataset Distillation

  • paper_url: http://arxiv.org/abs/2311.07025
  • repo_url: None
  • paper_authors: Yunzhen Feng, Ramakrishna Vedantam, Julia Kempe
  • For: The paper aims to achieve competitive performance on test data when trained on a small set of synthetic training samples, through the process of dataset distillation.* Methods: The paper treats dataset distillation as a bilevel optimization problem and introduces an improved method called Random Truncated Backpropagation Through Time (RaT-BPTT) to address issues such as variance in gradients, computational burden, and long-term dependencies.* Results: The paper establishes new state-of-the-art performance on standard dataset benchmarks using RaT-BPTT, and also discovers that distilled datasets tend to exhibit pronounced intercorrelation, which can be addressed by a boosting mechanism that generates distilled datasets with near optimal performance across different data budgets.Here’s the Chinese translation of the three points:* For: paper 的目的是在使用小量的合成训练样本来实现测试数据上的竞争性表现。* Methods: paper 将dataset distillation看作是一个二级优化问题,并介绍了一种改进的方法Random Truncated Backpropagation Through Time (RaT-BPTT)来解决变异性、计算负担和长期依赖问题。* Results: paper 使用 RaT-BPTT 实现了标准dataset benchmark上的新state-of-the-art性能,并发现了压缩数据集之间的强相关性,可以通过一种boosting机制来生成具有近似最佳性能的各种数据预算下的压缩数据集。
    Abstract Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
    摘要

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

  • paper_url: http://arxiv.org/abs/2311.07022
  • repo_url: https://github.com/ilkerkesen/ViLMA
  • paper_authors: Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem
  • for: 本研究的目的是为了开发一个任务无关的评估指标,以评估视频语言模型(VidLM)的细腻功能。
  • methods: 本研究使用了仔细制作的反例来测试 VidLM 的能力,并对其进行评估。此外,研究还包括一系列的技能测试,以评估 VidLM 的基础能力。
  • results: 研究发现,当前的 VidLM 的基础能力和视频语言模型(VLM)使用的 static 图像相比,其表现并不出色。此外,包括技能测试的表现在内,VidLM 的总表现也不如人类水平。这些结果表明, VidLM 还有很多需要进一步探索的领域。
    Abstract With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities. To address this challenge, we present ViLMA (Video Language Model Assessment), a task-agnostic benchmark that places the assessment of fine-grained capabilities of these models on a firm footing. Task-based evaluations, while valuable, fail to capture the complexities and specific temporal aspects of moving images that VidLMs need to process. Through carefully curated counterfactuals, ViLMA offers a controlled evaluation suite that sheds light on the true potential of these models, as well as their performance gaps compared to human-level understanding. ViLMA also includes proficiency tests, which assess basic capabilities deemed essential to solving the main counterfactual tests. We show that current VidLMs' grounding abilities are no better than those of vision-language models which use static images. This is especially striking once the performance on proficiency tests is factored in. Our benchmark serves as a catalyst for future research on VidLMs, helping to highlight areas that still need to be explored.
    摘要 随着视频语言模型(VidLM)的普及,需要开发更加稳健的评估方法来探索它们的视频语言功能。为此,我们提出了视频语言模型评估(ViLMA),一个任务无关的benchmark,它通过控制性的counterfactual来评估视频语言模型的细腻能力。任务基础的评估,虽有价值,但是无法捕捉视频图像中的复杂性和时间特征,这些特征是视频语言模型需要处理的。我们的benchmark包括了基础能力测试,它们评估了视频语言模型的基本能力,这些能力被认为是解决主要counterfactual测试的前提。我们的研究显示,现有的视频语言模型在grounding能力方面并没有超过视频语言模型,这是特别明显的,当加入基础能力测试时。我们的benchmark会成为未来研究视频语言模型的catalyst,帮助探索这些模型的未知领域。

Context-dependent Instruction Tuning for Dialogue Response Generation

  • paper_url: http://arxiv.org/abs/2311.07006
  • repo_url: None
  • paper_authors: Jin Myung Kwak, Minseon Kim, Sung Ju Hwang
  • for: 这个论文是为了解决复杂多轮对话生成任务中的输入变化问题而写的。
  • methods: 该论文提出了基于上一次对话 контекст的指令细化框架,该框架可以在每个多轮对话中生成响应和指令,并在评估阶段使用上一次对话 контекст来自动导航响应。
  • results: 根据量化评估结果,该框架在多轮对话生成任务中比基线方案更好地适应输入变化,并且可以减少计算资源开销。
    Abstract Recent language models have achieved impressive performance in natural language tasks by incorporating instructions with task input during fine-tuning. Since all samples in the same natural language task can be explained with the same task instructions, many instruction datasets only provide a few instructions for the entire task, without considering the input of each example in the task. However, this approach becomes ineffective in complex multi-turn dialogue generation tasks, where the input varies highly with each turn as the dialogue context changes, so that simple task instructions cannot improve the generation performance. To address this limitation, we introduce a context-based instruction fine-tuning framework for each multi-turn dialogue which generates both responses and instructions based on the previous context as input. During the evaluation, the model generates instructions based on the previous context to self-guide the response. The proposed framework produces comparable or even outstanding results compared to the baselines by aligning instructions to the input during fine-tuning with the instructions in quantitative evaluations on dialogue benchmark datasets with reduced computation budget.
    摘要 现代语言模型已经取得了优异的表现在自然语言任务中,通过在精馈过程中融合指令与任务输入。自然语言任务中的所有样例都可以使用相同的任务指令来解释,因此许多指令数据集只提供了任务中的一些指令,不考虑每个样例的输入。然而,这种方法在复杂的多转对话生成任务中失效,因为对话上下文的变化会导致输入的高度不同,使得简单的任务指令无法提高生成性能。为了解决这个限制,我们提出了基于对话上下文的指令精馈框架,这个框架在每个多转对话中生成回复和指令,并且使用上一个对话的上下文来自适化。在评估过程中,模型根据上一个对话的上下文来给出指令,以自适应回复。我们的提案 Frameworks 在对话库数据集上进行评估时,与基准值进行比较,产生了相似或甚至出色的结果,并且在对话生成任务中实现了优化的表现。

AGRAMPLIFIER: Defending Federated Learning Against Poisoning Attacks Through Local Update Amplification

  • paper_url: http://arxiv.org/abs/2311.06996
  • repo_url: None
  • paper_authors: Zirui Gong, Liyue Shen, Yanjun Zhang, Leo Yu Zhang, Jingwei Wang, Guangdong Bai, Yong Xiang
    for: 本研究旨在 Addressing the Byzantine poisoning attack 在 Federated Learning (FL) 中的协同性带来的威胁。methods: 本研究提出了一种新的方法,即 AGRAMPLIFIER,以提高现有的 Byzantine-robust aggregation rules 的Robustness、准确性和效率。results: 研究表明,通过将 AGRAMPLIFIER 与现有的 Byzantine-robust mechanisms 结合使用,可以提高模型的Robustness、精度和效率, average gains 为 40.08%、39.18% 和 10.68%。
    Abstract The collaborative nature of federated learning (FL) poses a major threat in the form of manipulation of local training data and local updates, known as the Byzantine poisoning attack. To address this issue, many Byzantine-robust aggregation rules (AGRs) have been proposed to filter out or moderate suspicious local updates uploaded by Byzantine participants. This paper introduces a novel approach called AGRAMPLIFIER, aiming to simultaneously improve the robustness, fidelity, and efficiency of the existing AGRs. The core idea of AGRAMPLIFIER is to amplify the "morality" of local updates by identifying the most repressive features of each gradient update, which provides a clearer distinction between malicious and benign updates, consequently improving the detection effect. To achieve this objective, two approaches, namely AGRMP and AGRXAI, are proposed. AGRMP organizes local updates into patches and extracts the largest value from each patch, while AGRXAI leverages explainable AI methods to extract the gradient of the most activated features. By equipping AGRAMPLIFIER with the existing Byzantine-robust mechanisms, we successfully enhance the model's robustness, maintaining its fidelity and improving overall efficiency. AGRAMPLIFIER is universally compatible with the existing Byzantine-robust mechanisms. The paper demonstrates its effectiveness by integrating it with all mainstream AGR mechanisms. Extensive evaluations conducted on seven datasets from diverse domains against seven representative poisoning attacks consistently show enhancements in robustness, fidelity, and efficiency, with average gains of 40.08%, 39.18%, and 10.68%, respectively.
    摘要 合作性的联合学习(FL)具有主要的威胁,即本地训练数据和本地更新的操纵,称为Byzantine毒害攻击。为解决这一问题,许多Byzantine鲁班耐式积分规则(AGRs)已经被提议,以筛选或调整嫌疑的本地更新。 这篇论文介绍了一种新的方法 called AGRAMPLIFIER,旨在同时提高现有 AGRs 的 Robustness、准确性和效率。AGRAMPLIFIER的核心思想是通过识别每个梯度更新中最压抑的特征来增强本地更新的“道德”性,从而更好地 отли奇嫌疑和合法更新。为实现这一目标,我们提出了两种方法:AGRMP和AGRXAI。AGRMP 将本地更新分割成块,并从每个块中提取最大值,而 AGRXAI 则利用可解释AI方法提取最活跃特征的梯度。通过将 AGRAMPLIFIER 与现有的Byzantine鲁班耐式机制结合使用,我们成功地提高了模型的Robustness,保持了准确性,并提高了总的效率。AGRAMPLIFIER 与现有的Byzantine鲁班耐式机制兼容,可与所有主流 AGR 机制结合使用。文章通过对七个 datasets 从多个领域进行了七种 poisoning 攻击的广泛评估,证明了 AGRAMPLIFIER 的效iveness。 average 提高了40.08%、39.18% 和 10.68%。

State-of-the-Art Review and Synthesis: A Requirement-based Roadmap for Standardized Predictive Maintenance Automation Using Digital Twin Technologies

  • paper_url: http://arxiv.org/abs/2311.06993
  • repo_url: None
  • paper_authors: Sizhe Ma, Katherine A. Flanigan, Mario Bergés
  • for: This paper aims to provide a requirement-based roadmap for standardized predictive maintenance (PMx) automation using digital twin (DT) technologies.
  • methods: The paper uses a systematic approach that includes identifying informational requirements (IRs) and functional requirements (FRs) for PMx, and conducting a literature review to determine how these requirements are currently being used in DTs.
  • results: The paper provides a roadmap for the development of standardized PMx automation using DTs, and highlights the areas where further research is needed to support the progress and maturation of these technologies.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目的是提供一个基于需求的PMx自动化道路图,使用数字双工具技术。
  • methods: 论文采用一种系统atic的方法,包括确定PMx的信息需求(IR)和功能需求(FR),以及对这些需求在数字双中的现状进行文献评估。
  • results: 论文提供一个PMx自动化的标准化道路图,并指出需要进一步研究以支持这些技术的进步和成熟。
    Abstract Recent digital advances have popularized predictive maintenance (PMx), offering enhanced efficiency, automation, accuracy, cost savings, and independence in maintenance. Yet, it continues to face numerous limitations such as poor explainability, sample inefficiency of data-driven methods, complexity of physics-based methods, and limited generalizability and scalability of knowledge-based methods. This paper proposes leveraging Digital Twins (DTs) to address these challenges and enable automated PMx adoption at larger scales. While we argue that DTs have this transformative potential, they have not yet reached the level of maturity needed to bridge these gaps in a standardized way. Without a standard definition for such evolution, this transformation lacks a solid foundation upon which to base its development. This paper provides a requirement-based roadmap supporting standardized PMx automation using DT technologies. A systematic approach comprising two primary stages is presented. First, we methodically identify the Informational Requirements (IRs) and Functional Requirements (FRs) for PMx, which serve as a foundation from which any unified framework must emerge. Our approach to defining and using IRs and FRs to form the backbone of any PMx DT is supported by the track record of IRs and FRs being successfully used as blueprints in other areas, such as for product development within the software industry. Second, we conduct a thorough literature review spanning fields to determine the ways in which these IRs and FRs are currently being used within DTs, enabling us to point to the specific areas where further research is warranted to support the progress and maturation of requirement-based PMx DTs.
    摘要 近期数字技术发展,predictive maintenance(PMx)得到了广泛应用,提高了效率、自动化、准确性、成本节省和独立性等方面。然而,它仍面临许多限制,如解释能力不足、数据驱动方法的样本不足、物理基础方法的复杂性以及知识基础方法的局限性和扩展性不足。这篇文章提议通过数字双方(DT)解决这些挑战,并促进大规模自动化采用PMx。虽然我们认为DT具有这种转变潜力,但它们并没有达到所需的成熔度,以便在标准化的方式下 bridging这些差距。没有一个标准定义这种演化的基础,这种转型缺乏固定的基础。这篇文章提供了一个基于需求的路线图,支持标准化的PMx自动化使用DT技术。我们采用了两个主要阶段的系统方法。首先,我们方法性地确定PMx的信息需求(IR)和功能需求(FR),这些需求将成为任何统一框架的基础。我们的方法是基于IR和FR的使用记录,在软件行业中产品开发中的蓝图中得到了支持。其次,我们通过对多个领域的文献综述,确定DT中正在使用IR和FR的方式,以便指出需要进行进一步研究,以支持PMx DT的发展和成熔度的提高。

cs.CL - 2023-11-13

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

  • paper_url: http://arxiv.org/abs/2311.07811
  • repo_url: https://github.com/aaronmueller/syntax-icl
  • paper_authors: Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen
  • for: investigate the robustness of LLMs supervised via ICL
  • methods: two simple and well-controlled syntactic transformations tasks, chain-of-thought prompting
  • results: large variance across LMs on this fundamental linguistic phenomenon, evidence that models pre-trained on code generalize better, and benefit to a greater extent from chain-of-thought prompting.Here’s the Chinese version:
  • for: investigate the robustness of LLMs supervised via ICL
  • methods: two simple and well-controlled syntactic transformations tasks, chain-of-thought prompting
  • results: 大量LMs在这一基本语言现象上存在巨大的变异,这种变异可以通过预训练集和监督方法的组合而更好地解释。具体来说,我们发现了代码预训练集的模型在这种情况下更好地generalize,并且受益于链式思维提示。
    Abstract In-context learning (ICL) is now a common method for supervising large language models (LLMs): given labeled examples in the input context, the LLM learns to perform the task without weight updates. Despite ICL's prevalence and utility, we understand little about whether models supervised in this manner represent the underlying structure of their tasks, rather than superficial heuristics that only generalize to identically distributed examples. In this study, we investigate the robustness of LLMs supervised via ICL using the test case of sensitivity to syntax, which is a prerequisite for robust language understanding. Our experiments are based on two simple and well-controlled syntactic transformations tasks, where correct out-of-distribution generalization requires an accurate syntactic analysis of the input. We further investigate whether out-of-distribution generalization can be improved via chain-of-thought prompting, where the model is provided with a sequence of intermediate computation steps that illustrate how the task ought to be performed. In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs on this fundamental linguistic phenomenon, and that the variance is explained more by the composition of the pre-training corpus and supervision methods than by model size. In particular, we find evidence that models pre-trained on code generalize better, and benefit to a greater extent from chain-of-thought prompting.
    摘要 启用上下文学习(ICL)现在是大语言模型(LLM)的常见监督方法:给定输入上下文中标注的例子,LLM可以学习完成任务而无需重新更新参数。despite ICL的普遍和实用性,我们对LLM被监督这种方式是否表达任务的基本结构而不是 superficies heuristics 只能泛化到一样分布的例子还不够了解。本研究 investigate LLMs 被监督 via ICL 的稳定性,使用语法敏感性作为语言理解的必要前提。我们的实验基于两个简单和可控的语法变换任务,正确的对于不同分布的输入需要精准的语法分析。我们进一步调查是否可以通过链条思维提示来提高对于不同分布的泛化,其中模型被提供一系列的中间计算步骤,以示如何完成任务。在GPT、PaLM和Llama 2家族的模型上进行实验,我们发现大量的变异,而这种变异更多是由预训练集和监督方法决定,而不是模型的大小。具体来说,我们发现代码预训练模型可以更好地泛化,并且受益于链条思维提示更大。

IruMozhi: Automatically classifying diglossia in Tamil

  • paper_url: http://arxiv.org/abs/2311.07804
  • repo_url: None
  • paper_authors: Kabilan Prasanna, Aryaman Arora
  • for: 这个论文是为了研究板xiTamil, a Dravidian language of South Asia, and its two registers, Literary Tamil and Spoken Tamil.
  • methods: 这个论文使用了人工标注的数据集,以及在Literary和Spoken Tamil之间进行分类的模型。
  • results: 这个论文发现了Spoken Tamil在现有的标注数据集中的不足,并且鼓励未来的研究者在这个语言变体上进行更多的工作。
    Abstract Tamil, a Dravidian language of South Asia, is a highly diglossic language with two very different registers in everyday use: Literary Tamil (preferred in writing and formal communication) and Spoken Tamil (confined to speech and informal media). Spoken Tamil is under-supported in modern NLP systems. In this paper, we release IruMozhi, a human-annotated dataset of parallel text in Literary and Spoken Tamil. We train classifiers on the task of identifying which variety a text belongs to. We use these models to gauge the availability of pretraining data in Spoken Tamil, to audit the composition of existing labelled datasets for Tamil, and to encourage future work on the variety.
    摘要 泰米尔语,一种南亚地区的达磨语言,是一种非常强烈的双语Diglossia,在日常使用中有两种不同的注重注重注重:文学泰米尔语(在书面和正式通信中具有首选)和口语泰米尔语(仅用于语音媒体和非正式通信)。口语泰米尔语在现代NLP系统中得到了更少的支持。在这篇论文中,我们发布了IruMozhi,一个人工标注的平行文本数据集,包括文学泰米尔语和口语泰米尔语之间的对照文本。我们使用这些模型来评估口语泰米尔语的预处理数据的可用性,对现有的标注数据集的组成进行审核,并促进将来对这种变种的工作。

In-context Learning and Gradient Descent Revisited

  • paper_url: http://arxiv.org/abs/2311.07772
  • repo_url: https://github.com/giilde/ft-vs-icl
  • paper_authors: Tomer Bar Natan, Gilad Deutch, Nadav Magar, Guy Dar
  • for: 这篇论文探讨了受限学习(ICL)在少量学习任务中的表现,并寻找了ICL的下面机制。
  • methods: 该论文使用了Gradient Descent(GD)基于优化过程来比较ICL和标准finetuning的相似性。
  • results: 研究发现,ICL和GD-based finetuning在大多数情况下具有相同或更好的表现,并且提出了一种层 causality 的变体,可以更好地解释ICL的工作机制。
    Abstract In-context learning (ICL) has shown impressive results in few-shot learning tasks, yet its underlying mechanism is still not fully understood. Recent works suggest that ICL can be thought of as a gradient descent (GD) based optimization process. While promising, these results mainly focus on simplified settings of ICL and provide only a preliminary evaluation of the similarities between the two methods. In this work, we revisit the comparison between ICL and GD-based finetuning and study what properties of ICL an equivalent process must follow. We highlight a major difference in the flow of information between ICL and standard finetuning. Namely, ICL can only rely on information from lower layers at every point, while finetuning depends on loss gradients from deeper layers. We refer to this discrepancy as Layer Causality and show that a layer causal variant of the finetuning process aligns with ICL on par with vanilla finetuning and is even better in most cases across relevant metrics. To the best of our knowledge, this is the first work to discuss this discrepancy explicitly and suggest a solution that tackles this problem with minimal changes.
    摘要 宽Context learning (ICL) 在几个shot learning任务中表现出色,然而它的下面机制仍未完全理解。 latest works suggest that ICL can be viewed as a gradient descent (GD) based optimization process. Although promising, these results mainly focus on the simplified settings of ICL and provide only a preliminary evaluation of the similarities between the two methods. In this work, we revisit the comparison between ICL and GD-based finetuning and study what properties of ICL an equivalent process must follow. We highlight a major difference in the flow of information between ICL and standard finetuning. Specifically, ICL can only rely on information from lower layers at every point, while finetuning depends on loss gradients from deeper layers. We refer to this discrepancy as Layer Causality and show that a layer causal variant of the finetuning process aligns with ICL on par with vanilla finetuning and is even better in most cases across relevant metrics. To the best of our knowledge, this is the first work to explicitly discuss this discrepancy and suggest a solution that tackles this problem with minimal changes.

Measuring Entrainment in Spontaneous Code-switched Speech

  • paper_url: http://arxiv.org/abs/2311.07703
  • repo_url: None
  • paper_authors: Debasmita Bhattacharya, Siying Ding, Alayna Nguyen, Julia Hirschberg
  • for: 研究 Code-switched 通话中的同步现象,以确定同步现象是否在写作和口语中的不同语言设置中具有一致性。
  • methods: 利用自然语言处理技术分析 Code-switched 通话中的语言结构和语言交界点,以确定同步现象的存在和特征。
  • results: 发现 Code-switched 通话中的同步现象与写作和口语中的同步现象之间存在相似之处,并且这种同步现象在自然语言交流中具有重要的应用前景。
    Abstract It is well-known that interlocutors who entrain to one another have more successful conversations than those who do not. Previous research has shown that interlocutors entrain on linguistic features in both written and spoken monolingual domains. More recent work on code-switched communication has also shown preliminary evidence of entrainment on certain aspects of code-switching (CSW). However, such studies of entrainment in code-switched domains have been extremely few and restricted to human-machine textual interactions. Our work studies code-switched spontaneous speech between humans by answering the following questions: 1) Do patterns of written and spoken entrainment in monolingual settings generalize to code-switched settings? 2) Do patterns of entrainment on code-switching in generated text generalize to spontaneous code-switched speech? We find evidence of affirmative answers to both of these questions, with important implications for the potentially "universal" nature of entrainment as a communication phenomenon, and potential applications in inclusive and interactive speech technology.
    摘要 研究发现,在交流过程中的对话者会相互听附,这会导致更成功的对话。以前的研究表明,在单语言书面和口语领域中,对话者会听附语言特征。然而,关于code-switching(CSW)的交流研究非常少,并且只是关注人机文本交互。我们的研究探讨了code-switched自由说话中的听附模式,并回答了以下两个问题:1)单语言书面和口语中的听附模式是否在code-switched设置中通用?2)在生成的文本中听附CSW后,是否存在对自由说话中的听附模式的扩展?我们发现了肯定答案,这有重要的意义,因为它表明听附可能是一种通用的交流现象,并且可能有各种应用于包容和互动的语音技术。

MART: Improving LLM Safety with Multi-round Automatic Red-Teaming

  • paper_url: http://arxiv.org/abs/2311.07689
  • repo_url: None
  • paper_authors: Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao
    for: This paper aims to improve the safety of Large Language Models (LLMs) by proposing a Multi-round Automatic Red-Teaming (MART) method that can scale up red-teaming and address potential safety risks.methods: The MART method involves both automatic adversarial prompt writing and safe response generation, where an adversarial LLM and a target LLM interplay in an iterative manner to improve the target LLM’s safety alignment.results: After 4 rounds of MART, the violation rate of the target LLM on adversarial prompt benchmarks reduced by up to 84.7%, achieving comparable performance to LLMs with extensive adversarial prompt writing, while maintaining strong performance on non-adversarial prompts.
    Abstract Red-teaming is a common practice for mitigating unsafe behaviors in Large Language Models (LLMs), which involves thoroughly assessing LLMs to identify potential flaws and addressing them with responsible and accurate responses. While effective, manual red-teaming is costly, and existing automatic red-teaming typically discovers safety risks without addressing them. In this paper, we propose a Multi-round Automatic Red-Teaming (MART) method, which incorporates both automatic adversarial prompt writing and safe response generation, significantly increasing red-teaming scalability and the safety of the target LLM. Specifically, an adversarial LLM and a target LLM interplay with each other in an iterative manner, where the adversarial LLM aims to generate challenging prompts that elicit unsafe responses from the target LLM, while the target LLM is fine-tuned with safety aligned data on these adversarial prompts. In each round, the adversarial LLM crafts better attacks on the updated target LLM, while the target LLM also improves itself through safety fine-tuning. On adversarial prompt benchmarks, the violation rate of an LLM with limited safety alignment reduces up to 84.7% after 4 rounds of MART, achieving comparable performance to LLMs with extensive adversarial prompt writing. Notably, model helpfulness on non-adversarial prompts remains stable throughout iterations, indicating the target LLM maintains strong performance on instruction following.
    摘要 红人 коман(Red-teaming)是一种常见的减少不安全行为的做法,用于大语言模型(LLMs)中,它通过全面评估LLMs,并对其发现的潜在漏洞进行负责任的回应。虽然有效,但手动红人 коман是昂贵的,而现有的自动红人 коман通常只能发现安全风险而不是解决它们。在这篇论文中,我们提出了一种多轮自动红人 коман(MART)方法,它将自动对抗文本生成和安全回应融合在一起,从而大幅提高红人 коман扩展性和目标LLM的安全性。具体来说,一个敌对的LLM和目标LLM在迭代的过程中互动,敌对LLM会尝试通过生成挑战性的提示来让目标LLM发送不安全的回应,而目标LLM则是在安全适应数据上练习,以适应敌对LLM的攻击。在每轮中,敌对LLM会为目标LLM制定更好的攻击策略,而目标LLM也会通过安全适应来提高自己。在对抗提示 benchmark 上,一个有限度的安全适应 LLM 的违规率下降到 84.7% 之后四轮 MART,与广泛的对抗提示写作 LLM 的性能相当,而且模型在非对抗提示上的帮助性保持稳定,表明目标 LLM 在指令遵从上保持了强大的表现。

Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts?

  • paper_url: http://arxiv.org/abs/2311.07564
  • repo_url: https://github.com/llnl/luar
  • paper_authors: Cristina Aggazzotti, Nicholas Andrews, Elizabeth Allyn Smith
  • for: 本研究探讨了对转录Speech的作者归属,这个领域存在新的挑战,因为许多语言特征,如括号和大写字母,不可靠或不存在。
  • methods: 作者使用了一种新的benchmark来测试作者归属的能力,并对不同的对话提供了控制的难度水平。他们还比较了一些神经网络和非神经网络的基elines,发现written文本归属模型在某些情况下可以达到高效性,但在最难的情况下却表现不佳。
  • results: 研究发现,作者归属模型在转录Speech中的表现不佳,尤其是在最难的情况下。这表明,对转录Speech的作者归属需要特殊的模型和技术来解决这些挑战。
    Abstract Authorship verification is the problem of determining if two distinct writing samples share the same author and is typically concerned with the attribution of written text. In this paper, we explore the attribution of transcribed speech, which poses novel challenges. The main challenge is that many stylistic features, such as punctuation and capitalization, are not available or reliable. Therefore, we expect a priori that transcribed speech is a more challenging domain for attribution. On the other hand, other stylistic features, such as speech disfluencies, may enable more successful attribution but, being specific to speech, require special purpose models. To better understand the challenges of this setting, we contribute the first systematic study of speaker attribution based solely on transcribed speech. Specifically, we propose a new benchmark for speaker attribution focused on conversational speech transcripts. To control for spurious associations of speakers with topic, we employ both conversation prompts and speakers' participating in the same conversation to construct challenging verification trials of varying difficulties. We establish the state of the art on this new benchmark by comparing a suite of neural and non-neural baselines, finding that although written text attribution models achieve surprisingly good performance in certain settings, they struggle in the hardest settings we consider.
    摘要

Using Natural Language Explanations to Improve Robustness of In-context Learning for Natural Language Inference

  • paper_url: http://arxiv.org/abs/2311.07556
  • repo_url: None
  • paper_authors: Xuanli He, Yuxiang Wu, Oana-Maria Camburu, Pasquale Minervini, Pontus Stenetorp
  • for: 本研究旨在探讨 whether 增强自然语言解释 (X-ICL) 可以提高语言模型 (LLM) 对于 seven 个难题和攻击性自然语言推理数据集的性能。
  • methods: 本研究使用了增强自然语言解释 (X-ICL) 和几个人生成的自然语言解释 (NLE) 来提高 LLM 的性能。此外,我们还提出了一种新的 ChatGPT 几个 shot 方法,即通过让 LLM 根据几个人生成的 NLE 生成更多的 NLE。
  • results: 研究发现,X-ICL 可以提高 LLM 的性能,并且 ChatGPT 几个 shot 方法比 ChatGPT 零shot 和人生成 NLE alone 更有优势。此外,我们还发现,在robustness-oriented evaluations中,prompt selection strategies 不如 X-ICL 方法的效果。
    Abstract Recent studies have demonstrated that large language models (LLMs) excel in diverse tasks through in-context learning (ICL) facilitated by task-specific prompts and examples. However, the existing literature shows that ICL encounters performance deterioration when exposed to adversarial inputs. Enhanced performance has been observed when ICL is augmented with natural language explanations (NLEs) (we refer to it as X-ICL). Thus, this work investigates whether X-ICL can improve the robustness of LLMs on a suite of seven adversarial and challenging natural language inference datasets. Moreover, we introduce a new approach to X-ICL by prompting an LLM (ChatGPT in our case) with few human-generated NLEs to produce further NLEs (we call it ChatGPT few-shot), which we show superior to both ChatGPT zero-shot and human-generated NLEs alone. We evaluate five popular LLMs (GPT3.5-turbo, LLaMa2, Vicuna, Zephyr, Mistral) and show that X-ICL with ChatGPT few-shot yields over 6% improvement over ICL. Furthermore, while prompt selection strategies were previously shown to significantly improve ICL on in-distribution test sets, we show that these strategies do not match the efficacy of the X-ICL paradigm in robustness-oriented evaluations.
    摘要 In this study, we investigate whether X-ICL can improve the robustness of LLMs on a set of seven challenging natural language inference datasets that include adversarial examples. We also introduce a new approach to X-ICL called ChatGPT few-shot, which involves prompting an LLM with a few human-generated NLEs to produce additional NLEs. We compare the performance of five popular LLMs (GPT3.5-turbo, LLaMa2, Vicuna, Zephyr, and Mistral) with and without X-ICL and find that X-ICL with ChatGPT few-shot yields over 6% improvement over ICL.Furthermore, we find that prompt selection strategies, which have been shown to improve ICL on in-distribution test sets, do not perform as well as X-ICL in terms of robustness. Our results suggest that X-ICL with ChatGPT few-shot is a more effective approach to improving the robustness of LLMs on adversarial inputs.

Leveraging Multiple Teachers for Test-Time Adaptation of Language-Guided Classifiers

  • paper_url: http://arxiv.org/abs/2311.07538
  • repo_url: https://github.com/weikangda/talc
  • paper_authors: Kangda Wei, Sayan Ghosh, Rakesh R. Menon, Shashank Srivastava
  • for: 这个论文旨在提出一种语言引导的分类器,可以在提供任务特定的自然语言解释、说明或指令时进行分类任务。
  • methods: 这个论文使用了数据编程技术,通过多个教师的解释和无标示示例来适应新任务。
  • results: 论文的实验结果表明,TALC framework可以在提供多个教师的解释和无标示示例时,与基eline比较,具有9.3%的相对提升。此外,TALC还能够适应不同的解释质量和量的变化,这标志着其在多个教师或人群学习场景中的可靠性。
    Abstract Recent approaches have explored language-guided classifiers capable of classifying examples from novel tasks when provided with task-specific natural language explanations, instructions or prompts (Sanh et al., 2022; R. Menon et al., 2022). While these classifiers can generalize in zero-shot settings, their task performance often varies substantially between different language explanations in unpredictable ways (Lu et al., 2022; Gonen et al., 2022). Also, current approaches fail to leverage unlabeled examples that may be available in many scenarios. Here, we introduce TALC, a framework that uses data programming to adapt a language-guided classifier for a new task during inference when provided with explanations from multiple teachers and unlabeled test examples. Our results show that TALC consistently outperforms a competitive baseline from prior work by an impressive 9.3% (relative improvement). Further, we demonstrate the robustness of TALC to variations in the quality and quantity of provided explanations, highlighting its potential in scenarios where learning from multiple teachers or a crowd is involved. Our code is available at: https://github.com/WeiKangda/TALC.git.
    摘要 现有方法已经探索了语言引导的分类器,可以在提供任务特定的自然语言说明、指导或提示的情况下将示例分类(Sanh et al., 2022; R. Menon et al., 2022)。这些分类器可以在零容量设置下进行泛化,但它们在不同语言说明中的任务性能差异很大,具有不可预测的特性(Lu et al., 2022; Gonen et al., 2022)。此外,当前的方法无法利用可用的无标示例。为了解决这些问题,我们介绍了TALC框架,它使用数据编程来在推理时将语言引导的分类器适应新任务,并使用多个教师和无标测试示例进行适应。我们的结果表明,TALC在比较基eline的9.3%的相对提升下 consistently outperform(相对提升9.3%)。此外,我们还证明了TALC对提供的说明质量和量的变化不敏感,这表明它在多个教师或一群人学习场景中具有潜在的优势。我们的代码可以在:https://github.com/WeiKangda/TALC.git中找到。

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

  • paper_url: http://arxiv.org/abs/2311.07536
  • repo_url: None
  • paper_authors: Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Min Zhang
  • for: This paper aims to evaluate the capabilities of the newly introduced GPT-4V model in visual question answering tasks, specifically in the realm of knowledge-intensive VQA tasks.
  • methods: The paper uses three perspectives to evaluate the model’s performance: Commonsense Knowledge, Fine-grained World Knowledge, and Comprehensive Knowledge with Decision-making Rationales.
  • results: The extensive experiments show that GPT-4V achieves state-of-the-art (SOTA) performance on the above three tasks, with notable improvements in reasoning and explanation when using composite images as few-shot. However, the model also exhibits severe hallucinations when dealing with world knowledge, highlighting the need for further advancements in this research direction.Here is the same information in Simplified Chinese text:
  • for: 本研究旨在评估新引入的GPT-4V模型在视觉问答任务中的能力,特别是知识集成VQA任务。
  • methods: 本研究使用三个角度评估模型性能: 通用常识知识、细化世界知识和决策理由。
  • results: 广泛的实验表明GPT-4V在上述三个任务中达到了状态之最(SOTA)性能, 其中使用复合图像作为少量时的推理和解释能力有所提升。然而,模型在世界知识方面也存在严重的幻觉现象, 反映未来在这个研究方向上需要进一步的进步。
    Abstract The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA). Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate not just recognition of visual elements, but also a deep comprehension of the visual information in conjunction with a vast repository of learned knowledge. To uncover such capabilities of MLMs, particularly the newly introduced GPT-4V, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing their proficiency across various specialized fields; 3) Comprehensive Knowledge with Decision-making Rationales, which examines model's capability to provide logical explanations for its inference, facilitating a deeper analysis from the interpretability perspective. Extensive experiments indicate that GPT-4V achieves SOTA performance on above three tasks. Interestingly, we find that: a) GPT-4V demonstrates enhanced reasoning and explanation when using composite images as few-shot; b) GPT-4V produces severe hallucinations when dealing with world knowledge, highlighting the future need for advancements in this research direction.
    摘要 随着多模态大型模型(MLM)的出现,视觉理解领域得到了极大的进步,特别是在视觉问答(VQA)领域。然而,真正的挑战在于知识导向的VQA任务,需要不仅识别视觉元素,而且还需要深入理解视觉信息并与大量学习知识相结合。为了探索MLMs的真正能力,特别是新引入的GPT-4V,我们提供了三个视角的深入评估:1)通用常识,评估模型如何理解视觉提示并与通用知识相连接; 2)细腻世界知识,测试模型在图像中特定知识的逻辑推理能力,展示其在多个专业领域中的掌握能力; 3)全面知识与决策逻辑,评估模型对其推理的解释能力,促进对其解释的深入分析。广泛的实验表明GPT-4V在以上三个任务中达到了最高的表现。有趣的是,我们发现:a)GPT-4V在几何图像作为少量例子时展现出更高的逻辑推理和解释能力; b)GPT-4V在world知识方面存在严重的幻觉现象,表明未来在这个研究方向上需要进一步的进步。

It’s Not Easy Being Wrong: Evaluating Process of Elimination Reasoning in Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07532
  • repo_url: https://github.com/nbalepur/poe
  • paper_authors: Nishant Balepur, Shramay Palta, Rachel Rudinger
  • for: 这个研究旨在测试大语言模型(LLM)是否可以通过链接思考(COT)来推理正确答案,以及这种方法在推理错误答案时的效果。
  • methods: 这个研究使用了链接思考(COT)和排除过程(PoE)的结合,让LLMs需要在多个选项中推理 incorrect options。
  • results: 研究发现,使用PoE with COT的能力在2选1的常识和科学推理 datasets 上具有较差的表现,并且这些策略之间的一致性较低。研究还进行了错误分析,并提供了未来工作的建议。
    Abstract Chain-of-thought (COT) prompting can help large language models (LLMs) reason toward correct answers, but its efficacy in reasoning toward incorrect answers is unexplored. This strategy of process of elimination (PoE), when used with COT, has the potential to enhance interpretability in tasks like medical diagnoses of exclusion. Thus, we propose PoE with COT, a new task where LLMs must reason toward incorrect options on multiple-choice questions. We evaluate the ability of GPT-3.5, LLaMA-2, and Falcon to perform PoE with COT on 2-choice commonsense and scientific reasoning datasets. We show that PoE consistently underperforms directly choosing the correct answer. The agreement of these strategies is also lower than the self-consistency of each strategy. To study these issues further, we conduct an error analysis and give suggestions for future work.
    摘要 链式思维(COT)提问可以帮助大型语言模型(LLM)到达正确答案,但其在 incorrect answers 上的效果未经探索。这种进程消除(PoE)策略,当用于 COT,有可能增强解释性在任务如医疗诊断排除中。因此,我们提议 PoE with COT,一种新的任务,要求 LLM 在多选问题上进行 incorrect options 的理解。我们使用 GPT-3.5、LLaMA-2 和 Falcon 来评估这些模型在 2 选常识和科学理解数据集上的表现。我们发现,PoE 通常下perform directly choosing the correct answer 。这些策略之间的一致性也比每个策略自我一致性低。为了更深入地研究这些问题,我们进行了错误分析并提供了未来工作的建议。

Multilingual Nonce Dependency Treebanks: Understanding how LLMs represent and process syntactic structure

  • paper_url: http://arxiv.org/abs/2311.07497
  • repo_url: None
  • paper_authors: David Arps, Laura Kallmeyer, Younes Samih, Hassan Sajjad
  • for: 这个论文是为了创建多语言 универсаль dependencies(UD) corpora 的非常扩展(SPUD)框架。
  • methods: 这个论文使用语义扰动的方法创建非常扩展数据,并遵循语言特定的规则进行语法注释。
  • results: 研究人员通过使用 SPUD 框架创建了阿拉伯语、英语、法语、德语和俄语等多种语言的非常扩展数据。并对这些数据进行了两个使用场景的研究:首先,研究非常扩展数据对单词共occurrence统计的影响,通过对 autoregressive(ALM)和 masked language models(MLM)的 perplexity 分布进行比较。其次,研究非常扩展数据对语法依赖探测器的影响,并复制了 M"uller-Eberstein et al. (2022) 的研究结果。
    Abstract We introduce SPUD (Semantically Perturbed Universal Dependencies), a framework for creating nonce treebanks for the multilingual Universal Dependencies (UD) corpora. SPUD data satisfies syntactic argument structure, provides syntactic annotations, and ensures grammaticality via language-specific rules. We create nonce data in Arabic, English, French, German, and Russian, and demonstrate two use cases of SPUD treebanks. First, we investigate the effect of nonce data on word co-occurrence statistics, as measured by perplexity scores of autoregressive (ALM) and masked language models (MLM). We find that ALM scores are significantly more affected by nonce data than MLM scores. Second, we show how nonce data affects the performance of syntactic dependency probes. We replicate the findings of M\"uller-Eberstein et al. (2022) on nonce test data and show that the performance declines on both MLMs and ALMs wrt. original test data. However, a majority of the performance is kept, suggesting that the probe indeed learns syntax independently from semantics.
    摘要 我们介绍SPUD(semantically perturbed universal dependencies)框架,用于创建多语言 universal dependencies(UD) corpora 的非常数据。SPUD 数据满足语义上的结构、提供语法注释,并通过语言特定规则确保语法正确性。我们在阿拉伯语、英语、法语、德语和俄语等语言中创建了非常数据,并对 SPUD 树 banks 进行了两种应用场景的示例。首先,我们研究非常数据对单词共occurrence 统计的影响,通过对 autoregressive(ALM)和 masked language models(MLM)的抑���阶准确度进行评估。我们发现,ALM scores 比 MLM scores 更sensitive 于非常数据。其次,我们显示了非常数据对语法依赖探测器的影响。我们重复了 M\"uller-Eberstein et al. (2022) 的研究结果,并发现在原始测试数据上,MLMs 和 ALMs 的性能均下降。然而,大多数性能仍然保留,表明探测器实际上学习了语法独立于 semantics。

A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07491
  • repo_url: None
  • paper_authors: Hejing Cao, Zhenwei An, Jiazhan Feng, Kun Xu, Liwei Chen, Dongyan Zhao
  • for: 这个论文目的是解决大语言模型在问答任务中容易做出幻见的问题。
  • methods: 这个论文提出了一种”解构并问”(Decompose-and-Query)框架,该框架使模型能够像ReAct一样利用外部知识,同时也限制模型的思考范围,以避免幻见的风险。
  • results: 实验表明,D&Q可以减少大语言模型在问答任务中的幻见风险,在ChitChatQA dataset上,D&Q不落后于ChatGPT的67%情况下表现优于ChatGPT,在HotPotQA问题只设置下,D&Q的F1分数达59.6%。
    Abstract While large language models exhibit remarkable performance in the Question Answering task, they are susceptible to hallucinations. Challenges arise when these models grapple with understanding multi-hop relations in complex questions or lack the necessary knowledge for a comprehensive response. To address this issue, we introduce the "Decompose-and-Query" framework (D&Q). This framework guides the model to think and utilize external knowledge similar to ReAct, while also restricting its thinking to reliable information, effectively mitigating the risk of hallucinations. Experiments confirm the effectiveness of D&Q: On our ChitChatQA dataset, D&Q does not lose to ChatGPT in 67% of cases; on the HotPotQA question-only setting, D&Q achieved an F1 score of 59.6%. Our code is available at https://github.com/alkaidpku/DQ-ToolQA.
    摘要 大型语言模型在问答任务中表现出色,但它们容易受到幻视的影响。问题的多步关系和模型缺乏必要的知识会导致模型产生不准确的答案。为解决这个问题,我们提出了“分解并询问”(D&Q)框架。这个框架帮助模型思考和使用外部知识,同时限制模型的思考范围仅对可靠的信息,彻底降低幻视的风险。实验表明,D&Q与ChatGPT在67%的情况下不落后,在HotPotQA问题只设置(question-only)中,D&Q的F1分数为59.6%。我们的代码可以在https://github.com/alkaidpku/DQ-ToolQA上获取。

Finding and Editing Multi-Modal Neurons in Pre-Trained Transformer

  • paper_url: http://arxiv.org/abs/2311.07470
  • repo_url: None
  • paper_authors: Haowen Pan, Yixin Cao, Xiaozhi Wang, Xun Yang
  • for: 这 paper 的目的是研究 transformer 型多Modal LLM 如何处理多种模式的信息,以及如何在这些模式之间进行协同工作。
  • methods: 这 paper 使用了一种新的方法来标识 transformer 型多Modal LLM 中的多Modal neuron,并通过一系列实验描述了这些 neuron 的三个关键特性。
  • results: 这 paper 的研究结果表明,通过使用这种新的方法,可以帮助更好地理解 transformer 型多Modal LLM 如何处理多种模式的信息,并且可以通过修改特定的 token 来实现更好的协同工作。
    Abstract Multi-modal large language models (LLM) have achieved powerful capabilities for visual semantic understanding in recent years. However, little is known about how LLMs comprehend visual information and interpret different modalities of features. In this paper, we propose a new method for identifying multi-modal neurons in transformer-based multi-modal LLMs. Through a series of experiments, We highlight three critical properties of multi-modal neurons by four well-designed quantitative evaluation metrics. Furthermore, we introduce a knowledge editing method based on the identified multi-modal neurons, for modifying a specific token to another designative token. We hope our findings can inspire further explanatory researches on understanding mechanisms of multi-modal LLMs.
    摘要 多modal大语言模型(LLM)在过去几年内实现了视觉semantic理解的强大能力。然而,对于LLM如何理解视觉信息以及不同modalities的特征 interpretations的知识很少。在这篇论文中,我们提出了一种新的方法来确定 transformer-based multi-modal LLM中的多modal нейроны。通过一系列实验,我们高亮了这些多modal нейроны的三个重要特性,并通过四种Well-designed量化评价指标来评估它们。此外,我们还介绍了基于被确定的多modal нейроны的知识编辑方法,可以修改特定的token到另一个特定的token。我们希望我们的发现可以激励更多的研究人员进行对多modal LLM的理解机制进行解释。

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks

  • paper_url: http://arxiv.org/abs/2311.07463
  • repo_url: None
  • paper_authors: Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram
  • for: 本研究旨在扩展MEGA benchmarking suite,包括6个新数据集,组成MEGAVERSE benchmark,覆盖81种语言,包括低资源非洲语言。
  • methods: 本研究使用了several state-of-the-art LLMs,如GPT-3.5-Turbo、GPT4、PaLM2和Llama2,对MEGAVERSE datasets进行评估。此外,还包括了两个多模式数据集,评估LLaVa-v1.5模型的表现。
  • results: 实验结果显示,GPT4和PaLM2在各种任务中表现出色,特别是在低资源语言上表现出优异,GPT4在更多的数据集上than PaLM2表现出优异。然而,数据污染问题需要解决,以确保对非英语语言LLM性能的准确评估。
    Abstract Recently, there has been a rapid advancement in research on Large Language Models (LLMs), resulting in significant progress in several Natural Language Processing (NLP) tasks. Consequently, there has been a surge in LLM evaluation research to comprehend the models' capabilities and limitations. However, much of this research has been confined to the English language, leaving LLM building and evaluation for non-English languages relatively unexplored. There has been an introduction of several new LLMs, necessitating their evaluation on non-English languages. This study aims to expand our MEGA benchmarking suite by including six new datasets to form the MEGAVERSE benchmark. The benchmark comprises 22 datasets covering 81 languages, including low-resource African languages. We evaluate several state-of-the-art LLMs like GPT-3.5-Turbo, GPT4, PaLM2, and Llama2 on the MEGAVERSE datasets. Additionally, we include two multimodal datasets in the benchmark and assess the performance of the LLaVa-v1.5 model. Our experiments suggest that GPT4 and PaLM2 outperform the Llama models on various tasks, notably on low-resource languages, with GPT4 outperforming PaLM2 on more datasets than vice versa. However, issues such as data contamination must be addressed to obtain an accurate assessment of LLM performance on non-English languages.
    摘要 近些时候,大语言模型(LLM)的研究得到了快速发展,导致了许多自然语言处理(NLP)任务的重要进步。然而,大多数这些研究都是在英语语言上进行的,因此非英语语言的LLM建构和评估还很少被探索。随着新的LLM的出现,需要对这些模型进行评估。本研究的目标是扩展我们的MEGA benchmarking suite,包括6个新的数据集,组成MEGAVERSE benchmark。该benchmark包括81种语言的22个数据集,包括低资源非洲语言。我们评估了一些当前最佳的LLM,如GPT-3.5-Turbo、GPT4、PaLM2和Llama2在MEGAVERSE数据集上的表现。此外,我们还包括了两个多Modal数据集,评估LLaVa-v1.5模型的表现。我们的实验表明,GPT4和PaLM2在不同任务上表现出色,特别是在低资源语言上。然而,需要解决数据杂杂问题,以获得LLM在非英语语言上的准确评估。

ChartCheck: An Evidence-Based Fact-Checking Dataset over Real-World Chart Images

  • paper_url: http://arxiv.org/abs/2311.07453
  • repo_url: None
  • paper_authors: Mubashara Akhtar, Nikesh Subedi, Vivek Gupta, Sahar Tahmasebi, Oana Cocarascu, Elena Simperl
  • for: 这篇论文是用来探讨如何验证图表上的说法的。
  • methods: 这篇论文使用了一个新的图表验证集合(ChartCheck),该集合包含1.7万个真实世界图表和10.5万个人写的说法和解释。
  • results: 在使用state-of-the-art模型进行评估后,该研究在finetuned设置下达到了73.9%的准确率。此外,研究还发现了图表特征和逻辑类型,对模型带来挑战。
    Abstract Data visualizations are common in the real-world. We often use them in data sources such as scientific documents, news articles, textbooks, and social media to summarize key information in a visual form. Charts can also mislead its audience by communicating false information or biasing them towards a specific agenda. Verifying claims against charts is not a straightforward process. It requires analyzing both the text and visual components of the chart, considering characteristics such as colors, positions, and orientations. Moreover, to determine if a claim is supported by the chart content often requires different types of reasoning. To address this challenge, we introduce ChartCheck, a novel dataset for fact-checking against chart images. ChartCheck is the first large-scale dataset with 1.7k real-world charts and 10.5k human-written claims and explanations. We evaluated the dataset on state-of-the-art models and achieved an accuracy of 73.9 in the finetuned setting. Additionally, we identified chart characteristics and reasoning types that challenge the models.
    摘要 数据视觉是现实中非常普遍的。我们常常在数据来源 such as 科学文献、新闻文章、教科书和社交媒体上使用它们,以概括关键信息在视觉形式下。但是,图表也可能会误导其audience,通过传递false信息或推动特定的议程。验证图表上的clam是一项复杂的过程,需要分析图表的文本和视觉组成部分,考虑颜色、位置和方向等特征。此外,以确定一个说法是否由图表内容支持,经常需要不同类型的推理。为解决这个挑战,我们提出了 ChartCheck,一个大规模的实际图表验证数据集。ChartCheck包含1.7k个实际图表和10.5k个人写的说法和解释。我们在现有模型上进行评估,在finetuned设置下达到了73.9%的准确率。此外,我们还发现了图表特征和推理类型,对模型带来挑战。

Controlled Text Generation for Black-box Language Models via Score-based Progressive Editor

  • paper_url: http://arxiv.org/abs/2311.07430
  • repo_url: None
  • paper_authors: Sangwon Yu, Changmin Lee, Hojin Lee, Sungroh Yoon
  • for: 这篇论文旨在提供一种控制文本生成方法,以便在特定领域中使用黑盒语言模型生成控制文本。
  • methods: 该方法基于ScoPE(得分进行推进编辑)生成器,通过在语言模型的泛化预测过程中隐藏层次地进行编辑,以实现控制文本生成。
  • results: 实验结果表明,ScoPE可以有效地在黑盒语言模型中进行控制文本生成,并且可以在不同的控制条件下进行多种类型的生成,包括在领域内和领域外的情况下。
    Abstract Despite recent progress in language models, generating constrained text for specific domains remains a challenge, particularly when utilizing black-box models that lack domain-specific knowledge. In this paper, we introduce ScoPE (Score-based Progressive Editor) generation, a novel approach for controlled text generation for black-box language models. We employ ScoPE to facilitate text generation in the target domain by integrating it with language models through a cascading approach. Trained to enhance the target domain score of the edited text, ScoPE progressively edits intermediate output discrete tokens to align with the target attributes throughout the auto-regressive generation process of the language model. This iterative process guides subsequent steps to produce desired output texts for the target domain. Our experimental results on diverse controlled generations demonstrate that ScoPE effectively facilitates controlled text generation for black-box language models in both in-domain and out-of-domain conditions, which is challenging for existing methods.
    摘要 尽管最近的语言模型进步很大,但在特定领域中生成受限的文本仍然是一个挑战,特别是当使用黑盒模型,这些模型缺乏特定领域的知识。在这篇论文中,我们介绍了ScoPE(Score-based Progressive Editor)生成器,一种新的控制文本生成方法。我们将ScoPE与语言模型结合起来,通过级联方式进行整合。ScoPE在编辑过程中采用分数基于的进度编辑策略,通过提高目标领域的分数来编辑批处理的中间输出整数。这种迭代过程导向后续步骤生成所需的输出文本。我们的实验结果表明,ScoPE可以有效地在黑盒语言模型中进行控制文本生成,包括在预测领域和外部领域的情况下。这是现有方法所不能做的。

Speech-based Slot Filling using Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07418
  • repo_url: None
  • paper_authors: Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland
  • for: investigating the potential application of large language models (LLMs) to slot filling with noisy ASR transcriptions
  • methods: in-context learning, task-specific fine-tuning, dedicated prompt designs, and linearised knowledge injection (LKI) scheme
  • results: an 8.3% absolute SLU-F1 improvement compared to the strong Flan-T5-base baseline system on a limited data setup, achieved by using the proposed fine-tuning together with the LKI scheme for LLaMA-13B.
    Abstract Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks. This paper investigates the potential application of LLMs to slot filling with noisy ASR transcriptions, via both in-context learning and task-specific fine-tuning. Dedicated prompt designs and fine-tuning approaches are proposed to improve the robustness of LLMs for slot filling with noisy ASR transcriptions. Moreover, a linearised knowledge injection (LKI) scheme is also proposed to integrate dynamic external knowledge into LLMs. Experiments were performed on SLURP to quantify the performance of LLMs, including GPT-3.5-turbo, GPT-4, LLaMA-13B and Vicuna-13B (v1.1 and v1.5) with different ASR error rates. The use of the proposed fine-tuning together with the LKI scheme for LLaMA-13B achieved an 8.3% absolute SLU-F1 improvement compared to the strong Flan-T5-base baseline system on a limited data setup.
    摘要 最近,大型语言模型(LLM)的进步在不同语言任务上显示出无 precedent 的能力。这篇论文研究了使用 LLM 进行插入式学习和任务特定微调来应对噪音 ASR 转录的插入问题。提议了专门的提示设计和微调方法以提高 LLM 的 robustness。此外,还提出了一种线性知识批注(LKI)方案,以 integrating 动态外部知识到 LLM 中。在 SLURP 上进行了实验,测试了不同 ASR 错误率下 LLM 的性能,包括 GPT-3.5-turbo、GPT-4、LLaMA-13B 和 Vicuna-13B(v1.1和v1.5)。结果显示,使用提议的微调和 LKI 方案,LLaMA-13B 在限制数据设置下与强基准系统 Flan-T5-base 相比,提高了8.3%的 SLU-F1 精度。

An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

  • paper_url: http://arxiv.org/abs/2311.07397
  • repo_url: https://github.com/junyangwang0410/amber
  • paper_authors: Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Ming Yan, Ji Zhang, Jitao Sang
  • For: 评估多模态语言模型(MLLM)的幻觉,以提高模型改进和实际应用部署。* Methods: 提出了一个免费的多维度评估平台AMBER,可以用于评估生成任务和分类任务中的幻觉,包括物体存在、物体属性和物体关系幻觉。* Results: 通过使用AMBER评估pipeline,对主流MLLMs进行了全面的评估和细化分析,并提供了mitigating幻觉的指导建议。
    Abstract Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucination, which may lead to harmful consequences. Therefore, evaluating MLLMs' hallucinations is becoming increasingly important in model improvement and practical application deployment. Previous works are limited in high evaluation costs (e.g., relying on humans or advanced LLMs) and insufficient evaluation dimensions (e.g., types of hallucination and task). In this paper, we propose an LLM-free multi-dimensional benchmark AMBER, which can be used to evaluate both generative task and discriminative task including object existence, object attribute and object relation hallucination. Based on AMBER, we design a low-cost and efficient evaluation pipeline. Additionally, we conduct a comprehensive evaluation and detailed analysis of mainstream MLLMs including GPT-4V(ision), and also give guideline suggestions for mitigating hallucinations. The data and code of AMBER are available at https://github.com/junyangwang0410/AMBER.
    摘要 尽管现代多模态语言模型(MLLMs)已取得了显著进步,但它们仍面临较大的幻觉挑战,这可能会导致有害的后果。因此,评估MLLMs的幻觉变得越来越重要,以便提高模型和实际应用的评估。先前的工作受到高评估成本(如人工或高级LLLMs)和不够的评估维度(如幻觉类型和任务)的限制。在这篇论文中,我们提出了一个LLLM-free的多维度标准Benchmark AMBER,可以用于评估生成任务和推理任务,包括对象存在、对象特征和对象关系幻觉。基于AMBER,我们设计了一个低成本、高效的评估管道。此外,我们对主流MLLMs,如GPT-4V(ision)进行了全面的评估和详细的分析,并提供了适应幻觉的指导建议。AMBER的数据和代码可以在GitHub上获取:https://github.com/junyangwang0410/AMBER。

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study

  • paper_url: http://arxiv.org/abs/2311.07387
  • repo_url: https://github.com/yinghao-li/minesweeper-for-llm
  • paper_authors: Yinghao Li, Haorui Wang, Chao Zhang
  • for: 测试LLMs的推理和观念能力
  • methods: 使用特定任务的精革定或提示工程来测试LLMs的语言理解能力
  • results: LLMs具有基本的推理和观念能力,但对于Minesweeper任务还是困难将多步骤逻辑思考转化为实际行动
    Abstract Large Language Models (LLMs) have shown remarkable proficiency in language understanding and have been successfully applied to a variety of real-world tasks through task-specific fine-tuning or prompt engineering. Despite these advancements, it remains an open question whether LLMs are fundamentally capable of reasoning and planning, or if they primarily rely on recalling and synthesizing information from their training data. In our research, we introduce a novel task -- Minesweeper -- specifically designed in a format unfamiliar to LLMs and absent from their training datasets. This task challenges LLMs to identify the locations of mines based on numerical clues provided by adjacent opened cells. Successfully completing this task requires an understanding of each cell's state, discerning spatial relationships between the clues and mines, and strategizing actions based on logical deductions drawn from the arrangement of the cells. Our experiments, including trials with the advanced GPT-4 model, indicate that while LLMs possess the foundational abilities required for this task, they struggle to integrate these into a coherent, multi-step logical reasoning process needed to solve Minesweeper. These findings highlight the need for further research to understand and nature of reasoning capabilities in LLMs under similar circumstances, and to explore pathways towards more sophisticated AI reasoning and planning models.
    摘要

LM-Polygraph: Uncertainty Estimation for Language Models

  • paper_url: http://arxiv.org/abs/2311.07383
  • repo_url: None
  • paper_authors: Ekaterina Fadeeva, Roman Vashurin, Akim Tsvigun, Artem Vazhentsev, Sergey Petrakov, Kirill Fedyanin, Daniil Vasilev, Elizaveta Goncharova, Alexander Panchenko, Maxim Panov, Timothy Baldwin, Artem Shelmanov
  • for: 提高大语言模型的安全、责任和有效性,解决LLMs“妄想”问题
  • methods: 引入了一系列 state-of-the-art uncertainty estimation(UE)方法,包括Python程序接口
  • results: 提供了一个可扩展的测试 benchmark,以及一个演示应用程序,增加了标准对话框架中的信任分数,帮助用户识别不可靠回答
    Abstract Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often "hallucinate", i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of LLMs. However, to date, research on UE methods for LLMs has been focused primarily on theoretical rather than engineering contributions. In this work, we tackle this issue by introducing LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. Additionally, it introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores, empowering end-users to discern unreliable responses. LM-Polygraph is compatible with the most recent LLMs, including BLOOMz, LLaMA-2, ChatGPT, and GPT-4, and is designed to support future releases of similarly-styled LMs.
    摘要 In this work, we aim to tackle this challenge by introducing LM-Polygraph, a framework that implements a variety of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. Additionally, we provide an extendable benchmark for consistent evaluation of UE techniques by researchers, as well as a demo web application that enriches standard chat dialogs with confidence scores, empowering end-users to distinguish unreliable responses.LM-Polygraph is compatible with the latest LLMs, including BLOOMz, LLaMA-2, ChatGPT, and GPT-4, and is designed to support future releases of similarly-styled LMs. By providing a practical solution for UE in LLMs, we hope to promote safer, more responsible, and more effective use of these models in a wide range of applications.

Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision

  • paper_url: http://arxiv.org/abs/2311.07362
  • repo_url: https://github.com/kaistai/volcano
  • paper_authors: Seongyun Lee, Sue Hyun Park, Yongrae Jo, Minjoon Seo
  • for: 本研究目的是解决大型多模态模型(LMMs)中的多模态幻觉问题,即模型提供错误的响应与给定的视觉信息不匹配。
  • methods: 我们提出了一种新的方法,即利用自我反馈作为视觉cue。我们基于这种方法提出了一种名为“火山”的多模态自适应修订模型,该模型可以根据给定的视觉信息生成自然语言反馈,并使用这些反馈进行自适应修订。
  • results: 我们的模型在MMHal-Bench、POPE和GAVIE上实现了状态的最佳性,并且在通用多模态能力方面也进步了。我们还通过质量分析表明,火山的反馈比初始响应更加靠近图像,这表明火山可以提供更多的视觉信息,帮助缓解多模态幻觉。我们在https://github.com/kaistAI/Volcano上公开发布了火山模型的7B和13B版本,以及数据和代码。
    Abstract Large multimodal models (LMMs) suffer from multimodal hallucination, where they provide incorrect responses misaligned with the given visual information. Recent works have conjectured that one of the reasons behind multimodal hallucination might be due to the vision encoder failing to ground on the image properly. To mitigate this issue, we propose a novel approach that leverages self-feedback as visual cues. Building on this approach, we introduce Volcano, a multimodal self-feedback guided revision model. Volcano generates natural language feedback to its initial response based on the provided visual information and utilizes this feedback to self-revise its initial response. Volcano effectively reduces multimodal hallucination and achieves state-of-the-art on MMHal-Bench, POPE, and GAVIE. It also improves on general multimodal abilities and outperforms previous models on MM-Vet and MMBench. Through a qualitative analysis, we show that Volcano's feedback is properly grounded on the image than the initial response. This indicates that Volcano can provide itself with richer visual information, helping alleviate multimodal hallucination. We publicly release Volcano models of 7B and 13B sizes along with the data and code at https://github.com/kaistAI/Volcano.
    摘要 大型多模式模型(LMM)受到多模式幻觉的影响,即提供错误的回应不符合给定的视觉信息。近期研究认为,这可能是由视觉编码器无法固定到图像而导致的。为解决这个问题,我们提出了一种新的方法,即利用自身反馈作为视觉cue。基于这种方法,我们介绍了一种新的多模式自 feedbac k revisions 模型——火山。火山生成基于提供的视觉信息的自然语言反馈,并使用这些反馈来自 revision 其初始回应。火山有效地减少多模式幻觉,并在 MMHal-Bench、POPE 和 GAVIE 上达到了领先的状态。它还在通用多模式能力方面进步,并在 MM-Vet 和 MMBench 上超越了前一代模型。通过质量分析,我们显示了火山的反馈是与图像更加固定的,这表明火山可以提供更多的视觉信息,帮助消除多模式幻觉。我们在 GitHub 上公开了火山模型的7B和13B版本,以及相关数据和代码。

BIDRN: A Method of Bidirectional Recurrent Neural Network for Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2311.07296
  • repo_url: None
  • paper_authors: Dr. D Muthusankar, Dr. P Kaladevi, Dr. V R Sadasivam, R Praveen
    for: This paper aims to provide a systematic framework for sentiment analysis in the context of student input on institution choice.methods: The study employs Deep Bidirectional Recurrent Neural Networks (BDRNNs) to analyze sentiment and generate a dataset with sentiment labels.results: The proposed SA-BDRNN Scheme is compared to existing frameworks to establish a robust deep neural network that can serve as an adequate classification model in sentiment analysis.
    Abstract Text mining research has grown in importance in recent years due to the tremendous increase in the volume of unstructured textual data. This has resulted in immense potential as well as obstacles in the sector, which may be efficiently addressed with adequate analytical and study methods. Deep Bidirectional Recurrent Neural Networks are used in this study to analyze sentiment. The method is categorized as sentiment polarity analysis because it may generate a dataset with sentiment labels. This dataset can be used to train and evaluate sentiment analysis models capable of extracting impartial opinions. This paper describes the Sentiment Analysis-Deep Bidirectional Recurrent Neural Networks (SA-BDRNN) Scheme, which seeks to overcome the challenges and maximize the potential of text mining in the context of Big Data. The current study proposes a SA-DBRNN Scheme that attempts to give a systematic framework for sentiment analysis in the context of student input on institution choice. The purpose of this study is to compare the effectiveness of the proposed SA- DBRNN Scheme to existing frameworks to establish a robust deep neural network that might serve as an adequate classification model in the field of sentiment analysis.
    摘要 Translation in Simplified Chinese:文本挖掘研究在最近几年内得到了越来越多的重要性,这是因为各种不结构化的文本数据的量增加了惊人的幅度。这种情况带来了很大的潜在和障碍,这些障碍可以通过适当的分析和研究方法有效地解决。这里使用的深度卷积神经网络(DBRNN)是用于情感分析的。这种方法被称为情感极性分析,因为它可以生成带有情感标签的数据集。这个数据集可以用来训练和评估情感分析模型,以EXTRACTING偏见的意见。这篇论文描述了对文本挖掘在大数据时代的SA-DBRNN方案,这种方案旨在解决文本挖掘领域中的挑战并充分发挥其潜在。现study提出了一种基于SA-DBRNN的方案,以便在学生选择机构的输入中进行情感分析。本研究的目的是比较SA-DBRNN方案与现有框架的效果,以建立一个robust的深度神经网络,用于情感分析领域的分类模型。

AdaCCD: Adaptive Semantic Contrasts Discovery based Cross Lingual Adaptation for Code Clone Detection

  • paper_url: http://arxiv.org/abs/2311.07277
  • repo_url: None
  • paper_authors: Yangkai Du, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji
  • for: 本研究旨在提高现代软件中代码冗余检测的效果,特别是在多种编程语言的情况下。
  • methods: 本文提出了一种名为AdaCCD的新方法,该方法可以在没有新语言的注释的情况下检测代码冗余。AdaCCD利用了预训练的编程语言模型来获得语言不可见代码表示,并提出了一种适应性提升对冲学习框架来传递知识从资源丰富的语言到资源缺乏的语言。
  • results: 根据多种编程语言的多语言代码冗余检测 benchmark,AdaCCD 的跨语言适应结果显著超过了其他基eline,甚至与精心微调相当。
    Abstract Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annotated data as well as their own model design constraints. To address these issues, we present AdaCCD, a novel cross-lingual adaptation method that can detect cloned codes in a new language without any annotations in that language. AdaCCD leverages language-agnostic code representations from pre-trained programming language models and propose an Adaptively Refined Contrastive Learning framework to transfer knowledge from resource-rich languages to resource-poor languages. We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages. AdaCCD achieves significant improvements over other baselines, and it is even comparable to supervised fine-tuning.
    摘要 <>Code Clone Detection,目标是从大型代码库中检索功能相似的程序,在现代软件中变得越来越受到关注。然而,当前的代码副本检测方法通常只能处理其中一些流行的编程语言。这是因为缺乏相关的标注数据以及模型设计的限制。为解决这些问题,我们提出了AdaCCD,一种跨语言适应方法,可以在新语言中检测副本代码无需该语言的标注。AdaCCD利用预训练的编程语言模型提供的语言不可识别代码表示,并提出了一种适应性反射对比学习框架,将资源丰富的语言中的知识传递到资源缺乏的语言中。我们通过构建5种编程语言的多语言代码副本检测 benchmark来评估AdaCCD的跨语言适应结果。AdaCCD在比较其他基eline上显示出了显著的改善,甚至可以与监督练化相当。

Danish Foundation Models

  • paper_url: http://arxiv.org/abs/2311.07264
  • repo_url: https://github.com/centre-for-humanities-computing/danish-foundation-models
  • paper_authors: Kenneth Enevoldsen, Lasse Hansen, Dan S. Nielsen, Rasmus A. F. Egebæk, Søren V. Holm, Martin C. Nielsen, Martin Bernstorff, Rasmus Larsen, Peter B. Jørgensen, Malte Højmark-Bertelsen, Peter B. Vahlstrup, Per Møldrup-Dalum, Kristoffer Nielbo
  • for: 提高小语言的研究水平和应用前景
  • methods: 基于广泛合作和高质量数据的开源基础模型
  • results: 提供高质量的开源基础模型,促进小语言研究和应用发展Here’s a breakdown of each point:
  • for: The paper is written to improve the research level and application prospects of small languages.
  • methods: The project uses open, well-documented, and high-quality foundation models for the Danish language, based on broad cooperation with public and private institutions.
  • results: The project provides high-quality open-source foundation models, which promote the development of small language research and applications.
    Abstract Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the Danish language. This is achieved through broad cooperation with public and private institutions, to ensure high data quality and applicability of the trained models. We present the motivation of the project, the current status, and future perspectives.
    摘要 大型语言模型,有时也被称为基础模型,已经在多个领域的研究中发挥了重要作用。然而,小语言的发展受到了高训练成本和大公司对这些模型的训练不具备吸引力的限制。为了解决这问题,丹麦基础模型项目目标是提供和维护开放、充分文档和高质量的基础模型,以满足丹麦语言的需求。这实现了广泛合作的公共和私人机构,以确保数据质量的高度和训练模型的应用性。我们介绍了项目的动机、当前状况和未来展望。

How are Prompts Different in Terms of Sensitivity?

  • paper_url: http://arxiv.org/abs/2311.07230
  • repo_url: None
  • paper_authors: Sheng Lu, Hendrik Schuff, Iryna Gurevych
  • for: 本研究旨在对各种模型和任务下的提示进行系统性的分析,了解提示对模型性能的影响。
  • methods: 本研究使用敏感度函数的敏感性来评估提示的影响,并使用梯度基本的精度分析来解释不同的提示对输入Token的相关性的影响。
  • results: 研究发现,敏感性是一个不supervised的表现度量,与准确率 exhibits 强negative correlation。此外,提出了一种基于敏感度的搜索策略,可以在输入信息scarce时提供有助于。本研究对提示的分析带来新的视角,为ICL机制的更好的理解做出了贡献。
    Abstract In-context learning (ICL) has become one of the most popular learning paradigms. While there is a growing body of literature focusing on prompt engineering, there is a lack of systematic analysis comparing the effects of prompts across different models and tasks. To address this gap, we present a comprehensive prompt analysis based on the sensitivity of a function. Our analysis reveals that sensitivity is an unsupervised proxy for model performance, as it exhibits a strong negative correlation with accuracy. We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output, resulting in different levels of sensitivity. Furthermore, we introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding. We show that this approach is particularly helpful when information in the input is scarce. Our work provides a fresh perspective on the analysis of prompts, and contributes to a better understanding of the mechanism of ICL.
    摘要 启发式学习(ICL)已成为最受欢迎的学习方法之一。虽然有一个不断增长的文献关注提示工程,但是没有系统性的分析比较不同模型和任务下的提示效果。为了填补这个差距,我们提出了基于函数敏感度的完整的提示分析。我们的分析显示,函数敏感度是无监督的表现指标,它与准确性之间存在强烈负相关性。我们使用梯度基于的关注分数来实证性地表明不同的提示对输入token的相关性有多大的影响,从而导致不同的敏感度水平。此外,我们引入了敏感度意识的解码方法,它将敏感度估计作为标准排序解码中的罚项。我们示示这种方法在输入信息scarce情况下特别有助于。我们的工作为ICL机制的分析提供了新的视角,并为ICL的更好的理解做出了贡献。

Troubles and Failures in Interactional Language. Towards a Linguistically Informed Taxonomy

  • paper_url: http://arxiv.org/abs/2311.07217
  • repo_url: None
  • paper_authors: Martina Wiltschko
  • for: 本研究旨在理解人类和人工对话代理(CA)之间的互动 nature, specifically focusing on linguistically defined variables that influence the flow of conversations among humans.
  • methods: 该研究采用了一个系统性的研究计划,使用Explicit linguistic perspective to investigate the human-machine interaction (HMI).
  • results: 该研究将提供一系列关于HMI的发现和理解,包括 linguistically defined variables that influence the flow of conversations among humans.
    Abstract The goal of this talk is to introduce a systematic research agenda which aims to understand the nature of interaction between humans and artificial conversational agents (CA) (henceforth humanmachine interaction, HMI). Specifically, we shall take an explicit linguistic perspective focusing on linguistically defined variables that are known to influence the flow of conversations among humans (henceforth human-human interaction, HHI).
    摘要 目的是介绍一个系统性的研究计划,旨在了解人类和人工对话机器人(CA)之间的交互(简称人机交互,HMI)。特别是,我们将采取Explicit linguistic perspective,关注人类对话中 linguistically定义的变量,这些变量影响对话的流动(简称人人交互,HHI)。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback

  • paper_url: http://arxiv.org/abs/2311.07215
  • repo_url: None
  • paper_authors: Seungjun Moon, Yongho Song, Hyungjoo Chae, Dongjin Kang, Taeyoon Kwon, Kai Tzu-iunn Ong, Seung-won Hwang, Jinyoung Yeo
  • for: 本研究旨在利用开源代码LM进行代码修复,以提供有用的反馈和指导。
  • methods: 本研究使用了一个特定的数据集——Coffee,以及一个框架——CoffeePots,来自动生成有用的反馈和指导。
  • results: 研究显示,使用Coffee和CoffeePots可以达到人工评估修复 benchmark 的最佳性能。
    Abstract Code editing is an essential step towards reliable program synthesis to automatically correct critical errors generated from code LLMs. Recent studies have demonstrated that closed-source LLMs (i.e., ChatGPT and GPT-4) are capable of generating corrective feedback to edit erroneous inputs. However, it remains challenging for open-source code LLMs to generate feedback for code editing, since these models tend to adhere to the superficial formats of feedback and provide feedback with misleading information. Hence, the focus of our work is to leverage open-source code LLMs to generate helpful feedback with correct guidance for code editing. To this end, we present Coffee, a collected dataset specifically designed for code fixing with feedback. Using this dataset, we construct CoffeePots, a framework for COde Fixing with FEEdback via Preference-Optimized Tuning and Selection. The proposed framework aims to automatically generate helpful feedback for code editing while minimizing the potential risk of superficial feedback. The combination of Coffee and CoffeePots marks a significant advancement, achieving state-of-the-art performance on HumanEvalFix benchmark. Codes and model checkpoints are publicly available at https://github.com/Lune-Blue/COFFEE.
    摘要 <>编辑代码是重要的一步 towards 可靠的程序生成,以自动 corrections critical errors 由 code LLMs 生成。 recent studies have shown that closed-source LLMs (i.e., ChatGPT and GPT-4) can generate corrective feedback to edit incorrect inputs. However, it is challenging for open-source code LLMs to generate feedback for code editing, as these models tend to adhere to the superficial formats of feedback and provide feedback with misleading information. Therefore, our work focuses on leveraging open-source code LLMs to generate helpful feedback with correct guidance for code editing. To this end, we present Coffee, a collected dataset specifically designed for code fixing with feedback. Using this dataset, we construct CoffeePots, a framework for COde Fixing with FEEdback via Preference-Optimized Tuning and Selection. The proposed framework aims to automatically generate helpful feedback for code editing while minimizing the potential risk of superficial feedback. The combination of Coffee and CoffeePots represents a significant advancement, achieving state-of-the-art performance on HumanEvalFix benchmark. codes and model checkpoints are publicly available at https://github.com/Lune-Blue/COFFEE.Translated by Google Translate.

Exploring the Dialogue Comprehension Ability of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07194
  • repo_url: None
  • paper_authors: Shuaijie She, Shujian Huang, Xingyun Wang, Yanke Zhou, Jiajun Chen
  • for: 这 paper 的目的是评估和分析不同的语言模型(LLMs)在对话 SUMMARIZATION 和对话理解能力的表现。
  • methods: 这 paper 使用了对话 SUMMARIZATION 任务来评估和分析不同的语言模型(LLMs)的对话理解能力。
  • results: 该 paper 的结果表明,平均 speaking 27% of the summaries generated by LLMs 包含了不一致的信息。即使使用最强的模型 ChatGPT 也有16%的错误。对于回答问题,所有评估的 LLMs 的错误率为37.2%。这些结果表明现有的 LLMs 在对话理解方面存在严重的缺陷。
    Abstract LLMs may interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation with the help of the dialogue summarization task. Beside evaluating and analyzing the dialogue summarization performance (DIAC-Sum) of different LLMs, we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-FactQA). Our evaluation shows that, on average, 27% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest model evaluated, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average error rate of all evaluated LLMs is 37.2%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still the most challenging problem for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data. The experimental results demonstrate that our method achieved an error rate improvement of 10.9% on DIAC-FactQA.
    摘要

VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency

  • paper_url: http://arxiv.org/abs/2311.07172
  • repo_url: None
  • paper_authors: Vernon Toh, Ratish Puduppully, Nancy F. Chen
  • for: 这篇论文主要是研究一种用于数学理解的方法,具体来说是利用大型自然语言模型(LLMs)和程序基本解决方法来提高数学能力。
  • methods: 本论文使用了Code Llama(7B)模型,对数学问题进行分析,并提出了一种系统的方法来处理多种单位和类型的量问题。这种方法包括定义单位和确保单位的一致性。
  • results: 研究发现,通过使用单位一致程序(UCPs)进行定制,可以提高Code Llama(7B)模型的数学能力,并且在处理多种单位和类型的量问题时,提供了一些初步的结果。
    Abstract Large Language Models (LLMs) combined with program-based solving techniques are increasingly demonstrating proficiency in mathematical reasoning. However, such progress is mostly demonstrated in closed-source models such as OpenAI-GPT4 and Claude. In this paper, we seek to study the performance of strong open-source LLMs. Specifically, we analyze the outputs of Code Llama (7B) when applied to math word problems. We identify a category of problems that pose a challenge for the model, particularly those involving quantities that span multiple types or units. To address this issue, we propose a systematic approach by defining units for each quantity and ensuring the consistency of these units during mathematical operations. We developed Unit Consistency Programs (UCPs), an annotated dataset of math word problems, each paired with programs that contain unit specifications and unit verification routines. Finally, we finetune the Code Llama (7B) model with UCPs to produce VerityMath and present our preliminary findings.
    摘要 大型语言模型(LLMs)与程式基于的解题技术相结合,逐渐展现出数学推理的能力。然而,这些进步主要出现在封闭式模型中,如OpenAI-GPT4和Claude。在这篇论文中,我们想要研究强大的开源LMMs的表现。我们分析了Code Llama(7B)当作应用于数学词汇问题的输出。我们发现了一种问题类型,尤其是涉及多种或单位的量的问题,对模型而言是一大挑战。为解决这个问题,我们提出了一个系统的方法,即定义单位 для每个量,并在数学操作中保持单位的一致性。我们称这为单位一致程式(UCPs)。我们还创建了一个标注的数学词汇问题集,每个问题都有单位规定和单位验证程式。最后,我们调整了Code Llama(7B)模型,使其能够处理VerityMath,并给出我们的初步结果。

calamanCy: A Tagalog Natural Language Processing Toolkit

  • paper_url: http://arxiv.org/abs/2311.07171
  • repo_url: https://github.com/ljvmiranda921/calamancy
  • paper_authors: Lester James V. Miranda
  • for: This paper is written for those who are interested in developing natural language processing (NLP) applications for Tagalog, particularly those who want to use spaCy as their framework.
  • methods: The paper presents an open-source toolkit called calamanCy, which is built on top of spaCy and provides a consistent API for building NLP applications. The toolkit offers general-purpose multitask models with out-of-the-box support for dependency parsing, POS tagging, and NER.
  • results: The paper aims to accelerate the progress of Tagalog NLP by consolidating disjointed resources in a unified framework and providing a convenient toolkit for experimentation and integration with other frameworks. The toolkit is available on GitHub for easy access and use.
    Abstract We introduce calamanCy, an open-source toolkit for constructing natural language processing (NLP) pipelines for Tagalog. It is built on top of spaCy, enabling easy experimentation and integration with other frameworks. calamanCy addresses the development gap by providing a consistent API for building NLP applications and offering general-purpose multitask models with out-of-the-box support for dependency parsing, parts-of-speech (POS) tagging, and named entity recognition (NER). calamanCy aims to accelerate the progress of Tagalog NLP by consolidating disjointed resources in a unified framework. The calamanCy toolkit is available on GitHub: https://github.com/ljvmiranda921/calamanCy.
    摘要 我们介绍calamanCy,一个开源工具集 для构建自然语言处理(NLP)管道 дляTagalog。它基于spaCy,使得容易实验和其他框架集成。calamanCy通过提供一致的API来建立NLP应用程序,并提供通用多任务模型,包括直接出现的依赖分析、部件标记(POS)和命名实体识别(NER)。calamanCy目标是加速Tagalog NLP的进步,通过集成分散的资源在一个统一的框架中。calamanCy工具集可在GitHub上下载:https://github.com/ljvmiranda921/calamanCy。

Developing a Named Entity Recognition Dataset for Tagalog

  • paper_url: http://arxiv.org/abs/2311.07161
  • repo_url: https://github.com/ljvmiranda921/calamancy
  • paper_authors: Lester James V. Miranda
  • for: 这个论文是为了开发一个Tagalog语言Named Entity Recognition(NER)数据集而写的。
  • methods: 这个论文使用了新闻报道获取的文本,并由本地语言使用者进行了词语标注。这些标注后,得到了约7.8万个文档,分别包括人名、机构名和地点名三种实体类型。
  • results: 论文通过对现有方法进行了广泛的实验评估,并在超级vised和转移学习Setting中测试了state-of-the-art方法。最终,论文公开发布了数据和处理代码,以便在未来的Tagalog NLP工作中激发创新。
    Abstract We present the development of a Named Entity Recognition (NER) dataset for Tagalog. This corpus helps fill the resource gap present in Philippine languages today, where NER resources are scarce. The texts were obtained from a pretraining corpora containing news reports, and were labeled by native speakers in an iterative fashion. The resulting dataset contains ~7.8k documents across three entity types: Person, Organization, and Location. The inter-annotator agreement, as measured by Cohen's $\kappa$, is 0.81. We also conducted extensive empirical evaluation of state-of-the-art methods across supervised and transfer learning settings. Finally, we released the data and processing code publicly to inspire future work on Tagalog NLP.
    摘要 我们介绍了一个标点名实体识别(NER)数据集的开发,这些数据集用于填补菲律宾语言资源的空白。这些文本来自新闻报道,并由本地使用者在轮询的方式进行标注。结果的数据集包含约7.8万个文档,分为三个实体类型:人物、组织机构和地点。Inter-annotator agreement,由科恩的κ度量表示,达到0.81。我们还进行了state-of-the-art方法的广泛实验,包括直接学习和转移学习Setting中。最后,我们公开发布了数据和处理代码,以便未来的Tagalog NLP工作。

Gen-Z: Generative Zero-Shot Text Classification with Contextualized Label Descriptions

  • paper_url: http://arxiv.org/abs/2311.07115
  • repo_url: None
  • paper_authors: Sachin Kumar, Chan Young Park, Yulia Tsvetkov
  • for: 提高 zero-shot 文本分类 task 的性能和 robustness
  • methods: 提出了一种生成式 prompting 框架 Gen-Z,通过测量 LM 对输入文本的可能性,以条件提取标签描述
  • results: 在多个标准分类 benchmark 上,与 six 种开源 LM 家族进行比较,显示 zero-shot 分类可以通过简单地Contextualization 来提高性能,同时提高对 prompt 变化的Robustness。
    Abstract Language model (LM) prompting--a popular paradigm for solving NLP tasks--has been shown to be susceptible to miscalibration and brittleness to slight prompt variations, caused by its discriminative prompting approach, i.e., predicting the label given the input. To address these issues, we propose Gen-Z--a generative prompting framework for zero-shot text classification. GEN-Z is generative, as it measures the LM likelihood of input text, conditioned on natural language descriptions of labels. The framework is multivariate, as label descriptions allow us to seamlessly integrate additional contextual information about the labels to improve task performance. On various standard classification benchmarks, with six open-source LM families, we show that zero-shot classification with simple contextualization of the data source of the evaluation set consistently outperforms both zero-shot and few-shot baselines while improving robustness to prompt variations. Further, our approach enables personalizing classification in a zero-shot manner by incorporating author, subject, or reader information in the label descriptions.
    摘要 Language model(LM)提示--一种广泛使用的解决NLP任务的方法--已经显示出受到了偏置和细微提示变化的脆弱性,这是由其推理提示方法引起的,即根据输入预测标签。为解决这些问题,我们提出了Gen-Z--一个生成提示框架 для零shot文本分类。GEN-Z是生成的,因为它测量LM对输入文本的可能性, conditioned on自然语言标签描述。该框架是多变量的,因为标签描述允许我们轻松地 integratingadditional contextual information about the labels to improve task performance。在多个标准分类benchmark上,使用六种开源LM家族,我们显示了零shot分类 with simple contextualization of the data source of the evaluation set可以Consistently outperform both zero-shot和few-shot基elines while improving robustness to prompt variations。此外,我们的方法可以在零shot manner中进行个性化分类,通过在标签描述中包含作者、主题或读者信息。

Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

  • paper_url: http://arxiv.org/abs/2311.07102
  • repo_url: None
  • paper_authors: Ziwei He, Jian Yuan, Le Zhou, Jingwen Leng, Bo Jiang
  • for: 本文旨在提高Transformer模型在长文本处理中的计算效率,并且能够更好地捕捉长程度依赖关系。
  • methods: 作者提出了一种名为“Fovea Transformer”的新方法,它使用多尺度树来表示输入序列,并使用context token的表示在树中进行进一步精细化。
  • results: 在三个长文本摘要任务上进行测试,该方法达到了两个任务的状态之Art并在另一个任务上达到了竞争性的结果,并且在一个任务上有部分的评价指标表现出现了改善和退化。
    Abstract The quadratic complexity of self-attention in Transformers has hindered the processing of long text. To alleviate this problem, previous works have proposed to sparsify the attention matrix, taking advantage of the observation that crucial information about a token can be derived from its neighbors. These methods typically combine one or another form of local attention and global attention. Such combinations introduce abrupt changes in contextual granularity when going from local to global, which may be undesirable. We believe that a smoother transition could potentially enhance model's ability to capture long-context dependencies. In this study, we introduce Fovea Transformer, a long-context focused transformer that addresses the challenges of capturing global dependencies while maintaining computational efficiency. To achieve this, we construct a multi-scale tree from the input sequence, and use representations of context tokens with a progressively coarser granularity in the tree, as their distance to the query token increases. We evaluate our model on three long-context summarization tasks\footnote{Our code is publicly available at: \textit{https://github.com/ZiweiHe/Fovea-Transformer}. It achieves state-of-the-art performance on two of them, and competitive results on the third with mixed improvement and setback of the evaluation metrics.
    摘要 “transformer的 quadratic complexity对于处理长文本问题产生了阻碍。以前的工作通过将注意力矩阵簇排除,利用了Token之间的相互关联性来获得有利的信息。这些方法通常是通过地方注意力和全球注意力的结合来实现。但这种结合可能会导致Contextual granularity的突然变化,从地方到全球,这可能不太好。我们认为,一个更缓和的变化可能可以帮助模型更好地捕捉长期依赖关系。在这篇研究中,我们引入了Fovea Transformer,一种专注于长期依赖关系的 transformer。我们使用输入序列中的多对称树结构,并使用Token的距离增加而增加的表示,以获得更好的 Computational efficiency。我们将这个模型应用于三个长期摘要任务上,其中两个任务上取得了现场最佳性能,另一个任务上则获得了混合的改善和退化的评估指标。”

On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

  • paper_url: http://arxiv.org/abs/2311.07093
  • repo_url: None
  • paper_authors: Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda
  • for: 提高非站台噪音感知识的效果
  • methods: 采用自动语音识别模型为噪音鲁棒特征提取器,从中提取情感语音特征,然后对下游NSER任务进行应用
  • results: 1) 提出的方法与传统噪音减少方法相比, NSER性能更高; 2) 超越了自动学习方法和文本基于的方法; 3) 甚至超越了使用ASR转录或噪音杂音的文本基于的方法
    Abstract This paper proposes an efficient attempt to noisy speech emotion recognition (NSER). Conventional NSER approaches have proven effective in mitigating the impact of artificial noise sources, such as white Gaussian noise, but are limited to non-stationary noises in real-world environments due to their complexity and uncertainty. To overcome this limitation, we introduce a new method for NSER by adopting the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech. We first obtain intermediate layer information from the ASR model as a feature representation for emotional speech and then apply this representation for the downstream NSER task. Our experimental results show that 1) the proposed method achieves better NSER performance compared with the conventional noise reduction method, 2) outperforms self-supervised learning approaches, and 3) even outperforms text-based approaches using ASR transcription or the ground truth transcription of noisy speech.
    摘要
  1. The proposed method achieves better NSER performance compared with the conventional noise reduction method.2. It outperforms self-supervised learning approaches.3. It even outperforms text-based approaches using ASR transcription or the ground truth transcription of noisy speech.

On the Discussion of Large Language Models: Symmetry of Agents and Interplay with Prompts

  • paper_url: http://arxiv.org/abs/2311.07076
  • repo_url: None
  • paper_authors: Qineng Wang, Zihao Wang, Ying Su, Yangqiu Song
  • for: 这 paper 旨在解释如何使用多个语言模型来解释复杂问题。
  • methods: 这 paper 使用了两种方法:一是提问工程,二是组合多个语言模型的多个推理。
  • results: 这 paper 实验ally 发现,把提问工程与多个推理机制相结合可以达到复杂多个机制的性能。此外,paper 还提出了一种可扩展的讨论机制,可以使用简单的提问来实现高性能。
    Abstract Two ways has been discussed to unlock the reasoning capability of a large language model. The first one is prompt engineering and the second one is to combine the multiple inferences of large language models, or the multi-agent discussion. Theoretically, this paper justifies the multi-agent discussion mechanisms from the symmetry of agents. Empirically, this paper reports the empirical results of the interplay of prompts and discussion mechanisms, revealing the empirical state-of-the-art performance of complex multi-agent mechanisms can be approached by carefully developed prompt engineering. This paper also proposes a scalable discussion mechanism based on conquer and merge, providing a simple multi-agent discussion solution with simple prompts but state-of-the-art performance.
    摘要 两种方法已经讨论用于解锁大语言模型的理智能力。第一种是提示工程,第二种是将多个推理机器人的多种推理结果相结合,或者多个机器人的讨论。理论上,这篇论文从代理Symmetry的角度正式 justify了多机器人讨论机制。实际上,这篇论文报告了提示和讨论机制之间的交互效果,显示了复杂多机器人机制的 empirical state-of-the-art性可以通过修改的提示工程来实现。此外,这篇论文还提出了一种可扩展的讨论机制基于征服和合并,提供了简单的多机器人讨论解决方案,但能够达到 state-of-the-art性的性能。

Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

  • paper_url: http://arxiv.org/abs/2311.07070
  • repo_url: https://github.com/pootiet/explain-then-translate
  • paper_authors: Zilu Tang, Mayank Agarwal, Alex Shypula, Bailin Wang, Derry Wijaya, Jie Chen, Yoon Kim
  • for: 这种研究使用自然语言解释作为代码-代码翻译中语言模型的中间步骤。
  • methods: 研究使用三种类型的自然语言解释,对 MultiPL-E dataset中构建的 19 种编程语言进行评估。
  • results: 研究发现,自然语言解释在零shot情况下特别有效,平均提高性能 by 12%。在困难程度高的程序上,自然语言解释的改进更加明显。研究发布数据集、代码和全面解决方案在所有 19 种语言中。
    Abstract This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models. Across three types of explanations and 19 programming languages constructed from the MultiPL-E dataset, we find the explanations to be particularly effective in the zero-shot case, improving performance by 12% on average. Improvements with natural language explanations are particularly pronounced on difficult programs. We release our dataset, code, and canonical solutions in all 19 languages.
    摘要 这个研究探讨了使用自然语言解释作为代码-到-代码翻译的语言模型中间步骤。通过三种类型的解释和使用MultiPL-E数据集构建的19种程序语言,我们发现解释在零shot情况下特别有效,提高性能的平均提升为12%。使用自然语言解释在困难程序中的改进 particualry明显。我们发布了我们的数据集、代码和所有19种语言的标准解。

Context Consistency between Training and Testing in Simultaneous Machine Translation

  • paper_url: http://arxiv.org/abs/2311.07066
  • repo_url: https://github.com/zhongmz/contextconsistencybitraining4simt
  • paper_authors: Meizhi Zhong, Lemao Liu, Kehai Chen, Mingming Yang, Min Zhang
  • for: 这个论文目的是提出一种同时机器翻译(SiMT)方法,以实现实时半翻译,并且逐渐增长源语言上下文。
  • methods: 这篇论文使用了一种新的训练方法,即上下文一致训练(Context Consistency Training,CCT),以确保在训练和测试中使用上下文的一致性。
  • results: 实验结果显示,使用CCT方法可以提高翻译质量和响应速度,并且在三种语言对比中,我们的系统首次超越了现有系统,凭借我们的上下文一致训练方法。
    Abstract Simultaneous Machine Translation (SiMT) aims to yield a real-time partial translation with a monotonically growing the source-side context. However, there is a counterintuitive phenomenon about the context usage between training and testing: e.g., the wait-k testing model consistently trained with wait-k is much worse than that model inconsistently trained with wait-k' (k' is not equal to k) in terms of translation quality. To this end, we first investigate the underlying reasons behind this phenomenon and uncover the following two factors: 1) the limited correlation between translation quality and training (cross-entropy) loss; 2) exposure bias between training and testing. Based on both reasons, we then propose an effective training approach called context consistency training accordingly, which makes consistent the context usage between training and testing by optimizing translation quality and latency as bi-objectives and exposing the predictions to the model during the training. The experiments on three language pairs demonstrate our intuition: our system encouraging context consistency outperforms that existing systems with context inconsistency for the first time, with the help of our context consistency training approach.
    摘要

PROPANE: Prompt design as an inverse problem

  • paper_url: http://arxiv.org/abs/2311.07064
  • repo_url: https://github.com/rimon15/propane
  • paper_authors: Rimon Melamed, Lucas H. McCabe, Tanay Wakhare, Yejin Kim, H. Howie Huang, Enric Boix-Adsera
  • for: 提高 Large Language Models (LLMs) 的表现,用于指导 LLMs toward 特定行为。
  • methods: 提出自动化提示优化框架 PROPANE,用于找到一个可以导致 semantically similar 输出的提示,无需用户参与。
  • results: PROPANE 可以用于 (a) 改进现有的提示,和 (b) 找到 semantically obfuscated 提示,可以在不同的模型之间传递。
    Abstract Carefully-designed prompts are key to inducing desired behavior in Large Language Models (LLMs). As a result, great effort has been dedicated to engineering prompts that guide LLMs toward particular behaviors. In this work, we propose an automatic prompt optimization framework, PROPANE, which aims to find a prompt that induces semantically similar outputs to a fixed set of examples without user intervention. We further demonstrate that PROPANE can be used to (a) improve existing prompts, and (b) discover semantically obfuscated prompts that transfer between models.
    摘要 仔细设计的提示是大语言模型(LLM)引导行为的关键。因此,大量精力被投入到引导提示,以使LLM行为于特定方向。在这项工作中,我们提出了一个自动化提示优化框架,称为PROPANE,目的是找到一个引导LLM生成相似含义的输入。我们进一步证明了PROPANE可以用于(a)提高现有提示,以及(b)找到Semantic Obfuscation的提示,这些提示可以在不同的模型之间传递。

Teach me with a Whisper: Enhancing Large Language Models for Analyzing Spoken Transcripts using Speech Embeddings

  • paper_url: http://arxiv.org/abs/2311.07014
  • repo_url: None
  • paper_authors: Fatema Hasan, Yulong Li, James Foulds, Shimei Pan, Bishwaranjan Bhattacharjee
    for: 这个论文主要用于提高语音识别和理解的语言模型,以便在语音讲解时分析 spoken transcripts。methods: 该方法使用了一种名为 OpenAI Whisper 的语音模型,通过将语音信息转移到语言模型中来帮助学习语言模型。results: 实验结果表明,使用该方法可以在分析 spoken transcripts 时获得显著改进,而无需在测试时处理 audio 流。
    Abstract Speech data has rich acoustic and paralinguistic information with important cues for understanding a speaker's tone, emotion, and intent, yet traditional large language models such as BERT do not incorporate this information. There has been an increased interest in multi-modal language models leveraging audio and/or visual information and text. However, current multi-modal language models require both text and audio/visual data streams during inference/test time. In this work, we propose a methodology for training language models leveraging spoken language audio data but without requiring the audio stream during prediction time. This leads to an improved language model for analyzing spoken transcripts while avoiding an audio processing overhead at test time. We achieve this via an audio-language knowledge distillation framework, where we transfer acoustic and paralinguistic information from a pre-trained speech embedding (OpenAI Whisper) teacher model to help train a student language model on an audio-text dataset. In our experiments, the student model achieves consistent improvement over traditional language models on tasks analyzing spoken transcripts.
    摘要 <>转换给定文本到简化中文。>听音数据具有丰富的听音和paraлин频信息,这些信息对理解说话人的语调、情感和意图是重要的信号,然而传统的大型自然语言处理器如BERT不会 incorporate这些信息。随着对多Modal语言模型的增长兴趣,我们提出了一种方法,该方法可以在测试时不需要听音流程来训练语言模型。我们通过将语音信息传递给一个预训练的语音嵌入(OpenAI Whisper)教师模型,并将其中的听音和paraлин频信息传递给一个学生语言模型,以便在听音文本集合上进行训练。在我们的实验中,学生模型在分析说话笔记任务上具有一致性的改进。

cs.LG - 2023-11-13

Probabilistic Physics-integrated Neural Differentiable Modeling for Isothermal Chemical Vapor Infiltration Process

  • paper_url: http://arxiv.org/abs/2311.07798
  • repo_url: None
  • paper_authors: Deepak Akhare, Zeping Chen, Richard Gulotty, Tengfei Luo, Jian-Xun Wang
  • For: This paper aims to develop a data-driven predictive model for the isothermal chemical vapor infiltration (CVI) densification process, which is critical for producing high-performance carbon-carbon and carbon-silicon carbide composites.* Methods: The authors use the physics-integrated neural differentiable (PiNDiff) modeling framework, which incorporates uncertainty quantification to enhance the model’s reliability and robustness. They also use both synthetic and real-world manufacturing data to validate the model’s accuracy.* Results: The proposed method is shown to be effective in modeling the densification process during CVI, and can potentially be used to optimize the manufacturing process and improve the quality and consistency of the final products.
    Abstract Chemical vapor infiltration (CVI) is a widely adopted manufacturing technique used in producing carbon-carbon and carbon-silicon carbide composites. These materials are especially valued in the aerospace and automotive industries for their robust strength and lightweight characteristics. The densification process during CVI critically influences the final performance, quality, and consistency of these composite materials. Experimentally optimizing the CVI processes is challenging due to long experimental time and large optimization space. To address these challenges, this work takes a modeling-centric approach. Due to the complexities and limited experimental data of the isothermal CVI densification process, we have developed a data-driven predictive model using the physics-integrated neural differentiable (PiNDiff) modeling framework. An uncertainty quantification feature has been embedded within the PiNDiff method, bolstering the model's reliability and robustness. Through comprehensive numerical experiments involving both synthetic and real-world manufacturing data, the proposed method showcases its capability in modeling densification during the CVI process. This research highlights the potential of the PiNDiff framework as an instrumental tool for advancing our understanding, simulation, and optimization of the CVI manufacturing process, particularly when faced with sparse data and an incomplete description of the underlying physics.
    摘要 化学蒸气渗入(CVI)是制造 carbon-carbon 和 carbon-silicon carbide composites 的广泛采用的制造技术。这些材料在航空和汽车业中尤其有价值,因为它们具有出色的强度和轻量特点。CVI densification 过程对 composite 材料的最终性能、质量和一致性具有关键影响。由于实验室 optimize CVI 过程的时间长和空间大,因此实验室优化是挑战。为解决这些挑战,这些工作采用了模型中心的方法。由于 CV 的热吸 densification 过程的复杂性和实验数据的有限性,我们开发了基于物理和神经网络的 PiNDiff 模型。在 PiNDiff 方法中嵌入了不确定性评估功能,使模型的可靠性和可重复性得到加强。通过对 synthetic 和实际制造数据进行了广泛的数值实验,我们表明了 PiNDiff 方法在 CV densification 过程中的模型化能力。这些研究强调 PiNDiff 框架在 CV 制造过程中的可能性,特别是在缺乏数据和不完全物理描述的情况下。

Explainable History Distillation by Marked Temporal Point Process

  • paper_url: http://arxiv.org/abs/2311.07797
  • repo_url: None
  • paper_authors: Sishun Liu, Ke Deng, Yan Wang, Xiuzhen Zhang
  • for: 这篇论文的目的是提出一种自动生成事件解释的机器学习系统,以便在实际任务中,特别是高度重要的任务中,让研究人员可以更好地理解机器学习模型的含义。
  • methods: 这篇论文提出了一种新的任务called \acrfull{ehd},它要求一个模型可以从历史记录中提取最少的事件,使得事件分布 conditional on 左边事件可以更好地预测未来。为了有效解决 \acrshort{ehd} 问题, authors 将任务重写为一个 \gls{01ip},并直接使用名为 \acrfull{model} 的模型来解决这个问题。
  • results: 实验结果表明, \acrshort{model} 在 Retweet 和 Stack Overflow 数据集上显示出了显著的优势,并且可以 revelas 实际世界中的逻辑基础。
    Abstract Explainability of machine learning models is mandatory when researchers introduce these commonly believed black boxes to real-world tasks, especially high-stakes ones. In this paper, we build a machine learning system to automatically generate explanations of happened events from history by \gls{ca} based on the \acrfull{tpp}. Specifically, we propose a new task called \acrfull{ehd}. This task requires a model to distill as few events as possible from observed history. The target is that the event distribution conditioned on left events predicts the observed future noticeably worse. We then regard distilled events as the explanation for the future. To efficiently solve \acrshort{ehd}, we rewrite the task into a \gls{01ip} and directly estimate the solution to the program by a model called \acrfull{model}. This work fills the gap between our task and existing works, which only spot the difference between factual and counterfactual worlds after applying a predefined modification to the environment. Experiment results on Retweet and StackOverflow datasets prove that \acrshort{model} significantly outperforms other \acrshort{ehd} baselines and can reveal the rationale underpinning real-world processes.
    摘要 机器学习模型的可解释性是必备的当研究者将这些通常被认为是黑盒子引入到实际任务中,尤其是高度重要的任务。在这篇论文中,我们构建了一个机器学习系统,可以自动生成历史事件的解释。Specifically,我们提出了一个新的任务called \acrfull{ehd}.这个任务需要一个模型可以从观察历史中提取最少的事件,并且目标是使得事件分布 conditioned on 左事件可以预测未来的观察结果 Notable worse。然后,我们将液化的事件视为未来的解释。为了效率地解决 \acrshort{ehd},我们将任务重写为 \gls{01ip} ,并直接通过一个模型called \acrfull{model}来解决该程序。这种方法填充了我们的任务和现有工作之间的空白,后者只是在应用先定的环境修改后才能够识别 Factual 和 counterfactual 世界的差异。实验结果表明,\acrshort{model}在 Retweet 和 StackOverflow 数据集上显著超越了其他 \acrshort{ehd} 基elines,并且可以揭示实际世界中的本质。

Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learning

  • paper_url: http://arxiv.org/abs/2311.07790
  • repo_url: None
  • paper_authors: Paula Chen, Tingwei Meng, Zongren Zou, Jérôme Darbon, George Em Karniadakis
  • for: 科学机器学习(SciML)中的解释性和计算效率两大挑战。
  • methods: 利用一种新的理论连接,将科学机器学习中的优化问题与一般化豪夫公式相连接,该公式表示一个时间依赖的汉密尔顿-雅可比 partial differential equation(HJ PDE)的viscosity解。通过这种连接,我们可以将解决某些带权学习问题的方法重新 интерпретирова为解决一个相关的控制问题和其对应的HJ PDE。这种连接允许我们在时间上跟踪学习过程中的模型更新,并且可以避免忘记性。
  • results: 我们在特殊情况下的线性回归问题中应用了这种连接,开发了一种基于Riccati方法的解决方案,该方案可以在持续学习应用中提供计算和存储的优化。我们还提供了一些相应的数据示例,显示了我们的方法在计算和存储方面的优势。
    Abstract We address two major challenges in scientific machine learning (SciML): interpretability and computational efficiency. We increase the interpretability of certain learning processes by establishing a new theoretical connection between optimization problems arising from SciML and a generalized Hopf formula, which represents the viscosity solution to a Hamilton-Jacobi partial differential equation (HJ PDE) with time-dependent Hamiltonian. Namely, we show that when we solve certain regularized learning problems with integral-type losses, we actually solve an optimal control problem and its associated HJ PDE with time-dependent Hamiltonian. This connection allows us to reinterpret incremental updates to learned models as the evolution of an associated HJ PDE and optimal control problem in time, where all of the previous information is intrinsically encoded in the solution to the HJ PDE. As a result, existing HJ PDE solvers and optimal control algorithms can be reused to design new efficient training approaches for SciML that naturally coincide with the continual learning framework, while avoiding catastrophic forgetting. As a first exploration of this connection, we consider the special case of linear regression and leverage our connection to develop a new Riccati-based methodology for solving these learning problems that is amenable to continual learning applications. We also provide some corresponding numerical examples that demonstrate the potential computational and memory advantages our Riccati-based approach can provide.
    摘要 我们面临科学机器学习(SciML)中的两大挑战:解释性和计算效率。我们将增强一些学习过程的解释性,通过建立一个新的理论连接,它连接了由SciML产生的优化问题和一个通用的豪夫公式,这个公式表示一个时间依赖的汉米顿-雅可比偏微分方程(HJ PDE)的沥丹解。具体来说,当我们解决一些具有积分类型损失函数的定制化学习问题时,我们其实是解决一个优化控制问题和其相关的HJ PDE。这个连接让我们可以将增量更新给学习模型视为时间演化的HJ PDE和优化控制问题,所有的先前信息都是内在地嵌入到HJ PDE的解中。因此,我们可以重用现有的HJ PDE解法和优化控制算法来设计新的高效训练方法,这些方法自然地与持续学习框架匹配,而不会发生衰减式遗传。作为一个首先探索这个连接的例子,我们考虑了特殊情况下的线性回推,并利用我们的连接,开发了一个新的里卡提-基础的方法学,这种方法适合持续学习应用。我们还提供了一些相应数例,以示出我们的方法可能具有更高的计算和内存优势。

Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

  • paper_url: http://arxiv.org/abs/2311.07786
  • repo_url: None
  • paper_authors: SayedHassan Khatoonabadi, Ahmad Abdellatif, Diego Elias Costa, Emad Shihab
  • For: The paper aims to predict the first response latency of maintainers and contributors in the context of pull requests (PRs) on GitHub.* Methods: The authors use a machine-learning approach with 21 features to predict the first response latency of maintainers and contributors. They evaluate seven types of classifiers and perform permutation feature importance and SHAP analyses to understand the impact of different features on the predicted response latencies.* Results: The authors achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. They find that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.
    Abstract The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also perform permutation feature importance and SHAP analyses to understand the importance and impact of different features on the predicted response latencies. Our best-performing models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. Our findings indicate that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.
    摘要 Success of a Pull Request (PR) 取决于维护者和贡献者在审核过程中的反应速度。了解审核过程的等待时间可以导致更好的互动和管理的期望。在这篇论文中,我们提出一种机器学习方法,可以预测维护者对于PR的第一个响应时间,以及贡献者对于维护者的第一个响应时间。我们收录了20个大型和受欢迎的开源项目的GitHub数据集,并提取了21个特征来描述项目、贡献者、PR和审核过程。使用这些特征,我们然后评估了七种类型的分类器,以确定最佳性能的模型。我们还进行了排序特征重要性和SHAP分析,以了解不同特征对预测响应时间的重要性和影响。我们的最佳模型在20个项目中的AUC-ROC和AUC-PR方面达到了33%的平均提升和58%的平均提升,而贡献者的AUC-ROC和AUC-PR方面达到了42%的平均提升和95%的平均提升。我们的发现表明,PR在星期一提交的早些时候,包含平均或微妙的提交数量,并且描述简洁的PR更有可能得到更快的第一个响应。同时,维护者的第一个响应时间早些,贡献者的第一个响应时间早些,贡献者的acceptance rate高和历史快速响应率高的贡献者更有可能得到和提供更快的第一个响应。

Dynamic Local Attention with Hierarchical Patching for Irregular Clinical Time Series

  • paper_url: http://arxiv.org/abs/2311.07744
  • repo_url: None
  • paper_authors: Xingyu Chen, Xiaochen Zheng, Amina Mollaysa, Manuel Schürch, Ahmed Allam, Michael Krauthammer
  • for: 这篇论文是为了解决在医疗和健康领域中频繁出现的不规则多ivariate时间序列数据的问题。
  • methods: 这篇论文使用了一个新的模型架构,包括两个模块:(1) DLA,一个动态本地注意力机制,通过学习的问题和特定的特征本地窗口来计算自我注意力操作。这会将不规则时间步的原始输入转换为一个调和的常规特征空间表示,同时考虑不同特征的抽样率。(2) 一个层次MLP混合器,将 DLA 的出力处理,通过多尺度装配来利用不同的尺度上的信息来进行下游任务。
  • results: 这篇论文的方法比前一些方法在三个真实世界数据集上表现更好,包括最新的医疗 MIMIC IV 数据集。
    Abstract Irregular multivariate time series data is prevalent in the clinical and healthcare domains. It is characterized by time-wise and feature-wise irregularities, making it challenging for machine learning methods to work with. To solve this, we introduce a new model architecture composed of two modules: (1) DLA, a Dynamic Local Attention mechanism that uses learnable queries and feature-specific local windows when computing the self-attention operation. This results in aggregating irregular time steps raw input within each window to a harmonized regular latent space representation while taking into account the different features' sampling rates. (2) A hierarchical MLP mixer that processes the output of DLA through multi-scale patching to leverage information at various scales for the downstream tasks. Our approach outperforms state-of-the-art methods on three real-world datasets, including the latest clinical MIMIC IV dataset.
    摘要 众多变量时间序列数据在医疗和健康领域非常普遍,它具有时间和特征方面的不规则性,使得机器学习方法很难处理。为解决这个问题,我们介绍了一种新的模型架构,包括两个模块:(1)DLA(动态本地注意力机制),它使用学习的查询和特征特定的本地窗口来计算自注意操作。这将在每个窗口中将不规则时间步骤的原始输入融合到一个协调的常规特征空间表示中,同时考虑不同特征的抽取速率。(2)层次MLP混合器,它将DLA输出处理过多个尺度的补丁来利用不同尺度的信息来下游任务。我们的方法在三个实际世界数据集上达到了现有方法的最佳性能,包括最新的医疗MIMIC IV数据集。

A Simple Quantum Blockmodeling with Qubits and Permutations

  • paper_url: http://arxiv.org/abs/2311.07726
  • repo_url: None
  • paper_authors: Ammar Daskin
  • For: 这篇论文的目的是为了介绍一种基于排序矩阵的量子阈值模型,用于数据分析任务。* Methods: 这种模型使用了排序矩阵和 permutation matrix,并通过在量子计算机上并行执行排序来实现效率的计算。* Results: 论文表明,使用这种模型可以在 $O(log(N))$ 时间内查找或更新适应值,比类 классической计算机更快。此外,由于量子Circuit中可以同时执行不同的排序序列,因此这种模型在量子计算机上可以更有效地实现机器学习任务。
    Abstract Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step. In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time. This lead us to show that when the number of iterations are less than $log(N)$ time, it may be possible to reach the same solution exponentially faster on quantum computers in comparison to classical computers. In addition, since on quantum circuits the different sequence of permutations can be applied in parallel (superpositon), the machine learning task in this model can be implemented more efficiently on quantum computers.
    摘要 Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step. In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time. This lead us to show that when the number of iterations are less than $log(N)$ time, it may be possible to reach the same solution exponentially faster on quantum computers in comparison to classical computers. In addition, since on quantum circuits the different sequence of permutations can be applied in parallel (superposition), the machine learning task in this model can be implemented more efficiently on quantum computers.Here's the text with the original English text and the Simplified Chinese translation side by side for reference:Original English Text:Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. Simplified Chinese Translation:Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. Original English Text:On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step.Simplified Chinese Translation:On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step.Original English Text:In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time.Simplified Chinese Translation:In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time.

Deep Phenotyping of Non-Alcoholic Fatty Liver Disease Patients with Genetic Factors for Insights into the Complex Disease

  • paper_url: http://arxiv.org/abs/2311.08428
  • repo_url: None
  • paper_authors: Tahmina Sultana Priya, Fan Leng, Anthony C. Luehrs, Eric W. Klee, Alina M. Allen, Konstantinos N. Lazaridis, Danfeng, Yao, Shulan Tian
    for:* This study aimed to identify subgroups of Non-alcoholic fatty liver disease (NAFLD) patients based on demographic, clinical, and genetic characteristics for precision medicine.methods:* The study used genomic and phenotypic data from 3,408 NAFLD cases and 4,739 controls, including demographic, clinical, and comorbidity data, and genotype information through whole exome sequencing.* The study used a chi-square test and stepwise backward-forward regression model to determine factors highly relevant to NAFLD, and latent class analysis (LCA) to identify subgroups.results:* The study identified 5 latent subgroups of NAFLD patients, characterized by metabolic syndrome, obesity, different comorbidities, psychoneurological factors, and genetic factors.* Cluster 2 had a significantly higher complex disease outcome compared to other clusters, including fibrosis, cirrhosis, and hepatocellular carcinoma (HCC), as well as liver failure.Here is the information in Simplified Chinese text:for:* 这个研究的目的是通过人群特征和生物 markers 进行精准医学,为NAFLD 患者提供个性化治疗。methods:* 这个研究使用了3408例 NAFLD 患者和4739例控制人群的 genomic 和 fenotypic 数据,包括人群特征、临床特征和相关疾病数据,以及通过整个扩展 sequencing 获得的 genotype 信息。* 研究使用 chi-square 测试和步骤性回推前进回归模型来确定 NAFLD 高度相关的因素,并使用 latent class analysis (LCA) 来确定患者群体。results:* 研究发现了5个 latent 群体,每个群体都具有不同的 мета波性、肥胖、不同的相关疾病、心神内科因素和遗传因素。* 群体2的复杂疾病结果明显高于其他群体,包括 fibrosis、cirrhosis 和肝癌 (HCC) 以及肝功能失调。
    Abstract Non-alcoholic fatty liver disease (NAFLD) is a prevalent chronic liver disorder characterized by the excessive accumulation of fat in the liver in individuals who do not consume significant amounts of alcohol, including risk factors like obesity, insulin resistance, type 2 diabetes, etc. We aim to identify subgroups of NAFLD patients based on demographic, clinical, and genetic characteristics for precision medicine. The genomic and phenotypic data (3,408 cases and 4,739 controls) for this study were gathered from participants in Mayo Clinic Tapestry Study (IRB#19-000001) and their electric health records, including their demographic, clinical, and comorbidity data, and the genotype information through whole exome sequencing performed at Helix using the Exome+$^\circledR$ Assay according to standard procedure (www$.$helix$.$com). Factors highly relevant to NAFLD were determined by the chi-square test and stepwise backward-forward regression model. Latent class analysis (LCA) was performed on NAFLD cases using significant indicator variables to identify subgroups. The optimal clustering revealed 5 latent subgroups from 2,013 NAFLD patients (mean age 60.6 years and 62.1% women), while a polygenic risk score based on 6 single-nucleotide polymorphism (SNP) variants and disease outcomes were used to analyze the subgroups. The groups are characterized by metabolic syndrome, obesity, different comorbidities, psychoneurological factors, and genetic factors. Odds ratios were utilized to compare the risk of complex diseases, such as fibrosis, cirrhosis, and hepatocellular carcinoma (HCC), as well as liver failure between the clusters. Cluster 2 has a significantly higher complex disease outcome compared to other clusters. Keywords: Fatty liver disease; Polygenic risk score; Precision medicine; Deep phenotyping; NAFLD comorbidities; Latent class analysis.
    摘要 非酒精脂肪liver病(NAFLD)是一种常见的慢性肝病,表现为不 consume significant amounts of alcohol 的人肝中聚集过多脂肪,包括风险因素如肥胖、荷尔血症、第二型糖尿病等。我们的目标是通过基因和现象特征来分类NAFLD患者,为精准医学提供优化的治疗方案。这些数据来自Mayo临床研究(IRB#19-000001)和其电子健康纪录,包括参与者的民生、临床和相关疾病数据,以及通过全染色体测序实施的基因信息。经过χ²测试和步骤式回溯前进分析模型,确定了NAFLD高度相关的因素。使用秘密分析法(LCA)对NAFLD患者进行分类,并确定了5个秘密群体。这5个群体被定义为不同的 метаболиic syndrome、肥胖、不同的相关疾病、神经内科因素和遗传因素。通过对每个群体的复杂疾病结果进行比较,发现群体2的复杂疾病结果显著高于其他群体。关键词:脂肪肝病;多单 nucleotide polymorphism(SNP)变种;精准医学;深度现象分析;NAFLD相关疾病;秘密分析法。

Matching aggregate posteriors in the variational autoencoder

  • paper_url: http://arxiv.org/abs/2311.07693
  • repo_url: None
  • paper_authors: Surojit Saha, Sarang Joshi, Ross Whitaker
  • for: 提高VAEs的适用范围和性能,解决VAEs常见的“囊泡”和“后退”问题
  • methods: 基于VAE的理论基础,使用kernel density estimate(KDE)模型高维合计 posterior distribution,提出了aggregate variational autoencoder(AVAE)方法
  • results: 对多个referenced数据集进行实验,与现有方法相比,AVAE方法表现更高效
    Abstract The variational autoencoder (VAE) is a well-studied, deep, latent-variable model (DLVM) that efficiently optimizes the variational lower bound of the log marginal data likelihood and has a strong theoretical foundation. However, the VAE's known failure to match the aggregate posterior often results in \emph{pockets/holes} in the latent distribution (i.e., a failure to match the prior) and/or \emph{posterior collapse}, which is associated with a loss of information in the latent space. This paper addresses these shortcomings in VAEs by reformulating the objective function associated with VAEs in order to match the aggregate/marginal posterior distribution to the prior. We use kernel density estimate (KDE) to model the aggregate posterior in high dimensions. The proposed method is named the \emph{aggregate variational autoencoder} (AVAE) and is built on the theoretical framework of the VAE. Empirical evaluation of the proposed method on multiple benchmark data sets demonstrates the effectiveness of the AVAE relative to state-of-the-art (SOTA) methods.
    摘要 “VAEs是一种已经广泛研究的深度隐变量模型(DLVM),能够有效地优化变量下界和具有强的理论基础。然而,VAEs通常会出现\"囊括/洞\"在幂分布中(即失准备)和/或\"后退\",这与数据信息损失在隐变量空间相关。这篇论文解决了VAEs中这些缺陷,通过修改VAEs的目标函数,使其能够匹配归一化 posterior distribution 和先验分布。我们使用核密度估计(KDE)来模型高维归一化 posterior distribution。我们提出的方法被称为\"归一化变量自动编码器\"(AVAE),基于VAEs的理论基础。我们对多个参考数据集进行了实验评估,并证明了AVAE相比 estado-of-the-art(SOTA)方法更有效。”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard varieties of Chinese. If you prefer Traditional Chinese, I can provide that as well. Additionally, please keep in mind that machine translation can sometimes be imperfect, and the nuances of the original text may be lost in translation.

Feature emergence via margin maximization: case studies in algebraic tasks

  • paper_url: http://arxiv.org/abs/2311.07568
  • repo_url: None
  • paper_authors: Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade
  • for: 本研究探讨了神经网络学习时所学习的内部表示形式,即神经网络如何选择特定的计算策略。
  • methods: 本研究使用了 margin maximization 原理来完全解释神经网络学习的特性。
  • results: 研究发现,神经网络在解决代数学习任务时会使用 fourier 特征来实现模块加法,并使用 irreducible 群论中的表示特征来实现总体组合。这与 Nanda et al. 和 Chughtai et al. 的实验结果吻合得非常 closely。
    Abstract Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
    摘要 理解神经网络学习过程中内部表征的学习是机器学习科学中一个重要挑战。本文探讨了一个相关问题:神经网络何以采用特定计算策略呢?我们的研究集中在代数学习任务上,包括幂加法、稀疏偶数和整数群操作。我们的主要理论发现可以使用边缘margin最大化原则来完全描述神经网络学习到的特征。具体来说,我们证明神经网络在训练过程中使用FOURIER特征来实现幂加法,并使用对应于整数群理论中reducible的表示来执行总体群操作,与实际观察结果相吻合。更一般来说,我们希望我们的技术可以帮助更深入地理解神经网络采用特定计算策略的原因。

Exploration via linearly perturbed loss minimisation

  • paper_url: http://arxiv.org/abs/2311.07565
  • repo_url: https://github.com/davidjanz/evill-code
  • paper_authors: David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári
  • for: 这个论文是为了解决结构化随机带强问题中的探索问题。
  • methods: 这篇论文提出了一种随机探索方法,即解决一个线性偏移的减少 log-likelihood 函数的最小值。在总体化线性带中,这种方法降到 perturbed history exploration(PHE)。
  • results: 论文表明,使用我们提出的数据依赖的偏移,EVILL可以与参数偏移方法匹配性能,并且在理论和实践中都有good表现。此外,论文还提供了一个外部 generalised linear bandits 中 PHE 会导致不稳定估计,而 EVILL 仍然表现良好的示例。
    Abstract We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood function. We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards. In doing so, we provide a simple and clean explanation of when and why random reward perturbations give rise to good bandit algorithms. With the data-dependent perturbations we propose, not present in previous PHE-type methods, EVILL is shown to match the performance of Thompson-sampling-style parameter-perturbation methods, both in theory and in practice. Moreover, we show an example outside of generalised linear bandits where PHE leads to inconsistent estimates, and thus linear regret, while EVILL remains performant. Like PHE, EVILL can be implemented in just a few lines of code.
    摘要 我们介绍了探索via线性损失偏移(EVILL),一种随机探索方法 для结构化随机抽象问题,它通过解决一个线性偏移后的减少正负扩展函数的最小值。我们显示,在通用化线性抽象问题下,EVILL降到了受扰的历史探索(PHE),一种通过训练在随机偏移奖励上进行探索。在这之中,我们提供了一个简单清晰的解释,当和为什么随机奖励偏移会导致好的抽象问题Algorithm。我们还提出了一种使用我们所提出的数据依赖的偏移,不存在在前一些PHE-型方法中的偏移,EVILL可以与参数偏移方法匹配表现, both in theory and in practice。此外,我们显示了一个外部普通化线性抽象问题中,PHE会导致不一致的估计,从而导致线性 regret,而EVILL则保持高效。与PHE一样,EVILL可以在只需几行程式码中实现。

Learning Control Policies of Hodgkin-Huxley Neuronal Dynamics

  • paper_url: http://arxiv.org/abs/2311.07563
  • repo_url: None
  • paper_authors: Malvern Madondo, Deepanshu Verma, Lars Ruthotto, Nicholas Au Yong
  • for: 这个论文的目的是开发一种基于神经网络的深脑刺激(DBS)closed-loop控制方法,以优化治疗效果。
  • methods: 该方法使用了一种控制策略,通过在实时基于患者的神经活动的参数调整DBS系统,以实现在线调整DBS系统的控制策略。
  • results: 该方法的实验结果显示,可以准确地预测患者的神经活动,并在不同的输入和输出参数下进行优化调整,以提高治疗效果。
    Abstract We present a neural network approach for closed-loop deep brain stimulation (DBS). We cast the problem of finding an optimal neurostimulation strategy as a control problem. In this setting, control policies aim to optimize therapeutic outcomes by tailoring the parameters of a DBS system, typically via electrical stimulation, in real time based on the patient's ongoing neuronal activity. We approximate the value function offline using a neural network to enable generating controls (stimuli) in real time via the feedback form. The neuronal activity is characterized by a nonlinear, stiff system of differential equations as dictated by the Hodgkin-Huxley model. Our training process leverages the relationship between Pontryagin's maximum principle and Hamilton-Jacobi-Bellman equations to update the value function estimates simultaneously. Our numerical experiments illustrate the accuracy of our approach for out-of-distribution samples and the robustness to moderate shocks and disturbances in the system.
    摘要 我们提出了一种神经网络方法用于关闭式深脑刺激(DBS)。我们将问题找到优化神经刺激策略转化为控制问题。在这种设定下,控制策略 aim to 优化治疗结果,通过在患者的进行实时电抗应用的DBS系统中调整参数,根据患者的持续神经活动。我们使用神经网络在线预测值函数,以便在实时通过反馈形式生成控制(刺激)。神经活动被描述为非线性、硬系统的差分方程,由韦德-休克利模型确定。我们的训练过程利用普通拉格曼最大原理和汉密尔-雅各布-贝尔曼方程来同时更新估计值函数。我们的数值实验表明我们的方法对于不同样本和系统强大扰动的稳定性和精度具有高度的准确性和稳定性。

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.07558
  • repo_url: None
  • paper_authors: Arjun Bhardwaj, Jonas Rothfuss, Bhavya Sukhija, Yarden As, Marco Hutter, Stelian Coros, Andreas Krause
  • for: 这个研究设计了一个名为 PACOH-RL 的模型基于 Meta-循环学习(Meta-RL)算法,用于快速适应不断变化的动力学。
  • methods: PACOH-RL 使用了热点热点热点(PACOH)来学习动力学模型的偏好,以便快速适应新的动力学情况。此外,它还包括了调整和知识不确定量化,以便在适应新情况时更好地调整探索和数据收集。
  • results: 实验结果显示,PACOH-RL 比模型基于 RL 和模型基于 Meta-RL 的基eline表现更好,能够快速适应新的动力学情况。此外,在一个真实的机械车上,我们还证明了 PACOH-RL 可以在没有充足的数据情况下进行高效的RL策略适应。
    Abstract We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.
    摘要 我们介绍PACOH-RL,一种新的模型基于Meta-循环学习(Meta-RL)算法,用于快速适应更改的动力学。PACOH-RL在元学习阶段学习动力学模型的假设,以便快速适应新的动力学,只需要最小化互动数据。现有的Meta-RL方法需要充足的元学习数据,限制了它们在机器人等设置中的应用。为解决这个问题,PACOH-RL将在元学习和任务适应阶段中添加了调整和知识不确定量化。当面对新的动力学时,我们使用这些不确定度估计来有效地导引探索和数据收集。这使得PACOH-RL能够在仅有限的数据情况下进行有益的RL政策适应。我们的实验结果显示,PACOH-RL在适应新的动力学条件时表现出色,比model-based RL和model-based Meta-RL基eline更好。最后,我们在一辆真实的机器人车上展示了PACOH-RL在多元、数据缺乏的情况下的实际应用潜力。

Tabdoor: Backdoor Vulnerabilities in Transformer-based Neural Networks for Tabular Data

  • paper_url: http://arxiv.org/abs/2311.07550
  • repo_url: None
  • paper_authors: Bart Pleiter, Behrad Tajalli, Stefanos Koffas, Gorka Abad, Jing Xu, Martha Larson, Stjepan Picek
  • for: 研究攻击和 защищать深度神经网络(DNN)在表格数据上的攻击性质。
  • methods: 使用transformer模型对表格数据进行深度学习,并对其进行系统性的实验 исследование。
  • results: 发现transformer模型对表格数据的攻击性质强,可以通过 minimal feature value alterations 实现nearly perfect attack success rates(约100%)。另外, Spectral Signatures 被证明是最有效的防御策略。
    Abstract Deep neural networks (DNNs) have shown great promise in various domains. Alongside these developments, vulnerabilities associated with DNN training, such as backdoor attacks, are a significant concern. These attacks involve the subtle insertion of triggers during model training, allowing for manipulated predictions. More recently, DNNs for tabular data have gained increasing attention due to the rise of transformer models. Our research presents a comprehensive analysis of backdoor attacks on tabular data using DNNs, particularly focusing on transformer-based networks. Given the inherent complexities of tabular data, we explore the challenges of embedding backdoors. Through systematic experimentation across benchmark datasets, we uncover that transformer-based DNNs for tabular data are highly susceptible to backdoor attacks, even with minimal feature value alterations. Our results indicate nearly perfect attack success rates (approx100%) by introducing novel backdoor attack strategies to tabular data. Furthermore, we evaluate several defenses against these attacks, identifying Spectral Signatures as the most effective one. Our findings highlight the urgency to address such vulnerabilities and provide insights into potential countermeasures for securing DNN models against backdoors on tabular data.
    摘要 深度神经网络(DNN)在不同领域都显示出了很大的承诺。同时,DNN训练中的漏洞,如后门攻击,也成为了一个重要的问题。这些攻击通过在模型训练过程中抽象插入特征,以致于模型的预测被操纵。在最近,使用表格数据的DNN受到了越来越多的关注,特别是由于 transformer 模型的出现。 我们的研究对 tabular 数据使用 DNN 进行了全面的后门攻击分析,特别是对 transformer 基于的网络进行了研究。由于表格数据的内在复杂性,我们探讨了后门攻击的挑战。通过对标准 benchmark 数据集进行系统实验,我们发现了 transformer 基于的 DNN 对 tabular 数据的后门攻击非常易受,即使特征值变化非常小。我们的结果表明,通过引入新的后门攻击策略,可以在 tabular 数据上达到 nearly 100% 的攻击成功率。此外,我们评估了多种防御策略,并发现 spectral signatures 是最有效的一种。我们的发现强调了需要解决这类漏洞,并提供了可能的对策方法来保护 DNN 模型免受 tabular 数据上的后门攻击。

Interpretable Fine-Tuning for Graph Neural Network Surrogate Models

  • paper_url: http://arxiv.org/abs/2311.07548
  • repo_url: None
  • paper_authors: Shivam Barwey, Romit Maulik
  • for: 本研究的目标是提出一种可解释的精度调整策略,用于提高基于图 neural network (GNN) 的无结构域流体动力学模型的预测能力。
  • methods: 该策略基于一种可变的子图采样策略,可以在前向传播中随机选择与预测任务直接相关的物理空间区域,并将这些区域作为基于输入的可读性函数进行表示。
  • results: 通过对一种基于 GNN 的基线模型进行可解释的精度调整,研究人员可以获得一个具有可读性函数的 fine-tuned GNN,该函数可以在预测过程中标识与预测任务直接相关的物理空间区域。此外,通过一种Regularization程序, fine-tuned GNN 还可以在推理过程中标识大多数预测错误的图节点,从而为基eline模型增加一种新的可解释的错误标记功能。
    Abstract Data-based surrogate modeling has surged in capability in recent years with the emergence of graph neural networks (GNNs), which can operate directly on mesh-based representations of data. The goal of this work is to introduce an interpretable fine-tuning strategy for GNNs, with application to unstructured mesh-based fluid dynamics modeling. The end result is a fine-tuned GNN that adds interpretability to a pre-trained baseline GNN through an adaptive sub-graph sampling strategy that isolates regions in physical space intrinsically linked to the forecasting task, while retaining the predictive capability of the baseline. The structures identified by the fine-tuned GNNs, which are adaptively produced in the forward pass as explicit functions of the input, serve as an accessible link between the baseline model architecture, the optimization goal, and known problem-specific physics. Additionally, through a regularization procedure, the fine-tuned GNNs can also be used to identify, during inference, graph nodes that correspond to a majority of the anticipated forecasting error, adding a novel interpretable error-tagging capability to baseline models. Demonstrations are performed using unstructured flow data sourced from flow over a backward-facing step at high Reynolds numbers.
    摘要 “数据基于的代理模型在近年来有了很大的进步,尤其是图 neck 网络(GNN),可以直接操作在数据表示中的碰撞网格。本工作的目标是介绍一种可解释的细化策略 для GNN,并应用于无结构的碰撞网格基础流动模型。结果是一种可解释的 GNN,通过适应性的子图抽样策略,隔离物理空间中与预测任务直接相关的区域,保留基础模型的预测能力。由 fine-tuned GNN 生成的结构,在前向传播中为 explicit 函数而生成的,作为基础模型架构、优化目标和已知问题特有物理的可访问的链接。此外,通过规则化程序, fine-tuned GNN 还可以在推理过程中标识出大多数预测错误的图节点,添加了基础模型中的一种新的可解释错误标记功能。示例通过来源于高 Reynolds 数的逆推流动数据进行演示。”

mlscorecheck: Testing the consistency of reported performance scores and experiments in machine learning

  • paper_url: http://arxiv.org/abs/2311.07541
  • repo_url: None
  • paper_authors: György Kovács, Attila Fazekas
  • for: validate reported experimental results in artificial intelligence
  • methods: numerical techniques for identifying inconsistencies in machine learning problems
  • results: developed an open-source package (mlscorecheck) with specific test bundles to detect systematically recurring flaws in various fields
    Abstract Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.
    摘要 Translated into Simplified Chinese: Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a difficult task. It requires either reimplementing techniques or carefully assessing papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques that can identify inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China and Singapore. Traditional Chinese is also widely used in Taiwan, Hong Kong, and other parts of the world.

Estimating optical vegetation indices with Sentinel-1 SAR data and AutoML

  • paper_url: http://arxiv.org/abs/2311.07537
  • repo_url: None
  • paper_authors: Daniel Paluba, Bertrand Le Saux, Francesco Sarti, Přemysl Stych
  • for: 这个研究的目的是将Synthetic Aperture Radar(SAR)数据用于取代光学数据,并且为森林监控系统提供更好的时间分辨率和空间分辨率。
  • methods: 本研究使用了Google Earth Engine(GEE)创建了一个多标准和多模式的数据集,包括时间和空间对称的Sentinel-1、Sentinel-2、地形高程数据(DEM)、天气和土地类型数据(MMT-GEE)。此外,还使用了DEM和天气数据生成的辅助特征来提高结果。
  • results: 研究结果显示,使用AutoML方法可以超过Random Forest Regression的性能,并且在三个中的四个维度上得到了69-84%的R2低误(0.05-0.32的MAE,对应的VI的低误)。此外,选择的案例研究显示,SAR-based VI可以实现更好的时间分辨率和空间分辩率,并且可以探测森林发生的突然变化。
    Abstract Current optical vegetation indices (VIs) for monitoring forest ecosystems are widely used in various applications. However, continuous monitoring based on optical satellite data can be hampered by atmospheric effects such as clouds. On the contrary, synthetic aperture radar (SAR) data can offer insightful and systematic forest monitoring with complete time series due to signal penetration through clouds and day and night acquisitions. The goal of this work is to overcome the issues affecting optical data with SAR data and serve as a substitute for estimating optical VIs for forests using machine learning. Time series of four VIs (LAI, FAPAR, EVI and NDVI) were estimated using multitemporal Sentinel-1 SAR and ancillary data. This was enabled by creating a paired multi-temporal and multi-modal dataset in Google Earth Engine (GEE), including temporally and spatially aligned Sentinel-1, Sentinel-2, digital elevation model (DEM), weather and land cover datasets (MMT-GEE). The use of ancillary features generated from DEM and weather data improved the results. The open-source Automatic Machine Learning (AutoML) approach, auto-sklearn, outperformed Random Forest Regression for three out of four VIs, while a 1-hour optimization length was enough to achieve sufficient results with an R2 of 69-84% low errors (0.05-0.32 of MAE depending on VI). Great agreement was also found for selected case studies in the time series analysis and in the spatial comparison between the original and estimated SAR-based VIs. In general, compared to VIs from currently freely available optical satellite data and available global VI products, a better temporal resolution (up to 240 measurements/year) and a better spatial resolution (20 m) were achieved using estimated SAR-based VIs. A great advantage of the SAR-based VI is the ability to detect abrupt forest changes with a sub-weekly temporal accuracy.
    摘要 现有的光学植被指数(VI)在监测森林生态系统方面广泛使用,但是不断监测基于光学卫星数据可能受到大气效应的干扰,如云层。然而,Synthetic Aperture Radar(SAR)数据可以提供系统性的森林监测,并且可以在完整的时间序列中提供完整的数据,因为信号可以通过云层和日夜耦合。这个工作的目标是使用SAR数据来解决光学数据中的问题,并作为估算光学VI的替代方案。使用多个时间的Sentinel-1 SAR和辅助数据,时间序列中的四个VI(LAI、FAPAR、EVI和NDVI)的估算被实现。这被启动了一个在Google Earth Engine(GEE)上的配对多时间多模式 dataset(MMT-GEE),包括时间和空间对齐的Sentinel-1、Sentinel-2、数字高程模型(DEM)、天气和土地数据(MMT-GEE)。使用来自DEM和天气数据生成的辅助特征提高了结果。使用自动机器学习(AutoML)方法auto-sklearn,Random Forest Regression的性能被超越,而1小时优化长度已经足够以达到足够的结果,R2值在69-84%之间,低错率(0.05-0.32的MAE,具体取决于VI)。选择的 caso studies 表明,在时间序列分析和空间比较中,选择了原始 SAR-based VI 的准确性是非常高的。总之,使用估算的 SAR-based VI 可以获得更高的时间分辨率(最多 240 次/年)和更高的空间分辨率(20 m),并且可以快速响应森林的快速变化。这是现有的光学卫星数据和全球 VI 产品中的一大优势。

Unsupervised Musical Object Discovery from Audio

  • paper_url: http://arxiv.org/abs/2311.07534
  • repo_url: https://github.com/arahosu/musicslots
  • paper_authors: Joonsu Gha, Vincent Herrmann, Benjamin Grewe, Jürgen Schmidhuber, Anand Gopalakrishnan
  • for: 这篇论文是为了解决音乐频谱中对象的分解问题而写的。
  • methods: 这篇论文使用了 modifying SlotAttention 模型来实现不supervised music decomposition。
  • results: 研究得出了好的性能在不监督的音符发现任务和监督的音符属性预测任务上。
    Abstract Current object-centric learning models such as the popular SlotAttention architecture allow for unsupervised visual scene decomposition. Our novel MusicSlots method adapts SlotAttention to the audio domain, to achieve unsupervised music decomposition. Since concepts of opacity and occlusion in vision have no auditory analogues, the softmax normalization of alpha masks in the decoders of visual object-centric models is not well-suited for decomposing audio objects. MusicSlots overcomes this problem. We introduce a spectrogram-based multi-object music dataset tailored to evaluate object-centric learning on western tonal music. MusicSlots achieves good performance on unsupervised note discovery and outperforms several established baselines on supervised note property prediction tasks.
    摘要 当前的对象中心学习模型,如受欢迎的槽注意模型,允许无监督视觉场景分解。我们的MusicSlots方法将槽注意模型应用到音频频谱中,以实现无监督音乐分解。由于视觉中的涂抹和遮盖没有相应的听觉对应物,视觉中的softmax正则化α面积的decoder无法适应音频对象的分解。MusicSlots解决了这个问题。我们介绍了基于spectrogram的多对象音乐数据集,用于评估对西方抽象音乐的对象中心学习。MusicSlots在无监督音符发现任务上达到了良好的性能,并在指定音符属性预测任务上超越了一些确立的基准点。

Automatic Identification of Driving Maneuver Patterns using a Robust Hidden Semi-Markov Models

  • paper_url: http://arxiv.org/abs/2311.07527
  • repo_url: None
  • paper_authors: Matthew Aguirre, Wenbo Sun, Jionghua, Jin, Yang Chen
  • for: 模型自动化驾驶动作模式,通常用于交通研究领域如减速驾驶、路面安全和智能汽车。
  • methods: 使用 Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM) 模型来自动归类天然的顺序动力数据,并且可以估计数据分割、状态持续时间和转移概率。
  • results: 提出了一种新的可靠 HDP-HSMM (rHDP-HSMM) 方法,可以减少重复的状态数量,提高模型估计的一致性。采用 simulate 试验和实际驾驶数据示例,证明提议的 rHDP-HSMM 能够更好地识别和推断驾驶动作模式。
    Abstract There is an increase in interest to model driving maneuver patterns via the automatic unsupervised clustering of naturalistic sequential kinematic driving data. The patterns learned are often used in transportation research areas such as eco-driving, road safety, and intelligent vehicles. One such model capable of modeling these patterns is the Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM), as it is often used to estimate data segmentation, state duration, and transition probabilities. While this model is a powerful tool for automatically clustering observed sequential data, the existing HDP-HSMM estimation suffers from an inherent tendency to overestimate the number of states. This can result in poor estimation, which can potentially impact impact transportation research through incorrect inference of driving patterns. In this paper, a new robust HDP-HSMM (rHDP-HSMM) method is proposed to reduce the number of redundant states and improve the consistency of the model's estimation. Both a simulation study and a case study using naturalistic driving data are presented to demonstrate the effectiveness of the proposed rHDP-HSMM in identifying and inference of driving maneuver patterns.
    摘要 “there is an increasing interest in using automatic unsupervised clustering of naturalistic sequential kinematic driving data to model driving maneuver patterns. these patterns are often used in transportation research areas such as eco-driving, road safety, and intelligent vehicles. one such model capable of modeling these patterns is the hierarchical dirichlet process hidden semi-markov model (HDP-HSMM), as it is often used to estimate data segmentation, state duration, and transition probabilities. however, the existing HDP-HSMM estimation suffers from an inherent tendency to overestimate the number of states, which can result in poor estimation and potentially impact transportation research through incorrect inference of driving patterns. in this paper, a new robust HDP-HSMM (rHDP-HSMM) method is proposed to reduce the number of redundant states and improve the consistency of the model's estimation. both a simulation study and a case study using naturalistic driving data are presented to demonstrate the effectiveness of the proposed rHDP-HSMM in identifying and inference of driving maneuver patterns.”Here's the text with the Chinese characters:“有增长的兴趣在使用自动无监督数据分 clustering来模型驾驶动作模式。这些模式通常用于交通研究领域,如可持续驾驶、路面安全和智能车辆。一个可以模型这些模式的模型是几何 Dirichlet 过程隐藏Markov 模型 (HDP-HSMM),它通常用于估计数据分 segmentation、状态持续时间和转换 probabilities。但是现有的 HDP-HSMM 估计受到了自然的倾向,即过度估计状态的数量,这可能会导致估计不单纯、不精确,并可能对交通研究造成错误的推论。在本文中,一个新的可靠 HDP-HSMM (rHDP-HSMM) 方法被提出,以减少状态的重复性并改善模型的估计稳定性。在一个实验研究和一个使用自然驾驶数据的实例研究中,显示了提案的 rHDP-HSMM 能够有效地识别和推断驾驶动作模式。”

Machine Learning For Beamline Steering

  • paper_url: http://arxiv.org/abs/2311.07519
  • repo_url: None
  • paper_authors: Isaac Kante
  • for: 这篇论文是为了解决加速器中的电子束轴向准确性问题,以提高光源的科学吞吐量。
  • methods: 该论文使用深度学习模型来帮助准确调整加速器中的磁agnets,从而降低人工操作时间和努力。
  • results: 该论文通过对储存数据和 simulate 数据进行训练,并与人工操作员进行比较,发现深度学习模型可以帮助提高加速器中的准确性和科学吞吐量。
    Abstract Beam steering is the process involving the calibration of the angle and position at which a particle accelerator's electron beam is incident upon the x-ray target with respect to the rotation axis of the collimator. Beam Steering is an essential task for light sources. In the case under study, the LINAC To Undulator (LTU) section of the beamline is difficult to aim. Each use of the accelerator requires re-calibration of the magnets in this section. This involves a substantial amount of time and effort from human operators, while reducing scientific throughput of the light source. We investigate the use of deep neural networks to assist in this task. The deep learning models are trained on archival data and then validated on simulation data. The performance of the deep learning model is contrasted against that of trained human operators.
    摘要 电子束扫描是指在加速器中电子束与精密测量的射线目标之间的角度和位置准确调整,这个过程对光源来说非常重要。在这个案例中,加速器的LINAC到Undulator(LTU)部分很难调整。每次使用加速器都需要重新调整磁铁,这需要人工操作者投入大量时间和努力,同时减少了光源的科学生产率。我们研究使用深度学习模型来帮助进行这个任务。我们在archiv数据上训练了深度学习模型,然后在模拟数据上验证了其性能,并与人工操作员进行了比较。

FEMDA: a unified framework for discriminant analysis

  • paper_url: http://arxiv.org/abs/2311.07518
  • repo_url: None
  • paper_authors: Pierre Houdouin, Matthieu Jonckheere, Frederic Pascal
  • For: The paper aims to address the limitations of classical methods such as linear and quadratic discriminant analysis when dealing with non-Gaussian distributions or contaminated datasets.* Methods: The paper presents a novel approach that uses an arbitrary Elliptically Symmetrical (ES) distribution per cluster with its own arbitrary scale parameter, allowing for potentially diverse and independent samples that may not follow identical distributions.* Results: The paper demonstrates that the new approach is simple, efficient, and robust compared to state-of-the-art methods, and that maximum-likelihood parameter estimation and classification can be easily derived.
    Abstract Although linear and quadratic discriminant analysis are widely recognized classical methods, they can encounter significant challenges when dealing with non-Gaussian distributions or contaminated datasets. This is primarily due to their reliance on the Gaussian assumption, which lacks robustness. We first explain and review the classical methods to address this limitation and then present a novel approach that overcomes these issues. In this new approach, the model considered is an arbitrary Elliptically Symmetrical (ES) distribution per cluster with its own arbitrary scale parameter. This flexible model allows for potentially diverse and independent samples that may not follow identical distributions. By deriving a new decision rule, we demonstrate that maximum-likelihood parameter estimation and classification are simple, efficient, and robust compared to state-of-the-art methods.
    摘要 Translated into Simplified Chinese:尽管线性和 quadratic discriminant analysis 是广泛认可的古典方法,但它们在面临非泊松分布或杂凑数据集时可能遇到 significiant 挑战。这主要是因为它们假设 Gaussian,这种假设缺乏稳定性。我们首先介绍和评论古典方法,然后提出一种新的方法,该方法可以在不同和独立的样本集中实现。在这种新方法中,每个分支 Considered 是一个自由的 Elliptically Symmetrical (ES) 分布,具有自己的自由拟合参数。这种灵活的模型允许样本可能不是完全相同的分布。我们 derivation 了一个新的决策规则,并证明了 maximum-likelihood 参数估计和分类是简单、高效、Robust 的 compared 于现有方法。

A Hypothesis on Good Practices for AI-based Systems for Financial Time Series Forecasting: Towards Domain-Driven XAI Methods

  • paper_url: http://arxiv.org/abs/2311.07513
  • repo_url: None
  • paper_authors: Branka Hadji Misheva, Joerg Osterrieder
  • for: 这篇论文主要是为了探讨在金融预测和预测任务中如何使用可解释的人工智能(XAI)方法,以提高客户体验、民主化金融服务、改善消费者保护和提高风险管理。
  • methods: 论文使用了经典的XAI方法,如LIME和SHAP,以及其他相关的技术来提供模型的解释。
  • results: 论文认为,在金融业中使用XAI方法可以更好地理解和满足用户的需求,但是现有的XAI方法也存在一些局限性,如计算复杂度、模型偏见、数据采样的敏感性和特性数据处理的挑战。
    Abstract Machine learning and deep learning have become increasingly prevalent in financial prediction and forecasting tasks, offering advantages such as enhanced customer experience, democratising financial services, improving consumer protection, and enhancing risk management. However, these complex models often lack transparency and interpretability, making them challenging to use in sensitive domains like finance. This has led to the rise of eXplainable Artificial Intelligence (XAI) methods aimed at creating models that are easily understood by humans. Classical XAI methods, such as LIME and SHAP, have been developed to provide explanations for complex models. While these methods have made significant contributions, they also have limitations, including computational complexity, inherent model bias, sensitivity to data sampling, and challenges in dealing with feature dependence. In this context, this paper explores good practices for deploying explainability in AI-based systems for finance, emphasising the importance of data quality, audience-specific methods, consideration of data properties, and the stability of explanations. These practices aim to address the unique challenges and requirements of the financial industry and guide the development of effective XAI tools.
    摘要

Machine learning for uncertainty estimation in fusing precipitation observations from satellites and ground-based gauges

  • paper_url: http://arxiv.org/abs/2311.07511
  • repo_url: None
  • paper_authors: Georgia Papacharalampous, Hristos Tyralis, Nikolaos Doulamis, Anastasios Doulamis
    for: 这项研究的目的是提供一种可靠的降水数据集,同时具有高的空间密度,通过将卫星数据和测站数据合并。methods: 这项研究使用了6种适合预测 uncertainty 量化的学习器,分别是quantile regression(QR)、quantile regression forest(QRF)、generalized random forests(GRF)、gradient boosting machines(GBM)、light gradient boosting machines(LightGBM)和quantile regression neural networks(QRNN)。results: 结果显示,LightGBM、QRF和GRF是最佳的学习器,其预测量iles的能力最高,而QRNN和QR最差。此外,这些学习器之间的主要区别在于feature importance的实现方式。
    Abstract To form precipitation datasets that are accurate and, at the same time, have high spatial densities, data from satellites and gauges are often merged in the literature. However, uncertainty estimates for the data acquired in this manner are scarcely provided, although the importance of uncertainty quantification in predictive modelling is widely recognized. Furthermore, the benefits that machine learning can bring to the task of providing such estimates have not been broadly realized and properly explored through benchmark experiments. The present study aims at filling in this specific gap by conducting the first benchmark tests on the topic. On a large dataset that comprises 15-year-long monthly data spanning across the contiguous United States, we extensively compared six learners that are, by their construction, appropriate for predictive uncertainty quantification. These are the quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM) and quantile regression neural networks (QRNN). The comparison referred to the competence of the learners in issuing predictive quantiles at nine levels that facilitate a good approximation of the entire predictive probability distribution, and was primarily based on the quantile and continuous ranked probability skill scores. Three types of predictor variables (i.e., satellite precipitation variables, distances between a point of interest and satellite grid points, and elevation at a point of interest) were used in the comparison and were additionally compared with each other. This additional comparison was based on the explainable machine learning concept of feature importance. The results suggest that the order from the best to the worst of the learners for the task investigated is the following: LightGBM, QRF, GRF, GBM, QRNN and QR...
    摘要 通常,通过卫星和测点数据的合并来形成减霾数据集,但是这些数据的不确定性估计 rarely 提供。 although the importance of uncertainty quantification in predictive modeling is widely recognized. 在这种情况下,本研究的目的是填充这个具体的空白,通过对15年的月度数据, covering the contiguous United States,进行了首次 benchmark 测试。我们对6种适用于预测uncertainty quantification的学习器进行了广泛的比较。这些学习器包括:量词回归(QR)、量词回归森林(QRF)、普通随机森林(GRF)、梯度提升机器(GBM)、轻量级梯度提升机器(LightGBM)和量词回归神经网络(QRNN)。 comparison 基于 nine 级预测量误差,以便 aproximate 预测概率分布的整个范围。此外,我们还对不同的预测变量(卫星降水变量、测点与卫星网格点之间的距离、测点 elevation)进行了比较,基于解释机器学习概念的特征重要性。 results 表明,这些学习器的排名如下:LightGBM、QRF、GRF、GBM、QRNN 和 QR。

Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

  • paper_url: http://arxiv.org/abs/2311.07510
  • repo_url: None
  • paper_authors: Jake Ryland Williams, Haoran Zhao
  • for: 这个论文的目的是提出一种高效的神经网络优化方法,以降低神经网络的计算成本,特别是在大规模使用时。
  • methods: 这个论文使用迭代近似法和反射层来优化神经网络,并提出了一种基于feed-forward神经网络的通用结果。
  • results: 测试结果表明,使用Explicit Solutions可以取得更好的优化结果,而且在使用反射层后,Explicit Solutions可以从更小的数据量中获得更好的优化结果。此外,这个论文还进行了一系列的ablation experiment,发现一些不同的体系结构可以生成高性能的模型,并且这些模型可以在更少的数据量上训练。
    Abstract Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive, especially when used at scale. This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications. We will discuss a general result about feed-forward neural networks and then extend this solution to compositional (mult-layer) networks, which are applied to a simplified transformer block containing feed-forward and self-attention layers. These models are used to train highly-specified and complex multi-layer neural architectures that we refer to as self-attentive feed-forward unit (SAFFU) layers, which we use to develop a transformer that appears to generalize well over small, cognitively-feasible, volumes of data. Testing demonstrates explicit solutions outperform models optimized by backpropagation alone. Moreover, further application of backpropagation after explicit solutions leads to better optima from smaller scales of data, training effective models from much less data is enabled by explicit solution warm starts. We then carry out ablation experiments training a roadmap of about 250 transformer models over 1-million tokens to determine ideal settings. We find that multiple different architectural variants produce highly-performant models, and discover from this ablation that some of the best are not the most parameterized. This appears to indicate well-generalized models could be reached using less data by using explicit solutions, and that architectural exploration using explicit solutions pays dividends in guiding the search for efficient variants with fewer parameters, and which could be incorporated into low-resource hardware where AI might be embodied.
    摘要 iterative approximation方法使用反射传播可以优化神经网络,但它们仍然具有计算成本,特别是在大规模使用时。这篇文章提出了一种高效的神经网络优化方法,可以降低神经网络缩放时的计算成本,并为低资源应用提供高效优化。我们将讨论一个通用的Feed-Forward神经网络的结果,然后扩展到多层神经网络,并应用于简化后Transformer块中的Feed-Forward和自注意层。这些模型用于训练复杂多层神经架构,我们称之为自注意Feed-Forward单元(SAFFU)层。我们使用这些层来开发一个Transformer模型,该模型在小量数据上Generalization良好。测试表明显式解决方案可以超越backpropagation alone的优化。此外,通过backpropagation和显式解决方案的组合,可以从小规模数据中获得更好的优化。这些方法可以训练效果很好的模型,从更少的数据中训练模型。我们然后进行了ablation experiment,训练约250个Transformer模型,并测试其在100万个字节上的性能。我们发现多种不同的建筑学variant可以生成高性能模型,并发现这些variant中的一些最好的模型并不是最大化参数的。这表明可以使用显式解决方案来找到更好的模型,并且使用这些方法可以在低资源硬件上搬运AI。

STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup

  • paper_url: http://arxiv.org/abs/2311.07504
  • repo_url: None
  • paper_authors: Yumnah Hasan, Fatemeh Amerehi, Patrick Healy, Conor Ryan
  • for: 该论文targets imbalanced medical imaging datasets, particularly breast cancer datasets, and aims to improve the performance of machine learning classifiers on these datasets.
  • methods: 该论文提出了一种新的 Vicinal Distribution Augmentation(Mixup)方法,combines SMOTE-ENN和Mixup在实例层次进行结合,以利用整个少数类分布,thereby mitigating both between-class and within-class imbalances.
  • results: 该论文在Digital Database for Screening Mammography和Wisconsin Breast Cancer(Diagnostics) datasets中 achieved AUC values of 0.96和0.99,respectively, demonstrating the effectiveness of STEM in improving the performance of machine learning classifiers on imbalanced medical imaging datasets.
    Abstract Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases. When trained using such data, models tend to assign higher probabilities to normal cases, leading to biased performance. Common oversampling techniques such as SMOTE rely on local information and can introduce marginalization issues. This paper investigates the potential of using Mixup augmentation that combines two training examples along with their corresponding labels to generate new data points as a generic vicinal distribution. To this end, we propose STEM, which combines SMOTE-ENN and Mixup at the instance level. This integration enables us to effectively leverage the entire distribution of minority classes, thereby mitigating both between-class and within-class imbalances. We focus on the breast cancer problem, where imbalanced datasets are prevalent. The results demonstrate the effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the Digital Database for Screening Mammography and Wisconsin Breast Cancer (Diagnostics) datasets, respectively. Moreover, this method shows promising potential when applied with an ensemble of machine learning (ML) classifiers.
    摘要 医学影像数据集偏度问题常被定义为类别分布不均衡和罕见病例的缺乏。当使用这些数据进行训练时,模型往往偏好正常情况,导致表现偏移。常见的扩大技术,如SMOTE,基于地方信息,可能会导致边缘化问题。本文研究了使用混合增强的潜在利点,通过将两个训练示例和其相应的标签拼接起来生成新的数据点,以实现一个通用的邻近分布。为此,我们提出了STEM,它将SMOTE-ENN和混合拼接在实例层次结合。这种整合使得我们可以有效利用少数类别的整个分布,从而解决 между类和内类别不均衡。我们将精力集中在乳腺癌问题上,这里的数据集很常见偏度。结果表明,STEM具有抗偏衡能力,在数字图像creening Mamмографи和乳腺癌(诊断)数据集上分别 achieve AUC值0.96和0.99。此外,这种方法在 ensemble 机器学习(ML)类ifier上表现了扎实的潜在性。

Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks

  • paper_url: http://arxiv.org/abs/2311.07498
  • repo_url: None
  • paper_authors: Jake Ryland Williams, Haoran Zhao
  • for: 这篇论文旨在提出一种可以实现对神经网络的优化,并且可以降低训练神经网络的Computational Expensive。
  • methods: 作者使用了Iterative differential approximation方法,并且通过分析Gradient的 mathematician来derive一个explicit solution дляfeed-forward language model(LM)和MNIST digit classification。
  • results: 作者发现,这个explicit solution可以实现near-optimality,并且可以降低iterative optimization的computational cost。此外,作者还发现,这个solution可以在多层神经网络中实现更好的optima,并且可以提高模型的解释性。
    Abstract Iterative differential approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures.
    摘要 iterative diferencial approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures.

A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals

  • paper_url: http://arxiv.org/abs/2311.07474
  • repo_url: None
  • paper_authors: Madi Arabi, Xiaolei Fang
  • for: 这篇论文旨在提出一种联合多个用户的联邦预测模型,以便在多元数据和尚未完整的情况下预测机器的故障时间。
  • methods: 本论文使用多元功能原始对角测度分析融合多条流损坏讯号,然后使用融合的特征建立(对数)-位置-标准 regresion 模型进行故障预测。
  • results: 数据分析显示,提案的模型性能与非联邦预测模型相同,并且比各用户自己建立的模型更好。
    Abstract Most prognostic methods require a decent amount of data for model training. In reality, however, the amount of historical data owned by a single organization might be small or not large enough to train a reliable prognostic model. To address this challenge, this article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model using their multi-stream, high-dimensional, and incomplete data while keeping each user's data local and confidential. The prognostic model first employs multivariate functional principal component analysis to fuse the multi-stream degradation signals. Then, the fused features coupled with the times-to-failure are utilized to build a (log)-location-scale regression model for failure prediction. To estimate parameters using distributed datasets and keep the data privacy of all participants, we propose a new federated algorithm for feature extraction. Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models and is better than that of the models constructed by each user itself.
    摘要 大多数预测方法需要一定量的数据进行模型训练。然而,在现实中,一个组织可能拥有的历史数据量可能不够或者太少以建立一个可靠的预测模型。为解决这个挑战,这篇文章提出了一种联合预测模型,允许多个用户共同构建一个失败时间预测模型,使用他们的多元流、高维度和不完整的数据,而不需要将数据分享或泄露。该预测模型首先使用多元函数主成分分析将多流衰减信号融合。然后,融合后的特征coupled with the times-to-failure被用建立一个(对数)-(尺度)-(拟合)回归模型进行失败预测。为了在分布式数据集上计算参数并保持所有参与者的数据隐私,我们提出了一种新的联合算法 для特征提取。 numerically studies show that the performance of the proposed model is the same as that of classic non-federated prognostic models and is better than that of the models constructed by each user itself.

On Self-Supervised Dynamic Incremental Regularised Adaptation

  • paper_url: http://arxiv.org/abs/2311.07461
  • repo_url: None
  • paper_authors: Abanoub Ghobrial, Kerstin Eder
  • for: 本研究提出了一种基于几个样本和轻量级融合的动态领域适应方法,即DIRA,以实现领域适应最佳结果。
  • methods: DIRA方法基于一些标签,但这些标签不是必需的,因此它可以视为一种自动适应方法。
  • results: DIRA方法在前一些研究中已经达到了领域适应最佳结果的水平,但它仍然需要提供标签来进行适应。在本研究中,我们提出了一种修改DIRA方法,使其成为自动适应方法,并将在未来的实验中提供证明。
    Abstract In this paper, we overview a recent method for dynamic domain adaptation named DIRA, which relies on a few samples in addition to a regularisation approach named elastic weight consolidation to achieve state-of-the-art (SOTA) domain adaptation results. DIRA has been previously shown to perform competitively with SOTA unsupervised adaption techniques. However, a limitation of DIRA is that it relies on labels to be provided for the few samples used in adaption. This makes it a supervised technique. In this paper, we discuss a proposed alteration to the DIRA method to make it self-supervised i.e. remove the need for providing labels. Experiments on our proposed alteration will be provided in future work.
    摘要 在这篇论文中,我们介绍了一种最近的动态领域适应方法 named DIRA,该方法基于一些样本和一种正则化方法 named elastic weight consolidation来实现领域适应结果。 DIRA 已经在之前的研究中展示了与顶尖无监督适应技术相当的性能。然而,DIRA 的一个局限性是它需要提供适应样本的标签。这使得它成为一种有监督的技术。在这篇论文中,我们讨论了对 DIRA 方法进行修改,以使其成为无监督的,即移除标签提供的需求。未来的工作中将提供相关的实验。

Causal Discovery under Latent Class Confounding

  • paper_url: http://arxiv.org/abs/2311.07454
  • repo_url: None
  • paper_authors: Bijan Mazaheri, Spencer Gordon, Yuval Rabani, Leonard Schulman
  • for: 这种研究是为了解决多源数据中的 causal discovery 问题。
  • methods: 这种方法使用 directaded acyclic graphs 模型系统的 causal 结构,并使用 conditional independence properties 来学习这种结构。
  • results: 研究表明,如果global confounding的 cardinality 是有限的(即数据来源有限),则可以成功地解决 causal discovery 问题。 however, the feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure。
    Abstract Directed acyclic graphs are used to model the causal structure of a system. ``Causal discovery'' describes the problem of learning this structure from data. When data is an aggregate from multiple sources (populations or environments), global confounding obscures conditional independence properties that drive many causal discovery algorithms. For this reason, existing causal discovery algorithms are not suitable for the multiple-source setting. We demonstrate that, if the confounding is of bounded cardinality (i.e. the data comes from a limited number of sources), causal discovery can still be achieved. The feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure.
    摘要 Translated into Simplified Chinese:导向无环Graphs are used to model the causal structure of a system. "Causal discovery" describes the problem of learning this structure from data. When data is an aggregate from multiple sources (populations or environments), global confounding obscures conditional independence properties that drive many causal discovery algorithms. For this reason, existing causal discovery algorithms are not suitable for the multiple-source setting. We demonstrate that, if the confounding is of bounded cardinality (i.e. the data comes from a limited number of sources), causal discovery can still be achieved. The feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Taiwan, and other countries.

Explainable Boosting Machines with Sparsity – Maintaining Explainability in High-Dimensional Settings

  • paper_url: http://arxiv.org/abs/2311.07452
  • repo_url: https://github.com/interpretml/interpret
  • paper_authors: Brandon M. Greenwell, Annika Dahlmann, Saurabh Dhoble
  • For: The paper aims to improve the transparency and speed of Explainable Boosting Machines (EBMs) in high-dimensional settings with many predictor variables.* Methods: The paper proposes using the Least Absolute Shrinkage and Selection Operator (LASSO) to introduce sparsity and remove less relevant terms in the EBM, allowing the model to maintain transparency and relatively fast scoring times.* Results: The paper shows that post-processing a fitted EBM with many terms using LASSO can reduce the model’s complexity and drastically improve scoring time, while maintaining competitive accuracy.Here’s the Chinese translation of the three points:* For: 这篇论文目标是在高维设定下提高Explainable Boosting Machines(EBM)的透明度和速度。* Methods: 论文提议使用Least Absolute Shrinkage and Selection Operator(LASSO)引入简洁性,从 fitted EBM 中除去不那么重要的项目,使模型保持透明度和相对快的分分析时间。* Results: 论文显示,对 fitted EBM 使用 LASSO 后处理可以减少模型的复杂性,提高分分析时间,保持竞争性的准确性。
    Abstract Compared to "black-box" models, like random forests and deep neural networks, explainable boosting machines (EBMs) are considered "glass-box" models that can be competitively accurate while also maintaining a higher degree of transparency and explainability. However, EBMs become readily less transparent and harder to interpret in high-dimensional settings with many predictor variables; they also become more difficult to use in production due to increases in scoring time. We propose a simple solution based on the least absolute shrinkage and selection operator (LASSO) that can help introduce sparsity by reweighting the individual model terms and removing the less relevant ones, thereby allowing these models to maintain their transparency and relatively fast scoring times in higher-dimensional settings. In short, post-processing a fitted EBM with many (i.e., possibly hundreds or thousands) of terms using the LASSO can help reduce the model's complexity and drastically improve scoring time. We illustrate the basic idea using two real-world examples with code.
    摘要 Simplified Chinese:与“黑盒”模型(如随机森林和深度神经网络)相比,可解释扩展机器(EBM)被视为“玻璃盒”模型,可同时保持高度的透明度和解释性。然而,EBM在高维设置中的多个预测变量时会变得更加难以理解和维护,同时也会导致在生产环境中使用变得更加困难。我们提出了一个简单的解决方案,基于最小绝对减少和选择算子(LASSO),可以在高维设置中引入稀疏性,并通过重新权重个模型项和 removes menos相关的项来保持模型的透明度和相对快的分分时间。简而言之,对已经预测好的 EBM 进行 LASSO 处理可以帮助减少模型的复杂性,并快速提高分分时间。我们使用两个实际例子的代码来说明基本思路。

On the Robustness of Neural Collapse and the Neural Collapse of Robustness

  • paper_url: http://arxiv.org/abs/2311.07444
  • repo_url: None
  • paper_authors: Jingtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe
  • for: 本文研究了神经网络中的神经崩溃现象,即训练结束时神经网络的特征向量和分类权重归一化到一个简单的 геометрической设计(一个简单体)。
  • methods: 本文使用了实验和理论的方法来研究神经崩溃的稳定性特性。
  • results: 研究发现,神经崩溃结构在小型攻击下消失,并且输入数据中的扰动Example会“跳跃”到简单体的边点上。此外,研究发现对抗攻击的网络优化后,神经崩溃仍然是普遍存在的现象,clean和扰动表示形成了垂直的简单体,并且导致了一个简单 nearest-neighbor 分类器。
    Abstract Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness, remains unclear. In this work, we study the stability properties of these simplices. We find that the simplex structure disappears under small adversarial attacks, and that perturbed examples "leap" between simplex vertices. We further analyze the geometry of networks that are optimized to be robust against adversarial perturbations of the input, and find that Neural Collapse is a pervasive phenomenon in these cases as well, with clean and perturbed representations forming aligned simplices, and giving rise to a robust simple nearest-neighbor classifier. By studying the propagation of the amount of collapse inside the network, we identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data.
    摘要

Boolean Variation and Boolean Logic BackPropagation

  • paper_url: http://arxiv.org/abs/2311.07427
  • repo_url: None
  • paper_authors: Van Minh Nguyen
  • for: 这篇论文是关于布尔集的概念引入和布尔逻辑推导深度模型的建立的。
  • methods: 该论文使用布尔逻辑推导深度模型, weights和活动都是布尔数字,通过布尔逻辑进行运算。具体来说,布尔深度模型可以直接在布尔领域内训练,不需要隐藏权重。没有梯度,只有逻辑是合成和归并。
  • results: 该论文的实验结果表明,布尔深度模型可以达到与实数深度模型相同的性能水平,但是具有更好的可解释性和安全性。
    Abstract The notion of variation is introduced for the Boolean set and based on which Boolean logic backpropagation principle is developed. Using this concept, deep models can be built with weights and activations being Boolean numbers and operated with Boolean logic instead of real arithmetic. In particular, Boolean deep models can be trained directly in the Boolean domain without latent weights. No gradient but logic is synthesized and backpropagated through layers.
    摘要 “变化”概念在布尔集中引入,基于这个概念,布尔逻辑反propagation原理得到开发。使用这个概念,深度模型可以使用布尔数字和布尔逻辑进行操作,而不需要实数 arithmetic。特别是,布尔深度模型可以直接在布尔领域内被训练,而不需要秘密 веса。无需梯度,逻辑是通过层次 синтези并反propagated。

Three-dimensional granular flow simulation using graph neural network-based learned simulator

  • paper_url: http://arxiv.org/abs/2311.07416
  • repo_url: None
  • paper_authors: Yongjin Choi, Krishna Kumar
  • for: This paper aims to develop a novel deep learning technique, graph neural network (GNN), to simulate granular flows and address the issues of computational intractability and empirical nature of traditional methods.
  • methods: The paper employs GNN to develop a GNN-based simulator (GNS) for granular flows, which learns the local interaction law of granular flows from a limited set of trajectories.
  • results: The paper shows that GNS successfully reproduces the overall behaviors of column collapses with various aspect ratios that were not encountered during training, and outperforms high-fidelity numerical simulators by 300 times in terms of computation speed.
    Abstract Reliable evaluations of geotechnical hazards like landslides and debris flow require accurate simulation of granular flow dynamics. Traditional numerical methods can simulate the complex behaviors of such flows that involve solid-like to fluid-like transitions, but they are computationally intractable when simulating large-scale systems. Surrogate models based on statistical or machine learning methods are a viable alternative, but they are typically empirical and rely on a confined set of parameters in evaluating associated risks. Due to their permutation-dependent learning, conventional machine learning models require an unreasonably large amount of training data for building generalizable surrogate models. We employ a graph neural network (GNN), a novel deep learning technique, to develop a GNN-based simulator (GNS) for granular flows to address these issues. Graphs represent the state of granular flows and interactions, like the exchange of energy and momentum between grains, and GNN learns the local interaction law. GNS takes the current state of the granular flow and estimates the next state using Euler explicit integration. We train GNS on a limited set of granular flow trajectories and evaluate its performance in a three-dimensional granular column collapse domain. GNS successfully reproduces the overall behaviors of column collapses with various aspect ratios that were not encountered during training. The computation speed of GNS outperforms high-fidelity numerical simulators by 300 times.
    摘要 可靠的地层风险评估需要准确地模拟 granular 流动的动态。传统的数值方法可以模拟 granular 流动中的复杂行为,但它们在大规模系统上 computationally intractable。基于统计或机器学习方法的代理模型是一种可行的 alternativa,但它们通常是empirical的,并且基于一个有限的参数集来评估相关的风险。由于它们的 permutation-dependent learning,传统的机器学习模型需要一个不切实际的大量的训练数据来建立通用的代理模型。我们employs a graph neural network (GNN),一种新的深度学习技术,来开发一个 GNN-based simulator (GNS) for granular flows。图表示 granular 流动的状态和交互,如粒子之间的能量和动量交换,GNN 学习本地交互法律。GNS 使用当前 granular 流动的状态来估计下一个状态,使用 Euler 显式积分。我们在一个有限的 granular 流动轨迹上训练 GNS,并评估其性能在一个三维 granular 柱塌领域。GNS 成功地复制了不同方向比例的柱塌的总行为,并且在训练期间没有遇到的多样化的柱塌行为。GNS 的计算速度高于高精度数值模拟器,提高了 300 倍。

Attention-based Multi-task Learning for Base Editor Outcome Prediction

  • paper_url: http://arxiv.org/abs/2311.07636
  • repo_url: None
  • paper_authors: Amina Mollaysa, Ahmed Allam, Michael Krauthammer
  • for: 提高基因编辑技术的精度和效率,以便更好地治疗人类遗传疾病。
  • methods: 使用机器学习模型,通过预测各种可能的编辑结果,以提高基因编辑设计的精度和效率。
  • results: 在多个数据集和基因编辑变体上,模型预测的结果与实验结果呈 corrrelation,证明了模型的有效性和可靠性。
    Abstract Human genetic diseases often arise from point mutations, emphasizing the critical need for precise genome editing techniques. Among these, base editing stands out as it allows targeted alterations at the single nucleotide level. However, its clinical application is hindered by low editing efficiency and unintended mutations, necessitating extensive trial-and-error experimentation in the laboratory. To speed up this process, we present an attention-based two-stage machine learning model that learns to predict the likelihood of all possible editing outcomes for a given genomic target sequence. We further propose a multi-task learning schema to jointly learn multiple base editors (i.e. variants) at once. Our model's predictions consistently demonstrated a strong correlation with the actual experimental results on multiple datasets and base editor variants. These results provide further validation for the models' capacity to enhance and accelerate the process of refining base editing designs.
    摘要 人类遗传病多发生于点突变,强调了精准基因编辑技术的核心性。其中,基因编辑技术出现了,它可以在单个核苷酸水平进行targeted修饰。然而,临床应用受到低修饰效率和不意图的突变所阻碍,需要进行详细的实验室试验。为了加速这个过程,我们提出了一种关注机制基于两个阶段机器学习模型,可以预测给定 genomic 目标序列中所有可能的编辑结果的可能性。我们还提议使用多任务学习 schema,可以同时学习多种基因编辑器(即变体)。我们的模型预测结果与实验结果在多个数据集和基因编辑器变体上均具有强相关性。这些结果为我们的模型增强和加速基因编辑设计的能力提供了进一步的验证。

Transpose Attack: Stealing Datasets with Bidirectional Training

  • paper_url: http://arxiv.org/abs/2311.07389
  • repo_url: https://github.com/guyamit/transpose-attack-paper-ndss24-
  • paper_authors: Guy Amit, Mosh Levy, Yisroel Mirsky
  • for: 本研究探讨了深度神经网络在反向方向下的漏洞,以及恶意用户可以通过这个漏洞将模型隐藏在正常模型中。
  • methods: 本研究使用了深度神经网络的反向传播方法,并示出了如何在这种方法下系统地记忆和回忆特定的样本。
  • results: 研究发现,现代建模可以通过这种方法在保护学习环境下隐藏敏感数据,并且可以高精度地复制大量样本,这可能会损害数据隐私和生成新模型。此外,研究还提出了一种新的方法来检测恶意模型。
    Abstract Deep neural networks are normally executed in the forward direction. However, in this work, we identify a vulnerability that enables models to be trained in both directions and on different tasks. Adversaries can exploit this capability to hide rogue models within seemingly legitimate models. In addition, in this work we show that neural networks can be taught to systematically memorize and retrieve specific samples from datasets. Together, these findings expose a novel method in which adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models. We focus on the data exfiltration attack and show that modern architectures can be used to secretly exfiltrate tens of thousands of samples with high fidelity, high enough to compromise data privacy and even train new models. Moreover, to mitigate this threat we propose a novel approach for detecting infected models.
    摘要 深度神经网络通常在前向方向下执行。然而,在这项工作中,我们发现了一个漏洞,允许模型在两个方向和不同任务上训练。攻击者可以利用这个能力,隐藏恶意模型在看起来合法的模型中。此外,我们还示出了神经网络可以系统地记忆和重复特定样本。这些发现表明了一种新的攻击方法,可以在受保护的学习环境中隐藏大量数据。我们主要关注数据泄露攻击,并证明现代架构可以使用高准确率泄露数据,足以损害数据隐私和生成新模型。此外,我们还提出了一种新的检测恶意模型的方法,以mitigate这种威胁。

arfpy: A python package for density estimation and generative modeling with adversarial random forests

  • paper_url: http://arxiv.org/abs/2311.07366
  • repo_url: https://github.com/bips-hb/arfpy
  • paper_authors: Kristin Blesch, Marvin N. Wright
  • for: 该论文提供了一种用于生成类似给定数据的轻量级方法,即 Adversarial Random Forests(ARF)的Python实现,帮助实际者快速地进行数据生成和分布估计。
  • methods: 该论文使用的方法是Adversarial Random Forests(ARF),它是一种基于树的生成模型,可以快速地生成类似给定数据的新数据。
  • results: 论文的结果表明,$\textit{arfpy}$ 可以快速地生成高质量的新数据,并且与传统的深度学习模型相比,它具有更低的 Tuning 和计算资源的需求,同时具有易用的Python接口,可以让科学家在各个领域进行数据生成。
    Abstract This paper introduces $\textit{arfpy}$, a python implementation of Adversarial Random Forests (ARF) (Watson et al., 2023), which is a lightweight procedure for synthesizing new data that resembles some given data. The software $\textit{arfpy}$ equips practitioners with straightforward functionalities for both density estimation and generative modeling. The method is particularly useful for tabular data and its competitive performance is demonstrated in previous literature. As a major advantage over the mostly deep learning based alternatives, $\textit{arfpy}$ combines the method's reduced requirements in tuning efforts and computational resources with a user-friendly python interface. This supplies audiences across scientific fields with software to generate data effortlessly.
    摘要

ADAMM: Anomaly Detection of Attributed Multi-graphs with Metadata: A Unified Neural Network Approach

  • paper_url: http://arxiv.org/abs/2311.07355
  • repo_url: https://github.com/konsotirop/adamm
  • paper_authors: Konstantinos Sotiropoulos, Lingxiao Zhao, Pierre Jinghong Liang, Leman Akoglu
  • for: 针对复杂的图Database中的节点和边具有自适应特征的异常实例检测。
  • methods: 提出了一种名为ADAMM的图神经网络模型,可以直接处理导向的多边图和自环图,同时同时处理图和标注数据的整合。
  • results: 对两个不同领域的数据集进行了实验,包括公司的财务日志条目和人们的城市流动轨迹数据,并证明了ADAMM的一致性和检测效果。
    Abstract Given a complex graph database of node- and edge-attributed multi-graphs as well as associated metadata for each graph, how can we spot the anomalous instances? Many real-world problems can be cast as graph inference tasks where the graph representation could capture complex relational phenomena (e.g., transactions among financial accounts in a journal entry), along with metadata reflecting tabular features (e.g. approver, effective date, etc.). While numerous anomaly detectors based on Graph Neural Networks (GNNs) have been proposed, none are capable of directly handling directed graphs with multi-edges and self-loops. Furthermore, the simultaneous handling of relational and tabular features remains an unexplored area. In this work we propose ADAMM, a novel graph neural network model that handles directed multi-graphs, providing a unified end-to-end architecture that fuses metadata and graph-level representation learning through an unsupervised anomaly detection objective. Experiments on datasets from two different domains, namely, general-ledger journal entries from different firms (accounting) as well as human GPS trajectories from thousands of individuals (urban mobility) validate ADAMM's generality and detection effectiveness of expert-guided and ground-truth anomalies. Notably, ADAMM outperforms existing baselines that handle the two data modalities (graph and metadata) separately with post hoc synthesis efforts.
    摘要 给定复杂的图数据库,包括节点和边具有多重图的多图,以及每个图的相关metadata,如何检测异常实例?许多现实世界问题可以表示为图推理任务,图表示可以捕捉复杂的关系现象(例如,财务交易记录中的账户之间的交易),并且metadata反映了表格特征(例如,批准人、有效日期等)。尽管已经有许多基于图神经网络(GNNs)的异常检测器被提出,但是这些模型无法直接处理指定的多图和自Loop。此外,同时处理关系和表格特征的推理还是一个未探索的领域。在这项工作中,我们提出了ADAMM模型,它可以处理指定多图,并提供一个简单的端到端架构,通过不监督的异常检测目标来融合metadata和图 nivel representation学习。实验表明,ADAMM在不同领域的数据集上(包括财务记录和人类GPS轨迹)具有一致性和检测准确性,并且超过了分离两种数据模式(图和metadata)的基础线上的混合synthesis方法。

Affine Invariance in Continuous-Domain Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2311.09245
  • repo_url: None
  • paper_authors: Ali Mohaddes, Johannes Lederer
  • for: 本研究旨在提高深度学习模型对几何变换的识别能力,通过利用群变换的概念。
  • methods: 本研究使用了连续域 convolutional neural networks,并引入了一新的相似性评价标准来评估两个输入信号之间的相似性under affine transformations。
  • results: 研究表明,通过利用全部的affine transforms生成的泛型linear group $\mathrm{GL}_2(\mathbb{R})$,可以大幅提高深度学习模型的性能。
    Abstract The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Indeed, it has been shown that group invariance can largely improve deep learning performances in practice, where such transformations are very common. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considering isometric invariance or similarity invariance, we focus on the full structure of affine transforms generated by the generalized linear group $\mathrm{GL}_2(\mathbb{R})$. We introduce a new criterion to assess the similarity of two input signals under affine transformations. Then, unlike conventional methods that involve solving complex optimization problems on the Lie group $G_2$, we analyze the convolution of lifted signals and compute the corresponding integration over $G_2$. In sum, our research could eventually extend the scope of geometrical transformations that practical deep-learning pipelines can handle.
    摘要 “GROUP INVARIANCE HELPS NEURAL NETWORKS RECOGNIZE PATTERNS AND FEATURES UNDER GEOMETRIC TRANSFORMATIONS. IN PRACTICE, SUCH TRANSFORMATIONS ARE VERY COMMON, AND GROUP INVARIANCE CAN LARGELY IMPROVE DEEP LEARNING PERFORMANCE. THIS RESEARCH STUDIES AFFINE INVARIANCE ON CONTINUOUS-DOMAIN CONVOLUTIONAL NEURAL NETWORKS. OTHER RESEARCH HAS CONSIDERED ISMETRIC INVARIANCE OR SIMILARITY INVARIANCE, BUT WE FOCUS ON THE FULL STRUCTURE OF AFFINE TRANSFORMS GENERATED BY THE GENERALIZED LINEAR GROUP $\mathbb{R}^2$. WE INTRODUCE A NEW CRITERION TO ASSESS THE SIMILARITY OF TWO INPUT SIGNALS UNDER AFFINE TRANSFORMATIONS. UNLIKE CONVENTIONAL METHODS THAT INVOLVE SOLVING COMPLEX OPTIMIZATION PROBLEMS ON THE LIE GROUP $G_2$, WE ANALYZE THE CONVOLUTION OF LIFTED SIGNALS AND COMPUTE THE CORRESPONDING INTEGRATION OVER $G_2$. IN SUM, OUR RESEARCH COULD FINALLY EXTEND THE SCOPE OF GEOMETRIC TRANSFORMATIONS THAT PRACTICAL DEEP-LEARNING PIPELINES CAN HANDLE.”Note that Simplified Chinese is used here, which is a more common writing system in China. If you prefer Traditional Chinese, I can provide that version as well.

Missing Value Imputation for Multi-attribute Sensor Data Streams via Message Propagation (Extended Version)

  • paper_url: http://arxiv.org/abs/2311.07344
  • repo_url: https://github.com/xli-2020/mpin
  • paper_authors: Xiao Li, Huan Li, Hua Lu, Christian S. Jensen, Varun Pandey, Volker Markl
  • for: 用于替代感知数据流中缺失值的快速和高效方法。
  • methods: 提出了一种消息协议推广网络(MPIN),可以在一个时间窗口内恢复缺失数据实例的值。同时,我们还提出了一种连续替换机制,包括数据更新和模型更新机制,以便MPIN可以在实时应用中进行连续替换。
  • results: MPIN可以在多个实际数据集上表现出较高的替换精度和效率,并且连续替换机制可以保证MPIN的高效性和准确性。
    Abstract Sensor data streams occur widely in various real-time applications in the context of the Internet of Things (IoT). However, sensor data streams feature missing values due to factors such as sensor failures, communication errors, or depleted batteries. Missing values can compromise the quality of real-time analytics tasks and downstream applications. Existing imputation methods either make strong assumptions about streams or have low efficiency. In this study, we aim to accurately and efficiently impute missing values in data streams that satisfy only general characteristics in order to benefit real-time applications more widely. First, we propose a message propagation imputation network (MPIN) that is able to recover the missing values of data instances in a time window. We give a theoretical analysis of why MPIN is effective. Second, we present a continuous imputation framework that consists of data update and model update mechanisms to enable MPIN to perform continuous imputation both effectively and efficiently. Extensive experiments on multiple real datasets show that MPIN can outperform the existing data imputers by wide margins and that the continuous imputation framework is efficient and accurate.
    摘要 仪器数据流广泛存在在互联网东西(IoT)中的实时应用中。然而,仪器数据流中存在缺失值,这些缺失值可能由仪器故障、通信错误或电池耗尽等因素引起。缺失值会下降实时分析任务和下游应用的质量。现有的填充方法都有一定的假设,或者效率低下。在这个研究中,我们想要准确地和高效地填充数据流中缺失值,以便更广泛地应用于实时应用。首先,我们提出了一种消息传播填充网络(MPIN),可以在时间窗口中重建缺失的数据实例。我们给出了MPIN的理论分析,解释了它的效果。其次,我们提出了一种连续填充框架,该框架包括数据更新和模型更新机制,以便MPIN可以在实时中进行连续填充,同时保证效果和效率。在多个实际数据集上进行了广泛的实验,我们发现MPIN可以在现有数据填充器的基础上准确地和高效地填充数据流中的缺失值,并且连续填充框架可以保证MPIN的高效性和准确性。

Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning

  • paper_url: http://arxiv.org/abs/2311.07343
  • repo_url: None
  • paper_authors: Felix den Breejen, Sangmin Bae, Stephen Cha, Tae-Young Kim, Seoung Hyun Koh, Se-Young Yun
  • for: 提高 tabular deep learning 的表现
  • methods: 使用召回机制,特别是在练习 TabPFN 模型的 fine-tuning 阶段
  • results: 在我们的实验中,使用召回机制和大量预训练可以明显超越现有方法,这些发现表明将召回机制融合到预训练和传输学习方案中可以提升 tabular deep learning 的表现。I hope that helps! Let me know if you have any other questions.
    Abstract While interests in tabular deep learning has significantly grown, conventional tree-based models still outperform deep learning methods. To narrow this performance gap, we explore the innovative retrieval mechanism, a methodology that allows neural networks to refer to other data points while making predictions. Our experiments reveal that retrieval-based training, especially when fine-tuning the pretrained TabPFN model, notably surpasses existing methods. Moreover, the extensive pretraining plays a crucial role to enhance the performance of the model. These insights imply that blending the retrieval mechanism with pretraining and transfer learning schemes offers considerable potential for advancing the field of tabular deep learning.
    摘要 而Tabular深度学习的兴趣在过去几年得到了广泛的关注,但是传统的树状模型仍然在深度学习方法之上表现更好。为了减少这个性能差距,我们 explore了一种创新的引用机制,即让神经网络在预测时引用其他数据点。我们的实验表明,引用基本训练,特别是在使用预训练TabPFN模型进行细化训练时,显著超过了现有方法。此外,广泛的预训练也对模型性能产生了重要的影响。这些发现表明,将引用机制与预训练和传输学习策略结合起来可以为表格深度学习领域带来显著的进步。

DAGC: Data-Volume-Aware Adaptive Sparsification Gradient Compression for Distributed Machine Learning in Mobile Computing

  • paper_url: http://arxiv.org/abs/2311.07324
  • repo_url: None
  • paper_authors: Rongwei Lu, Yutong Jiang, Yinan Mao, Chen Tang, Bin Chen, Laizhong Cui, Zhi Wang
  • for: 实现分布式机器学习(Distributed Machine Learning,DML)在移动环境中获得更好的性能。
  • methods: 使用非均匀压缩法,对不同的worker分配不同的压缩比例,以应对非同一的数据分布和量。
  • results: 这项研究发现,对于非同一的数据分布和量,将不同的压缩比例分配给不同的worker可以提高 convergence rate,并且可以在受限的通信预算下进行优化。
    Abstract Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is thus a promising solution. This study introduces an analysis of distributed SGD with non-uniform compression, which reveals that the convergence rate (indicative of the iterations needed to achieve a certain accuracy) is influenced by compression ratios applied to workers with differing volumes. Accordingly, we frame relative compression ratio assignment as an $n$-variables chi-square nonlinear optimization problem, constrained by a fixed and limited communication budget. We propose DAGC-R, which assigns the worker handling larger data volumes the conservative compression. Recognizing the computational limitations of mobile devices, we DAGC-A, which are computationally less demanding and enhances the robustness of the absolute gradient compressor in non-IID scenarios. Our experiments confirm that both the DAGC-A and DAGC-R can achieve better performance when dealing with highly imbalanced data volume distribution and restricted communication.
    摘要 分布式机器学习(DML)在移动环境中遇到了重要的通信瓶颈。梯度压缩已经出现为解决这个问题的有效解决方案,提供了有限的带宽和计量数据环境中的重要优点。然而,它们在非标一致环境中会导致严重的性能下降,因为不考虑工作者间数据量的差异。将不同数据分布和量的工作者分配不同的压缩比例是一种有前途的解决方案。本研究提出了分布式SGD中非均匀压缩的分析,发现压缩比率应用于不同数据分布和量的工作者会影响收敛率(表示需要达到某种精度的迭代次数)。因此,我们将压缩比率分配视为n变量的chi-方差非线性优化问题,受限于固定的通信预算。我们提出了DAGC-R,它将处理大量数据的工作者分配保守的压缩。认识到移动设备的计算限制,我们还提出了DAGC-A,它是计算较少的,并在非标一致情况下提高了绝对梯度压缩器的稳定性。我们的实验表明,DAGC-A和DAGC-R可以在面临高度不均衡数据量分布和限制通信的情况下表现更好。

A Voting Approach for Explainable Classification with Rule Learning

  • paper_url: http://arxiv.org/abs/2311.07323
  • repo_url: https://github.com/albertn7/modularrulelearning
  • paper_authors: Albert Nössig, Tobias Hell, Georg Moser
  • for: This paper investigates the application of rule learning methods in typical classification tasks, with a focus on providing explanations for the predictions made.
  • methods: The approach used in the paper is a voting method that combines rule learning and state-of-the-art methods to achieve comparable results with explanations.
  • results: The paper shows that the proposed approach outperforms ordinary rule learning methods and achieves results on a par with state-of-the-art outcomes, using a variety of benchmark data sets including a use case of significant interest to insurance industries.Here is the text in Simplified Chinese:
  • for: 这篇论文 investigate了通常的分类任务中使用规则学习方法,并强调提供预测结果的解释。
  • methods: 该论文使用的方法是一种投票方法,将规则学习和当今最佳方法结合起来,以实现与解释相对的结果。
  • results: 论文表明,提posed方法不仅比普通的规则学习方法更高效,还能够与当今最佳结果相比。使用了多个 benchmark 数据集,包括保险业的一个有用的应用场景。
    Abstract State-of-the-art results in typical classification tasks are mostly achieved by unexplainable machine learning methods, like deep neural networks, for instance. Contrarily, in this paper, we investigate the application of rule learning methods in such a context. Thus, classifications become based on comprehensible (first-order) rules, explaining the predictions made. In general, however, rule-based classifications are less accurate than state-of-the-art results (often significantly). As main contribution, we introduce a voting approach combining both worlds, aiming to achieve comparable results as (unexplainable) state-of-the-art methods, while still providing explanations in the form of deterministic rules. Considering a variety of benchmark data sets including a use case of significant interest to insurance industries, we prove that our approach not only clearly outperforms ordinary rule learning methods, but also yields results on a par with state-of-the-art outcomes.
    摘要 现代结果通常由不可解释的机器学习方法取得,如深度神经网络。然而,在本研究中,我们调查了规则学习方法的应用。因此,分类结果会基于可读的(第一类)规则,解释预测结果。然而,规则基分类通常比现代结果(经常significantly)精度较差。作为主要贡献,我们介绍了投票方法,结合这两个世界,尝试以相似的方式获得现代结果,同时仍提供可读的规则解释。使用多种标准资料集,包括一个有意义的保险业案例,我们证明了我们的方法不仅明显超过常规规则学习方法,而且也与现代结果相似。

An introduction to reinforcement learning for neuroscience

  • paper_url: http://arxiv.org/abs/2311.07315
  • repo_url: None
  • paper_authors: Kristopher T. Jensen
  • for: 本文主要是为了介绍基于强化学习的神经科学研究,包括 классиical temporal difference算法和深度强化学习方法,以及它们在实验神经科学中的应用。
  • methods: 本文使用的方法包括模型自由和模型基于的强化学习,以及DYNA和继承表示法。这些方法在机器学习和神经科学实验中都有着广泛的应用。
  • results: 本文提供了一个入门性的介绍,涵盖了强化学习的基本理论和神经科学实验中的应用。同时,本文还提供了一些实际的例子,如meta-强化学习(Wang et al., 2018)和分布强化学习(Dabney et al., 2020),以及相关的代码和图像生成。
    Abstract Reinforcement learning has a rich history in neuroscience, from early work on dopamine as a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to recent work suggesting that dopamine could implement a form of 'distributional reinforcement learning' popularized in deep learning (Dabney et al., 2020). Throughout this literature, there has been a tight link between theoretical advances in reinforcement learning and neuroscientific experiments and findings. As a result, the theories describing our experimental data have become increasingly complex and difficult to navigate. In this review, we cover the basic theory underlying classical work in reinforcement learning and build up to an introductory overview of methods used in modern deep reinforcement learning that have found applications in systems neuroscience. We start with an overview of the reinforcement learning problem and classical temporal difference algorithms, followed by a discussion of 'model-free' and 'model-based' reinforcement learning together with methods such as DYNA and successor representations that fall in between these two categories. Throughout these sections, we highlight the close parallels between the machine learning methods and related work in both experimental and theoretical neuroscience. We then provide an introduction to deep reinforcement learning with examples of how these methods have been used to model different learning phenomena in the systems neuroscience literature, such as meta-reinforcement learning (Wang et al., 2018) and distributional reinforcement learning (Dabney et al., 2020). Code that implements the methods discussed in this work and generates the figures is also provided.
    摘要 强化学习有着丰富的历史在神经科学中,从早期关于 dopamine 作为时间差值学习的奖励预测错误信号 (Schultz et al., 1997) 到最近的研究表明 dopamine 可能实现一种 '分布式强化学习' 的概念,它在深度学习中受欢迎 (Dabney et al., 2020)。在这些文献中,有着神经科学实验和发现的紧密联系,因此理论上的进步和实验的发现相互启发。在这篇文章中,我们将讲解强化学习的基本理论,从经典工作开始,然后推导到现代深度强化学习的方法,它们在系统神经科学中找到了应用。我们开始于强化学习问题的概述和经典时间差值算法,然后讲解 '模型自由' 和 '模型基于' 强化学习,以及 DYNA 和继承表示法,它们在这两个类别之间存在。在这些部分中,我们强调了机器学习方法和相关的神经科学实验和理论之间的密切相互关系。然后,我们将介绍深度强化学习,并通过系统神经科学文献中的不同学习现象模型,如 meta-强化学习 (Wang et al., 2018) 和分布式强化学习 (Dabney et al., 2020) 来示例。我们还提供了实现这些方法和生成图表的代码。

A probabilistic forecast methodology for volatile electricity prices in the Australian National Electricity Market

  • paper_url: http://arxiv.org/abs/2311.07289
  • repo_url: None
  • paper_authors: Cameron Cornell, Nam Trong Dinh, S. Ali Pourmousavi
  • for: 该论文主要探讨了澳大利亚南澳地区电力市场中的价格波动问题,并提出了一种可靠的预测方法。
  • methods: 该论文使用了随机过滤和一些后处理步骤,包括量词回归 ensemble 方法,以提高预测 precisio。
  • results: comparing with各个模型的预测结果, ensemble 模型在预测澳大利亚南澳地区电力市场价格中显示出了更高的准确率和更好的适应性。
    Abstract The South Australia region of the Australian National Electricity Market (NEM) displays some of the highest levels of price volatility observed in modern electricity markets. This paper outlines an approach to probabilistic forecasting under these extreme conditions, including spike filtration and several post-processing steps. We propose using quantile regression as an ensemble tool for probabilistic forecasting, with our combined forecasts achieving superior results compared to all constituent models. Within our ensemble framework, we demonstrate that averaging models with varying training length periods leads to a more adaptive model and increased prediction accuracy. The applicability of the final model is evaluated by comparing our median forecasts with the point forecasts available from the Australian NEM operator, with our model outperforming these NEM forecasts by a significant margin.
    摘要 南澳大利用澳大电力市场(NEM)的区域显示出一些最高的价格波动,这篇论文描述了在这些极端情况下的抽象预测方法,包括峰值筛选和一些后处理步骤。我们建议使用量论回归作为ensemble工具,我们的组合预测达到了所有组件模型的超越成果。在我们的集成框架中,我们证明了平均模型各种训练时间期间的变化,导致更适应的模型和提高预测精度。我们对最终模型的可行性进行了评估,通过与澳大电力市场运营商提供的点预测相比较,我们的模型在重要的margin上超越了这些预测。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Learning Arithmetic Formulas in the Presence of Noise: A General Framework and Applications to Unsupervised Learning

  • paper_url: http://arxiv.org/abs/2311.07284
  • repo_url: None
  • paper_authors: Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, Tanmay Sinha
  • for: 这 paper 是为了设计高效的无监督学习问题解决方案,如混合 Gaussian 和子空间 clustering。
  • methods: 这 paper 使用一种基于 meta 算法的框架,学习受噪的 arithmetic circuits。这基于 Garg, Kayal 和 Saha (FOCS 20) 的 latest work,但是它们不受噪。关键的一部分是一种高效的 Robust Vector Space Decomposition 算法。
  • results: 作者表明,当某些矩阵有足够大的最小非零特征值时,他们的 meta 算法会工作良好。作者还推测,这种condition 在简化版问题上成立,因此他们的框架可以在简化设定下提供高效的算法。
    Abstract We present a general framework for designing efficient algorithms for unsupervised learning problems, such as mixtures of Gaussians and subspace clustering. Our framework is based on a meta algorithm that learns arithmetic circuits in the presence of noise, using lower bounds. This builds upon the recent work of Garg, Kayal and Saha (FOCS 20), who designed such a framework for learning arithmetic circuits without any noise. A key ingredient of our meta algorithm is an efficient algorithm for a novel problem called Robust Vector Space Decomposition. We show that our meta algorithm works well when certain matrices have sufficiently large smallest non-zero singular values. We conjecture that this condition holds for smoothed instances of our problems, and thus our framework would yield efficient algorithms for these problems in the smoothed setting.
    摘要 我们提出了一个通用的框架,用于设计高效的无监督学习问题的算法,如混合 Gaussian 和 subspace clustering。我们的框架基于一个 meta 算法,可以在噪声存在的情况下学习加法Circuit,使用下界。这基于最近的 Garg、Kayal 和 Saha (FOCS 20) 的工作,他们设计了这样的框架,但不含噪声。我们的 meta 算法的关键组成部分是一种高效的 Robust Vector Space Decomposition 算法。我们表明,当某些矩阵有足够大的最小非零特征值时,我们的 meta 算法能够工作良好。我们推测,这种条件在简化的问题上是成立的,因此我们的框架可以在简化 Setting 中提供高效的算法。

Predictive and Prescriptive Analytics for Multi-Site Modeling of Frail and Elderly Patient Services

  • paper_url: http://arxiv.org/abs/2311.07283
  • repo_url: None
  • paper_authors: Elizabeth Williams, Daniel Gartner, Paul Harper
    for:The paper aims to assess how predictive and prescriptive analytical methods can address operational challenges in healthcare, specifically in the context of planning resource capacities for frail and elderly inpatient wards.methods:The paper uses a combination of predictive and prescriptive analytical methods, including Classification and Regression Trees (CART) analysis and deterministic and two-stage stochastic programs, to analyze clinical and demographic patient attributes and predict length of stay.results:The linked methodologies provided different but similar results compared to using averages, capturing a more realistic real-world variation in patient length of stay. The results suggest that healthcare managers should consider using predictive and prescriptive models to make more informed decisions, rather than relying on averages.
    Abstract Recent research has highlighted the potential of linking predictive and prescriptive analytics. However, it remains widely unexplored how both paradigms could benefit from one another to address today's major challenges in healthcare. One of these is smarter planning of resource capacities for frail and elderly inpatient wards, addressing the societal challenge of an aging population. Frail and elderly patients typically suffer from multimorbidity and require more care while receiving medical treatment. The aim of this research is to assess how various predictive and prescriptive analytical methods, both individually and in tandem, contribute to addressing the operational challenges within an area of healthcare that is growing in demand. Clinical and demographic patient attributes are gathered from more than 165,000 patient records and used to explain and predict length of stay. To that extent, we employ Classification and Regression Trees (CART) analysis to establish this relationship. On the prescriptive side, deterministic and two-stage stochastic programs are developed to determine how to optimally plan for beds and ward staff with the objective to minimize cost. Furthermore, the two analytical methodologies are linked by generating demand for the prescriptive models using the CART groupings. The results show the linked methodologies provided different but similar results compared to using averages and in doing so, captured a more realistic real-world variation in the patient length of stay. Our research reveals that healthcare managers should consider using predictive and prescriptive models to make more informed decisions. By combining predictive and prescriptive analytics, healthcare managers can move away from relying on averages and incorporate the unique characteristics of their patients to create more robust planning decisions, mitigating risks caused by variations in demand.
    摘要 To assess the operational challenges within inpatient wards, we gathered clinical and demographic patient attributes from over 165,000 patient records and used Classification and Regression Trees (CART) analysis to explain and predict length of stay. We also developed deterministic and two-stage stochastic programs to determine how to optimally plan for beds and ward staff to minimize cost.The linked methodologies provided different but similar results compared to using averages, capturing a more realistic real-world variation in patient length of stay. Our research reveals that healthcare managers should consider using predictive and prescriptive models to make more informed decisions, moving away from relying on averages and incorporating the unique characteristics of their patients to create more robust planning decisions. By linking predictive and prescriptive analytics, healthcare managers can mitigate risks caused by variations in demand and create a more sustainable and effective healthcare system.

Towards Bounding Causal Effects under Markov Equivalence

  • paper_url: http://arxiv.org/abs/2311.07259
  • repo_url: None
  • paper_authors: Alexis Bellot
  • for: 这 paper 的目的是解决非观察数据下的 causal effect 预测问题,即 determining non-trivial bounds on causal effects induced by the data.
  • methods: 这 paper 使用了一种名为 Partial Ancestral Graph 的 less informative structure,并提供了一种系统的算法来 derive bounds on causal effects ,可以从 observational data 中学习。
  • results: 这 paper 的结果是提供了一种可 analytically 计算的 bounds on causal effects,它们可以在 less informative 的 causal diagram 下进行计算。
    Abstract Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that, in general, such questions cannot be answered definitively from observational data, e.g., as a consequence of unobserved confounding. A generalization of this task is to determine non-trivial bounds on causal effects induced by the data, also known as the task of partial causal identification. In the literature, several algorithms have been developed for solving this problem. Most, however, require a known parametric form or a fully specified causal diagram as input, which is usually not available in practical applications. In this paper, we assume as input a less informative structure known as a Partial Ancestral Graph, which represents a Markov equivalence class of causal diagrams and is learnable from observational data. In this more "data-driven" setting, we provide a systematic algorithm to derive bounds on causal effects that can be computed analytically.
    摘要 Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that, in general, such questions cannot be answered definitively from observational data, e.g., due to unobserved confounding. A generalization of this task is to determine non-trivial bounds on causal effects induced by the data, also known as the task of partial causal identification. In the literature, several algorithms have been developed for solving this problem. Most, however, require a known parametric form or a fully specified causal diagram as input, which is usually not available in practical applications. In this paper, we assume as input a less informative structure known as a Partial Ancestral Graph, which represents a Markov equivalence class of causal diagrams and is learnable from observational data. In this more "data-driven" setting, we provide a systematic algorithm to derive bounds on causal effects that can be computed analytically.Here's the translation in Traditional Chinese as well:预测未见的干预效果是资料科学中的基本研究问题。已经证明,在一般情况下,这些问题无法从观察数据中确定答案,例如因为隐藏的共组因素。一个这个任务的扩展是决定观察数据中的非重要效果 bound,也就是partial causal identification的任务。在文献中,有许多算法用于解决这个问题,但大多数需要知道的 parametric form 或者完全的 causal diagram 作为输入,这通常不是实际应用中的情况。在这篇文章中,我们假设输入的是一个 less informative 的结构,known as Partial Ancestral Graph,这是一个可以从观察数据学习的 Markov equivalence class of causal diagrams。在这个 "data-driven" 的设定下,我们提供了一个系统的算法,可以分析方式 compute analytically bounds on causal effects。

Error Analysis of Option Pricing via Deep PDE Solvers: Empirical Study

  • paper_url: http://arxiv.org/abs/2311.07231
  • repo_url: None
  • paper_authors: Rawin Assabumrungrat, Kentaro Minami, Masanori Hirano
  • for: 这个论文的目的是为了研究深度学习基于PDE解决高维Option价值问题的可scalability和实用性。
  • methods: 这个论文使用了深度学习基于PDE的方法来解决高维Option价值问题,并进行了对比性试验来评估这些方法的实际性和可scalability。
  • results: 研究发现了三种主要的错误来源:(1)目标选项和下一个资产的规定错误,(2)资产模型仿真方法引起的错误,(3)神经网络训练过程中的错误。这些错误的影响都被细分分析了。研究发现DBSDE方法在性能和可靠性方面较为出色,而其他方法则具有较强的sensitivity性。此外,研究还发现了计算资源的尺度关系,即batch size和时间步长的平方根与方法性能之间存在负相关性。
    Abstract Option pricing, a fundamental problem in finance, often requires solving non-linear partial differential equations (PDEs). When dealing with multi-asset options, such as rainbow options, these PDEs become high-dimensional, leading to challenges posed by the curse of dimensionality. While deep learning-based PDE solvers have recently emerged as scalable solutions to this high-dimensional problem, their empirical and quantitative accuracy remains not well-understood, hindering their real-world applicability. In this study, we aimed to offer actionable insights into the utility of Deep PDE solvers for practical option pricing implementation. Through comparative experiments, we assessed the empirical performance of these solvers in high-dimensional contexts. Our investigation identified three primary sources of errors in Deep PDE solvers: (i) errors inherent in the specifications of the target option and underlying assets, (ii) errors originating from the asset model simulation methods, and (iii) errors stemming from the neural network training. Through ablation studies, we evaluated the individual impact of each error source. Our results indicate that the Deep BSDE method (DBSDE) is superior in performance and exhibits robustness against variations in option specifications. In contrast, some other methods are overly sensitive to option specifications, such as time to expiration. We also find that the performance of these methods improves inversely proportional to the square root of batch size and the number of time steps. This observation can aid in estimating computational resources for achieving desired accuracies with Deep PDE solvers.
    摘要 Option 价值计价,金融领域的基本问题,经常需要解决非线性偏微分方程(PDE)。在多资产选项(如彩虹选项)的情况下,这些PDE变得高维度,带来维度味的挑战。Recently, deep learning-based PDE solvers have emerged as scalable solutions to this high-dimensional problem, but their empirical and quantitative accuracy remains not well-understood, hindering their real-world applicability. In this study, we aimed to offer actionable insights into the utility of Deep PDE solvers for practical option pricing implementation. Through comparative experiments, we assessed the empirical performance of these solvers in high-dimensional contexts. Our investigation identified three primary sources of errors in Deep PDE solvers: (i) errors inherent in the specifications of the target option and underlying assets, (ii) errors originating from the asset model simulation methods, and (iii) errors stemming from the neural network training. Through ablation studies, we evaluated the individual impact of each error source. Our results indicate that the Deep BSDE method (DBSDE) is superior in performance and exhibits robustness against variations in option specifications. In contrast, some other methods are overly sensitive to option specifications, such as time to expiration. We also find that the performance of these methods improves inversely proportional to the square root of batch size and the number of time steps. This observation can aid in estimating computational resources for achieving desired accuracies with Deep PDE solvers.Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from Traditional Chinese.

Neural General Circulation Models

  • paper_url: http://arxiv.org/abs/2311.07222
  • repo_url: https://github.com/ananya2001gupta/Bitcoin-Price-Prediction-using-AI-ML.
  • paper_authors: Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, James Lottes, Stephan Rasp, Peter Düben, Milan Klöwer, Sam Hatfield, Peter Battaglia, Alvaro Sanchez-Gonzalez, Matthew Willson, Michael P. Brenner, Stephan Hoyer
  • for: 这篇论文的目的是为了开发一种新的气象模型,它结合了数学方法和机器学习技术,以提高天气预测和气候预测的准确性和效率。
  • methods: 这篇论文使用的方法包括了机器学习模型和传统的气象模型,以及一种可微的解方法,用于解决大规模动力学问题。
  • results: 这篇论文的结果表明,该新的气象模型可以与传统的气象模型和机器学习模型相比,在1-10天的天气预测和1-15天的气候预测中具有同等或更高的准确性。此外,该模型还可以在长期气候预测中准确地跟踪全球气候指标,并且可以模拟出实际的热带风暴频率和轨迹。
    Abstract General circulation models (GCMs) are the foundation of weather and climate prediction. GCMs are physics-based simulators which combine a numerical solver for large-scale dynamics with tuned representations for small-scale processes such as cloud formation. Recently, machine learning (ML) models trained on reanalysis data achieved comparable or better skill than GCMs for deterministic weather forecasting. However, these models have not demonstrated improved ensemble forecasts, or shown sufficient stability for long-term weather and climate simulations. Here we present the first GCM that combines a differentiable solver for atmospheric dynamics with ML components, and show that it can generate forecasts of deterministic weather, ensemble weather and climate on par with the best ML and physics-based methods. NeuralGCM is competitive with ML models for 1-10 day forecasts, and with the European Centre for Medium-Range Weather Forecasts ensemble prediction for 1-15 day forecasts. With prescribed sea surface temperature, NeuralGCM can accurately track climate metrics such as global mean temperature for multiple decades, and climate forecasts with 140 km resolution exhibit emergent phenomena such as realistic frequency and trajectories of tropical cyclones. For both weather and climate, our approach offers orders of magnitude computational savings over conventional GCMs. Our results show that end-to-end deep learning is compatible with tasks performed by conventional GCMs, and can enhance the large-scale physical simulations that are essential for understanding and predicting the Earth system.
    摘要 全球气候模型(GCM)是天气和气候预测的基础模型。GCM 是一种基于物理的数值模拟器,将大规模动力学数值解析与调整的小规模过程模型相结合。近年来,基于机器学习(ML)的模型在 deterministic 天气预测中达到了相当于或更好的技能,但它们尚未展现出了改善的ensemble预测或长期天气和气候 simulations 的稳定性。在这篇文章中,我们首次把数分解 solver 与 ML 组件结合在一起,并证明它可以生成 deterministic 天气、ensemble 天气和气候预测,与best 的 ML 和物理学习方法相当。我们称之为 NeuralGCM。NeuralGCM 在 1-10 天预测中与 ML 模型相当,并与欧洲中期天气预测 ensemble 在 1-15 天预测中相当。在给定的海洋表面温度下,NeuralGCM 可以准确地跟踪气候指标,如全球平均温度,并且气候预测具有140 km 的分辨率可以显示出见识真实的热带风暴的频率和轨迹。在天气和气候方面,我们的方法可以与传统 GCM 的计算量相比,提供多个级别的计算效益。我们的结果表明,深度学习可以与传统 GCM 的任务相容,并且可以提高大规模物理 simulations 的精度,这些 simulations 是地球系统的理解和预测的关键。

Non-Contact Breathing Rate Detection Using Optical Flow

  • paper_url: http://arxiv.org/abs/2311.08426
  • repo_url: None
  • paper_authors: Robyn Maxwell, Timothy Hanley, Dara Golden, Adara Andonie, Joseph Lemley, Ashkan Parsi
  • for: 这个论文的目的是研究一种非接触式呼吸速率检测方法,使用动态推理算法。
  • methods: 这篇论文使用了光流算法来成功测量呼吸速率,通过跟踪身体特定点的运动来确定呼吸速率。
  • results: 测试表明,胸部运动可以生成非常准确的信号,RMSE为0.63。然而,面部运动也可以生成可靠的信号,但是受到头体运动干扰的影响。这些发现表明了光流算法的潜在用于非接触式呼吸速率检测,并且选择合适的点可以提高准确性。
    Abstract Breathing rate is a vital health metric that is an invaluable indicator of the overall health of a person. In recent years, the non-contact measurement of health signals such as breathing rate has been a huge area of development, with a wide range of applications from telemedicine to driver monitoring systems. This paper presents an investigation into a method of non-contact breathing rate detection using a motion detection algorithm, optical flow. Optical flow is used to successfully measure breathing rate by tracking the motion of specific points on the body. In this study, the success of optical flow when using different sets of points is evaluated. Testing shows that both chest and facial movement can be used to determine breathing rate but to different degrees of success. The chest generates very accurate signals, with an RMSE of 0.63 on the tested videos. Facial points can also generate reliable signals when there is minimal head movement but are much more vulnerable to noise caused by head/body movements. These findings highlight the potential of optical flow as a non-invasive method for breathing rate detection and emphasize the importance of selecting appropriate points to optimize accuracy.
    摘要 呼吸速率是一个重要的健康指标,对人体全面健康具有无估的意义。在最近的年头,非接触式健康信号的测量已经成为了很大的发展领域,从 теле医疗到驾驶员监测系统。本文介绍了一种非接触式呼吸速率检测方法,使用动态流体遥感算法。动态流体遥感算法可以成功地测量呼吸速率,通过跟踪体内具有特定点的运动。本研究中,使用不同的点集来评估动态流体的成功率。测试结果显示,胸部运动可以生成非常准确的信号,RMSE为0.63。脸部运动也可以生成可靠的信号,但是当头体运动较大时,会受到干扰。这些发现强调了动态流体的非侵入式方法在呼吸速率检测中的潜在优势,并且选择合适的点可以提高准确性。

On Elastic Language Models

  • paper_url: http://arxiv.org/abs/2311.07204
  • repo_url: https://github.com/RAIVNLab/MatFormer-OLMo
  • paper_authors: Chen Zhang, Benyou Wang, Dawei Song
  • for: 这个论文的目的是提出一种可塑性语言模型(ElasticLM),以适应高变化的请求流量而提供可控制的计算弹性。
  • methods: 该论文使用了知识储存distillation技术将大型语言模型压缩到小型模型中,并在请求流量变化时进行计算弹性调整,以实现可控制的响应时间-性能质量融合。
  • results: 实验结果表明,与静态基线相比,ElasticLM可以在不同的请求流量下提供可控制的响应时间-性能质量融合,并且可以在高并发下进行线上 simulate 。
    Abstract Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.
    摘要 大规模预训练语言模型已经在各种语言理解和信息检索任务中达到了吸引人的性能。知识填充提供了一种将大型语言模型压缩到小型模型的机会,以实现可接受的延迟-性能质量评估。然而,在请求数(例如,提交到搜索引擎的查询)高度变化的场景下,静态质量可能不适用。一旦模型被分配了静态质量,它可能无法适应高请求量时的延迟或低性能时的请求量较少。为解决这个问题,我们提出了弹性语言模型(ElasticLM),它可以在请求流中灵活地调整质量评估的平衡。基本想法是通过引入可灵活计算的弹性结构,使得弹性LM可以在可扩展和可控的计算下进行灵活的质量评估。具体来说,我们在弹性LM中引入了一种计算弹性,以便在不同的请求流中进行灵活的质量评估。我们还设计了一种弹性优化算法,以学习弹性LM在计算弹性下的性能。为了服务弹性LM,我们采用了一种灵活的调度策略。考虑到信息检索的特点,我们适应了弹性LM到紧凑检索和重新排序,并分别提出了弹性紧凑器(ElasticDenser)和弹性排序器(ElasticRanker)。在GLUE语言理解 benchmark上进行了线上评估,以及一些信息检索任务,如自然问题、智能问答和 MS MARCO。结果显示,弹性LM、弹性紧凑器和弹性排序器可以正确地和竞争力地与静态基线相比。此外,我们还进行了在线验证,并发现弹性LM可以根据请求流的变化提供灵活的质量评估平衡。

Input Convex LSTM: A Convex Approach for Fast Lyapunov-Based Model Predictive Control

  • paper_url: http://arxiv.org/abs/2311.07202
  • repo_url: None
  • paper_authors: Zihao Wang, Zhe Wu
  • for: 提高MPC的准确率和速度,解决ICNN中vanishing gradient问题
  • methods: 基于ICNN的LSTM模型,利用input convex性保证closed-loop稳定性,提高MPC的准确率和速度
  • results: 在非线性化学反应器的模拟研究中,提高了46.7%、31.3%和20.2%相比于基eline平板RNN、平板LSTM和输入几何RECNN模型的减少时间和mitigate vanishing gradient问题
    Abstract Leveraging Input Convex Neural Networks (ICNNs), ICNN-based Model Predictive Control (MPC) successfully attains globally optimal solutions by upholding convexity within the MPC framework. However, current ICNN architectures encounter the issue of vanishing gradients, which limits their ability to serve as deep neural networks for complex tasks. Additionally, the current neural network-based MPC, including conventional neural network-based MPC and ICNN-based MPC, faces slower convergence speed when compared to MPC based on first-principles models. In this study, we leverage the principles of ICNNs to propose a novel Input Convex LSTM for Lyapunov-based MPC, with the specific goal of reducing convergence time and mitigating the vanishing gradient problem while ensuring closed-loop stability. From a simulation study of a nonlinear chemical reactor, we observed a mitigation of vanishing gradient problem and a reduction in convergence time, with a percentage decrease of 46.7%, 31.3%, and 20.2% compared to baseline plain RNN, plain LSTM, and Input Convex Recurrent Neural Network, respectively.
    摘要 使用输入凸神经网络(ICNN),ICNN基本的模型预测控制(MPC)得到了全球最优解决方案,并保持在MPC框架中的凸性。然而,当前ICNN架构面临着衰减 gradients 问题,这限制了它们作为复杂任务的深度神经网络的能力。此外,当前神经网络基于的 MPC,包括常见神经网络基于的 MPC 和 ICNN基本的 MPC,在比基于初始理论模型的 MPC slower convergence speed 。在本研究中,我们基于 ICNN 的原理提出了一种新的输入凸 LSTM для Lyapunov-based MPC,以减少 converge 时间和消除衰减 gradients 问题,并确保关闭环Loop 稳定性。从一个非线性化学反应器的模拟研究来看,我们观察到了衰减 gradients 问题的减少和 converge 时间的减少,相比基eline plain RNN、plain LSTM 和输入凸回归神经网络,分别下降了46.7%、31.3%和20.2%。

A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning

  • paper_url: http://arxiv.org/abs/2311.07627
  • repo_url: None
  • paper_authors: Thomas Bonald, Nathan de Lara
  • for: 这个论文是关于半监督分类的研究,目标是将图形上的所有节点分配标签,使用一些已知标签的节点作为种子来刺激整个图形。
  • methods: 这种算法基于热传导原理,将种子节点的标签通过热导媒体传播到整个图形,然后使用每个节点的温度作为每个标签的分数函数。
  • results: 这paper证明了这种算法不是一定consistent, Unless the temperatures of the nodes at equilibrium are centered before scoring。这个步骤不仅使得算法可证明consistent,还会在实际图形上带来显著的性能提升。
    Abstract The task of semi-supervised classification aims at assigning labels to all nodes of a graph based on the labels known for a few nodes, called the seeds. One of the most popular algorithms relies on the principle of heat diffusion, where the labels of the seeds are spread by thermoconductance and the temperature of each node at equilibrium is used as a score function for each label. In this paper, we prove that this algorithm is not consistent unless the temperatures of the nodes at equilibrium are centered before scoring. This crucial step does not only make the algorithm provably consistent on a block model but brings significant performance gains on real graphs.
    摘要 semi-supervised classification的任务是将所有图节点分配标签,基于一些节点的标签(称为种子)的知道。最受欢迎的算法基于热传导原理,其中种子标签通过热导性传播,每个节点的热度(即每个标签的得分函数)在均衡状态下是一个分数函数。在这篇论文中,我们证明了这个算法不一定是一致的,除非在执行前将节点的热度中心化。这一步不仅使算法可证性提高,还会在真实的图上带来显著的性能提升。

Quantum Machine Learning for Remote Sensing: Exploring potential and challenges

  • paper_url: http://arxiv.org/abs/2311.07626
  • repo_url: None
  • paper_authors: Artur Miroszewski, Jakub Nalepa, Bertrand Le Saux, Jakub Mielczarek
  • for: 这篇论文研究了量子机器学习(QML)在遥感领域的应用。
  • methods: 该论文使用了量子计算机来处理和分析遥感数据,并研究了量子优势在QML中对遥感领域的影响。
  • results: 研究发现,尽管量子计算机的kernel值峰值集中问题会降低其性能,但这并不完全消除量子优势在QML中对遥感领域的影响。
    Abstract The industry of quantum technologies is rapidly expanding, offering promising opportunities for various scientific domains. Among these emerging technologies, Quantum Machine Learning (QML) has attracted considerable attention due to its potential to revolutionize data processing and analysis. In this paper, we investigate the application of QML in the field of remote sensing. It is believed that QML can provide valuable insights for analysis of data from space. We delve into the common beliefs surrounding the quantum advantage in QML for remote sensing and highlight the open challenges that need to be addressed. To shed light on the challenges, we conduct a study focused on the problem of kernel value concentration, a phenomenon that adversely affects the runtime of quantum computers. Our findings indicate that while this issue negatively impacts quantum computer performance, it does not entirely negate the potential quantum advantage in QML for remote sensing.
    摘要 产业领域的量子技术在快速发展,为不同科学领域提供了抢夺的机会。这些emerging technologies中,量子机器学习(QML)已经吸引了广泛的注意,因为它可能改变数据处理和分析的方式。在这篇论文中,我们调查了QML在遥感领域的应用。据信QML可以为遥感数据分析提供有价值的洞察。我们探讨了量子优势在QML领域的共同信念,并高亮了需要解决的开放挑战。为了照明这些挑战,我们进行了关于值集中心的问题研究,这是量子计算机性能的一个问题。我们的发现表明,虽然这个问题会影响量子计算机的性能,但并不完全否定量子优势在QML领域的遥感数据分析中。

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

  • paper_url: http://arxiv.org/abs/2311.07625
  • repo_url: None
  • paper_authors: Rishav Mukherji, Mark Schöne, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney
  • for: 这篇论文主要探讨了深度学习模型中的活动簇节点,以及将其转移到神经元 computing 设备上的可能性。
  • methods: 本研究使用了 GRU 模型,并通过将活动节点簇节过滤来实现活动簇。此外,研究还考虑了活动簇与参数簇的互动。
  • results: 研究获得了Up to $20\times$ 的computational 缩减,并且维持了 perplexity 值在 $60$ 以下,在 Penn Treebank 语言模型任务上。这个结果不仅不受对对参数簇的影响,而且也不受对对活动簇的影响。
    Abstract Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to $20\times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning.
    摘要

Learning Symmetrization for Equivariance with Orbit Distance Minimization

  • paper_url: http://arxiv.org/abs/2311.07143
  • repo_url: https://github.com/tiendatnguyen-vision/orbit-symmetrize
  • paper_authors: Tien Dat Nguyen, Jinwoo Kim, Hongseok Yang, Seunghoon Hong
  • for: 将任意神经网络架构变换为具有给定群的对称性和等变性。
  • methods: 基于kim et al. (2023)和kaba et al. (2023)的提议,使用优化的损失函数来替换神经特征的群表示,以提高适用范围。
  • results: 在SO(2)图像分类任务和O(1, 3)任务上实验表明,我们的方法具有竞争力和更广泛的通用性。实现将于https://github.com/tiendatnguyen-vision/Orbit-symmetrize 上公开。
    Abstract We present a general framework for symmetrizing an arbitrary neural-network architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance between group orbits. This change makes our approach applicable to a broader range of matrix groups, such as the Lorentz group O(1, 3), than these two proposals. We experimentally show our method's competitiveness on the SO(2) image classification task, and also its increased generality on the task with O(1, 3). Our implementation will be made accessible at https://github.com/tiendatnguyen-vision/Orbit-symmetrize.
    摘要 我们提出一个通用的框架,使得任意神经网络架构变为对一个给定群的对称化和可变性满足。我们基于kim et al. (2023)和kaba et al. (2023)的提议,并且改进了它们,将神经特征转换为群表示的步骤替换为一个优化过程,损失函数直观度量群轨迹之间的距离。这种变化使我们的方法可以应用于更广泛的矩阵群,如 Lorentz 群 O(1, 3),而不是这两个提议。我们在 SO(2) 图像分类任务上进行了实验,并证明了我们的方法的竞争力。此外,我们还证明了我们的方法在 O(1, 3) 上的更高一致性。我们的实现将在 GitHub 上提供,地址为

SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

  • paper_url: http://arxiv.org/abs/2311.07141
  • repo_url: None
  • paper_authors: Jiazhi Li, Mahyar Khayatkhoei, Jiageng Zhu, Hanchen Xie, Mohamed E. Hussein, Wael AbdAlmageed
  • for: 避免神经网络依赖保护属性(例如种族、性别、年龄)进行预测是当今发展公平可信的AI的关键。虽然已有许多有效的属性偏好除去方法被提出,但它们的局限性尚未得到充分探讨。为此,在这项工作中,我们数学和实验证明了现有属性偏好除去方法在强偏好情况下的局限性,并提出了一种新的方法可以 Mitigate这种局限性。
  • methods: 我们首先 derive了一个普适的非虚空信息理论上限,表明现有属性偏好除去方法在偏好强度较弱时才能够有效。然后,我们 derive了任何可以除去属性偏好的方法必须满足的必要条件。 inspirited by这个条件,我们则提出了一种新的方法,使用对抗目标函数直接在输入空间中过滤保护属性,不需要任何特定目标标签,并且可以在强偏好和中等偏好情况下达到州际表现。
  • results: 我们对 sintetic、图像和人口普查数据集进行了广泛的实验,以验证 derive的理论上限和其实际效果,以及提出的新方法在强偏好和中等偏好情况下的效果。结果表明,我们的方法可以减少强偏好情况下的属性偏好,并且在中等偏好情况下可以保持现有方法的表现。
    Abstract Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for prediction is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. To that end, in this work, we mathematically and empirically reveal the limitation of existing attribute bias removal methods in presence of strong bias and propose a new method that can mitigate this limitation. Specifically, we first derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength, revealing that they are effective only when the inherent bias in the dataset is relatively weak. Next, we derive a necessary condition for the existence of any method that can remove attribute bias regardless of the bias strength. Inspired by this condition, we then propose a new method using an adversarial objective that directly filters out protected attributes in the input space while maximally preserving all other attributes, without requiring any specific target label. The proposed method achieves state-of-the-art performance in both strong and moderate bias settings. We provide extensive experiments on synthetic, image, and census datasets, to verify the derived theoretical bound and its consequences in practice, and evaluate the effectiveness of the proposed method in removing strong attribute bias.
    摘要 We first derive an upper bound on the performance of any attribute bias removal method in terms of bias strength, showing that they are effective only when the inherent bias in the dataset is relatively weak. We then derive a necessary condition for the existence of any method that can remove attribute bias regardless of strength. Inspired by this condition, we propose a new method using an adversarial objective that directly filters out protected attributes in the input space while preserving all other attributes, without requiring any specific target label.The proposed method achieves state-of-the-art performance in both strong and moderate bias settings. We provide extensive experiments on synthetic, image, and census datasets to verify the derived theoretical bound and its consequences in practice, and evaluate the effectiveness of the proposed method in removing strong attribute bias.

How to Do Machine Learning with Small Data? – A Review from an Industrial Perspective

  • paper_url: http://arxiv.org/abs/2311.07126
  • repo_url: None
  • paper_authors: Ivan Kraljevski, Yong Chul Ju, Dmitrij Ivanov, Constanze Tschöpe, Matthias Wolff
  • for: 本研究旨在探讨机器学习在小数据情况下的应用和工程应用。
  • methods: 本文提出了一种机器学习 formalism,并对小数据的定义、工业应用和机器学习方法进行了简要的介绍。
  • results: 本文介绍了五种机器学习小数据工业应用中的挑战,并对域表示和数据收集的考虑进行了概述。
    Abstract Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational resources that resulted in exponential data growth. However, because of the insufficient amount of data in some cases, employing machine learning in solving complex tasks is not straightforward or even possible. As a result, machine learning with small data experiences rising importance in data science and application in several fields. The authors focus on interpreting the general term of "small data" and their engineering and industrial application role. They give a brief overview of the most important industrial applications of machine learning and small data. Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced. Five critical challenges of machine learning with small data in industrial applications are presented: unlabeled data, imbalanced data, missing data, insufficient data, and rare events. Based on those definitions, an overview of the considerations in domain representation and data acquisition is given along with a taxonomy of machine learning approaches in the context of small data.
    摘要 人工智能在科学、业务和日常生活中经历了技术突破的几十年。这些进步归功于计算资源的急剧增加和缩小,导致数据的快速增长。然而,由于一些情况下数据的缺乏,直接或者even possible的使用机器学习解决复杂任务不是一件容易的事情。因此,机器学习小数据在数据科学和应用领域中升起了重要性。作者将关注“小数据”的通用定义,以及其在工程和产业应用中的作用。他们给出了关于机器学习和小数据的重要工业应用的简要概述。小数据在big data的定义下被定义为具有以下特征:小数据量、高度归一化、缺失数据、缺乏数据和罕见事件。作者还提出了五个关键的机器学习小数据在工业应用中的挑战:无标签数据、偏极数据、缺失数据、缺乏数据和罕见事件。根据这些定义,作者还给出了域表示和数据获取的考虑因素,以及机器学习小数据的稍等分类。

Novel models for fatigue life prediction under wideband random loads based on machine learning

  • paper_url: http://arxiv.org/abs/2311.07114
  • repo_url: None
  • paper_authors: Hong Sun, Yuanying Qiu, Jing Li, Jin Bai, Ming Peng
  • for: 预测轧钢质量寿命
  • methods: 使用三种机器学习模型:支持向量机(SVM)、泊松过程回归(GPR)和人工神经网络(ANN)建立三种宽频质量寿命预测模型,并使用多个频率谱样本和各种质量相关参数提高模型的通用能力。
  • results: 对比传统频率域模型,新开发的机器学习模型具有更高的预测精度,其中人工神经网络模型在三种机器学习模型中表现最佳。
    Abstract Machine learning as a data-driven solution has been widely applied in the field of fatigue lifetime prediction. In this paper, three models for wideband fatigue life prediction are built based on three machine learning models, i.e. support vector machine (SVM), Gaussian process regression (GPR) and artificial neural network (ANN). The generalization ability of the models is enhanced by employing numerous power spectra samples with different bandwidth parameters and a variety of material properties related to fatigue life. Sufficient Monte Carlo numerical simulations demonstrate that the newly developed machine learning models are superior to the traditional frequency-domain models in terms of life prediction accuracy and the ANN model has the best overall performance among the three developed machine learning models.
    摘要 机器学习作为数据驱动解决方案广泛应用于软件衰弱生命预测领域。本文提出了基于三种机器学习模型(支持向量机器、泊松过程回归和人工神经网络)构建三种宽带衰弱生命预测模型,以提高模型通用性。通过使用不同带宽参数和多种相关衰弱生命物理性能的数据采样,提高模型的泛化能力。我们通过充分的蒙特卡洛仿真计算表明,新发展的机器学习模型在生命预测精度方面比传统频率域模型有所提高,而人工神经网络模型在三种机器学习模型中显示出最佳总表现。

Adversarial Purification for Data-Driven Power System Event Classifiers with Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.07110
  • repo_url: None
  • paper_authors: Yuanbin Cheng, Koji Yamashita, Jim Follum, Nanpeng Yu
  • for: 本研究旨在提出一种有效的防御策略,以防止针对机器学习基于PMU数据的电力系统事件分类器的恶意攻击。
  • methods: 该方法包括两步:首先,在PMU数据中添加噪声;其次,使用预训练的神经网络来消除添加的噪声,同时 removelinear perturbations introduced by adversarial attacks。
  • results: 实验结果表明,提议的扩散模型基于的防御策略可以增强事件分类器在恶意攻击下的准确率,同时满足实时操作的需求。另外,理论分析表明,该方法可以减少PMU数据的欧几何距离,从而减少恶意攻击的影响。
    Abstract The global deployment of the phasor measurement units (PMUs) enables real-time monitoring of the power system, which has stimulated considerable research into machine learning-based models for event detection and classification. However, recent studies reveal that machine learning-based methods are vulnerable to adversarial attacks, which can fool the event classifiers by adding small perturbations to the raw PMU data. To mitigate the threats posed by adversarial attacks, research on defense strategies is urgently needed. This paper proposes an effective adversarial purification method based on the diffusion model to counter adversarial attacks on the machine learning-based power system event classifier. The proposed method includes two steps: injecting noise into the PMU data; and utilizing a pre-trained neural network to eliminate the added noise while simultaneously removing perturbations introduced by the adversarial attacks. The proposed adversarial purification method significantly increases the accuracy of the event classifier under adversarial attacks while satisfying the requirements of real-time operations. In addition, the theoretical analysis reveals that the proposed diffusion model-based adversarial purification method decreases the distance between the original and compromised PMU data, which reduces the impacts of adversarial attacks. The empirical results on a large-scale real-world PMU dataset validate the effectiveness and computational efficiency of the proposed adversarial purification method.
    摘要

Exposition on over-squashing problem on GNNs: Current Methods, Benchmarks and Challenges

  • paper_url: http://arxiv.org/abs/2311.07073
  • repo_url: None
  • paper_authors: Dai Shi, Andi Han, Lequan Lin, Yi Guo, Junbin Gao
  • for: 本研究旨在探讨Graph-based message-passing neural networks (MPNNs)中的Over-squashing (OSQ)问题,包括OSQ的不同形式、addressing OSQ的方法和与表达能力之间的关系。
  • methods: 本研究总结了现有literature中对OSQ问题的不同形式和addressing OSQ的方法,包括三类方法:1) node feature transformation, 2) message passing scheme design, 3) graph structure design。
  • results: 本研究评估了现有works中对OSQ问题的解决方案,包括employned empirical methods和computational complexities。此外,本研究还提出了一些未解决的问题,以及可能的解决方案。
    Abstract Graph-based message-passing neural networks (MPNNs) have achieved remarkable success in both node and graph-level learning tasks. However, several identified problems, including over-smoothing (OSM), limited expressive power, and over-squashing (OSQ), still limit the performance of MPNNs. In particular, OSQ serves as the latest identified problem, where MPNNs gradually lose their learning accuracy when long-range dependencies between graph nodes are required. In this work, we provide an exposition on the OSQ problem by summarizing different formulations of OSQ from current literature, as well as the three different categories of approaches for addressing the OSQ problem. In addition, we also discuss the alignment between OSQ and expressive power and the trade-off between OSQ and OSM. Furthermore, we summarize the empirical methods leveraged from existing works to verify the efficiency of OSQ mitigation approaches, with illustrations of their computational complexities. Lastly, we list some open questions that are of interest for further exploration of the OSQ problem along with potential directions from the best of our knowledge.
    摘要 GRAPH-BASED MESSAGE-PASSING NEURAL NETWORKS (MPNNs) 已经取得了优异的成绩在节点和图结构学习任务中。然而,一些已知的问题,包括过滤(OSM)、有限表达力和过滤(OSQ),仍然限制 MPNNs 的表现。特别是 OSQ,最新的已知问题,MPNNs 在需要图节点之间长距离关系时逐渐失去学习精度。在这个工作中,我们提供了 OSQ 问题的概述,包括现有文献中不同形式的 OSQ 问题和Addressing OSQ 问题的三种类型方法。此外,我们还讨论了 OSQ 与表达力之间的对应关系以及 OSQ 和 OSM 之间的贸易。此外,我们还总结了现有工作中用于验证 OSQ 缓解方法的实际方法,包括其计算复杂性。最后,我们列出了一些未解决的问题,以及可能的解决方案。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you need Traditional Chinese, please let me know.

To Transformers and Beyond: Large Language Models for the Genome

  • paper_url: http://arxiv.org/abs/2311.07621
  • repo_url: None
  • paper_authors: Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang
  • for: 本文主要是为了介绍Large Language Models(LLMs)在 genomics 中的应用,以及这些模型在计算生物学和计算机科学领域的转型作用。
  • methods: 本文主要采用 transformer 架构和其他 LLMs 进行 genomics 数据的分析和模型化。
  • results: 本文提出了一种基于 transformer 架构的基因组分析方法,并评估了这种方法在不同的数据集上的性能。Here’s the summary in English:
  • for: The paper primarily focuses on the application of Large Language Models (LLMs) in genomics and their transformative role in computational biology and computer science.
  • methods: The paper mainly uses transformer architecture and other LLMs for genomic data analysis and modeling.
  • results: The paper proposes a gene expression analysis method based on the transformer architecture and evaluates its performance on different data sets.
    Abstract In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore both the strengths and limitations of transformers and other LLMs for genomics. Additionally, we contemplate the future of genomic modeling beyond the transformer architecture based on current trends in research. The paper aims to serve as a guide for computational biologists and computer scientists interested in LLMs for genomic data. We hope the paper can also serve as an educational introduction and discussion for biologists to a fundamental shift in how we will be analyzing genomic data in the future.
    摘要 在高速发展的基因组学领域中,深度学习已经成为解决复杂计算挑战的有用工具。本文集中关注基因组学中的大语言模型(LLMs),主要基于转换器架构。传统的卷积神经网络和循环神经网络的基础上,我们探讨了转换器和其他 LLMS 在基因组学方面的优势和局限性。此外,我们还考虑了未来基因组数据分析的发展趋势,以及在研究中使用 LLMs 的可能性。本文旨在为计算生物学家和计算机科学家提供 LLMs 在基因组数据分析方面的指南,同时也为生物学家提供一种基因组数据分析的基本变革。

A PAC-Bayesian Perspective on the Interpolating Information Criterion

  • paper_url: http://arxiv.org/abs/2311.07013
  • repo_url: None
  • paper_authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
  • for: 本文旨在解决深度学习中的理论实践差距问题,即理论不能提供实践中有用的指导。
  • methods: 本文使用Interpolating Information Criterion(IIC)来研究过参数化模型的性能。
  • results: 根据IIC, authors得出了一个PAC-Bayes bound,可以描述拥有多参数化模型在 interpolating régime中的性能。从这个 bound 中, authors可以量化不同因素对模型的泛化性能产生的影响,包括模型、优化器和参数初始化方案的组合;Empirical Neural Tangent Kernel 的 спектrum;损失函数的曲线形状;和数据中的噪声。
    Abstract Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent contradiction with the well-known bias-variance tradeoff. While such phenomena have proven challenging to theoretically study for general models, the recently proposed Interpolating Information Criterion (IIC) provides a valuable theoretical framework to examine performance for overparameterized models. Using the IIC, a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence generalization performance in the interpolating regime. From the provided bound, we quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, optimizer, and parameter-initialization scheme; the spectrum of the empirical neural tangent kernel; curvature of the loss landscape; and noise present in the data.
    摘要

eess.SP - 2023-11-13

  • paper_url: http://arxiv.org/abs/2311.07722
  • repo_url: None
  • paper_authors: Haochen Li, Zhaolin Wang, Xidong Mu, Zhiwen Pan, Yuanwei Liu
  • For: 这种Integrated Sensing, Positioning, and Communication (ISPAC)框架可以同时为多个通信用户提供服务,并且实现目标探测和定位。* Methods: 该框架使用了一种新的双数组结构,其中一个小规模的协助传输器(AT)附加到了一个大规模的主传输器(MT)以使通信系统具备探测和定位能力。* Results: 研究人员通过提出的框架,首次 derivated了联合角度和距离Cramér-Rao bound(CRB)。然后,CRB是在下降链和上升链ISPAC场景中进行最小化。数值结果表明:1)提出的ISPAC系统可以仅通过单个BS和有限频率来实现目标的探测和定位;2)折衔式混合分析数字ISPAC方法可以与完全数字ISPAC方法匹配或超越其在无需强制通信率要求下的位置准确性。
    Abstract A near-field integrated sensing, positioning, and communication (ISPAC) framework is proposed, where a base station (BS) simultaneously serves multiple communication users and carries out target sensing and positioning. A novel double-array structure is proposed to enable the near-field ISPAC at the BS. Specifically, a small-scale assisting transceiver (AT) is attached to the large-scale main transceiver (MT) to empower the communication system with the ability of sensing and positioning. Based on the proposed framework, the joint angle and distance Cram\'er-Rao bound (CRB) is first derived. Then, the CRB is minimized subject to the minimum communication rate requirement in both downlink and uplink ISPAC scenarios: 1) For downlink ISPAC, a downlink target positioning algorithm is proposed and a penalty dual decomposition (PDD)-based double-loop algorithm is developed to tackle the non-convex optimization problem. 2) For uplink ISPAC, an uplink target positioning algorithm is proposed and an efficient alternating optimization algorithm is conceived to solve the non-convex CRB minimization problem with coupled user communication and target probing design. Both proposed optimization algorithms can converge to a stationary point of the CRB minimization problem. Numerical results show that: 1) The proposed ISPAC system can locate the target in both angle and distance domains merely relying on single BS and limited bandwidths; and 2) the positioning performance achieved by the hybrid-analog-and-digital ISPAC approaches that achieved by fully digital ISPAC when the communication rate requirement is not stringent.
    摘要 提出一种靠近场 интеGRATED SENSING、定位和通信(ISPAC)框架,其中基站(BS)同时服务多个通信用户,并同时进行目标探测和定位。提议了一种双array结构,以启用靠近场 ISPAC。具体来说,一个小规模协助传输器(AT)附加到大规模主传输器(MT)以允许通信系统具备探测和定位能力。基于该框架,首先 derivation 了共振矩阵(CRB)的最小值。然后,CRB 被最小化,并且需要在下行和上行 ISPAC 场景中保持最低通信率要求。具体来说,对于下行 ISPAC,我们提出了一种下行目标定位算法,并使用 penalty dual decomposition(PDD)-based double-loop algorithm 来解决非对称优化问题。对于上行 ISPAC,我们提出了一种上行目标定位算法,并开发了一种高效的 alternate optimization algorithm 来解决非对称CRB最小化问题。两种提出的优化算法都可以 converge 到CRB最小化问题的站点。数值结果显示,提出的 ISPAC 系统可以通过唯一BS和有限频率来定位目标,并且在角度和距离两个域中都可以进行定位。此外,在 Fully digital ISPAC 方法不可能达到的情况下,我们的 hybrid-analog-and-digital ISPAC 方法可以实现更高的定位性能。

Vertiport Navigation Requirements and Multisensor Architecture Considerations for Urban Air Mobility

  • paper_url: http://arxiv.org/abs/2311.07487
  • repo_url: None
  • paper_authors: Omar Garcia Crespillo, Chen Zhu, Maximilian Simonetti, Daniel Gerbeth, Young-Hee Lee, Wenhan Hao
  • for: The paper is written for discussing the navigation technologies required for future safe operation of drones in urban environments, specifically for precision approach operations based on vertiport designs.
  • methods: The paper proposes a possible multisensor navigation architecture solution to support these operations, which includes an overview of the challenges of each subsystem and initial proof of concept based on flight trials.
  • results: The paper presents initial proof of concept for some navigation sensor subsystems based on flight trials performed during the German Aerospace Center (DLR) project HorizonUAM.
    Abstract Communication, Navigation and Surveillance (CNS) technologies are key enablers for future safe operation of drones in urban environments. However, the design of navigation technologies for these new applications is more challenging compared to e.g., civil aviation. On the one hand, the use cases and operations in urban environments are expected to have stringent requirements in terms of accuracy, integrity, continuity and availability. On the other hand, airborne sensors may not be based on high-quality equipment as in civil aviation and solutions need to rely on tighter multisensor solutions, whose safety is difficult to assess. In this work, we first provide some initial navigation requirements related to precision approach operations based on recently proposed vertiport designs. Then, we provide an overview of a possible multisensor navigation architecture solution able to support these types of operations and we comment on the challenges of each of the subsystems. Finally, initial proof of concept for some navigation sensor subsystems is presented based on flight trials performed during the German Aerospace Center (DLR) project HorizonUAM.
    摘要 通信、导航和监测(CNS)技术是未来无人机在都市环境中安全运行的关键促进者。然而,为了满足这些新应用程序的设计,与民航相比, Navigation technologies 的设计更加具有挑战性。一方面,城市环境下的用例和操作具有高度的准确性、完整性、可用性和一致性要求。另一方面,空中探测仪可能不是民航使用的高质量设备,解决方案需要依赖更紧凑的多感器解决方案,其安全性困难评估。在这种情况下,我们首先提供了一些precision approach操作相关的初始导航要求,然后提供了一种可能的多感器导航架构解决方案,并评估了每个子系统的挑战。最后,对一些导航感知子系统的初步证明是基于德国航空中心(DLR)项目HorizonUAM期间进行的飞行试验。

Near-Field Sparse Channel Estimation for Extremely Large-Scale RIS-Aided Wireless Communications

  • paper_url: http://arxiv.org/abs/2311.07249
  • repo_url: None
  • paper_authors: Zixing Tang, Yuanbin Chen, Ying Wang, Tianqi Mao, Qingqing Wu, Marco Di Renzo, Lajos Hanzo
  • for: 该研究旨在提高大规模智能反射surface(XL-RIS)的通信系统中的频率响应。
  • methods: 该研究使用了采集频率响应的两个阶段Channel estimation方法,包括预处理阶段和Channel estimation阶段。
  • results: 研究结果表明,该两个阶段Channel estimation方法可以减少频率响应的不确定性,并提高通信系统的性能。
    Abstract A significant increase in the number of reconfigurable intelligent surface (RIS) elements results in a spherical wavefront in the near field of extremely large-scale RIS (XL-RIS). Although the channel matrix of the cascaded two-hop link may become sparse in the polar-domain representation, their accurate estimation of these polar-domain parameters cannot be readily guaranteed. To tackle this challenge, we exploit the sparsity inherent in the cascaded channel. To elaborate, we first estimate the significant path-angles and distances corresponding to the common paths between the BS and the XL-RIS. Then, the individual path parameters associated with different users are recovered. This results in a two-stage channel estimation scheme, in which distinct learning-based networks are used for channel training at each stage. More explicitly, in stage I, a denoising convolutional neural network (DnCNN) is employed for treating the grid mismatches as noise to determine the true grid index of the angles and distances. By contrast, an iterative shrinkage thresholding algorithm (ISTA) based network is proposed for adaptively adjusting the column coherence of the dictionary matrix in stage II. Finally, our simulation results demonstrate that the proposed two-stage learning-based channel estimation outperforms the state-of-the-art benchmarks.
    摘要 “一个重要的增加在弹性智能表面(RIS)元素数量上,导致在EXTREMELY LARGE-SCALE RIS(XL-RIS)的近场内发射圆锥波front。即使这个通道矩阵在对应的两个统计步骤中可能会受到简化,但它们的精确估计仍然不能得到保证。为了解决这个挑战,我们利用了这个简化的通道矩阵中的组合簇统计特性。具体来说,我们首先估计XL-RIS和BS之间的共同路径角度和距离。然后,对不同的用户而言,个别的路径参数被恢复。这结构成了一个两阶段的通道估计方案,在每个阶段中使用不同的学习型网络进行通道训练。更详细地说,在第一阶段中,我们使用了一个检测网络(DnCNN)来处理网格不一致的问题,以确定真正的网格指标。而在第二阶段中,我们提出了一个调整Column coherence的估计网络(ISTA),以适应不同的用户。最后,我们的实验结果显示,我们的两阶段学习型通道估计方案比STATE-OF-THE-ART的参考模型更高效。”

Time-Frequency Localization Characteristics of the Delay-Doppler Plane Orthogonal Pulse

  • paper_url: http://arxiv.org/abs/2311.07238
  • repo_url: None
  • paper_authors: Akram Shafie, Jinhong Yuan, Nan Yang, Hai Lin
  • for: 这个论文旨在探讨快速移动场景下可靠通信的新方案——orthogonal delay-Doppler(DD)分 multiplexing(ODDM)模ulation。
  • methods: 作者研究了快速变化频率(TF)本地化特性,这个特性可以衡量干扰信号在TF域中能量的分布情况。
  • results: 作者发现,使用这种新的干扰信号模ulation可以利用时间和频率多样性,提高探测精度。此外,作者还提出了一种新的干扰信号设计方法,并通过数值计算证明了这种设计的能量分布在TF域中的步骤增长。
    Abstract The orthogonal delay-Doppler (DD) division multiplexing (ODDM) modulation has recently been proposed as a promising solution for ensuring reliable communications in high mobility scenarios. In this work, we investigate the time-frequency (TF) localization characteristics of the DD plane orthogonal pulse (DDOP), which is the prototype pulse of ODDM modulation. The TF localization characteristics examine how concentrated or spread out the energy of a pulse is in the joint TF domain. We first derive the TF localization metric, TF area (TFA), for the DDOP. Based on this result, we provide insights into the energy spread of the DDOP in the joint TF domain. Then, we delve into the potential advantages of the DDOP due to its energy spread, particularly in terms of leveraging both time and frequency diversities, and enabling high-resolution sensing. Furthermore, we determine the TFA for the recently proposed generalized design of the DDOP. Finally, we validate our analysis based on numerical results and show that the energy spread for the generalized design of the DDOP in the joint TF domain exhibits a step-wise increase as the duration of sub-pulses increases.
    摘要 高速移动场景中可靠通信的新方案是正交延时Doppler分 multiplexing(ODDM)调制。在这个工作中,我们调查了延时频率(TF)本地化特性的DD平面正交普朗(DDOP)。TF本地化特性检查普朗在TF域的能量集中程度。我们首先 derive TF本地化度量(TFA)的DDOP。基于这个结果,我们提供了DDOP在TF域的能量散布的深入理解。然后,我们探讨了DDOP的能量散布在TF域的优点,尤其是利用时间和频率多样性,并实现高分辨率探测。然后,我们确定了Generalized Design of DDOP的TFA。最后,我们 validate our analysis based on numerical results and show that the energy spread for the generalized design of the DDOP in the joint TF domain exhibits a step-wise increase as the duration of sub-pulses increases.

Cooperative Coherent Multistatic Imaging and Phase Synchronization in Networked Sensing

  • paper_url: http://arxiv.org/abs/2311.07212
  • repo_url: None
  • paper_authors: Dario Tagliaferri, Marco Manzoni, Marouan Mizmizi, Stefano Tebaldini, Andrea Virgilio Monti-Guarnieri, Claudio Maria Prati, Umberto Spagnolini
  • for: 这篇论文探讨了车辆雷达网络中的合作协同成像技术,以提高探测脆弱目标的可靠性和分辨率。
  • methods: 论文使用了多个雷达搭载的车辆之间的合作,以提高集体探测Capability和检测脆弱目标的能力。
  • results: 论文表明,通过合作协同成像技术,可以提高探测脆弱目标的可靠性和分辨率,并通过实验示例表明,两辆车辆之间的合作可以检测到静止人体的脚部,并且需要高度准确的时钟同步和探测器的位势准确性来实现。
    Abstract Coherent multistatic radio imaging represents a pivotal opportunity for forthcoming wireless networks, which involves distributed nodes cooperating to achieve accurate sensing resolution and robustness. This paper delves into cooperative coherent imaging for vehicular radar networks. Herein, multiple radar-equipped vehicles cooperate to improve collective sensing capabilities and address the fundamental issue of distinguishing weak targets in close proximity to strong ones, a critical challenge for vulnerable road users protection. We prove the significant benefits of cooperative coherent imaging in the considered automotive scenario in terms of both probability of correct detection, evaluated considering several system parameters, as well as resolution capabilities, showcased by a dedicated experimental campaign wherein the collaboration between two vehicles enables the detection of the legs of a pedestrian close to a parked car. Moreover, as \textit{coherent} processing of several sensors' data requires very tight accuracy on clock synchronization and sensor's positioning -- referred to as \textit{phase synchronization} -- (such that to predict sensor-target distances up to a fraction of the carrier wavelength), we present a general three-step cooperative multistatic phase synchronization procedure, detailing the required information exchange among vehicles in the specific automotive radar context and assessing its feasibility and performance by hybrid Cram\'er-Rao bound.
    摘要 多 static 无线电影像表示未来无线网络的重要机会,即分布式节点合作以实现精确感知和鲁棒性。本文探讨了自动车radar网络中的合作协同准确影像。在这种情况下,多个 радио设备Equipped with vehicles cooperate to improve collective sensing capabilities and address the fundamental issue of distinguishing weak targets in close proximity to strong ones, a critical challenge for vulnerable road users protection. We prove the significant benefits of cooperative coherent imaging in the considered automotive scenario in terms of both probability of correct detection, evaluated considering several system parameters, as well as resolution capabilities, showcased by a dedicated experimental campaign wherein the collaboration between two vehicles enables the detection of the legs of a pedestrian close to a parked car. Moreover, as coherent processing of several sensors' data requires very tight accuracy on clock synchronization and sensor's positioning -- referred to as phase synchronization -- (such that to predict sensor-target distances up to a fraction of the carrier wavelength), we present a general three-step cooperative multistatic phase synchronization procedure, detailing the required information exchange among vehicles in the specific automotive radar context and assessing its feasibility and performance by hybrid Cramér-Rao bound.

Joint Computation and Communication Resource Optimization for Beyond Diagonal UAV-IRS Empowered MEC Networks

  • paper_url: http://arxiv.org/abs/2311.07199
  • repo_url: None
  • paper_authors: Asad Mahmood, Thang X. Vu, Wali Ullah Khan, Symeon Chatzinotas, Björn Ottersten
  • for: 这个论文是为了解决 beyond 5G (B5G) 网络中的覆盖、容量和能源效率等问题,通过实现智能可 configurable 表面 (IRS) 的全面发挥。
  • methods: 这个论文使用了 Beyond Diagonal IRS (BD-IRS) 技术,它是一种可以超越传统的 diagonally phase shift matrix 的新型 IRS 架构,具有更好的传输性能和可控性。
  • results: 这个论文的结果显示,BD-IRS 可以实现更好的系统延迟和数据率,比传统的 diagonally IRS 系统更好。具体来说,该论文的结果显示,BD-IRS 可以 reduc 系统延迟 by 7.25%,并提高数据率 by 17.77%。
    Abstract Intelligent Reconfigurable Surfaces (IRS) are crucial for overcoming challenges in coverage, capacity, and energy efficiency beyond 5G (B5G). The classical IRS architecture, employing a diagonal phase shift matrix, hampers effective passive beamforming manipulation. To unlock its full potential, Beyond Diagonal IRS (BD-IRS or IRS 2.0) emerges as a revolutionary member, transcending limitations of the diagonal IRS. This paper introduces BD-IRS deployed on unmanned aerial vehicles (BD-IRS-UAV) in Mobile Edge Computing (MEC) networks. Here, users offload tasks to the MEC server due to limited resources and finite battery life. The objective is to minimize worst-case system latency by optimizing BD-IRS-UAV deployment, local and edge computational resource allocation, task segmentation, power allocation, and received beamforming vector. The resulting non-convex/non-linear NP-hard optimization problem is intricate, prompting division into two subproblems: 1) BD-IRS-UAV deployment, local and edge computational resources, and task segmentation, and 2) power allocation, received beamforming, and phase shift design. Standard optimization methods efficiently solve each subproblem. Monte Carlo simulations provide numerical results, comparing the proposed BD-IRS-UAV-enabled MEC optimization framework with various benchmarks. Performance evaluations include comparisons with fully-connected and group-connected architectures, single-connected diagonal IRS, and binary offloading, edge computation, fixed computation, and local computation frameworks. Results show a 7.25% lower latency and a 17.77% improvement in data rate with BD-IRS compared to conventional diagonal IRS systems, demonstrating the effectiveness of the proposed optimization framework.
    摘要 智能可重新配置表面(IRS)是5G以外的挑战的关键,包括覆盖率、容量和能源效率。传统的IRS架构使用对角相位Matrix,导致有效的被动扫描干扰不优化。为了解锁其潜力,Beyond Diagonal IRS(BD-IRS)在Mobile Edge Computing(MEC)网络中出现,这是一种革命性的IRS成员,超越传统的IRS架构。本文介绍BD-IRS在无人航空器(BD-IRS-UAV)上的部署,用户因有限的资源和电池寿命而将任务下载到MEC服务器。目标是最小化系统延迟,通过BD-IRS-UAV部署、本地和边缘计算资源分配、任务分割、电力分配和接收扫描向量优化。这是一个非对称/非线性NP困难优化问题,我们将其分解为两个子问题:1)BD-IRS-UAV部署、本地和边缘计算资源和任务分割,2)电力分配、接收扫描和相位偏移设计。标准优化方法可以有效解决每个子问题。Monte Carlo仿真实现了数字结果,并与各种参考模型进行比较。性能评价包括与完全连接和分组连接架构、单连接对角IRS、 binary offloading、边缘计算、固定计算和本地计算框架进行比较。结果显示BD-IRS相比传统对角IRS系统,提供了7.25%的延迟和17.77%的数据速率提升,这说明了我们提posed的优化框架的有效性。

CASTER: A Computer-Vision-Assisted Wireless Channel Simulator for Gesture Recognition

  • paper_url: http://arxiv.org/abs/2311.07169
  • repo_url: https://github.com/rzy0901/testspectrogram
  • paper_authors: Zhenyu Ren, Guoliang Li, Chenqing Ji, Chao Yu, Shuai Wang, Rui Wang
  • for: solves the problem of training dataset acquisition for wireless hand gesture recognition.
  • methods: uses computer-vision-assisted simulation method to simulate the training dataset via existing videos, and uses a primitive-based hand model to calculate the channel impulse response of each snapshot.
  • results: achieves an average classification accuracy of 90.8% in simulation-to-reality inference.Here’s the full text in Simplified Chinese:
  • for: solves the problem of 无线手势识别的训练数据采集.
  • methods: 使用计算机视觉助动的模拟方法,通过现有视频来生成训练数据,并使用基于原型的手模型来计算每帧图像的通道响应。
  • results: 实现了90.8%的实验到现实推理精度.
    Abstract In this paper, a computer-vision-assisted simulation method is proposed to address the issue of training dataset acquisition for wireless hand gesture recognition. In the existing literature, in order to classify gestures via the wireless channel estimation, massive training samples should be measured in a consistent environment, consuming significant efforts. In the proposed CASTER simulator, however, the training dataset can be simulated via existing videos. Particularly, a gesture is represented by a sequence of snapshots, and the channel impulse response of each snapshot is calculated via tracing the rays scattered off a primitive-based hand model. Moreover, CASTER simulator relies on the existing videos to extract the motion data of gestures. Thus, the massive measurements of wireless channel can be eliminated. The experiments demonstrate a 90.8% average classification accuracy of simulation-to-reality inference.
    摘要 在这篇论文中,一种计算机视觉协助的模拟方法被提出,以解决无线手势识别的训练数据采集问题。现有文献中,为通过无线通道估计识别手势,需要投入巨资,测量大量的适合环境。在提议的 CASTER 模拟器中,然而,训练数据可以通过现有的视频进行模拟。具体来说,每个手势被表示为一个序列的快照,并且每个快照的通道响应被计算通过跟踪一个基于primitive的手模型折射的光线。此外,CASTER 模拟器利用现有的视频提取手势的运动数据,因此可以消除大量的无线通道测量。实验表明,模拟到现实的推断精度达90.8%。

Communication-Assisted Sensing in 6G Networks

  • paper_url: http://arxiv.org/abs/2311.07157
  • repo_url: None
  • paper_authors: Fuwang Dong, Fan Liu, Shihang Lu, Yifeng Xiong, Qixun Zhang, Zhiyong Feng
  • for: 本文旨在探讨 sixth-generation (6G) 感知网络中的整合感知通信系统的协调效果,尤其是在通信协助感知 (CAS) 过程中。
  • methods: 本文采用了 rate-distortion 理论和源-通道分离定理 (SCT) 来建立 CAS 框架,并对lossy 数据传输中的干扰、编码率和通道容量之间的交互作用进行了全面的理解。
  • results: 本文提出了两种不同的波形策略:分离的感知通信 (S&C) 波形和双功能波形。其中,分离S&C波形使用了一个简单的一维搜索算法,而双功能波形使用了一种启发式的共声信息优化算法。此外,本文还发现了子空间贸易和水填贸易的存在。最后,通过数字实验 validate 了提议的算法的有效性。
    Abstract The exploration of coordination gain achieved through the synergy of sensing and communication (S&C) functions plays a vital role in improving the performance of integrated sensing and communication systems. This paper focuses on the optimal waveform design for communication-assisted sensing (CAS) systems within the context of 6G perceptive networks. In the CAS process, the base station actively senses the targets through device-free wireless sensing and simultaneously transmits the pertinent information to end-users. In our research, we establish a CAS framework grounded in the principles of rate-distortion theory and the source-channel separation theorem (SCT) in lossy data transmission. This framework provides a comprehensive understanding of the interplay between distortion, coding rate, and channel capacity. The purpose of waveform design is to minimize the sensing distortion at the user end while adhering to the SCT and power budget constraints. In the context of target response matrix estimation, we propose two distinct waveform strategies: the separated S&C and dual-functional waveform schemes. In the former strategy, we develop a simple one-dimensional search algorithm, shedding light on a notable power allocation tradeoff between the S&C waveform. In the latter scheme, we conceive a heuristic mutual information optimization algorithm for the general case, alongside a modified gradient projection algorithm tailored for the scenarios with independent sensing sub-channels. Additionally, we identify the presence of both subspace tradeoff and water-filling tradeoff. Finally, we validate the effectiveness of the proposed algorithms through numerical simulations.
    摘要 探索协调效益通过感知和通信(S&C)函数的结合可以提高整合感知和通信系统的性能。这篇论文专注于6G感知网络中的通信协助感知(CAS)系统的优化波动设计。在CAS过程中,基站通过无设备无线感知来活动检测目标,并同时将相关信息传递给终端用户。在我们的研究中,我们建立了基于率误差理论和源通道分离定理(SCT)的CAS框架,提供了感知误差、编码率和通道容量之间的全面理解。波动设计的目的是在用户端 minimize 感知误差,同时遵循SCT和功率预算约束。在目标响应矩阵估计中,我们提出了两种不同的波动策略:分离S&C和双功能波动 schemes。在前一种策略中,我们开发了一种简单的一维搜索算法,探讨了S&C波动力度的电力分配负担tradeoff。在后一种策略中,我们提出了一种基于通信协助感知的通用情况下的优化幂息算法,并与独立感知子频道的情况下的修改版 gradient projection algorithm。此外,我们发现了两种质量贸易:子空间贸易和水填贸易。最后,我们通过数值仿真验证了提出的算法的效果。

Performance Analysis of Integrated Data and Energy Transfer Assisted by Fluid Antenna Systems

  • paper_url: http://arxiv.org/abs/2311.07134
  • repo_url: None
  • paper_authors: Xiao Lin, Halvin Yang, Yizhe Zhao, Jie Hu, Kai-Kit Wong
  • for: 本研究旨在研究一种带有FAMA助け的IDET系统,其中N个接入点(AP)为N个用户设备(UE)提供专用的IDET服务。每个UE都装备有一个流体天线。
  • methods: 本研究使用时间切换(TS)来分析WDT和WET的性能,包括WDT停机概率、WET停机概率、可靠传输率和平均能量充电量。
  • results: 研究发现,在优化UE数量和TS比例的情况下,FAMA助け的IDET系统可以实现WDT和WET性能的平衡,并且在相同天线大小情况下,FAMA助け的IDET系统比传统MIMO系统表现更好。
    Abstract Fluid antenna multiple access (FAMA) is capable of exploiting the high spatial diversity of wireless channels to mitigate multi-user interference via flexible port switching, which achieves a better performance than traditional multi-input-multi-output (MIMO) systems. Moreover, integrated data and energy transfer (IDET) is able to provide both the wireless data transfer (WDT) and wireless energy transfer (WET) services towards low-power devices. In this paper, a FAMA assisted IDET system is studied, where $N$ access points (APs) provide dedicated IDET services towards $N$ user equipments (UEs). Each UE is equipped with a single fluid antenna. The performance of WDT and WET , \textit{i.e.}, the WDT outage probability, the WET outage probability, the reliable throughput and the average energy harvesting amount, are analysed theoretically by using time switching (TS) between WDT and WET. Numerical results validate our theoretical analysis, which reveals that the number of UEs and TS ratio should be optimized to achieve a trade-off between the WDT and WET performance. Moreover, FAMA assisted IDET achieves a better performance in terms of both WDT and WET than traditional MIMO with the same antenna size.
    摘要 fluid antenna 多元接入 (FAMA) 可以利用无线通信chnnel的高空间多样性来减少多用户干扰,通过flexible port switching实现更好的性能,比traditional 多输入多输出 (MIMO) 系统更好。另外, integral data and energy transfer (IDET) 可以为low-powerdevice提供无线数据传输 (WDT) 和无线能量传输 (WET) 服务。在这篇论文中,我们研究了一种由 N 个Access Points (APs) 提供专门的 IDET 服务,每个用户设备 (UE) 都装备了一个流体天线。我们使用时间switching (TS) 来分析WDT 和 WET 的性能,包括WDT 损失概率、WET 损失概率、可靠传输率和平均能量收集量。我们的理论分析表明,在TS ratio 和用户数量的trade-off下,FAMA 助け的 IDET 系统可以在WDT 和 WET 性能方面达到更好的 equilibria。此外,FAMA 助け的 IDET 系统在相同天线大小情况下比traditional MIMO 系统更好。

Multi-Point Method using Effective Demodulation and Decomposition Techniques allowing Identification of Disturbing Loads in Power Grids

  • paper_url: http://arxiv.org/abs/2311.07129
  • repo_url: None
  • paper_authors: Piotr Kuwałek, Grzegorz Wiczyński
  • for: 本研究旨在提出一种新的电压波动源标识方法,包括考虑本地化,即指定干扰负荷的供应点。
  • methods: 提出的方法包括使用搅动信号估计器来估计搅动信号,并将其分解成各个干扰负荷的组成信号。
  • results: 实验和数字 simulate studies表明,提出的方法可以准确地标识干扰负荷的供应点,并且可以在智能计量基础设施中自动地本地化干扰负荷。
    Abstract The paper presents an innovative approach to the identification of sources of voltage fluctuations in power networks, also considering the localization understood as the indication of supply points of disturbing loads. The presented approach considers disturbance sources that change their operating state with a frequency higher than the power frequency. Implementation of the proposed solution is also proposed in such a way that its implementation in the smart meter infrastructure allows for automatic localization of disturbance sources without additional expert knowledge. In the proposed approach, the modulation signal is estimated using a carrier signal estimator, which allows for the estimation of modulation signal with a frequency higher than the power frequency. The estimated modulating signal is decomposed into component signals associated with individual disturbing loads by decomposition by approximation using pulse waves. The decomposition process allows for the estimation of selected parameters associated with disturbing loads, on the basis of which the assessment of propagation of voltage fluctuations associated with the impact of individual disturbance sources is performed, which allows for the indication of their supply point. The proposed approach was verified in numerical simulation studies using MATLAB/SIMULINK and in experimental studies carried out in a real low-voltage power grid.
    摘要 The method uses a carrier signal estimator to estimate the modulation signal, which is then decomposed into component signals associated with individual disturbing loads using pulse wave decomposition. This allows for the estimation of selected parameters associated with the disturbing loads, enabling the assessment of the propagation of voltage fluctuations and the identification of their supply points.The proposed approach was verified through numerical simulations using MATLAB/SIMULINK and experimental studies in a real low-voltage power grid.

Sum Rate Maximization under AoI Constraints for RIS-Assisted mmWave Communications

  • paper_url: http://arxiv.org/abs/2311.07128
  • repo_url: None
  • paper_authors: Ziqi Guo, Yong Niu, Shiwen Mao, Changming Zhang, Ning Wang, Zhangdui Zhong, Bo Ai
  • for: 本文目的是提高 millimeter wave(mmWave)通信系统中信息新鲜度,并满足用户设备(UE)的信息新鲜度要求。
  • methods: 本文使用了智能表面(RIS)帮助 mmWave 通信系统,并在接收机和发送机中实现了方向性射频。另外,本文还使用了 hierarchical search 方法和本地搜索方法来优化射频和 RIS 反射率。
  • results: simulation 结果显示,提出的算法可以有效地提高系统总速率,同时满足所有 UE 的信息新鲜度要求。
    Abstract The concept of age of information (AoI) has been proposed to quantify information freshness, which is crucial for time-sensitive applications. However, in millimeter wave (mmWave) communication systems, the link blockage caused by obstacles and the severe path loss greatly impair the freshness of information received by the user equipments (UEs). In this paper, we focus on reconfigurable intelligent surface (RIS)-assisted mmWave communications, where beamforming is performed at transceivers to provide directional beam gain and a RIS is deployed to combat link blockage. We aim to maximize the system sum rate while satisfying the information freshness requirements of UEs by jointly optimizing the beamforming at transceivers, the discrete RIS reflection coefficients, and the UE scheduling strategy. To facilitate a practical solution, we decompose the problem into two subproblems. For the first per-UE data rate maximization problem, we further decompose it into a beamforming optimization subproblem and a RIS reflection coefficient optimization subproblem. Considering the difficulty of channel estimation, we utilize the hierarchical search method for the former and the local search method for the latter, and then adopt the block coordinate descent (BCD) method to alternately solve them. For the second scheduling strategy design problem, a low-complexity heuristic scheduling algorithm is designed. Simulation results show that the proposed algorithm can effectively improve the system sum rate while satisfying the information freshness requirements of all UEs.
    摘要 “年龄信息(AoI)概念已经被提出来衡量信息的新鲜度,这对于时间敏感应用非常重要。但是,在 millimeter 波(mmWave)通信系统中,链路堵塞由障碍物和严重的路径损害很大地减少了接收到用户设备(UE)的信息新鲜度。在本文中,我们关注了基于智能表面(RIS)的 mmWave 通信系统,其中在发射器和接收器之间进行方向性射频,并使用 RIS 来抗链路堵塞。我们的目标是通过同时优化发射器的方向性射频、RIS 的反射系数和 UE 的调度策略来最大化系统总吞吐率,同时满足所有 UE 的信息新鲜度要求。为了实现实用解决方案,我们将问题划分为两个互补问题。首先,我们将每个 UE 的数据速率最大化问题划分为发射器的方向性射频优化问题和 RIS 的反射系数优化问题。 compte tenu de la difficulté de l'estimation de canal, nous utilisons la méthode de recherche hiérarchique pour la première et la méthode de recherche locale pour la deuxième, et then we adopt the method of descent coordonné (BCD) pour les résoudre alternativement。其次,我们设计了一种低复杂度的决策算法来解决调度策略设计问题。 simulation results show that the proposed algorithm can effectively improve the system sum rate while satisfying the information freshness requirements of all UEs.”

Secure Wireless Communication via Movable-Antenna Array

  • paper_url: http://arxiv.org/abs/2311.07104
  • repo_url: None
  • paper_authors: Guojie Hu, Qingqing Wu, Kui Xu, Jiangbo Si, Naofal Al-Dhahir
  • for: investigate the MA array-assisted physical-layer security in wireless communication, to maximize the achievable secrecy rate.
  • methods: jointly design the transmit beamforming and positions of all antennas at Alice using projected gradient ascent (PGA) and alternating optimization methods.
  • results: the MA array significantly enhances the secrecy rate compared to the conventional fixed-position antenna (FPA) array, due to the additional spatial degree of freedom (DoF) that can be fully exploited.
    Abstract Movable antenna (MA) array is a novel technology recently developed where positions of transmit/receive antennas can be flexibly adjusted in the specified region to reconfigure the wireless channel and achieve a higher capacity. In this letter, we, for the first time, investigate the MA array-assisted physical-layer security where the confidential information is transmitted from a MA array-enabled Alice to a single-antenna Bob, in the presence of multiple single-antenna and colluding eavesdroppers. We aim to maximize the achievable secrecy rate by jointly designing the transmit beamforming and positions of all antennas at Alice subject to the transmit power budget and specified regions for positions of all transmit antennas. The resulting problem is highly non-convex, for which the projected gradient ascent (PGA) and the alternating optimization methods are utilized to obtain a high-quality suboptimal solution. Simulation results demonstrate that since the additional spatial degree of freedom (DoF) can be fully exploited, the MA array significantly enhances the secrecy rate compared to the conventional fixed-position antenna (FPA) array.
    摘要 移动天线(MA)数组是一种最近发展的新技术,其中传输/接收天线的位置可以在指定区域中flexibly调整,以重新配置无线通道并实现更高的容量。在这封信中,我们第一次调查了MA数组协助物理层安全性,其中秘密信息由MA数组启用的Alice发送给单天线Bob,在多个单天线和合谋伪装者的存在下。我们想要最大化可以实现的机密率,通过对所有发射天线的传输扬射和位置进行joint设计,并且保持传输功率预算和指定的发射天线位置区域。问题非常非 convex,因此我们使用 проекted gradient ascent(PGA)和交互优化方法来获得高质量的不可靠解。实验结果表明,由于可以充分利用额外的空间度量(DoF),MA数组可以明显提高机密率,相比于传统固定位置天线(FPA)数组。

Optimal Configuration of Reconfigurable Intelligent Surfaces with Arbitrary Discrete Phase Shifts

  • paper_url: http://arxiv.org/abs/2311.07096
  • repo_url: None
  • paper_authors: Seyedkhashayar Hashemi, Hai Jiang, Masoud Ardakani
  • for: 这篇论文是关于智能反射 повер的优化问题的研究,目的是 Maximize the channel capacity of the target user.
  • methods: 论文使用了一种基于非均匀频率的RIS元素的模型,并提出了一种最优化问题来找到最优的反射振荡和反射阶段。
  • results: 论文表明,在优化配置下,每个RIS元素都是 either turned off or operates at maximum amplitude,并提供了一种复杂度linear with the number of RIS elements的算法来找到最优的反射振荡和阶段。
    Abstract We address the reflection optimization problem for a reconfigurable intelligent surface (RIS), where the RIS elements feature a set of non-uniformly spaced discrete phase shifts. This is motivated by the actual behavior of practical RIS elements, where it is shown that a uniform phase shift assumption is not realistic. A problem is formulated to find the optimal refection amplitudes and reflection phase shifts of the RIS elements such that the channel capacity of the target user is maximized. We first prove that in the optimal configuration, each RIS element is either turned off or operates at maximum amplitude. We then develop a method that finds the optimal reflection amplitudes and phases with complexity linear in the number of RIS elements. Some new and interesting insight into the reflection optimization problem is also provided.
    摘要 我们处理了对可重配置智能表面(RIS)的反射优化问题,其中RIS元素具有一个非均匀的频率分布的不同阶段阶梯。这是因为实际上的RIS元素行为,其中发现均匀阶梯假设不是现实的。我们形ulated了一个问题,以 Maximize the channel capacity of the target user,找到最佳反射振幅和反射阶梯的解。我们首先证明了,在最佳配置下,每个RIS元素都是 Either turned off or operates at maximum amplitude。然后,我们开发了一种方法,可以在RIS元素数量linear增加的情况下找到最佳反射振幅和阶梯。我们还提供了一些新的和有趣的反射优化问题的思想。

Recursive and non-recursive filters for sequential smoothing and prediction with instantaneous phase and frequency estimation applications

  • paper_url: http://arxiv.org/abs/2311.07089
  • repo_url: None
  • paper_authors: Hugh Lachlan Kennedy
  • for: 这个论文的目的是设计一种可 recursive 和非 recursive 数字 filters,用于跟踪一个指定度量的满足多阶 polynomial 信号,并减少高频噪声的影响。
  • methods: 这个论文使用了一种 fixes-lag smoothing 技术,以实现低复杂性且低频率响应的满足多阶 polynomial 信号跟踪。它还使用了一种预测器(具有一个预测样本),以减少角度旋转错误的影响。
  • results: simulation 结果表明,IIR 过滤器(使用优化的 lag)可以减少角度旋转错误,特别是对于高频率的信号,从而实现低 SNR 下的准确的时延测量和频率测量。此外,IIR 过滤器的错误卷积达到了 FIR 下界,但需要 significatively lower computational cost。
    Abstract A simple procedure for the design of recursive digital filters with an infinite impulse response (IIR) and non-recursive digital filters with a finite impulse response (FIR) is described. The fixed-lag smoothing filters are designed to track an approximately polynomial signal of specified degree without bias at steady state, while minimizing the gain of high-frequency (coloured) noise with a specified power spectral density. For the IIR variant, the procedure determines the optimal lag (i.e. the passband group delay) yielding a recursive low-complexity smoother of low order, with a specified bandwidth, and excellent passband phase linearity. The filters are applied to the problem of instantaneous frequency estimation, e.g. for Doppler-shift measurement, for a complex exponential with polynomial phase progression in additive white noise. For this classical problem, simulations show that the incorporation of a prediction filter (with a one-sample lead) reduces the incidence of (phase or frequency) angle unwrapping errors, particularly for signals with high rates of angle change, which are known to limit the performance of standard FIR estimators at low SNR. This improvement allows the instantaneous phase of low-frequency signals to be estimated, e.g. for time-delay measurement, and/or the instantaneous frequency of frequency-modulated signals, down to a lower SNR. In the absence of unwrapping errors, the error variance of the IIR estimators (with the optimal phase lag) reaches the FIR lower bound, at a significantly lower computational cost. Guidelines for configuring and tuning both FIR and IIR filters are provided.
    摘要 一种简单的过程用于设计回归数字filters(IIR)和非回归数字 filters(FIR)的设计方法。这些固定滤波器用于跟踪一个指定度的 polynomialsignal在稳态状态下无偏,同时最小化高频(色整)噪声的功率 спектル。对于IIR变体,这种过程确定了最佳的延迟(即极频组延迟),以获得一个低复杂度的回归估计器,并且在指定宽度下实现出色的极频阶线性。这些筛子应用于Doppler偏移测量中的快速频率估计问题,例如复杂的指数几何函数在加itive white noise中的极频偏移测量。对于这个 классиссиical问题,Simulations表明,在包含预测筛(带有一个预测)的情况下,可以减少(相对频率或相对频率)角度弯曲错误的发生,特别是对于高速角度变化的信号,这些信号已知会限制标准FIR估计器的性能在低SNR情况下。这种改进允许估计低频信号的极频幅度,例如时间延迟测量和/或极频信号的极频频率估计,下降到更低的SNR。在缺乏弯曲错误的情况下,IIR估计器(与最佳相位延迟)的错误方差达到FIR下界,但是在显著更低的计算成本下。文章还提供了配置和调整FIR和IIR筛子的指南。

Sensing Mutual Information with Random Signals in Gaussian Channels

  • paper_url: http://arxiv.org/abs/2311.07081
  • repo_url: None
  • paper_authors: Lei Xie, Fan Liu, Zhanyuan Xie, Zheng Jiang, Shenghui Song
  • for: 本研究旨在提高感知性能,并研究随机信号的感知性能。
  • methods: 本研究使用随机矩阵理论来计算随机信号下的感知 Shared Information(SMI)的封闭式表达式。然后,使用拓扑推广来优化预处理器。
  • results: 通过实验结果,提出的方法可以提高感知性能。
    Abstract Sensing performance is typically evaluated by classical metrics, such as Cramer-Rao bound and signal-to-clutter-plus-noise ratio. The recent development of the integrated sensing and communication (ISAC) framework motivated the efforts to unify the metric for sensing and communication, where researchers have proposed to utilize mutual information (MI) to measure the sensing performance with deterministic signals. However, the need to communicate in ISAC systems necessitates the use of random signals for sensing applications and the closed-form evaluation for the sensing mutual information (SMI) with random signals is not yet available in the literature. This paper investigates the achievable performance and precoder design for sensing applications with random signals. For that purpose, we first derive the closed-form expression for the SMI with random signals by utilizing random matrix theory. The result reveals some interesting physical insights regarding the relation between the SMI with deterministic and random signals. The derived SMI is then utilized to optimize the precoder by leveraging a manifold-based optimization approach. The effectiveness of the proposed methods is validated by simulation results.
    摘要 感知性能通常通过古典指标评估,如克拉默-瑞托约限和信号噪声比。 reciently,整合感知通信(ISAC)框架的发展激发了研究人员尝试统一感知和通信的度量,其中一些研究人员提议使用共谱(MI)来衡量感知性能。然而,ISAC系统中的通信需求使得感知应用中需要使用随机信号,而关于随机信号的感知共谱(SMI)的关闭形式评估在文献中并没有。这篇论文investigates the achievable performance and precoder design for sensing applications with random signals.为此,我们首先 derivethe closed-form expression for the SMI with random signals by utilizing random matrix theory. The result reveals some interesting physical insights regarding the relation between the SMI with deterministic and random signals. The derived SMI is then utilized to optimize the precoder by leveraging a manifold-based optimization approach. The effectiveness of the proposed methods is validated by simulation results.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Shannon Theory for Wireless Communication in a Resonant Chamber

  • paper_url: http://arxiv.org/abs/2311.07068
  • repo_url: None
  • paper_authors: Amritpal Singh, Thomas Marzetta
    for: 这个研究探讨了在封闭电磁振兴室(RC)中进行无线通信的可能性。methods: 研究使用了对两个天线的阻抗矩阵的分析,并考虑了对天线的传递和获取。results: 研究发现,当RC壁填充完美导电材料时,通信频率将会受到几何对应的影响,并且随着载重抗的增加,频率将会逐渐趋向真值轴。此外,接收器会受到负iah杂音和内部抗杂变的影响。研究还发现,对于定定力和功率限制,频率和功率的分配可以实现无限制的容量。但是,实际分配不会集中在振兴频率上。
    Abstract A closed electromagnetic resonant chamber (RC) is a highly favorable artificial environment for wireless communication. A pair of antennas within the chamber constitutes a two-port network described by an impedance matrix. We analyze communication between the two antennas when the RC has perfectly conducting walls and the impedance matrix is imaginary-valued. The transmit antenna is driven by a current source, and the receive antenna is connected to a load resistor whose voltage is measured by an infinite-impedance amplifier. There are a countably infinite number of poles in the channel, associated with resonance in the RC, which migrate towards the real frequency axis as the load resistance increases. There are two sources of receiver noise: the Johnson noise of the load resistor, and the internal amplifier noise. An application of Shannon theory yields the capacity of the link, subject to bandwidth and power constraints on the transmit current. For a constant transmit power, capacity increases without bound as the load resistance increases. Surprisingly, the capacity-attaining allocation of transmit power versus frequency avoids placing power close to the resonant frequencies.
    摘要 闭合电磁振荡室(RC)是一个非常有利的人工环境,用于无线通信。它中的两个天线组成了一个二端网络,可以通过一个阻抗矩阵来描述。我们分析了它们之间的通信,当RC墙为绝对导电的时候,天线之间的阻抗矩阵为虚数值的时候。传输天线由一个电流源驱动,接收天线连接到一个负担抗的抗器,其电压被一个无限大的抗器检测。频率域中有无数多的极点,与RC的振荡相关,随着负担抗增加,极点逐渐移动到实频轴。接收器的噪声有两种来源:加载抗器的约束噪声,以及内部抗器的噪声。根据雪hnen理论,我们可以计算链路容量,受到带宽和功率限制的情况下。对于Constant transmit power,链路容量会无限增长,随着负担抗增加。很奇怪的是,链路容量最佳分配的发射功率与频率的分配不会将功率集中在振荡频率上。

Clifford Algebra-Based Iterated Extended Kalman Filter with Application to Low-Cost INS/GNSS Navigation

  • paper_url: http://arxiv.org/abs/2311.07049
  • repo_url: None
  • paper_authors: Wei Ouyang, Yutian Wang, Yuanxin Wu
  • for: 提高低成本INS/GNSS结合导航系统的初始pose精度
  • methods: 使用Clifford алгебра表示扩展pose,IMU偏差和杆arm,建立 quasi-group-affine系统,并基于Clifford-RQEKF进行逐步筛选
  • results: 数值仿真和实验表明,所有逐步筛选方法均具有快速全球受欢迎性,而逐步Clifford-RQEKF在特别大的IMU偏差情况下表现较好
    Abstract The traditional GNSS-aided inertial navigation system (INS) usually exploits the extended Kalman filter (EKF) for state estimation, and the initial attitude accuracy is key to the filtering performance. To spare the reliance on the initial attitude, this work generalizes the previously proposed trident quaternion within the framework of Clifford algebra to represent the extended pose, IMU biases and lever arms on the Lie group. Consequently, a quasi-group-affine system is established for the low-cost INS/GNSS integrated navigation system, and the right-error Clifford algebra-based EKF (Clifford-RQEKF) is accordingly developed. The iterated filtering approach is further applied to significantly improve the performances of the Clifford-RQEKF and the previously proposed trident quaternion-based EKFs. Numerical simulations and experiments show that all iterated filtering approaches fulfill the fast and global convergence without the prior attitude information, whereas the iterated Clifford-RQEKF performs much better than the others under especially large IMU biases.
    摘要 传统的GNSS协助导航系统(INS)通常利用扩展卡尔曼 Filter(EKF)进行状态估计,初始Orientation的准确性是过滤性能的关键。为了减少对初始Orientation的依赖,这项工作通过在Clifford алгебра中为扩展pose、IMU偏差和杆臂的表示generalize the previously proposed trident quaternion within the framework of Clifford algebra。因此,一个 quasi-group-affine system是建立了低成本INS/GNSS интеegrated navigation system,并对应地开发了Clifford-RQEKF。iterated filtering approach是进一步应用于显著提高Clifford-RQEKF和之前提出的trident quaternion-based EKFs的性能。数字实验和实验显示,所有的iterated filtering approaches可以在不知情IMU偏差的情况下实现快速和全球化的 converges,而iterated Clifford-RQEKF在特别大的IMU偏差情况下表现较好。

Deep Joint Source Channel Coding With Attention Modules Over MIMO Channels

  • paper_url: http://arxiv.org/abs/2311.07041
  • repo_url: None
  • paper_authors: Weiran Jiang, Wei Chen, Bo Ai
  • for: 提高多输入多出力(MIMO)通道的图像传输性能
  • methods: 使用深度 JOINT SOURCE和通道编码(DJSCC)结构,包括串行结构和平行结构,并使用注意模块来适应不同通道质量
  • results: 实验结果表明提议的DJSCC结构可以提高图像传输性能,并通过非参数 entropy 估计发现系统通过注意模块来调整发送信息的量,以适应不同通道质量
    Abstract In this paper, we propose two deep joint source and channel coding (DJSCC) structures with attention modules for the multi-input multi-output (MIMO) channel, including a serial structure and a parallel structure. With singular value decomposition (SVD)-based precoding scheme, the MIMO channel can be decomposed into various sub-channels, and the feature outputs will experience sub-channels with different channel qualities. In the serial structure, one single network is used at both the transmitter and the receiver to jointly process data streams of all MIMO subchannels, while data steams of different MIMO subchannels are processed independently via multiple sub-networks in the parallel structure. The attention modules in both serial and parallel architectures enable the system to adapt to varying channel qualities and adjust the quantity of information outputs in accordance with the channel qualities. Experimental results demonstrate the proposed DJSCC structures have improved image transmission performance, and reveal the phenomenon via non-parameter entropy estimation that the learned DJSCC transceivers tend to transmit more information over better sub-channels.
    摘要 在这篇论文中,我们提出了两种深度结合源码混合编码(DJSCC)结构,包括串行结构和平行结构,以便对多输入多输出(MIMO)通道进行编码。使用基于SVD(特征值分解)的编码方案,MIMO通道可以被分解成多个子通道,并且特征输出将经历不同通道质量的子通道。在串行结构中,发送端和接收端都使用单个网络进行数据流的同时处理,而在平行结构中,不同的MIMO子通道使用多个子网络进行独立的数据流处理。对各个MIMO子通道进行独立处理的注意模块允许系统适应不同的通道质量,并调整发送的信息量以适应通道质量。实验结果表明,我们提出的DJSCC结构可以提高图像传输性能,并通过非参数 entropy 估计来证明系统学习的DJSCC接收器通常将更多的信息发送到更好的子通道。

A Hybrid Joint Source-Channel Coding Scheme for Mobile Multi-hop Networks

  • paper_url: http://arxiv.org/abs/2311.07028
  • repo_url: None
  • paper_authors: Chenghong Bian, Yulin Shao, Deniz Gunduz
  • for: 这个论文是为了提出一种hybridjoint源�annel编码(JSCC)方案,以便在多� hop 网络中Robust图像传输。
  • methods: 这个方案使用了深度神经网络(DeepJSCC),并将图像分解成多个部分,然后在不同的hop中进行编码。
  • results: 经过数字� simulations,这个方案能够超过完全analog和完全数字方案,并且可以避免“峰值效应”和静止噪声的问题。
    Abstract We propose a novel hybrid joint source-channel coding (JSCC) scheme for robust image transmission over multi-hop networks. In the considered scenario, a mobile user wants to deliver an image to its destination over a mobile cellular network. We assume a practical setting, where the links between the nodes belonging to the mobile core network are stable and of high quality, while the link between the mobile user and the first node (e.g., the access point) is potentially time-varying with poorer quality. In recent years, neural network based JSCC schemes (called DeepJSCC) have emerged as promising solutions to overcome the limitations of separation-based fully digital schemes. However, relying on analog transmission, DeepJSCC suffers from noise accumulation over multi-hop networks. Moreover, most of the hops within the mobile core network may be high-capacity wireless connections, calling for digital approaches. To this end, we propose a hybrid solution, where DeepJSCC is adopted for the first hop, while the received signal at the first relay is digitally compressed and forwarded through the mobile core network. We show through numerical simulations that the proposed scheme is able to outperform both the fully analog and fully digital schemes. Thanks to DeepJSCC it can avoid the cliff effect over the first hop, while also avoiding noise forwarding over the mobile core network thank to digital transmission. We believe this work paves the way for the practical deployment of DeepJSCC solutions in 6G and future wireless networks.
    摘要 我们提出了一种新的混合式 JOINT SOURCE-CHANNEL 编码(JSCC)方案,用于在多个跳代网络上稳定传输图像。在我们考虑的场景中,一个移动用户想要将图像传输到其目标地点,通过移动无线网络。我们假设了一个实际的设定,其中移动用户和首个节点(例如访问点)之间的链路为可能是时间变化的、质量较差的无线链路,而其他节点在移动核心网络中的链路则是稳定的高质量的无线链路。在过去几年中,基于神经网络的JSCC方案(称为深度JSCC)已经 emerged as Promising solutions to overcome the limitations of separation-based fully digital schemes。然而,在Analog transmission中,深度JSCC受到多跳代网络中的噪声积累的限制。此外,大多数内部节点在移动核心网络中的跳代可能是高容量无线连接,需要数字方法。为此,我们提出了一种混合解决方案,其中在首个跳代使用深度JSCC,并将在首个中继器接收的信号数字压缩并传递 через 移动核心网络。我们通过数字实验示例表明,我们的方案能够超越完全数字和完全分析的方案。深度JSCC可以避免首跳效应,同时避免在移动核心网络中传输噪声。我们认为这项工作将在6G和未来的无线网络中实用化深度JSCC解决方案。