cs.CL - 2023-11-21

Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue

  • paper_url: http://arxiv.org/abs/2311.13061
  • repo_url: None
  • paper_authors: Aron Molnar, Jaap Jumelet, Mario Giulianelli, Arabella Sinclair
  • for: 这个论文目的是评估语言模型在对话中重复的水平是否与人类对话水平相似,以及语言模型在理解过程中如何处理语言重复。
  • methods: 这个论文使用了语言模型的生成和理解方法来进行研究。
  • results: 研究发现语言模型在对话中的重复水平和人类对话水平相似,并且在理解过程中使用了类似于人类对话中的语言重复机制。
    Abstract Language models are often used as the backbone of modern dialogue systems. These models are pre-trained on large amounts of written fluent language. Repetition is typically penalised when evaluating language model generations. However, it is a key component of dialogue. Humans use local and partner specific repetitions; these are preferred by human users and lead to more successful communication in dialogue. In this study, we evaluate (a) whether language models produce human-like levels of repetition in dialogue, and (b) what are the processing mechanisms related to lexical re-use they use during comprehension. We believe that such joint analysis of model production and comprehension behaviour can inform the development of cognitively inspired dialogue generation systems.
    摘要 语言模型常被用作现代对话系统的脊梁。这些模型通常在大量的整流文本上进行预训练。在评估语言模型生成时,通常会对重复进行惩罚。然而,重复是对话的关键组成部分。人类在对话中使用本地和伙伴特定的重复,这些重复被人类用户首选,并导致更successful的对话。在这种研究中,我们计划(a)判断语言模型在对话中是否生成出人类水平的重复,以及(b)模型在理解过程中如何处理lexical re-use的机制。我们认为,这种联合分析模型的生产和理解行为可以推导出认知驱动的对话生成系统的发展。

Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark

  • paper_url: http://arxiv.org/abs/2311.13053
  • repo_url: https://github.com/haroldliuj/multiapi
  • paper_authors: Xiao Liu, Jianfeng Lin, Jiawei Zhang
  • for: 推进大语言模型在多modal信息上的能力提升
  • methods: 使用了235种多样化的API调用和2038个contextual prompts进行大语言模型的评估
  • results: 发现大语言模型在API调用决策方面表现出色,但面临域名确定、功能选择和 argue生成等挑战,并且卫星context可能会降低性能。
    Abstract The proliferation of Large Language Models like ChatGPT has significantly advanced language understanding and generation, impacting a broad spectrum of applications. However, these models predominantly excel in text-based tasks, overlooking the complexity of real-world multimodal information. This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset aimed at expanding LLMs' proficiency in multimodal contexts. Developed collaboratively through ChatGPT, MultiAPI consists of 235 diverse API calls and 2,038 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks. Through comprehensive experiments, our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation. What's more, we surprisingly notice that auxiliary context can actually impair the performance. An in-depth error analysis paves the way for a new paradigm to address these challenges, suggesting a potential direction for future LLM research.
    摘要 大量语言模型如ChatGPT的普及已经对语言理解和生成带来了巨大的进步,影响了广泛的应用领域。然而,这些模型主要 excel在文本任务上,忽视了现实世界中的多模式信息的复杂性。本研究提出了MultiAPI,一个创新的大规模API Benchmark数据集,旨在扩展LLMs在多模式Context下的能力。通过与ChatGPT合作开发,MultiAPI包含235种多样化API调用和2038个contextual prompt,为Tool-augmented LLMs在多模式任务中的评估提供了一个独特的平台。经过广泛的实验,我们发现,虽然LLMs在API调用决策方面表现出色,但面临域名确定、功能选择和Arguments生成等挑战。更有意外的是,auxiliary context可能会降低性能。一个深入的错误分析,为未来LLM研究提供了一个新的思路,指向了一个可能的方向。

Systematic word meta-sense extension

  • paper_url: http://arxiv.org/abs/2311.13029
  • repo_url: https://github.com/jadeleiyu/sworme
  • paper_authors: Lei Yu
  • for: 测试和改善语言模型的字面Extension能力,以扩展字面意义到新的semantic domain(也称为meta-sense)。
  • methods: 引入了一个新的任务called systematic word meta-sense extension (SWORME),用于测试和改善语言模型的字面Extension能力。
  • results: 研究发现语言模型倾向于对于相关的semantic domain进行逐步的字面Extension,但在非常不直观的meaning extension中表现较差。提出了一种基于比喻的word meaning extension方法,并证明了这种方法可以有效地提高语言模型的系统性。
    Abstract The meaning of polysemous words often varies in a highly productive yet predictable way. Generalizing the regularity between conventional senses to derive novel word meaning is crucial for automated processing of non-literal language uses such as figurative expressions. We introduce a novel task called systematic word meta-sense extension (SWORME) to test and improve language models' ability to extend word meaning to denote new semantic domains (also called meta-senses) that bear regular semantic relations with existing senses. We found that language models prefer incremental lexical semantic change toward conceptually similar meta-senses such as logical metonymy, and are much worse at predicting highly non-literal meaning extensions such as metaphors. We propose a novel analogy-based method of word meaning extension, and show that it effectively improves language model systematicity in making both gradual and radical types of meta-sense extension. We further demonstrate that learning systematic meta-sense extensions benefits language models on multiple benchmarks of figurative language understanding.
    摘要 文本中的多义词语常有一定的规律性变化,从某种意义上扩展到新的 semantic domain(也称为meta-sense)是自动处理非直译语言用法的关键。我们提出了一项新任务 called systematic word meta-sense extension (SWORME),检测和提高语言模型对word meaning的扩展能力。我们发现语言模型偏好逐步含义变化,即从某种相似的meta-sense中进行逐步扩展,而不是难以预测的非直译含义扩展。我们提出了一种基于对比的方法,并证明其能够有效地提高语言模型的系统性。此外,我们还证明了学习系统atic meta-sense extension可以提高语言模型对figurative language的理解能力。

Data Diversity Matters for Robust Instruction Tuning

  • paper_url: http://arxiv.org/abs/2311.14736
  • repo_url: None
  • paper_authors: Alexander Bukharin, Tuo Zhao
  • for: 本研究目的是Alignment大型自然语言模型, instruction tuningStep中的一个中心挑战是数据集选择,因为数据集的组合可以直接影响下游性能。
  • methods: 我们提出了一个新的算法,即Quality-Diversity Instruction Tuning(QDIT),可以控制数据集的多样性和质量,从而进行深入的研究,了解多样性和质量对于 instrucction following 能力的影响。
  • results: 我们在多个大规模 instruction tuning 数据集上验证了 QDIT 的性能,发现它可以提高 worst-case 性能 by 18%,同时保持或提高 average 性能,相比于质量驱动基elines。
    Abstract Instruction tuning has emerged as a key step in aligning large language models. One of the central challenges of instruction tuning is dataset selection, as the composition of the instruction tuning dataset can significantly impact downstream performance. In particular, researchers have hypothesized that dataset diversity and dataset quality are important indicators of downstream performance. However, it is not clear how to automatically select high quality and diverse data or how exactly quality and diversity affect instruction following ability. To resolve these issues, we propose a new algorithm, Quality-Diversity Instruction Tuning (QDIT). QDIT provides a principled algorithm to control dataset diversity and quality, allowing us to conduct an in depth study on the effect of diversity and quality on instruction tuning performance. From this study we draw two key insights (1) there is a natural tradeoff between dataset diversity and quality and (2) increasing dataset diversity significantly improves the worst case instruction following performance, therefore improving robustness. We validate the performance of QDIT on several large scale instruction tuning datasets, where we find it can improve worst case performance by 18% while maintaining or improving average performance compared to quality driven baselines.
    摘要 <>将文本翻译成简化中文。<>大语言模型对 instrucion 调整已经成为关键的步骤。一个中心的挑战是 instrucion 调整 dataset 选择,因为选择的 dataset 可以对下游性能产生很大的影响。特别是,研究人员假设了 dataset 多样性和 dataset 质量是下游性能的重要指标。然而,不清楚如何自动选择高质量和多样的数据,或者怎样质量和多样性对 instrucion 遵循能力有什么影响。为解决这些问题,我们提出了一个新的算法,即 Quality-Diversity Instruction Tuning (QDIT)。QDIT 提供了一个原理惊动的方法来控制 dataset 多样性和质量,让我们可以进行深入的研究,了解多样性和质量对 instrucion 调整性能的影响。从这个研究中,我们获得了两个关键的结读:(1)多样性和质量之间存在自然的贡献和(2)增加 dataset 多样性可以提高最差情况 instrucion 遵循性能,因此提高了类型的稳定性。我们验证 QDIT 的性能在多个大规模 instrucion 调整 dataset 上,发现它可以提高最差情况性能 by 18% ,同时维持或提高平均性能相比于质量驱动基准。

A Baseline Analysis of Reward Models’ Ability To Accurately Analyze Foundation Models Under Distribution Shift

  • paper_url: http://arxiv.org/abs/2311.14743
  • repo_url: None
  • paper_authors: Ben Pikus, Will LeVine, Tony Chen, Sean Hendryx
  • for: 这 paper 的目的是测试 Large Language Models (LLM) 中 reward model 的Robustness 性能是否受到分布Shift的影响。
  • methods: 这 paper 使用了 Reinforcement Learning with Human Feedback (RLHF) 方法,通过训练 reward model 来对 LLM 进行 aligning。在推理时,使用 reward model 来评估 LLM 的响应是否符合所需的行为。
  • results: 这 paper 的结果表明, reward model 在分布Shift 的情况下的性能会受到影响, Specifically, 通过对 prompts 和 responses 进行 OOD 探测,发现了新的折衔模式和准确率下降。此外,这 paper 还采用了一种常用于分类的 OOD 探测技术,并应用于 reward model Setting 中来探测 prompts 和 responses 的分布Shift。
    Abstract Foundation models, specifically Large Language Models (LLM's), have lately gained wide-spread attention and adoption. Reinforcement Learning with Human Feedback (RLHF) involves training a reward model to capture desired behaviors, which is then used to align an LLM. These reward models are additionally used at inference-time to estimate how well LLM responses adhere to those desired behaviors. However, there is little work measuring how robust these reward models are to distribution shifts. In this work, we evaluate how reward model performance - measured via accuracy and calibration (i.e. alignment between accuracy and confidence) - is affected by distribution shift. We show novel calibration patterns and accuracy drops due to OOD prompts and responses, and that the reward model is more sensitive to shifts in responses than prompts. Additionally, we adapt an OOD detection technique commonly used in classification to the reward model setting in order to detect these distribution shifts in prompts and responses.
    摘要 Foundation models, specifically Large Language Models (LLM), 在最近受到广泛关注和应用。人类反馈学习强化学习(RLHF)通过训练一个奖励模型,以捕捉所需的行为,然后用于对LMM进行对齐。这些奖励模型在推理时还用于估计LLM响应是否符合所需的行为。然而,有很少关于这些奖励模型对分布变化的抗锋性的研究。在这个工作中,我们评估了奖励模型性能(通过准确率和自信度对接)受到分布变化的影响。我们发现了新的准确模式和自信度下降,它们是由外部提示和响应引起的。此外,我们还采用了一种常用于分类的 OUTSIDE(OOD)探测技术,以检测提示和响应中的分布变化。

LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource Sentiment Analysis of Bangla Language

  • paper_url: http://arxiv.org/abs/2311.12735
  • repo_url: https://github.com/aunabil4602/bnlp-workshop-task2-2023
  • paper_authors: Aunabil Chakma, Masum Hasan
  • for: 本研究的目的是描述在BLP-2023任务2中进行 sentiment analysis 的系统,该任务涉及处理来自多种社交媒体平台的公共帖子和评论。
  • methods: 本研究使用了多种策略,包括细化、随机删除 tokens、以及使用多个外部数据集,以使用 BanglaBert 模型进行预处理。
  • results: 本研究的最终模型是 ensemble 的三个最佳 BanglaBert 变化,其中在 Test Set 中得分为 0.718,在 30 个参与队伍中排名第三。 Additionally, the paper discusses promising systems that did not perform well, including task-adaptive pertaining and paraphrasing using BanglaT5.
    Abstract This paper describes the system of the LowResource Team for Task 2 of BLP-2023, which involves conducting sentiment analysis on a dataset composed of public posts and comments from diverse social media platforms. Our primary aim is to utilize BanglaBert, a BERT model pre-trained on a large Bangla corpus, using various strategies including fine-tuning, dropping random tokens, and using several external datasets. Our final model is an ensemble of the three best BanglaBert variations. Our system has achieved overall 3rd in the Test Set among 30 participating teams with a score of 0.718. Additionally, we discuss the promising systems that didn't perform well namely task-adaptive pertaining and paraphrasing using BanglaT5. Training codes and external datasets which are used for our system are publicly available at https://github.com/Aunabil4602/bnlp-workshop-task2-2023
    摘要 这份报告介绍了我们在BLP-2023任务2中的系统,该任务涉及对社交媒体平台上的公共帖子和评论进行情感分析。我们的主要目标是使用预先训练的BanglaBert模型,包括微调、随机drop tokens以及多个外部数据集,以实现最佳性能。我们的最终模型是 ensemble of 三个最佳 BanglaBert 变种。我们的系统在30个参与者队伍中的测试集中得到了总第三名,得分为0.718。此外,我们还讨论了未能表现好的系统,包括任务适应性的归并和重复使用 BanglaT5。我们使用的训练代码和外部数据集在https://github.com/Aunabil4602/bnlp-workshop-task2-2023上公开可用。

Soft Random Sampling: A Theoretical and Empirical Analysis

  • paper_url: http://arxiv.org/abs/2311.12727
  • repo_url: None
  • paper_authors: Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei Zhang, George Saon, Brian Kingsbury
  • for: 大规模深度神经网络的高效训练
  • methods: 随机抽样法 (Soft Random Sampling, SRS)
  • results: 1. 数据覆盖率和占用率的分析; 2. 非 convex 目标函数的收敛率; 3. 泛化性性能Here’s a more detailed explanation of each point:1. What the paper is written for: The paper is written for training large-scale deep neural networks, specifically using the Soft Random Sampling (SRS) method for efficient training.2. What methods the paper uses: The paper uses SRS, a simple yet effective approach for efficient training of large-scale deep neural networks.3. What results the paper gets: The paper provides a theoretical and empirical analysis of SRS, including its sampling dynamics, convergence rate, and generalization performance. The results show that SRS offers a better accuracy-efficiency trade-off compared to existing coreset-based data selection methods, especially on real-world industrial scale data sets.
    Abstract Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data. SRS selects a subset uniformly at random with replacement from the full data set in each epoch. In this paper, we conduct a theoretical and empirical analysis of SRS. First, we analyze its sampling dynamics including data coverage and occupancy. Next, we investigate its convergence with non-convex objective functions and give the convergence rate. Finally, we provide its generalization performance. We empirically evaluate SRS for image recognition on CIFAR10 and automatic speech recognition on Librispeech and an in-house payload dataset to demonstrate its effectiveness. Compared to existing coreset-based data selection methods, SRS offers a better accuracy-efficiency trade-off. Especially on real-world industrial scale data sets, it is shown to be a powerful training strategy with significant speedup and competitive performance with almost no additional computing cost.
    摘要 <>随机抽样(SRS)是一种简单 yet有效的方法,用于大规模深度神经网络的高效训练,特别是面临庞大数据时。SRS每 epoch 选择全数据集中的一 subset uniform random sampling with replacement。在这篇论文中,我们进行了样本动力学的分析,包括数据覆盖率和占用率。然后,我们研究了它对非对称目标函数的收敛性和收敛速率。最后,我们评估了它的泛化性。我们对 CIFAR10 和 Librispeech 等图像识别和自然语音识别 benchmark 进行了实验评估,以示其效果。相比现有的核心集基本数据选择方法,SRS 提供了更好的准确率-效率交易。特别是在实际工业规模的数据集上,它被证明是一种强大的训练策略,具有显著的速度减少和竞争性能,而无需额外计算成本。

Fair Text Classification with Wasserstein Independence

  • paper_url: http://arxiv.org/abs/2311.12689
  • repo_url: https://github.com/letenothibaud/wasserstein_fair_classification
  • paper_authors: Thibaud Leteno, Antoine Gourru, Charlotte Laclau, Rémi Emonet, Christophe Gravier
  • for: 这篇论文主要旨在提高文本分类模型的公平性,尤其是在敏感群体(例如男女)之间实现公平对待的挑战。
  • methods: 本论文提出了一种基于 Wasserstein 独立的方法,以避免在文本Encoder中吸收不公平的资讯。这种方法不需要训练和测试敏感特征的标签,与现有方法不同。
  • results: 本论文的方法可以实现与现有方法相同或更好的公平精度调和。
    Abstract Group fairness is a central research topic in text classification, where reaching fair treatment between sensitive groups (e.g. women vs. men) remains an open challenge. This paper presents a novel method for mitigating biases in neural text classification, agnostic to the model architecture. Considering the difficulty to distinguish fair from unfair information in a text encoder, we take inspiration from adversarial training to induce Wasserstein independence between representations learned to predict our target label and the ones learned to predict some sensitive attribute. Our approach provides two significant advantages. Firstly, it does not require annotations of sensitive attributes in both testing and training data. This is more suitable for real-life scenarios compared to existing methods that require annotations of sensitive attributes at train time. Second, our approach exhibits a comparable or better fairness-accuracy trade-off compared to existing methods.
    摘要 “集体公平”是文本分类研究的中心话题,即在敏感群体(例如男女)之间实现公正对待仍然是一个未解决的挑战。本文提出了一种新的方法来减少文本分类模型中的偏见,不受模型体系影响。因为Difficult to distinguish fair from unfair information in a text encoder,我们从对抗训练中得到了灵感,通过 Wasserstein independence between representations learned to predict our target label and the ones learned to predict some sensitive attribute,来降低偏见。我们的方法具有两大优点:一是不需要在测试和训练数据中标注敏感属性。这比现有方法更适合实际场景,因为现有方法需要在训练时标注敏感属性。二是,我们的方法在公平精度质量之间具有相当或更好的质量比。

MathGloss: Building mathematical glossaries from text

  • paper_url: http://arxiv.org/abs/2311.12649
  • repo_url: None
  • paper_authors: Lucy Horowitz, Valeria de Paiva
  • for: 这个项目的目的是自动地使用现代自然语言处理工具和资源,创建一个高等数学知识图(KG),以便每名数学家可以根据自己的偏好自定义学习。
  • methods: 这个项目使用了五种资源:wikidata、大学科学课程的涵盖词、法国高等数学课程的讲义和自动证明工具lean4、一个多语言数学词典(MuLiMa),以及一个由数学家curate的类征论义Wiki(nLab)。
  • results: 这个项目的结果是一个联结了各种学习数学资源的知识图,可以让每名数学家根据自己的偏好自定义学习,并且可以使得数学家和正式工具专家更容易互相理解,缓解一些正式数学和计算机科学之间的障碍。
    Abstract MathGloss is a project to create a knowledge graph (KG) for undergraduate mathematics from text, automatically, using modern natural language processing (NLP) tools and resources already available on the web. MathGloss is a linked database of undergraduate concepts in mathematics. So far, it combines five resources: (i) Wikidata, a collaboratively edited, multilingual knowledge graph hosted by the Wikimedia Foundation, (ii) terms covered in mathematics courses at the University of Chicago, (iii) the syllabus of the French undergraduate mathematics curriculum which includes hyperlinks to the automated theorem prover Lean 4, (iv) MuLiMa, a multilingual dictionary of mathematics curated by mathematicians, and (v) the nLab, a wiki for category theory also curated by mathematicians. MathGloss's goal is to bring together resources for learning mathematics and to allow every mathematician to tailor their learning to their own preferences. Moreover, by organizing different resources for learning undergraduate mathematics alongside those for learning formal mathematics, we hope to make it easier for mathematicians and formal tools (theorem provers, computer algebra systems, etc) experts to "understand" each other and break down some of the barriers to formal math.
    摘要 MathGloss是一个项目,旨在自动地使用现代自然语言处理工具和资源,创建高等数学知识 graphs(KG),以便为大学生数学学习提供一个链接数据库。MathGloss是一个将多种资源集成在一起的数学概念链接数据库。当前,MathGloss已经 combinated five resources:(i)Wikidata,一个由wikimedia基金会共同编辑的多语言知识图;(ii)University of Chicago的数学课程中覆盖的概念;(iii)法国大学数学课程的讲义,包括Lean 4自动证明器的链接;(iv)MuLiMa,一个由数学家维护的多语言数学词典;以及(v)nLab,由数学家维护的category theorywiki。MathGloss的目标是将学习数学资源集中起来,让每个数学家可以根据自己的喜好自定义学习。此外,通过将不同的学习数学资源与学习正式数学资源一起排序,我们希望可以让数学家和正式工具专家更好地“理解” each other,缓解一些正式数学和计算机科学之间的障碍。

Evaluation Metrics of Language Generation Models for Synthetic Traffic Generation Tasks

  • paper_url: http://arxiv.org/abs/2311.12534
  • repo_url: None
  • paper_authors: Simone Filice, Jason Ingyu Choi, Giuseppe Castellucci, Eugene Agichtein, Oleg Rokhlenko
  • for: 这篇论文的目的是为了提出和评估语言生成(NLG)任务中的多个输出文本生成技术。
  • methods: 这篇论文使用了常见的NLG指标,如BLEU指标,来评估生成的文本质量。然而,这些指标并不适合用于评估生成的交通数据。因此,本文提出了一些适合的指标,并对它们进行了评估。
  • results: 本文的实验结果表明,提出的指标能够更好地评估生成的交通数据质量,并且与人工评估结果相吻合度提高了20%。这些结果表明,这些指标可以用于更好地估计生成的文本数据的代表性。
    Abstract Many Natural Language Generation (NLG) tasks aim to generate a single output text given an input prompt. Other settings require the generation of multiple texts, e.g., for Synthetic Traffic Generation (STG). This generation task is crucial for training and evaluating QA systems as well as conversational agents, where the goal is to generate multiple questions or utterances resembling the linguistic variability of real users. In this paper, we show that common NLG metrics, like BLEU, are not suitable for evaluating STG. We propose and evaluate several metrics designed to compare the generated traffic to the distribution of real user texts. We validate our metrics with an automatic procedure to verify whether they capture different types of quality issues of generated data; we also run human annotations to verify the correlation with human judgements. Experiments on three tasks, i.e., Shopping Utterance Generation, Product Question Generation and Query Auto Completion, demonstrate that our metrics are effective for evaluating STG tasks, and improve the agreement with human judgement up to 20% with respect to common NLG metrics. We believe these findings can pave the way towards better solutions for estimating the representativeness of synthetic text data.
    摘要 很多自然语言生成(NLG)任务的目标是生成一个输入提示的唯一输出文本。然而,其他情况需要生成多个文本,例如神经网络交通生成(STG)。这种生成任务对于训练和评估问答系统以及对话代理人来说非常重要,因为它的目标是生成多个问题或谈话的语言多样性,类似于真实用户的语言变量。在这篇论文中,我们表明了常见的NLG指标,如BLEU,不适用于评估STG。我们提出并评估了一些适用于比较生成的交通与真实用户文本分布的指标。我们使用自动程序来验证这些指标是否捕捉了不同类型的质量问题,并进行人工标注来验证与人类判断的相关性。在购物问题生成、产品问题生成和查询自动完成三个任务中,我们的指标显示了对STG任务的有效性,并与常见NLG指标相比提高了人类判断的吻合率达20%。我们认为这些发现可能会推动更好的代理人数据的可 represencing性估计。

  • paper_url: http://arxiv.org/abs/2311.12489
  • repo_url: None
  • paper_authors: Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser, Hinrich Schütze
  • for: 本研究旨在提高低资源语言(<5M tokens)和中等资源语言(<50M)的多语言自然语言处理(NLP)性能。
  • methods: 我们提出了一种语言链基本方法,通过在资源充沛的源语言为起点,逐渐添加每种语言,直到达到目标语言。我们还扩展了半共同双语方法,以消除前一代工作中的主要弱点,即独立训练的单语言表示。
  • results: 我们在4种语言家族中进行了双语词典推导,包括4种very low-resource(<5M tokens)和4种中等资源(<50M)目标语言,并显示了这些方法的改进性能。此外,我们的分析表明,中间语言的质量也是重要的,以及在多语言空间中所有语言的引导点都很重要。
    Abstract Very low-resource languages, having only a few million tokens worth of data, are not well-supported by multilingual NLP approaches due to poor quality cross-lingual word representations. Recent work showed that good cross-lingual performance can be achieved if a source language is related to the low-resource target language. However, not all language pairs are related. In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target. We build MWEs one language at a time by starting from the resource rich source and sequentially adding each language in the chain till we reach the target. We extend a semi-joint bilingual approach to multiple languages in order to eliminate the main weakness of previous works, i.e., independently trained monolingual embeddings, by anchoring the target language around the multilingual space. We evaluate our method on bilingual lexicon induction for 4 language families, involving 4 very low-resource (<5M tokens) and 4 moderately low-resource (<50M) target languages, showing improved performance in both categories. Additionally, our analysis reveals the importance of good quality embeddings for intermediate languages as well as the importance of leveraging anchor points from all languages in the multilingual space.
    摘要 非常低资源语言,具有只有几百万个字的数据,由于跨语言word表示质量不高,不太适合多语言NLP方法。然而,recent work表明,如果源语言与目标语言之间存在关系,那么可以实现良好的跨语言性能。然而,不 всех语言对不是相关的。在这篇论文中,我们提议通过一种语言链模型来构建多语言词嵌入(MWEs),该模型通过在资源充沛的源语言和目标语言之间添加一系列相关的语言来桥接这两个语言。我们逐一构建MWEs,从源语言开始,逐渐添加每种语言,直到达到目标语言。我们将多语言semi-联合方法扩展到多种语言,以消除之前工作中的主要弱点,即独立训练的单语言嵌入。我们对4种语言家族的双语词典生成进行评估,包括4种非常低资源(<5M tokens)和4种moderately low-resource(<50M)目标语言,并显示我们的方法在这两个类型的目标语言上都有提高的表现。此外,我们的分析表明,中间语言的质量也是非常重要,以及在多语言空间中使用所有语言的抓点。

Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

  • paper_url: http://arxiv.org/abs/2311.12480
  • repo_url: None
  • paper_authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
  • for: 本研究旨在提高视觉语音识别系统的质量,特别是在 speaker-dependent 的情况下。
  • methods: 本研究使用了 Spain LIP-RTVE 数据库,并提出了不同的适应策略,包括 fine-tuning 技术。基于 CTC/Attention 架构的预训练模型也被用作参考。
  • results: 研究发现,通过两步 fine-tuning 过程,首先将 VSR 系统适应到任务域,可以获得显著改善。即使只有有限的数据available,也可以达到与当前状态艺术水平相当的结果。
    Abstract Different studies have shown the importance of visual cues throughout the speech perception process. In fact, the development of audiovisual approaches has led to advances in the field of speech technologies. However, although noticeable results have recently been achieved, visual speech recognition remains an open research problem. It is a task in which, by dispensing with the auditory sense, challenges such as visual ambiguities and the complexity of modeling silence must be faced. Nonetheless, some of these challenges can be alleviated when the problem is approached from a speaker-dependent perspective. Thus, this paper studies, using the Spanish LIP-RTVE database, how the estimation of specialized end-to-end systems for a specific person could affect the quality of speech recognition. First, different adaptation strategies based on the fine-tuning technique were proposed. Then, a pre-trained CTC/Attention architecture was used as a baseline throughout our experiments. Our findings showed that a two-step fine-tuning process, where the VSR system is first adapted to the task domain, provided significant improvements when the speaker adaptation was addressed. Furthermore, results comparable to the current state of the art were reached even when only a limited amount of data was available.
    摘要 不同的研究已经证明视觉cue在语音识别过程中的重要性。实际上,audiovisualapproach的发展在语音技术领域取得了进步。然而,虽然最近获得了显著的成果,视觉语音识别仍然是一个打开的研究问题。这是一个无听觉感的任务,面临着视觉歧义和模型 silence 的复杂性。然而,通过 speaker-dependent 的视角,一些这些挑战可以得到改善。因此,本文使用西班牙LIP-RTVE数据库,研究了特定人士的特циалиzed 终端系统的估计对语音识别质量的影响。首先,不同的适应策略基于细致调整技术被提议。然后,用作基线的预训练 CTC/Attention 架构被使用。我们的发现表明,在任务领域中首先对 VSR 系统进行了二步细致调整,可以提供显著改善。此外,只使用有限的数据也可以达到与当前状态的艺术水平。

CSMeD: Bridging the Dataset Gap in Automated Citation Screening for Systematic Literature Reviews

  • paper_url: http://arxiv.org/abs/2311.12474
  • repo_url: https://github.com/wojciechkusa/systematic-review-datasets
  • paper_authors: Wojciech Kusa, Oscar E. Mendoza, Matthias Samwald, Petr Knoth, Allan Hanbury
  • for: This paper aims to address the lack of standardized evaluation datasets for automated literature screening systems in systematic literature reviews (SLRs).
  • methods: The authors analyze citation screening evaluation datasets and introduce a new meta-dataset called CSMeD, which consolidates nine publicly released collections of SLRs from medicine and computer science.
  • results: The authors introduce a new dataset called CSMeD-FT for evaluating full text publication screening tasks and conduct experiments to demonstrate the utility of CSMeD.
    Abstract Systematic literature reviews (SLRs) play an essential role in summarising, synthesising and validating scientific evidence. In recent years, there has been a growing interest in using machine learning techniques to automate the identification of relevant studies for SLRs. However, the lack of standardised evaluation datasets makes comparing the performance of such automated literature screening systems difficult. In this paper, we analyse the citation screening evaluation datasets, revealing that many of the available datasets are either too small, suffer from data leakage or have limited applicability to systems treating automated literature screening as a classification task, as opposed to, for example, a retrieval or question-answering task. To address these challenges, we introduce CSMeD, a meta-dataset consolidating nine publicly released collections, providing unified access to 325 SLRs from the fields of medicine and computer science. CSMeD serves as a comprehensive resource for training and evaluating the performance of automated citation screening models. Additionally, we introduce CSMeD-FT, a new dataset designed explicitly for evaluating the full text publication screening task. To demonstrate the utility of CSMeD, we conduct experiments and establish baselines on new datasets.
    摘要

Analysis of Visual Features for Continuous Lipreading in Spanish

  • paper_url: http://arxiv.org/abs/2311.12468
  • repo_url: None
  • paper_authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
  • for: 本研究旨在提高自动视觉语音识别系统的性能,通过分析不同的视觉语音特征来选择最佳的视觉特征来捕捉舌面运动的本质。
  • methods: 本研究使用了传统的隐马尔可夫模型和 Gaussian Mixture Models,并使用了 eigenlips 和深度特征的组合来解决自动视觉语音识别任务。
  • results: 研究结果表明,尽管任务具有挑战性,但在限定条件下,使用 eigenlips 和深度特征的组合可以达到较高的识别精度。
    Abstract During a conversation, our brain is responsible for combining information obtained from multiple senses in order to improve our ability to understand the message we are perceiving. Different studies have shown the importance of presenting visual information in these situations. Nevertheless, lipreading is a complex task whose objective is to interpret speech when audio is not available. By dispensing with a sense as crucial as hearing, it will be necessary to be aware of the challenge that this lack presents. In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task. In order to estimate our system, we present an audiovisual corpus compiled from a subset of the RTVE database, which has been used in the Albayz\'in evaluations. We employ a traditional system based on Hidden Markov Models with Gaussian Mixture Models. Results show that, although the task is difficult, in restricted conditions we obtain recognition results which determine that using eigenlips in combination with deep features is the best visual approach.
    摘要

LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild

  • paper_url: http://arxiv.org/abs/2311.12457
  • repo_url: https://github.com/david-gimeno/lip-rtve
  • paper_authors: David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos
  • for: 这篇论文主要是为了提高自动语音识别系统的稳定性和准确性,通过结合音频和视觉cue来表征语音。
  • methods: 这篇论文使用了Hidden Markov Models(隐马尔可夫模型),这是传统的Speech Technologies领域中广泛使用的一种方法。
  • results: 这篇论文提出了一个 semi-自动标注的 audiovisual 数据库,提供了13小时的自然西班牙语话语数据,并在speaker-dependent和speaker-independent两种enario下报告了基准结果。
    Abstract Speech is considered as a multi-modal process where hearing and vision are two fundamentals pillars. In fact, several studies have demonstrated that the robustness of Automatic Speech Recognition systems can be improved when audio and visual cues are combined to represent the nature of speech. In addition, Visual Speech Recognition, an open research problem whose purpose is to interpret speech by reading the lips of the speaker, has been a focus of interest in the last decades. Nevertheless, in order to estimate these systems in the currently Deep Learning era, large-scale databases are required. On the other hand, while most of these databases are dedicated to English, other languages lack sufficient resources. Thus, this paper presents a semi-automatically annotated audiovisual database to deal with unconstrained natural Spanish, providing 13 hours of data extracted from Spanish television. Furthermore, baseline results for both speaker-dependent and speaker-independent scenarios are reported using Hidden Markov Models, a traditional paradigm that has been widely used in the field of Speech Technologies.
    摘要 《speech是一种多Modal的过程,听见和视觉是两个基本柱子。实际上,许多研究表明,自动听说识别系统的可靠性可以通过将音频和视觉提示结合来提高。此外,视觉听说识别,一个长期受到关注的问题,目前在深度学习时代仍然是一个开放的研究问题。然而,大多数数据库都是专门为英语而设计,其他语言则缺乏资源。因此,本文提供了一个半自动注释的audiovisual数据库,用于处理无结构化的自然西班牙语,提供13小时的数据,来自西班牙电视。此外,基线结果也被报告,包括 speaker-dependent和speaker-independent场景下的基线结果,使用传统的隐马尔可夫模型。

Visual Analytics for Generative Transformer Models

  • paper_url: http://arxiv.org/abs/2311.12418
  • repo_url: None
  • paper_authors: Raymond Li, Ruixin Yang, Wen Xiao, Ahmed AbuRaed, Gabriel Murray, Giuseppe Carenini
  • for: 这个论文旨在支持分析转换器基于模型的可读性。
  • methods: 该 Framework 使用可交互的视觉化方式,以便用户可以轻松地探索不同的模型方面。
  • results: 作者通过三个实际的 NLP 研究问题来证明该 Framework 的可行性和有用性。
    Abstract While transformer-based models have achieved state-of-the-art results in a variety of classification and generation tasks, their black-box nature makes them challenging for interpretability. In this work, we present a novel visual analytical framework to support the analysis of transformer-based generative networks. In contrast to previous work, which has mainly focused on encoder-based models, our framework is one of the first dedicated to supporting the analysis of transformer-based encoder-decoder models and decoder-only models for generative and classification tasks. Hence, we offer an intuitive overview that allows the user to explore different facets of the model through interactive visualization. To demonstrate the feasibility and usefulness of our framework, we present three detailed case studies based on real-world NLP research problems.
    摘要 “transformer-based模型在多种分类和生成任务中实现了状态的最佳结果,但它们的黑盒特性使得它们对解释性不便。在这项工作中,我们提出了一种新的视觉分析框架,用于支持transformer-based生成网络的分析。与之前的工作主要集中在encoder-based模型上,我们的框架是对transformer-based encoder-decoder模型和decoder-only模型的生成和分类任务进行了首次专门的支持。因此,我们提供了一个直观的概述,让用户可以通过交互式视觉化来探索不同方面的模型。为证明我们的框架的可行性和实用性,我们在三个具体的案例研究中提供了真实的NL表示问题。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages

  • paper_url: http://arxiv.org/abs/2311.12405
  • repo_url: None
  • paper_authors: Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Genta Indra Winata, Pascale Fung, Ayu Purwarianti
  • for: 研究INDONESIAN NLP中的混合现象
  • methods: 使用四种扩展语言(英文、 Sundanese、 Javanese 和 Malay)进行混合语言研究,并提出 IndoRobusta 框架来评估和改进混合语言稳定性
  • results: 研究发现预训练词汇偏见影响模型对INDONESIAN-ENGLISH混合语言的处理能力,即使INDONESIAN-ENGLISH混合语言在日常对话中更为常见Here’s the English version of the three key points for reference:
  • for: Exploring code-mixing in Indonesian with four embedded languages
  • methods: Using four extended languages (English, Sundanese, Javanese, and Malay) to study code-mixing, and introducing the IndoRobusta framework to evaluate and improve code-mixing robustness
  • results: Findings show that pre-training corpus bias affects the model’s ability to handle Indonesian-English code-mixing, despite higher language diversity in Indonesian-English code-mixing in daily conversation.
    Abstract Significant progress has been made on Indonesian NLP. Nevertheless, exploration of the code-mixing phenomenon in Indonesian is limited, despite many languages being frequently mixed with Indonesian in daily conversation. In this work, we explore code-mixing in Indonesian with four embedded languages, i.e., English, Sundanese, Javanese, and Malay; and introduce IndoRobusta, a framework to evaluate and improve the code-mixing robustness. Our analysis shows that the pre-training corpus bias affects the model's ability to better handle Indonesian-English code-mixing when compared to other local languages, despite having higher language diversity.
    摘要 “印尼语言处理(NLP)做出了重要的进步。然而,研究印尼语言中的混合现象(code-mixing)还很有限,尽管在日常对话中许多语言与印尼语言混合使用。在这项工作中,我们研究了印尼语言中四种嵌入语言(英语、 Sundanese、 Javanese 和马来语)的混合现象,并介绍了 IndoRobusta 框架,用于评估和改进混合robustness。我们的分析表明,预训练词库偏见影响了模型对印尼语言-英语混合处理的能力,尽管印尼语言的语言多样性较高。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

InterPrompt: Interpretable Prompting for Interrelated Interpersonal Risk Factors in Reddit Posts

  • paper_url: http://arxiv.org/abs/2311.12404
  • repo_url: None
  • paper_authors: MSVPJ Sathvik, Surjodeep Sarkar, Chandni Saxena, Sunghwan Sohn, Muskan Garg
  • for: 预测和早期识别心理健康问题的人工智能支持系统。
  • methods: 使用N-shot学习和GPT-3模型,并提出了一种可解释的提示方法(InterPrompt)来改进语言修改和注意力机制。
  • results: 研究结果表明,四种基于GPT-3模型的 variant,当使用InterPrompt进行微调,在分类和解释生成方面都有显著提高,而且系统水平的解释性和可信度也得到了提高。
    Abstract Mental health professionals and clinicians have observed the upsurge of mental disorders due to Interpersonal Risk Factors (IRFs). To simulate the human-in-the-loop triaging scenario for early detection of mental health disorders, we recognized textual indications to ascertain these IRFs : Thwarted Belongingness (TBe) and Perceived Burdensomeness (PBu) within personal narratives. In light of this, we use N-shot learning with GPT-3 model on the IRF dataset, and underscored the importance of fine-tuning GPT-3 model to incorporate the context-specific sensitivity and the interconnectedness of textual cues that represent both IRFs. In this paper, we introduce an Interpretable Prompting (InterPrompt)} method to boost the attention mechanism by fine-tuning the GPT-3 model. This allows a more sophisticated level of language modification by adjusting the pre-trained weights. Our model learns to detect usual patterns and underlying connections across both the IRFs, which leads to better system-level explainability and trustworthiness. The results of our research demonstrate that all four variants of GPT-3 model, when fine-tuned with InterPrompt, perform considerably better as compared to the baseline methods, both in terms of classification and explanation generation.
    摘要 心理健康专业人士和临床医生观察到了因人际风险因素(IRF)引起的心理疾病的增加。为了模拟人类在循环诊断过程中的干预场景,我们认为文本指示可以识别IRF:受抑阻的归属感(TBe)和感受到的负担感(PBu)在个人故事中。在这基础上,我们使用N-shot学习方法和GPT-3模型在IRF数据集上,并强调了将GPT-3模型Context-specific敏感性和文本提示之间的相互连接纳入模型。在这篇论文中,我们介绍了一种可解释的提示方法(InterPrompt),用于提高注意力机制。这种方法通过微调GPT-3模型的预训练 веса来实现更细致的语言修改。我们的模型能够检测到IRFs中常见的模式和下面连接,从而提高系统级别的解释能力和信任性。研究结果表明,对GPT-3模型进行InterPrompt微调后,四种模型variants都表现出了明显的提高,比基eline方法都高于,包括分类和解释生成。

A Survey of Graph Meets Large Language Model: Progress and Future Directions

  • paper_url: http://arxiv.org/abs/2311.12399
  • repo_url: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks
  • paper_authors: Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo Sun, Hong Cheng, Jeffrey Xu Yu
  • for: 本研究旨在对大语言模型(LLMs)在图像任务中的应用进行全面回顾和分析,并提出一种新的分类方法来组织现有的方法。
  • methods: 本文对现有的方法进行了系统的介绍和分析,并将其分为三类 based on the role(即增强组件、预测组件和对齐组件)。
  • results: 本文提出了一些未来研究的可能性,并将相关的论文维护在:https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks。
    Abstract Graph plays a significant role in representing and analyzing complex relationships in real-world applications such as citation networks, social networks, and biological data. Recently, Large Language Models (LLMs), which have achieved tremendous success in various domains, have also been leveraged in graph-related tasks to surpass traditional Graph Neural Networks (GNNs) based methods and yield state-of-the-art performance. In this survey, we first present a comprehensive review and analysis of existing methods that integrate LLMs with graphs. First of all, we propose a new taxonomy, which organizes existing methods into three categories based on the role (i.e., enhancer, predictor, and alignment component) played by LLMs in graph-related tasks. Then we systematically survey the representative methods along the three categories of the taxonomy. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. The relevant papers are summarized and will be consistently updated at: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks.
    摘要 GRAPH 在各种实际应用中,如引用网络、社交网络和生物数据中,扮演了重要的角色,用于表示和分析复杂关系。最近,大型自然语言模型(LLMs),在不同领域中取得了很大成功,也在图像相关任务中被利用,以超越传统的图 neural network(GNNs)基于方法,实现状态的最佳性能。在本文中,我们首先提出了一个新的分类法,将现有方法分为三类,根据 LLMS 在图像相关任务中的角色(即优化器、预测器和对齐组件)。然后,我们系统地检查了代表性方法,并将其分类在三类中。最后,我们讨论了现有研究的限制,并指出了未来研究的潜在方向。相关论文将在: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks 中进行系统化更新。

Problems of Non-equivalent Words in Technical Translation

  • paper_url: http://arxiv.org/abs/2311.12395
  • repo_url: None
  • paper_authors: Mohammad Ibrahim Qani
  • For: This research paper focuses on the issue of non-equivalent words in translation, specifically from English to Russian. The authors aim to provide solutions and rules for rendering these words accurately in the target language.* Methods: The paper uses a combination of linguistic analysis and examples to illustrate the challenges of translating non-equivalent words. The authors also provide suggestions for how to overcome these challenges and find appropriate equivalents in the target language.* Results: The paper highlights the importance of understanding the cultural and historical context of non-equivalent words in order to accurately translate them. The authors also provide a list of common non-equivalent words and their equivalents in Russian, which can be useful for translators and linguists working in this field.
    Abstract Translating words which do not have equivalent in target language is not easy and finding proper equivalent of those words are very important to render correctly and understandably, the article defines some thoughts and ideas of scientists on the common problems of non-equivalent words from English to Russian language and includes English and Russian examples and ideas of certain scientist. The English language is worldwide spoken and there are 1.35 billion English speakers and over 258 million Russian speakers according to the 2021s statistics. Inevitably, these billions of speakers around the world have connection and they may have deal in different criteria. In order to understand one another they need to have a pure and fully-understood language. These pure languages understanding directly relates to translation knowledge where linguists and translators need to work and research to eradicate misunderstanding. Misunderstandings mostly appear in non-equivalent words because there are different local and internal words like food, garment, cultural and traditional words and others in every notion. Truly, most of these words do not have equivalent in the target language and these words need to be worked and find their equivalent in the target language to fully understand the both languages. However, some of these non-equivalent words are already professionally rendered to the target language but still there many other words to be rendered. Hence, this research paper includes different ways and rules of rendering non-equivalent words from source language to the target language.
    摘要 英语是全球最广泛使用的语言之一,截至2021年统计,全球有135亿英语母语者和258万俄语母语者。由于这些亿万speakeraround the world有Connection,他们可能会有不同的评价标准。为了理解彼此,他们需要有一种纯净、完全理解的语言。这种语言理解直接与翻译知识相关,语言学家和翻译员需要努力工作和研究,以消除不同理解。英语和俄语之间的不同语言表达主要表现在非等效词上。每种语言都有自己的地方语言和内部词汇,如食品、服装、文化和传统词汇等。大多数这些词汇在目标语言中没有等效词,需要进行工作和找到其等效词。然而,一些这些非等效词已经在目标语言中得到了专业的翻译,但还有很多其他的词汇需要翻译。因此,这篇研究论文探讨了不同的方法和规则 для将非等效词从源语言翻译到目标语言。

The Obscure Limitation of Modular Multilingual Language Models

  • paper_url: http://arxiv.org/abs/2311.12375
  • repo_url: None
  • paper_authors: Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Ayu Purwarianti
  • for: 这个论文旨在探讨模块化多语言语言模型(MLM)在多语言推理场景中的限制,以及如何通过Language Identification(LID)模块来改善模块化MLM的多语言性能。
  • methods: 本文使用了现有的模块化MLM模型,并在其中添加了Language Identification(LID)模块,以评估模块化MLM在多语言推理场景中的性能。
  • results: 研究发现,在加入LID模块后,模块化MLM的多语言性能有所提高,但是由于LID和模块化MLM之间的管道式approach而导致的性能差距仍然存在。
    Abstract We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages. Existing evaluations of modular MLMs exclude the involvement of language identification (LID) modules, which obscures the performance of real-case multilingual scenarios of modular MLMs. In this work, we showcase the effect of adding LID on the multilingual evaluation of modular MLMs and provide discussions for closing the performance gap of caused by the pipelined approach of LID and modular MLMs.
    摘要 我们暴露了模块化多语言语言模型(MLM)在多语言推理场景中的局限性。现有的模块化 MLM 评估 exclude 语言标识(LID)模块的参与,这会隐藏真实场景中模块化 MLM 的性能。在这个工作中,我们表明了将 LID 添加到多语言评估中的效果,并提供了关闭性能差距的讨论。

Beyond Turing: A Comparative Analysis of Approaches for Detecting Machine-Generated Text

  • paper_url: http://arxiv.org/abs/2311.12373
  • repo_url: None
  • paper_authors: Muhammad Farid Adilazuarda, Nikolaos Nektarios Arkoulis, Oleksii Chumakov
  • for: 本研究旨在评估三种方法用于分辨人类和机器生成文本:传统的板块学习、语言模型(LM)精度调整和多语言模型调整。
  • methods: 本研究使用了三种方法:传统的板块学习、LM精度调整和多语言模型调整。
  • results: 研究结果表明这三种方法在分辨人类和机器生成文本方面存在显著的差异,这表明这个领域仍然需要进一步的发展。
    Abstract Significant progress has been made on text generation by pre-trained language models (PLMs), yet distinguishing between human and machine-generated text poses an escalating challenge. This paper offers an in-depth evaluation of three distinct methods used to address this task: traditional shallow learning, Language Model (LM) fine-tuning, and Multilingual Model fine-tuning. These approaches are rigorously tested on a wide range of machine-generated texts, providing a benchmark of their competence in distinguishing between human-authored and machine-authored linguistic constructs. The results reveal considerable differences in performance across methods, thus emphasizing the continued need for advancement in this crucial area of NLP. This study offers valuable insights and paves the way for future research aimed at creating robust and highly discriminative models.
    摘要 <>转换文本为简化中文。<>研究人员已取得了文本生成领域的重要进展,但分辨人类和机器生成的文本却成为一项困难的挑战。这篇论文提供了三种方法的深入评估:传统的浅学习、语言模型(LM)练化和多语言模型练化。这些方法在各种机器生成文本上进行了严格的测试,为分辨人类和机器生成语言结构的能力提供了标准。结果表明了不同方法之间的显著差异,从而强调了这一领域的持续发展需求。这篇研究提供了有价值的意见和未来研究的指导,推动了人工智能语言处理领域的进步。

Utilizing Language Models for Tour Itinerary Recommendation

  • paper_url: http://arxiv.org/abs/2311.12355
  • repo_url: None
  • paper_authors: Ngai Lam Ho, Kwan Hui Lim
  • For: The paper is written for researchers and practitioners in the fields of Operations Research and Recommendation Systems, specifically those interested in tour itinerary recommendation and planning.* Methods: The paper explores the use of language models, specifically Word2Vec and GloVe for learning POI embeddings, and transformer-based techniques like BERT for generating itineraries.* Results: The paper discusses the effectiveness of these approaches in recommending personalized POIs relevant to users and planning them as an itinerary that satisfies various constraints.Here is the same information in Simplified Chinese text:* For: 这篇论文是为了研究和实践操作研究和推荐系统领域的人们,具体来说是关于旅游路线规划和建议。* Methods: 论文使用语言模型,例如Word2Vec和GloVe来学习POI嵌入,以及基于转换器的技术如BERT来生成路线。* Results: 论文讨论了这些方法在建议个性化POI和遵循各种约束的情况下的效果。
    Abstract Tour itinerary recommendation involves planning a sequence of relevant Point-of-Interest (POIs), which combines challenges from the fields of both Operations Research (OR) and Recommendation Systems (RS). As an OR problem, there is the need to maximize a certain utility (e.g., popularity of POIs in the tour) while adhering to some constraints (e.g., maximum time for the tour). As a RS problem, it is heavily related to problem or filtering or ranking a subset of POIs that are relevant to a user and recommending it as part of an itinerary. In this paper, we explore the use of language models for the task of tour itinerary recommendation and planning. This task has the unique requirement of recommending personalized POIs relevant to users and planning these POIs as an itinerary that satisfies various constraints. We discuss some approaches in this area, such as using word embedding techniques like Word2Vec and GloVe for learning POI embeddings and transformer-based techniques like BERT for generating itineraries.
    摘要 Translation in Simplified Chinese:旅游观光路线规划涉及到融合操作研究(OR)和推荐系统(RS)两个领域的挑战。作为OR问题,需要最大化一定的用户满意度(例如点击点击的人数),同时遵循一些硬性或软性的限制(例如旅游时间的最长)。作为RS问题,它和推荐或筛选点击点击的问题有相似的特点,需要推荐对用户相关的点击点击,并将其组合成一个路线。在这篇论文中,我们探讨使用语言模型来进行旅游观光路线规划和规划。这个任务有唯一的需求,即为用户推荐个性化适合的点击点击,并将其组合成一个满足多个限制的路线。我们讨论了一些在这个领域的方法,例如使用Word2Vec和GloVelearn POI嵌入,以及使用BERT生成路线。

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

  • paper_url: http://arxiv.org/abs/2311.12351
  • repo_url: https://github.com/strivin0311/long-llms-learning
  • paper_authors: Yunpeng Huang, Jingwei Xu, Zixu Jiang, Junyu Lai, Zenan Li, Yuan Yao, Taolue Chen, Lijuan Yang, Zhou Xin, Xiaoxing Ma
  • for: 本研究旨在探讨基于Transformer的大型自然语言模型(LLM)在长 Context 下的提升,以优化在真实世界中遇到的长输入和输出问题。
  • methods: 本研究提出了一个总体的纲要,包括分析当前Transformer-based LLMs 中处理长 Context 输入和输出的问题,以及提出一个总体的 taxonomy 来探讨Transformer 的升级方法。
  • results: 本研究提供了一些常用的评价方法和工具包,包括数据集、评价指标和基eline 模型,以及一些提高 LLMS 效率和可行性的优化工具。
    Abstract With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs) have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been applied in diverse areas as knowledge bases, human interfaces, and dynamic agents. However, a prevailing limitation exists: many current LLMs, constrained by resources, are primarily pre-trained on shorter texts, rendering them less effective for longer-context prompts, commonly encountered in real-world settings. In this paper, we present a comprehensive survey focusing on the advancement of model architecture in Transformer-based LLMs to optimize long-context capabilities across all stages from pre-training to inference. We firstly delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. Then, we mainly offer a holistic taxonomy to navigate the landscape of Transformer upgrades on architecture to solve these problems. Afterward, we provide the investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as some amazing optimization toolkits like libraries, systems, and compilers to augment LLMs' efficiency and efficacy across different stages. Finally, we further discuss the predominant challenges and potential avenues for future research in this domain. Additionally, we have established a repository where we curate relevant literature with real-time updates at https://github.com/Strivin0311/long-llms-learning.
    摘要 随着ChatGPT的爆发,基于Transformer的大型自然语言模型(LLMs)已经开辟出了一条革命性的道路,并在不同领域中应用,如知识库、人机界面和动态代理。然而,一个普遍的限制存在:许多当前的LLMs,受资源的限制,主要预训练在短文本上,导致它们在长文本上的效果较差。在这篇论文中,我们提供了一项全面的报告,探讨了Transformer基于模型的改进,以便在所有阶段从预训练到推理中优化长文本能力。我们首先明确和分析了当前Transformer基于模型处理长文本输入和输出的问题。然后,我们提供了一个总体的分类,以帮助读者在Transformer升级的建筑方面 navigation。接着,我们进行了评估野外使用的评价需求,包括数据集、度量和基线模型,以及一些惊喜的优化工具包,如库、系统和编译器,以提高LLMs的效率和效果在不同阶段。最后,我们进一步讨论了当前领域的主要挑战和未来研究的可能性。此外,我们已经建立了一个存储库,并在实时更新https://github.com/Strivin0311/long-llms-learning。

Modeling Political Orientation of Social Media Posts: An Extended Analysis

  • paper_url: http://arxiv.org/abs/2311.12323
  • repo_url: None
  • paper_authors: Sadia Kamal, Brenner Little, Jade Gullic, Trevor Harms, Kristin Olofsson, Arunkumar Bagavathi
  • for: 本研究旨在Characterizing political polarization on online social media platforms, specifically in the social media posts themselves.
  • methods: 我们提出了两种启发法,利用新闻媒体偏见和帖子内容来标注社交媒体帖子的政治方向。
  • results: 我们发现,使用我们提出的启发法可以生成高质量的标注数据,并且现有的机器学习模型可以在使用传统超参数学习和少数参数学习的情况下提高预测帖子政治方向的性能。
    Abstract Developing machine learning models to characterize political polarization on online social media presents significant challenges. These challenges mainly stem from various factors such as the lack of annotated data, presence of noise in social media datasets, and the sheer volume of data. The common research practice typically examines the biased structure of online user communities for a given topic or qualitatively measuring the impacts of polarized topics on social media. However, there is limited work focusing on analyzing polarization at the ground-level, specifically in the social media posts themselves. Such existing analysis heavily relies on annotated data, which often requires laborious human labeling, offers labels only to specific problems, and lacks the ability to determine the near-future bias state of a social media conversations. Understanding the degree of political orientation conveyed in social media posts is crucial for quantifying the bias of online user communities and investigating the spread of polarized content. In this work, we first introduce two heuristic methods that leverage on news media bias and post content to label social media posts. Next, we compare the efficacy and quality of heuristically labeled dataset with a randomly sampled human-annotated dataset. Additionally, we demonstrate that current machine learning models can exhibit improved performance in predicting political orientation of social media posts, employing both traditional supervised learning and few-shot learning setups. We conduct experiments using the proposed heuristic methods and machine learning approaches to predict the political orientation of posts collected from two social media forums with diverse political ideologies: Gab and Twitter.
    摘要 发展机器学习模型来 caracterize online社交媒体上的政治偏见存在 significante挑战。这些挑战主要来自于各种因素,如数据集中的噪音、社交媒体数据的大量和缺乏标注数据等。现有研究通常会研究在某个话题上的在线用户群体的偏见结构,或者使用质量量表来衡量推特话题的影响。然而,尚有限的研究将注意力集中在社交媒体文章本身的偏见分析上。现有的分析方法往往依赖于人工标注,这需要劳动密集,同时只能为特定问题提供标签,而且无法确定社交媒体对话的近期偏见状态。理解社交媒体文章中具有政治倾向的程度是量化在线用户群体偏见的关键,以及推特偏见内容的传播。在这项工作中,我们首先介绍了两种归纳方法,利用新闻媒体偏见和文章内容来标注社交媒体文章。然后,我们比较了归纳方法生成的数据集和随机采样的人工标注数据集的效果和质量。此外,我们还证明了现有的机器学习模型可以通过使用传统的超vised学习和几shot学习方式,在预测社交媒体文章的政治倾向方面提高性能。我们在提posed归纳方法和机器学习方法的基础上进行实验,以预测来自Gab和Twitter两个社交媒体平台的不同政治意识型的文章的政治倾向。

AcademicGPT: Empowering Academic Research

  • paper_url: http://arxiv.org/abs/2311.12315
  • repo_url: None
  • paper_authors: Shufa Wei, Xiaolong Xu, Xianbiao Qi, Xi Yin, Jun Xia, Jingyi Ren, Peijun Tang, Yuxiang Zhong, Yihao Chen, Xiaoqin Ren, Yuxin Liang, Liankai Huang, Kai Xie, Weikang Gui, Wei Tan, Shuanglong Sun, Yongquan Hu, Qinxian Liu, Nanjin Li, Chihao Dai, Lihua Wang, Xiaohui Liu, Lei Zhang, Yutao Xie
  • for: 这份技术报告旨在介绍一种专门为学术研究设计的语言模型(AcademicGPT),以便促进学术研究的进步。
  • methods: 这种模型是基于LLaMA2-70B的 continual training模型,其训练集主要包括学术论文、硬件文献、一些学术领域的内容以及高质量的中文数据等。
  • results: 我们对AcademicGPT进行了多个公共benchmark测试,包括MMLU和CEval,以及一些专门的学术benchmark测试,如PubMedQA、SCIEval和我们自己创建的ComputerScienceQA,以显示其在通用知识、中文能力和学术能力方面的能力。此外,我们还基于AcademicGPT的基础模型开发了一些适用于学术领域的应用程序,如通用学术问答、AI助力读笔、论文审稿和AI助力标题和摘要生成等。
    Abstract Large Language Models (LLMs) have demonstrated exceptional capabilities across various natural language processing tasks. Yet, many of these advanced LLMs are tailored for broad, general-purpose applications. In this technical report, we introduce AcademicGPT, designed specifically to empower academic research. AcademicGPT is a continual training model derived from LLaMA2-70B. Our training corpus mainly consists of academic papers, thesis, content from some academic domain, high-quality Chinese data and others. While it may not be extensive in data scale, AcademicGPT marks our initial venture into a domain-specific GPT tailored for research area. We evaluate AcademicGPT on several established public benchmarks such as MMLU and CEval, as well as on some specialized academic benchmarks like PubMedQA, SCIEval, and our newly-created ComputerScienceQA, to demonstrate its ability from general knowledge ability, to Chinese ability, and to academic ability. Building upon AcademicGPT's foundation model, we also developed several applications catered to the academic area, including General Academic Question Answering, AI-assisted Paper Reading, Paper Review, and AI-assisted Title and Abstract Generation.
    摘要 大型自然语言模型(LLM)已经展示出了广泛的应用场景,但许多这些高级LLM都是为普遍应用场景设计的。在这份技术报告中,我们介绍了 AcademicGPT,这是专门为学术研究设计的。AcademicGPT 是基于 LLaMA2-70B 的连续训练模型。我们的训练集主要包括学术论文、论文、学术领域内容、高质量中文数据和其他数据。虽然数据规模不是很广泛,但 AcademicGPT 是我们在域pecific GPT 领域的首次尝试。我们在一些已有的公共评估指标上如 MMLU 和 CEval 进行了评估,以及一些特殊的学术评估指标如 PubMedQA、SCIEval 和我们新创建的 ComputerScienceQA,以示其在通用知识、中文能力和学术能力方面的能力。基于 AcademicGPT 基本模型,我们还开发了一些针对学术领域的应用程序,包括通用学术问答、 AI 助力论文阅读、论文评审和 AI 助力标题和摘要生成。

Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

  • paper_url: http://arxiv.org/abs/2311.12275
  • repo_url: None
  • paper_authors: Ruiyang Qin, Jun Xia, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Peipei Zhou, Jingtong Hu, Yiyu Shi
  • for: 本研究提出了一个框架,用于在 edge 设备上实现个人化语言模型(LLM),考虑到稀疏标注和储存限制。
  • methods: 本研究使用了一个新的自我超vised练习框架,可以在 edge 设备上选择和储存最有代表性的数据,并通过多个 semantically 相似的问题文本和预期回答生成器来增强练习质量。
  • results: 本研究的实验结果显示,提出的框架可以实现最好的用户特定内容生成能力(精度)和练习速度(性能),并且比vanilla基线框架要好。
    Abstract After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.
    摘要 after a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. however, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. while it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. in addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. it remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. in this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. to enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. to the best of our knowledge, this is the very first on-device LLM personalization framework.

cs.LG - 2023-11-21

Training Deep 3D Convolutional Neural Networks to Extract BSM Physics Parameters Directly from HEP Data: a Proof-of-Concept Study Using Monte Carlo Simulations

  • paper_url: http://arxiv.org/abs/2311.13060
  • repo_url: None
  • paper_authors: S. Dubey, T. E. Browder, S. Kohani, R. Mandal, A. Sibidanov, R. Sinha
  • for: 这个论文是用计算机视觉技术直接从高能物理(HEP)味道数据中提取 beyond the Standard Model(BSM)参数的一种新应用。
  • methods: 这个论文开发了一种将angular和动量分布转换成” quasi-images”的方法,以便使用卷积神经网络进行回归任务,类似于适应。这与通常在 HEП 中使用机器学习/人工智能(ML/AI)进行分类任务不同。
  • results: 作为证明,这个论文使用了34层差分神经网络来回归 MC(Monte Carlo) simulations of $B \rightarrow K^{*}\mu^{+}\mu^{-}$ 减噪 decay 中的 Wilson Coefficient $C_{9}$。这种技术可以推广应用,可能在其他 HEEP 实验中找到应用。
    Abstract We report on a novel application of computer vision techniques to extract beyond the Standard Model (BSM) parameters directly from high energy physics (HEP) flavor data. We develop a method of transforming angular and kinematic distributions into "quasi-images" that can be used to train a convolutional neural network to perform regression tasks, similar to fitting. This contrasts with the usual classification functions performed using ML/AI in HEP. As a proof-of-concept, we train a 34-layer Residual Neural Network to regress on these images and determine the Wilson Coefficient $C_{9}$ in MC (Monte Carlo) simulations of $B \rightarrow K^{*}\mu^{+}\mu^{-}$ decays. The technique described here can be generalized and may find applicability across various HEP experiments and elsewhere.
    摘要 我们报道了一种使用计算机视觉技术直接从高能物理(HEP)味道数据中提取非标准模型(BSM)参数的新应用。我们开发了一种将天体和动量分布转换成“假像”的方法,这些假像可以用于训练卷积神经网络进行回归任务,类似于适应。这与通常使用机器学习/人工智能(ML/AI)在HEP中进行分类任务不同。作为证明,我们在MC( Monte Carlo) simulate $B \to K^{*}\mu^{+}\mu^{-}$ 衰变中训练了34层差分神经网络,以计算威尔逊系数 $C_{9}$。该技术可以普遍应用,可能在其他HEP实验中找到应用。

A note on estimating the dimension from a random geometric graph

  • paper_url: http://arxiv.org/abs/2311.13059
  • repo_url: None
  • paper_authors: Caelan Atamanchuk, Luc Devroye, Gabor Lugosi
  • for: 这个论文研究了使用随机几何图来估计Underlying Space中维度d的问题,当我们有图的邻接矩阵,但不知道r_n或X_i vectors的情况下。
  • methods: 论文使用了随机几何图的方法来估计维度d,并提出了一个可靠的估计方法,其中不需要知道density的情况下,当n^{3/2} r_n^d 到达极限时,估计方法就会 converge到维度d。
  • results: 论文的主要结果是,当density满足 $\int f^5 < \infty$ 时,当 $n^{3/2} r_n^d \to \infty$ 且 $r_n = o(1)$ 时,存在一个可靠的估计方法,其中估计方法会 converge到维度d。此外,无论density是什么,当 $n r_n^d \to \infty$ 且 $r_n = o(1)$ 时,也存在一个可靠的估计方法。
    Abstract Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$.
    摘要 Let $G_n$ be a random geometric graph with vertex set $[n]$,based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$.

Multi-fidelity Bayesian Optimization in Engineering Design

  • paper_url: http://arxiv.org/abs/2311.13050
  • repo_url: None
  • paper_authors: Bach Do, Ruda Zhang
  • For: 这篇论文主要针对的是用多元信息优化(MFO)和 bayesian优化(BO)结合解决高成本工程设计优化问题,利用这种方法的优势,如物理和数学理解、资源节约、探索尝试补做、不确定性考虑和并行计算。* Methods: 这篇论文主要探讨了两个关键组成部分的进步:基于 Gaussian process(GP)的MF 模拟和获取函数。论文首先将现有的MF模型方法和MFO策略分类,将MF BO置于大家族中的Surrogate-based优化和MFO算法中。然后,通过探讨每个组成部分中共享的特性,描述了重要的GP基于MF模拟和审查函数。* Results: 论文的结果显示,MF BO 可以在解决复杂且重要的设计优化问题中表现出色,如约束优化、高维度优化、不确定性优化和多目标优化。然而,论文还提出了一些需要进一步研究的重要方面,例如约束优化、高维度优化、不确定性优化和多目标优化。
    Abstract Resided at the intersection of multi-fidelity optimization (MFO) and Bayesian optimization (BO), MF BO has found a niche in solving expensive engineering design optimization problems, thanks to its advantages in incorporating physical and mathematical understandings of the problems, saving resources, addressing exploitation-exploration trade-off, considering uncertainty, and processing parallel computing. The increasing number of works dedicated to MF BO suggests the need for a comprehensive review of this advanced optimization technique. In this paper, we survey recent developments of two essential ingredients of MF BO: Gaussian process (GP) based MF surrogates and acquisition functions. We first categorize the existing MF modeling methods and MFO strategies to locate MF BO in a large family of surrogate-based optimization and MFO algorithms. We then exploit the common properties shared between the methods from each ingredient of MF BO to describe important GP-based MF surrogate models and review various acquisition functions. By doing so, we expect to provide a structured understanding of MF BO. Finally, we attempt to reveal important aspects that require further research for applications of MF BO in solving intricate yet important design optimization problems, including constrained optimization, high-dimensional optimization, optimization under uncertainty, and multi-objective optimization.
    摘要 居住在多元调整(MFO)和 bayesian优化(BO)的交叉点上,MF BO 在解决高成本工程设计优化问题方面发现了一个 nich,感谢它在 физи学和数学问题的理解方面具有优势,避免资源浪费,处理探索-探索负载问题,考虑不确定性,并可以并行计算。随着更多的研究works dedicated to MF BO,需要一篇 comprehensive 的 Review 文章来概述这种进步的优化技术。在这篇文章中,我们首先将 существу的 MF 模型方法和 MFO 策略分类,以便将 MF BO 置入一个大家族的 surrogate-based 优化和 MFO 算法中。然后,我们利用这些方法之间的共同特性来描述重要的 GP 基于 MF 模型,并评审各种 acquisition 函数。通过这样,我们期望提供一个结构化的理解 MF BO。最后,我们尝试揭示需要进一步研究的重要方面,以便应用 MF BO 解决复杂但重要的设计优化问题,包括受限优化、高维度优化、不确定性优化和多値优化。

Favour: FAst Variance Operator for Uncertainty Rating

  • paper_url: http://arxiv.org/abs/2311.13036
  • repo_url: None
  • paper_authors: Thomas D. Ahle, Sahar Karimi, Peter Tak Peter Tang
  • for: 本文旨在提高 bayesian neural network(BNN)的广泛应用,通过采样 posterior distribution 来评估预测结果的不确定性。
  • methods: 本文提出了一种更有原理的幂度传播框架,基于 “spiked covariance matrices” 来平滑地选择质量和推理时间之间的平衡。这种框架使用了一种新的快速更新 diagonally-plus-low-rank 矩阵 approximation 的算法,可以在不同操作下进行更新。
  • results: 对于 MC Dropout 和 Variational Inference 等下游不确定性主题任务,我们测试了我们的算法和采样基eline,发现 Favour 可以匹配 10-100 个推理样本的性能,而且速度和性能之间没有明显的负面关系。因此,本文使得 BNN 在性能关键任务中得到了广泛应用。
    Abstract Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions. By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference. Unfortunately many inference samples are often needed, the overhead of which greatly hinder BNN's wide adoption. To mitigate this, previous work proposed propagating the first and second moments of the posterior directly through the network. However, on its own this method is even slower than sampling, so the propagated variance needs to be approximated such as assuming independence between neural nodes. The resulting trade-off between quality and inference time did not match even plain Monte Carlo sampling. Our contribution is a more principled variance propagation framework based on "spiked covariance matrices", which smoothly interpolates between quality and inference time. This is made possible by a new fast algorithm for updating a diagonal-plus-low-rank matrix approximation under various operations. We tested our algorithm against sampling based MC Dropout and Variational Inference on a number of downstream uncertainty themed tasks, such as calibration and out-of-distribution testing. We find that Favour is as fast as performing 2-3 inference samples, while matching the performance of 10-100 samples. In summary, this work enables the use of BNN in the realm of performance critical tasks where they have previously been out of reach.
    摘要 bayesian neural networks (BNN) 已经成为解释机器学习预测的关键方法。通过采样 posterior distribution,数据科学家可以估计预测的不确定性。然而,很多采样通常需要很长时间,这会很大的阻碍 BNN 的广泛应用。为了解决这个问题,先前的工作提议直接通过网络传播 posterior 的第一和第二 moments。然而,这种方法甚至 slower than sampling,因此需要approximate propagated variance,例如 Assuming independence between neural nodes。这导致了quality和采样时间之间的权衡不符合,even plain Monte Carlo sampling。我们的贡献是一种更有原理的方法,基于 "spiked covariance matrices",可以平滑地转换 междуquality和采样时间。这是通过一种新的快速更新 diagonally-plus-low-rank matrix approximation under various operations的算法来实现的。我们对 sampling based MC Dropout和Variational Inference 进行了许多下游不确定性主题任务的测试,如准确性和外围测试。我们发现,Favour 可以与 2-3 个采样相比,具有与 10-100 个采样相同的性能。总之,这项工作使得 BNN 可以在性能关键任务中应用,这些任务之前已经不可达。

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

  • paper_url: http://arxiv.org/abs/2311.17894
  • repo_url: None
  • paper_authors: Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore
  • for: 本研究使用机器学习方法来研究半导体原子在碳单层上的动态转移,当电子扫描顺序电子显微镜(STEM)电子束辐射时。
  • methods: 我们的方法基于数据中心,利用STEM中收集的数据样本进行处理和筛选,生成符号表示,并使用神经网络来预测转移概率。
  • results: 我们的实验研究表明,我们的方法可以准确地预测半导体原子的动态转移,并且可以在适当的目标位置引导半导体原子。
    Abstract We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
    摘要 我们介绍了一种机器学习方法,用于确定单层碳原子上锆原子的迁移动力学。我们的方法是数据驱动的,利用了扫描传输电子显微镜(STEM)中的电子束数据。我们对数据样本进行了处理和筛选,生成了符号表示法,并使用这些符号表示法来训练神经网络预测迁移概率。这些学习得到的迁移动力学然后用于导引单个锆原子在网格中到预先确定的目标位置。我们对实验数据进行了empirical分析,证明了我们的方法的有效性和通用性。

Fast and Interpretable Mortality Risk Scores for Critical Care Patients

  • paper_url: http://arxiv.org/abs/2311.13015
  • repo_url: https://github.com/muhangtian/gfr-experiments
  • paper_authors: Chloe Qinyu Zhu, Muhang Tian, Lesia Semenova, Jiachang Liu, Jack Xu, Joseph Scarpa, Cynthia Rudin
  • for: 预测 ICU 病人死亡的风险评估是重要的任务之一,这些评估的目的是提高护理医学中的病人监测和护理。
  • methods: 我们使用现代可读性 Machine Learning 技术来设计准确且可读性的死亡风险分数模型。我们利用了最大的公共 ICU 监测数据集,即 MIMIC III 和 eICU 数据集。我们通过评估风险的各个医疗机构来研究风险的普适性。
  • results: 我们的 GroupFasterRisk 算法可以在几个小时内生成高度可读性的死亡风险分数模型,这些模型的预测性能与黑盒 ML 模型相当,但它们具有许多优点,例如可控制特征数量、团队稀烈性、幂等性和可选的域知识 correction。这些模型在医疗机构中的实践中表现出色,并且可以根据域专家的需求进行定制。
    Abstract Prediction of mortality in intensive care unit (ICU) patients is an important task in critical care medicine. Prior work in creating mortality risk models falls into two major categories: domain-expert-created scoring systems, and black box machine learning (ML) models. Both of these have disadvantages: black box models are unacceptable for use in hospitals, whereas manual creation of models (including hand-tuning of logistic regression parameters) relies on humans to perform high-dimensional constrained optimization, which leads to a loss in performance. In this work, we bridge the gap between accurate black box models and hand-tuned interpretable models. We build on modern interpretable ML techniques to design accurate and interpretable mortality risk scores. We leverage the largest existing public ICU monitoring datasets, namely the MIMIC III and eICU datasets. By evaluating risk across medical centers, we are able to study generalization across domains. In order to customize our risk score models, we develop a new algorithm, GroupFasterRisk, which has several important benefits: (1) it uses hard sparsity constraint, allowing users to directly control the number of features; (2) it incorporates group sparsity to allow more cohesive models; (3) it allows for monotonicity correction on models for including domain knowledge; (4) it produces many equally-good models at once, which allows domain experts to choose among them. GroupFasterRisk creates its risk scores within hours, even on the large datasets we study here. GroupFasterRisk's risk scores perform better than risk scores currently used in hospitals, and have similar prediction performance to black box ML models (despite being much sparser). Because GroupFasterRisk produces a variety of risk scores and handles constraints, it allows design flexibility, which is the key enabler of practical and trustworthy model creation.
    摘要 估计 icu 病人死亡的风险是重要的任务在抢救医学中。现有的 mortality risk 模型可以分为两大类:域专家创造的分数系统,以及黑obox 机器学习(ml)模型。两者都有缺点:黑obox 模型不可以在医院中使用,而手动创建模型(包括手动调整 logistic regression 参数)需要人类进行高维度受限化优化,这会导致性能下降。在这种情况下,我们 bridges 这两种模型的缺点,并创建了准确且可解释的 mortality risk 分数。我们基于现代可解释ml技术,设计了高精度的 mortality risk 分数。我们利用了最大的 icu 监测数据集,namely MIMIC III 和 eICU 数据集。通过评估风险的医院差异,我们能够研究风险的通用性。为了自定义我们的风险分数模型,我们开发了一个新的算法,GroupFasterRisk,它具有以下重要优点:1. 使用硬约束,allow users 直接控制特征数量。2. incorporates group sparsity,allowing more cohesive models.3. 允许含域知识的 monotonicity correction.4. 生成多个 equally good models,allowing domain experts to choose among them.GroupFasterRisk 在很短的时间内生成了高精度的风险分数,并且在我们所研究的大型数据集上表现了更好的预测性能。 GroupFasterRisk 的风险分数比现在使用的医院风险分数更好,并且与黑obox ml 模型的预测性能相似,即使它们非常紧凑。由于 GroupFasterRisk 生成了多个风险分数和处理约束,因此具有设计灵活性,这是实用和信任worthy 模型的关键 enable。

How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks

  • paper_url: http://arxiv.org/abs/2311.12997
  • repo_url: None
  • paper_authors: Rahul Ramesh, Mikail Khona, Robert P. Dick, Hidenori Tanaka, Ekdeep Singh Lubana
  • for: 本研究旨在评估 transformer 模型是否可以学习并执行复杂的操作。
  • methods: 作者使用 autoregressive Transformer 模型,并在这些模型上进行了大量的系统性实验,以评估模型是否可以学习并执行复杂的操作。
  • results: 研究发现,autoregressive Transformer 模型可以从训练数据中学习compositional结构,并可以通过将输出作为 intermediate 输出来执行复杂的操作。此外,训练数据对模型的作 composition 能力有着重要的影响,并且模型的吸引层在后半部分对 compositional 能力具有重要的作用。
    Abstract Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper "how capable can a transformer become?". Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from the training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) the training data has a significant impact on the model's ability to compose unseen combinations of functions; and (4) the attention layers in the latter half of the model are critical to compositionality.
    摘要 transformers 受训于庞大的文本资料库显示出一个惊人的能力集合,例如执行简单的逻辑运算。由于语言的自然层次结构,我们可以预期模型会学习这些能力的组合,可能导致输入的可能性急剧增加。为此,我们在这篇论文中尝试评估transformers 的能力底线。特别是,我们使用一种基于推论的Transformer模型来训练数据生成过程,该过程包含一组已定义的独立功能。通过对这种数据生成过程进行了系统和广泛的实验,我们发现:1. 推论型Transformer 可以从训练数据中学习层次结构,并对未before seen的组合函数进行泛化。2. 通过生成中间输出来组合函数的方式更有效地泛化到未经见过的组合函数,相比于不生成中间输出。3. 训练数据对模型对未经见过的组合函数的能力具有很大的影响。4. 模型的后半部分的注意层是对组合性的关键。Note: "Simplified Chinese" is a romanization of Chinese characters, which is used to represent the language in a simpler form, especially for non-native speakers. The translation is based on the "Yale Romanization" system, which is commonly used in academic publications.

Hierarchical Learning for Quantum ML: Novel Training Technique for Large-Scale Variational Quantum Circuits

  • paper_url: http://arxiv.org/abs/2311.12929
  • repo_url: None
  • paper_authors: Hrant Gharibyan, Vincent Su, Hayk Tepanyan
  • for: 实现大规模量子矩阵的有效训练
  • methods: 利用层次学习的新量子架构,实现大规模量子矩阵的训练
  • results: 在量子矩阵上训练3维多重 Gaussian 分布,精度可达4%的总差异 distance,并且运行在现有量子硬件(IBM 7和27个粒子)上
    Abstract We present hierarchical learning, a novel variational architecture for efficient training of large-scale variational quantum circuits. We test and benchmark our technique for distribution loading with quantum circuit born machines (QCBMs). With QCBMs, probability distributions are loaded into the squared amplitudes of computational basis vectors represented by bitstrings. Our key insight is to take advantage of the fact that the most significant (qu)bits have a greater effect on the final distribution and can be learned first. One can think of it as a generalization of layerwise learning, where some parameters of the variational circuit are learned first to prevent the phenomena of barren plateaus. We briefly review adjoint methods for computing the gradient, in particular for loss functions that are not expectation values of observables. We first compare the role of connectivity in the variational ansatz for the task of loading a Gaussian distribution on nine qubits, finding that 2D connectivity greatly outperforms qubits arranged on a line. Based on our observations, we then implement this strategy on large-scale numerical experiments with GPUs, training a QCBM to reproduce a 3-dimensional multivariate Gaussian distribution on 27 qubits up to $\sim4\%$ total variation distance. Though barren plateau arguments do not strictly apply here due to the objective function not being tied to an observable, this is to our knowledge the first practical demonstration of variational learning on large numbers of qubits. We also demonstrate hierarchical learning as a resource-efficient way to load distributions for existing quantum hardware (IBM's 7 and 27 qubit devices) in tandem with Fire Opal optimizations.
    摘要 我们介绍了一种当今简化架构,用于有效地训练大规模的量子矩阵变量学习。我们将这种技术应用于量子矩阵生成机器(QCBM)上,将几率分布载入量子矩阵的两个基底向量上。我们的关键见解是利用最重要的(qu)比特对最终分布的影响更大,因此可以从最重要的比特开始学习。这可以看作是对层次学习的扩展,以避免巴伦板块现象。我们简要介绍了附属方法来计算导数,特别是 для损失函数不是观测值。我们在9个比特上载入 Gaussian 分布时发现,2D 连接性能比发展在线的连接性能更高。基于我们的观察,我们然后将这策略应用到大规模的数据库中,使用 GPU 训练一个 QCBM,以重现3维多项 Gaussian 分布在27个比特上,差异值为约4%。虽然巴伦板块理论不直接适用于这个任务,但这是我们知道的首个实际运用量子学习大量比特的示例。我们还证明这种简化架构可以实现资源效率地载入分布,并与 Fire Opal 优化相结合。

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

  • paper_url: http://arxiv.org/abs/2311.12786
  • repo_url: None
  • paper_authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger
  • for: 本研究的目的是解释在预训练模型上进行细化是如何改变模型内部学习的能力?研究人员通过使用机制可解释性工具(例如网络剔除和探测)来理解模型在预训练和细化过程中学习的能力是如何变化。
  • methods: 研究人员使用synthetic和控制的设置来进行实验,并使用机制可解释性工具来理解模型在预训练和细化过程中学习的能力是如何变化。
  • results: 研究人员发现:(i)细化通常不会改变模型内部学习的能力;(ii)在预训练后,模型会学习一个较少的变换,称为”包裹”,这使得模型看起来像是改变了能力;(iii)在下游任务中细化时,模型可以快速”复活”已经被遗弃的能力,即模型在只需几个梯度步后就可以重新使用这些能力。这表明,在细化模型时,实际上可能会意外地移除模型的安全封包,从而导致模型的不安全。研究人员还在使用TinyStories dataset进行语言模型的实验,以支持他们的声明。
    Abstract Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely novel capabilities or does it just modulate existing ones? We address this question empirically in synthetic, controlled settings where we can use mechanistic interpretability tools (e.g., network pruning and probing) to understand how the model's underlying capabilities are changing. We perform an extensive analysis of the effects of fine-tuning in these settings, and show that: (i) fine-tuning rarely alters the underlying model capabilities; (ii) a minimal transformation, which we call a 'wrapper', is typically learned on top of the underlying model capabilities, creating the illusion that they have been modified; and (iii) further fine-tuning on a task where such hidden capabilities are relevant leads to sample-efficient 'revival' of the capability, i.e., the model begins reusing these capability after only a few gradient steps. This indicates that practitioners can unintentionally remove a model's safety wrapper merely by fine-tuning it on a, e.g., superficially unrelated, downstream task. We additionally perform analysis on language models trained on the TinyStories dataset to support our claims in a more realistic setup.
    摘要 现在,许多大型预训模型的细化已成为开发任务特定和通用机器学习系统的标准策略,包括开发安全部署的模型。 despite its importance, there has been little research on how fine-tuning affects the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely new capabilities or just modulate existing ones? We address this question empirically in controlled, synthetic settings where we can use mechanistic interpretability tools (e.g., network pruning and probing) to understand how the model's underlying capabilities are changing. We perform an extensive analysis of the effects of fine-tuning in these settings, and show that: (i) fine-tuning rarely alters the underlying model capabilities; (ii) a minimal transformation, which we call a 'wrapper', is typically learned on top of the underlying model capabilities, creating the illusion that they have been modified; and (iii) further fine-tuning on a task where such hidden capabilities are relevant leads to sample-efficient 'revival' of the capability, i.e., the model begins reusing these capability after only a few gradient steps. This indicates that practitioners can unintentionally remove a model's safety wrapper merely by fine-tuning it on a, e.g., superficially unrelated, downstream task. We additionally perform analysis on language models trained on the TinyStories dataset to support our claims in a more realistic setup.

Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+α$ Moments

  • paper_url: http://arxiv.org/abs/2311.12784
  • repo_url: None
  • paper_authors: Trung Dang, Jasper C. H. Lee, Maoyuan Song, Paul Valiant
    for: The paper is written to study the problem of mean estimation in $\mathbb{R}$, specifically exploring whether algorithms can leverage useful features of the input distribution to achieve better performance than the current best-known bounds.methods: The paper uses a combination of theoretical and empirical techniques to study the mean estimation problem, including the construction of a new distribution that demonstrates the limitations of existing algorithms and the introduction of a new definitional framework for analyzing the fine-grained optimality of algorithms.results: The paper shows that, despite the existence of useful features in the input distribution, no reasonable estimator can achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22]. Additionally, the paper introduces a new definitional framework for analyzing the fine-grained optimality of algorithms, which it applies to show that median-of-means is neighborhood optimal, up to constant factors.
    Abstract There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [BCL13] and a lower bound by [DLLO16], characterizing the big-O optimal errors for distributions for which only a $1+\alpha$ moment exists for $\alpha \in (0,1)$. Both results, however, are optimal only in the worst case. We initiate the fine-grained study of the mean estimation problem: Can algorithms leverage useful features of the input distribution to beat the sub-Gaussian rate, without explicit knowledge of such features? We resolve this question with an unexpectedly nuanced answer: "Yes in limited regimes, but in general no". For any distribution $p$ with a finite mean, we construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$ and $q$ are not distinguishable with high probability, and $q$ further preserves $p$'s moments up to constants. The main consequence is that no reasonable estimator can asymptotically achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22]. More generally, we introduce a new definitional framework to analyze the fine-grained optimality of algorithms, which we call "neighborhood optimality", interpolating between the unattainably strong "instance optimality" and the trivially weak "admissibility" definitions. Applying the new framework, we show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-optimal estimator without constant factor slackness.
    摘要 “现在有增加的 интерес在改善我们的算法理解基本的统计问题,如mean estimation,这是因为了了解对于价值得资料中提取的限制。现有的最佳结果在 $\mathbb{R}$ 上是:1)最佳对�elta-Gaussian mean estimator,由 [LV22] 提供,具有所有 Distribution 的finite但未知方差下的紧密 sub-Gaussian 常数,以及2) [BCL13] 和 [DLLO16] 的分析,characterizing the big-O optimal error for distributions for which only a $1+\alpha$ moment exists for $\alpha \in (0,1)$.但这些结果仅在最差情况下是最佳的。我们开始了基本的mean estimation问题的细部研究:可以算法利用输入分布的有用特征来超过 sub-Gaussian 率吗?我们给出了一个不意料的答案:“是,但仅对于有限的特定区间”. for any distribution $p$ with a finite mean, we construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$ and $q$ are not distinguishable with high probability, and $q$ further preserves $p$'s moments up to constants. The main consequence is that no reasonable estimator can asymptotically achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22].更一般地,我们引入一个新的分析定义架构,我们称之为 “邻区优化”,它位于“不可接受地强”的“实例优化”和“无效的”“可接受性”定义之间。我们运用这个定义架构,证明 median-of-means 是邻区优化的,具有常数因子的固定。是否可以找到一个不含常数因子的邻区优化 estimator 是开问题。”

Generative Machine Learning for Multivariate Equity Returns

  • paper_url: http://arxiv.org/abs/2311.14735
  • repo_url: None
  • paper_authors: Ruslan Tepelyan, Achintya Gopal
  • for: 这个论文旨在使用现代机器学习方法来模拟股票回报的分布。
  • methods: 这篇论文使用了conditional importance weighted autoencoders和conditional normalizing flows等现代机器学习方法来实现这个目标。
  • results: 论文表明这些机器学习模型可以广泛应用于金融领域,包括生成真实的synthetic数据、估计volatility和相关性、风险分析(如资产值风险,或VaR,的 portefolio)和资产配置优化。
    Abstract The use of machine learning to generate synthetic data has grown in popularity with the proliferation of text-to-image models and especially large language models. The core methodology these models use is to learn the distribution of the underlying data, similar to the classical methods common in finance of fitting statistical models to data. In this work, we explore the efficacy of using modern machine learning methods, specifically conditional importance weighted autoencoders (a variant of variational autoencoders) and conditional normalizing flows, for the task of modeling the returns of equities. The main problem we work to address is modeling the joint distribution of all the members of the S&P 500, or, in other words, learning a 500-dimensional joint distribution. We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization.
    摘要 使用机器学习生成合成数据的使用已经在文本到图像模型和特别是大语言模型的普及下逐渐增长。这些模型的核心方法是学习下面数据的分布,与传统金融中使用统计模型 fits 数据类似。在这项工作中,我们探讨使用现代机器学习方法,特别是conditional importance weighted autoencoders(variational autoencoders的变体)和conditional normalizing flows,来模型股票回报的分布。我们的主要问题是学习S&P 500成员的共同分布,即学习500维联合分布。我们表明这种生成模型在金融领域有广泛的应用,包括生成真实的合成数据、评估风险(如资产值风险,或VaR)、资产组合优化等。

Deep Learning-Based Real-Time Quality Control of Standard Video Compression for Live Streaming

  • paper_url: http://arxiv.org/abs/2311.12918
  • repo_url: None
  • paper_authors: Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T. Chakradhar, Sennur Ulukus
  • for: 提高无线用户视频质量的实时控制技术
  • methods: 使用实时深度学习控制H.264编码器参数,以保持视频质量的PSNR水平在 Specified threshold 以上,同时最小化编码后的视频带宽使用
  • results: 实验结果表明,该方法可以在QCIF数据集和多种公共数据集上实现到2.5倍的带宽使用率提升,并且非遵从概率低于10^-2
    Abstract Ensuring high-quality video content for wireless users has become increasingly vital. Nevertheless, maintaining a consistent level of video quality faces challenges due to the fluctuating encoded bitrate, primarily caused by dynamic video content, especially in live streaming scenarios. Video compression is typically employed to eliminate unnecessary redundancies within and between video frames, thereby reducing the required bandwidth for video transmission. The encoded bitrate and the quality of the compressed video depend on encoder parameters, specifically, the quantization parameter (QP). Poor choices of encoder parameters can result in reduced bandwidth efficiency and high likelihood of non-conformance. Non-conformance refers to the violation of the peak signal-to-noise ratio (PSNR) constraint for an encoded video segment. To address these issues, a real-time deep learning-based H.264 controller is proposed. This controller dynamically estimates the optimal encoder parameters based on the content of a video chunk with minimal delay. The objective is to maintain video quality in terms of PSNR above a specified threshold while minimizing the average bitrate of the compressed video. Experimental results, conducted on both QCIF dataset and a diverse range of random videos from public datasets, validate the effectiveness of this approach. Notably, it achieves improvements of up to 2.5 times in average bandwidth usage compared to the state-of-the-art adaptive bitrate video streaming, with a negligible non-conformance probability below $10^{-2}$.
    摘要 保证无线用户高质量视频内容已成为当前非常重要。然而,在保持视频质量水平时,面临着因动态视频内容而导致的编码比特率波动的挑战。为解决这些问题,提出了一种在线实时基于深度学习的H.264控制器。这种控制器在视频块内容中估算了最佳编码参数,以保证视频质量PSNR水平上一定的门槛,同时最小化压缩视频的平均比特率。实验结果,在QCIF数据集和公共数据集中采集的多种随机视频上,证明了这种方法的有效性。特别是,可以在比特率使用量下降2.5倍以上,同时非遵从probability在10^-2以下。

Neural-Integrated Meshfree (NIM) Method: A differentiable programming-based hybrid solver for computational mechanics

  • paper_url: http://arxiv.org/abs/2311.12915
  • repo_url: None
  • paper_authors: Honghui Du, QiZhi He
  • for: 该研究旨在提出一种基于强化学习的 meshfree 方法,用于解决计算机力学问题。
  • methods: 该方法使用了 differentiable programming 技术,将传统的物理学习法与深度学习架构集成,并采用了 NeuroPU 混合方法来有效地表示解。
  • results: 研究表明,NIM 方法可以提高解表示精度和训练效率,并且在不同的数据中保持高度的一致性和泛化能力。
    Abstract We present the neural-integrated meshfree (NIM) method, a differentiable programming-based hybrid meshfree approach within the field of computational mechanics. NIM seamlessly integrates traditional physics-based meshfree discretization techniques with deep learning architectures. It employs a hybrid approximation scheme, NeuroPU, to effectively represent the solution by combining continuous DNN representations with partition of unity (PU) basis functions associated with the underlying spatial discretization. This neural-numerical hybridization not only enhances the solution representation through functional space decomposition but also reduces both the size of DNN model and the need for spatial gradient computations based on automatic differentiation, leading to a significant improvement in training efficiency. Under the NIM framework, we propose two truly meshfree solvers: the strong form-based NIM (S-NIM) and the local variational form-based NIM (V-NIM). In the S-NIM solver, the strong-form governing equation is directly considered in the loss function, while the V-NIM solver employs a local Petrov-Galerkin approach that allows the construction of variational residuals based on arbitrary overlapping subdomains. This ensures both the satisfaction of underlying physics and the preservation of meshfree property. We perform extensive numerical experiments on both stationary and transient benchmark problems to assess the effectiveness of the proposed NIM methods in terms of accuracy, scalability, generalizability, and convergence properties. Moreover, comparative analysis with other physics-informed machine learning methods demonstrates that NIM, especially V-NIM, significantly enhances both accuracy and efficiency in end-to-end predictive capabilities.
    摘要 我们介绍了一种叫做神经网络集成粗糙方法(NIM),这是一种基于渐进学习架构的混合粗糙方法,用于计算机构学。NIM融合了传统物理基础的粗糙分解技术与深度学习架构,使得解析方法可以更好地表示解析结果。这个神经数据混合不只提高了解析方法的数据空间分解,还可以减少深度学习模型的大小和基于自动渐进的空间梯度计算,因此提高了训练效率。在NIM框架下,我们提出了两种真正粗糙的解析方法:强式形式基于NIM(S-NIM)和地方可变形基于NIM(V-NIM)。S-NIM解析方法直接考虑到了强式形式的管理 equation,而V-NIM解析方法则使用了地方 Petrov-Galerkin 方法,可以根据任意重叠的子区域建构可变残留。这 Ensures both the satisfaction of underlying physics and the preservation of meshfree property。我们对站点和变化benchmark问题进行了广泛的数据实验,以评估NIM方法的精度、可扩展性、一致性和收敛性。此外,我们还进行了与其他物理资料机器学习方法的比较分析,结果显示NIM方法,特别是V-NIM方法,可以明显提高精度和效率在端到端预测能力。

Learning to Optimise Wind Farms with Graph Transformers

  • paper_url: http://arxiv.org/abs/2311.12750
  • repo_url: None
  • paper_authors: Siyi Li, Arnaud Robert, A. Aldo Faisal, Matthew D. Piggott
  • for: 提供一种数据驱动模型,可以准确预测风力电站中所有风轮机的发电量。
  • methods: 将风电园转换为完全连接图,并使用图变换器进行处理。
  • results: 得到一个具有普适性的替补模型,可以用于优化风轮机的法向角配置,并达到类似于 industrially-standard 风电园仿真工具的准确率,仅需计算成本的一小部分。
    Abstract This work proposes a novel data-driven model capable of providing accurate predictions for the power generation of all wind turbines in wind farms of arbitrary layout, yaw angle configurations and wind conditions. The proposed model functions by encoding a wind farm into a fully-connected graph and processing the graph representation through a graph transformer. The graph transformer surrogate is shown to generalise well and is able to uncover latent structural patterns within the graph representation of wind farms. It is demonstrated how the resulting surrogate model can be used to optimise yaw angle configurations using genetic algorithms, achieving similar levels of accuracy to industrially-standard wind farm simulation tools while only taking a fraction of the computational cost.
    摘要

Exploring Graph Classification Techniques Under Low Data Constraints: A Comprehensive Study

  • paper_url: http://arxiv.org/abs/2311.12737
  • repo_url: None
  • paper_authors: Kush Kothari, Bhavya Mehta, Reshmika Nambiar, Seema Shrawne
  • for: 这个论文提供了对 graf数据增强和少数shot学习的最新研究进展的简洁概述。
  • methods: 论文涵盖了不同类型的graph数据增强技术,包括节点和边扰动、图缩放、图生成等,以及最新的少数shot学习技术,如元学习和模型独立元学习。
  • results: 论文对这些领域进行了深入探讨,并将它们分为rule based方法和学习基于方法。在图数据增强方面,论文还研究了度量学习技术和优化基于技术。总的来说,这篇论文为解决图处理问题中的低数据场景提供了广泛的技术选择。
    Abstract This survey paper presents a brief overview of recent research on graph data augmentation and few-shot learning. It covers various techniques for graph data augmentation, including node and edge perturbation, graph coarsening, and graph generation, as well as the latest developments in few-shot learning, such as meta-learning and model-agnostic meta-learning. The paper explores these areas in depth and delves into further sub classifications. Rule based approaches and learning based approaches are surveyed under graph augmentation techniques. Few-Shot Learning on graphs is also studied in terms of metric learning techniques and optimization-based techniques. In all, this paper provides an extensive array of techniques that can be employed in solving graph processing problems faced in low-data scenarios.
    摘要 这份调查论文提供了近期研究Graph数据增强和几 shot学习的全面回顾。它覆盖了各种Graph数据增强技术,包括节点和边干扰、图缩放、图生成等,以及最新的几 shot学习技术,如元学习和模型无关元学习。文章深入探讨这些领域,并进一步分类。Rule based approaches和learning based approaches分别被surveyed under Graph增强技术。几 shot学习在图上也被研究,包括度量学习技术和优化基于技术。总之,这份论文提供了诸多可用于解决低数据情况下的图处理问题的技术。

Non-Sequential Ensemble Kalman Filtering using Distributed Arrays

  • paper_url: http://arxiv.org/abs/2311.12909
  • repo_url: None
  • paper_authors: Cédric Travelletti, Jörg Franke, David Ginsbourger, Stefan Brönnimann
  • for: 这个论文是为了提出一种新的分布式实现 Ensemble Kalman Filter (EnKF),以便在高维问题中进行大量数据的非顺序吸收。
  • methods: 这个论文使用了分布式计算的新进展,以实现构建和使用全模型差异矩阵的分布式存储,从而实现单批吸收所有观测值,消除观测顺序依赖。
  • results: 比较性性能评估表明,新的非顺序实现超过了传统的顺序实现。
    Abstract This work introduces a new, distributed implementation of the Ensemble Kalman Filter (EnKF) that allows for non-sequential assimilation of large datasets in high-dimensional problems. The traditional EnKF algorithm is computationally intensive and exhibits difficulties in applications requiring interaction with the background covariance matrix, prompting the use of methods like sequential assimilation which can introduce unwanted consequences, such as dependency on observation ordering. Our implementation leverages recent advancements in distributed computing to enable the construction and use of the full model error covariance matrix in distributed memory, allowing for single-batch assimilation of all observations and eliminating order dependencies. Comparative performance assessments, involving both synthetic and real-world paleoclimatic reconstruction applications, indicate that the new, non-sequential implementation outperforms the traditional, sequential one.
    摘要

Attacks of fairness in Federated Learning

  • paper_url: http://arxiv.org/abs/2311.12715
  • repo_url: https://github.com/slkdfjslkjfd/fl_fairness_attacks
  • paper_authors: Joseph Rance, Filip Svoboda
  • for: 本研究探讨了 Federated Learning 中数据隐私的攻击方式,特别是在某些特征下可以通过控制一小部分客户端来引入后门。
  • methods: 研究人员使用了一种类似于后门攻击的威胁模型,并证明了在某些特征下,只需控制一个客户端就可以影响模型的偏见性。
  • results: 研究人员发现,通过引入后门,攻击者可以让模型在某些特征下具有不公平的性能分布。此外,研究人员还发现,这种攻击可以通过控制单个客户端来实现。
    Abstract Federated Learning is an important emerging distributed training paradigm that keeps data private on clients. It is now well understood that by controlling only a small subset of FL clients, it is possible to introduce a backdoor to a federated learning model, in the presence of certain attributes. In this paper, we present a new type of attack that compromises the fairness of the trained model. Fairness is understood to be the attribute-level performance distribution of a trained model. It is particularly salient in domains where, for example, skewed accuracy discrimination between subpopulations could have disastrous consequences. We find that by employing a threat model similar to that of a backdoor attack, an attacker is able to influence the aggregated model to have an unfair performance distribution between any given set of attributes. Furthermore, we find that this attack is possible by controlling only a single client. While combating naturally induced unfairness in FL has previously been discussed in depth, its artificially induced kind has been neglected. We show that defending against attacks on fairness should be a critical consideration in any situation where unfairness in a trained model could benefit a user who participated in its training.
    摘要 Federated Learning 是一种重要的新趋势的分布式训练模式,它可以保持客户端上的数据私有。现在已经了解到,只需控制一小部分的 Federated Learning 客户端,就可以引入一个后门到 Federated Learning 模型中,具有特定属性的情况下。在这篇论文中,我们介绍了一种新的攻击方式,它会损害模型的公平性。公平性被理解为模型在不同属性上的性能分布。在一些领域,如果模型对某些子Population的准确率有偏见,可能会导致灾难性的后果。我们发现,通过使用类似于后门攻击模型的威胁模型,攻击者可以通过控制单个客户端,使模型在任意属性之间具有不公平的性能分布。此外,我们发现,这种攻击可以通过控制单个客户端来实现。在过去,对 Federated Learning 中自然occurring 的不公平性进行了深入的研究,但是人工引入的不公平性却被忽视了。我们表明,在任何情况下,如果模型中的不公平性可能会利用用户参与其训练, THEN 防御这种攻击应该是一个关键的考虑因素。

Regression-Based Analysis of Multimodal Single-Cell Data Integration Strategies

  • paper_url: http://arxiv.org/abs/2311.12711
  • repo_url: None
  • paper_authors: Bhavya Mehta, Nirmit Deliwala, Madhav Chandane
  • for: 这 paper 是为了探索多Modal single-cell 技术在疾病生物标志物质检测和药物发现方面的应用,以及这些技术在细胞发育过程中单个细胞水平上的表达。
  • methods: 这 paper 使用了不同的机器学习技术,包括echo state networks,来模型单个细胞水平上的DNA、RNA和蛋白质之间的相互关系,以及细胞发育过程中的细胞ular differentiation。
  • results: 实验结果表明,使用 echo state networks 可以在细胞水平上高度准确地检测疾病生物标志物质,并且可以准确地预测细胞ular differentiation。这些发现可能会推动我们对细胞发育和功能的理解,以及利用机器学习技术在医学领域的应用。
    Abstract Multimodal single-cell technologies enable the simultaneous collection of diverse data types from individual cells, enhancing our understanding of cellular states. However, the integration of these datatypes and modeling the interrelationships between modalities presents substantial computational and analytical challenges in disease biomarker detection and drug discovery. Established practices rely on isolated methodologies to investigate individual molecular aspects separately, often resulting in inaccurate analyses. To address these obstacles, distinct Machine Learning Techniques are leveraged, each of its own kind to model the co-variation of DNA to RNA, and finally to surface proteins in single cells during hematopoietic stem cell development, which simplifies understanding of underlying cellular mechanisms and immune responses. Experiments conducted on a curated subset of a 300,000-cell time course dataset, highlights the exceptional performance of Echo State Networks, boasting a remarkable state-of-the-art correlation score of 0.94 and 0.895 on Multi-omic and CiteSeq datasets. Beyond the confines of this study, these findings hold promise for advancing comprehension of cellular differentiation and function, leveraging the potential of Machine Learning.
    摘要 多Modal单细胞技术可以同时从单个细胞中收集多种数据类型,提高我们对细胞状态的理解。然而,将这些数据类型集成并模型细胞间关系却存在很大的计算和分析挑战,在疾病标记物识别和药物探索中。现有的做法通常是采用分离的方法来研究单个分子方面,常导致不准确的分析。为了解决这些障碍,不同的机器学习技术被投入使用,每种都用自己的方式来模型细胞中DNA和RNA之间的相关性,最终是surfaceproteins的模型。在单细胞发育过程中,这种方法简化了细胞内部机制和免疫应答的理解。实验在一个精心选择的300,000个细胞时序数据集上进行,显示了Echo State Networks的极高性能,与Multi-omic和CiteSeq数据集的相关性分别达到0.94和0.895的remarkable状态。 beyond这些研究,这些发现对细胞 diferenciación和功能的理解有极大的前途。

On the Out-of-Distribution Coverage of Combining Split Conformal Prediction and Bayesian Deep Learning

  • paper_url: http://arxiv.org/abs/2311.12688
  • repo_url: None
  • paper_authors: Paul Scemama, Ariel Kapusta
  • for: 这 paper 的目的是将 Bayesian deep learning 和异常预测结合使用,以增强机器学习系统中的不确定性和安全性。
  • methods: 这 paper 使用了 split conformal prediction 方法,并研究了这些方法在多类图像分类 tasks 中的效果。
  • results: 研究发现,如果模型在核对集上具有较低的自信度,那么结果的异常预测覆盖率可能会比简单的预测可信区域更差。相反,如果模型在核对集上具有较高的自信度,那么使用异常预测方法可能会提高异常预测覆盖率。
    Abstract Bayesian deep learning and conformal prediction are two methods that have been used to convey uncertainty and increase safety in machine learning systems. We focus on combining Bayesian deep learning with split conformal prediction and how this combination effects out-of-distribution coverage; particularly in the case of multiclass image classification. We suggest that if the model is generally underconfident on the calibration set, then the resultant conformal sets may exhibit worse out-of-distribution coverage compared to simple predictive credible sets. Conversely, if the model is overconfident on the calibration set, the use of conformal prediction may improve out-of-distribution coverage. We evaluate prediction sets as a result of combining split conformal methods and neural networks trained with (i) stochastic gradient descent, (ii) deep ensembles, and (iii) mean-field variational inference. Our results suggest that combining Bayesian deep learning models with split conformal prediction can, in some cases, cause unintended consequences such as reducing out-of-distribution coverage.
    摘要 bayesian深度学习和具有保证性的预测是两种用于传递不确定性并提高机器学习系统安全性的方法。我们关注将bayesian深度学习与分割具有保证性的预测结合使用,并对这种结合对于多类图像分类中的外部数据覆盖性产生影响。我们认为,如果模型在校准集上通常是不确定的,那么结果的具有保证性的预测集可能会比简单的预测信任区更差地覆盖外部数据。相反,如果模型在校准集上过于自信,那么使用具有保证性的预测可能会提高外部数据覆盖性。我们通过结合分割具有保证性的预测方法和深度 ensemble、Stochastic gradient descent和mean-field Variational inference所训练的神经网络进行评估。我们的结果表明,将bayesian深度学习模型与分割具有保证性的预测结合可能会在某些情况下导致意外的后果,如降低外部数据覆盖性。

Managing ML-Based Application Non-Functional Behavior: A Multi-Model Approach

  • paper_url: http://arxiv.org/abs/2311.12686
  • repo_url: None
  • paper_authors: Marco Anisetti, Claudio A. Ardagna, Nicola Bena, Ernesto Damiani, Paolo G. Panero
  • for: 提高 Machine Learning(ML)模型在应用程序生命周期中的稳定性和可靠性,包括设计、实现和运维阶段。
  • methods: 提出了一种多模型方法,通过在运行时选择多个 ML 模型,以保证应用程序的非功能性特性在时间和模型变化中保持稳定。该方法包括模型评估和模型替换两个步骤,以确保应用程序的非功能性特性在运行时得到保障。
  • results: 实验结果表明,该方法可以在真实世界的场景中提高非功能性特性的公平性。
    Abstract Modern applications are increasingly driven by Machine Learning (ML) models whose non-deterministic behavior is affecting the entire application life cycle from design to operation. The pervasive adoption of ML is urgently calling for approaches that guarantee a stable non-functional behavior of ML-based applications over time and across model changes. To this aim, non-functional properties of ML models, such as privacy, confidentiality, fairness, and explainability, must be monitored, verified, and maintained. This need is even more pressing when modern applications operate in the edge-cloud continuum, increasing their complexity and dynamicity. Existing approaches mostly focus on i) implementing classifier selection solutions according to the functional behavior of ML models, ii) finding new algorithmic solutions to this need, such as continuous re-training. In this paper, we propose a multi-model approach built on dynamic classifier selection, where multiple ML models showing similar non-functional properties are made available to the application and one model is selected over time according to (dynamic and unpredictable) contextual changes. Our solution goes beyond the state of the art by providing an architectural and methodological approach that continuously guarantees a stable non-functional behavior of ML-based applications, is applicable to different ML models, and is driven by non-functional properties assessed on the models themselves. It consists of a two-step process working during application operation, where model assessment verifies non-functional properties of ML models trained and selected at development time, and model substitution guarantees a continuous and stable support of non-functional properties. We experimentally evaluate our solution in a real-world scenario focusing on non-functional property fairness.
    摘要 现代应用程序越来越受机器学习(ML)模型的影响,这些模型的不确定性对应用生命周期的设计、实现和运维产生了深远的影响。随着ML的普遍应用,需要一些稳定性保证ML模型基于应用的不同时间和模型变更下的行为。为此,需要监测、验证和维护ML模型的隐私、安全、公平和解释性等非功能性特性。这种需求更加急迫,当现代应用程序在边缘云环境中运行时,它们的复杂性和动态性增加。现有的方法主要集中在:(i)根据ML模型的功能行为实现分类器选择解决方案,(ii)找到新的算法解决方案。在这篇论文中,我们提出了一种基于动态分类选择的多模型方法,其中多个显示类似非功能性特性的ML模型被提供给应用程序,并在时间上根据动态和不可预测的上下文变化选择一个模型。我们的解决方案超出了当前的状态艺术,因为它提供了一种体系和方法论的方法,可以在不同的ML模型上保证ML模型基于应用的不同时间和模型变更下的稳定性,并且在运行时进行模型评估和模型替换,以确保持续性和稳定性。我们在一个真实的应用场景中进行实验评估,关注非功能性特性公平。

Adversarial Reweighting Guided by Wasserstein Distance for Bias Mitigation

  • paper_url: http://arxiv.org/abs/2311.12684
  • repo_url: None
  • paper_authors: Xuan Zhao, Simone Fabbrizzi, Paula Reyero Lobo, Siamak Ghodsi, Klaus Broelemann, Steffen Staab, Gjergji Kasneci
  • for: 本研究旨在 Addressing 数据集中不同群体的不平等表现,以避免机器学习模型对少数群体的歧视。
  • methods: 本研究提出了一种新的对抗重量方法,通过减少多数群体的样本来弥补数据集中的表现偏袋。
  • results: 我们的实验表明,我们的方法可以遏制偏袋,不会牺牲分类精度,并且在图像和表格 benchmark 数据集上表现出色。
    Abstract The unequal representation of different groups in a sample population can lead to discrimination of minority groups when machine learning models make automated decisions. To address these issues, fairness-aware machine learning jointly optimizes two (or more) metrics aiming at predictive effectiveness and low unfairness. However, the inherent under-representation of minorities in the data makes the disparate treatment of subpopulations less noticeable and difficult to deal with during learning. In this paper, we propose a novel adversarial reweighting method to address such \emph{representation bias}. To balance the data distribution between the majority and the minority groups, our approach deemphasizes samples from the majority group. To minimize empirical risk, our method prefers samples from the majority group that are close to the minority group as evaluated by the Wasserstein distance. Our theoretical analysis shows the effectiveness of our adversarial reweighting approach. Experiments demonstrate that our approach mitigates bias without sacrificing classification accuracy, outperforming related state-of-the-art methods on image and tabular benchmark datasets.
    摘要 不平等的人群代表性在样本人口中可能导致机器学习模型自动做出不公正的决策。为解决这些问题,公平意识感机器学习同时优化了两个(或更多)指标,即预测效果和低不公正。然而,数据中少数群体的自然下降使得不同人群之间的不同待遇更加难以注意和处理。在这篇论文中,我们提出了一种新的对抗重量法来解决这种“表示偏见”问题。为了均衡多个人群的数据分布,我们的方法减少了主要群体的样本。为了最小化预测风险,我们的方法偏好主要群体的样本,它们与少数群体的距离根据沃氏距离最小。我们的理论分析表明了我们的对抗重量法的效果。实验表明,我们的方法可以减轻偏见而不损失分类精度,在图像和表格 benchmark 数据集上超过相关的现状方法。

Interpretation of the Transformer and Improvement of the Extractor

  • paper_url: http://arxiv.org/abs/2311.12678
  • repo_url: None
  • paper_authors: Zhe Chen
  • for: 本研究旨在提供对Transformer架构的深入理解和全面解释,以便更好地改进Transformer架构。
  • methods: 本研究使用了自己的理解和经验来对Transformer架构进行了全面的解释,并对这些解释进行了证明和验证。此外,本研究还提出了一种改进Extractor的方法,该方法可以在不添加额外可训练参数的情况下,提高Transformer架构的性能。
  • results: 实验结果表明,提高后的Extractor可以在不添加额外可训练参数的情况下,实现更高的性能,这表明了Transformer架构的改进方法。
    Abstract It has been over six years since the Transformer architecture was put forward. Surprisingly, the vanilla Transformer architecture is still widely used today. One reason is that the lack of deep understanding and comprehensive interpretation of the Transformer architecture makes it more challenging to improve the Transformer architecture. In this paper, we first interpret the Transformer architecture comprehensively in plain words based on our understanding and experiences. The interpretations are further proved and verified. These interpretations also cover the Extractor, a family of drop-in replacements for the multi-head self-attention in the Transformer architecture. Then, we propose an improvement on a type of the Extractor that outperforms the self-attention, without introducing additional trainable parameters. Experimental results demonstrate that the improved Extractor performs even better, showing a way to improve the Transformer architecture.
    摘要 六年以上了,Transformer架构仍然广泛使用。一个原因是因为没有深入理解和全面解释Transformer架构,使其更难提高。在这篇论文中,我们首先以简单的话语言解释Transformer架构,基于我们的理解和经验。这些解释还覆盖了多头自注意的家族替换器。然后,我们提出了一种改进该类替换器,超越自注意,无需添加额外可训练参数。实验结果表明,改进后的替换器表现更好,展示了如何提高Transformer架构。

Contrastive Left-Right Wearable Sensors (IMUs) Consistency Matching for HAR

  • paper_url: http://arxiv.org/abs/2311.12674
  • repo_url: None
  • paper_authors: Dominique Nshimyimana, Vitor Fortes Rey, Paul Lukowic
  • for: 这篇论文旨在探讨如何使用实际数据进行自主学习,不需要任何转换。
  • methods: 该方法利用实际数据中的对称性,通过对两个不同的传感器(左右手或腿上的IMU)进行对比,将相互出现的传感器数据变得更加相似,非相互出现的传感器数据变得更加不同。
  • results: 在MM-Fit数据集上,该方法与基准方法SimCLR相比,有 statistically significant 的改善,而在Opportunity数据集上,则与基准方法和SimCLR相比,有轻微的改善。此外,该方法能够在仅使用少量数据进行训练时,仍能提高超参量基eline的性能。
    Abstract Machine learning algorithms are improving rapidly, but annotating training data remains a bottleneck for many applications. In this paper, we show how real data can be used for self-supervised learning without any transformations by taking advantage of the symmetry present in the activities. Our approach involves contrastive matching of two different sensors (left and right wrist or leg-worn IMUs) to make representations of co-occurring sensor data more similar and those of non-co-occurring sensor data more different. We test our approach on the Opportunity and MM-Fit datasets. In MM-Fit we show significant improvement over the baseline supervised and self-supervised method SimCLR, while for Opportunity there is significant improvement over the supervised baseline and slight improvement when compared to SimCLR. Moreover, our method improves supervised baselines even when using only a small amount of the data for training. Future work should explore under which conditions our method is beneficial for human activity recognition systems and other related applications.
    摘要 机器学习算法在快速进步,但训练数据标注仍然是许多应用程序的瓶颈。在这篇论文中,我们展示了如何使用实际数据进行自助学习,无需任何转换。我们的方法利用活动中的对称性,对左右腕或膝上IMU的两种感知器进行对比匹配,使同时发生的感知器数据表示更相似,不同时发生的感知器数据表示更不同。我们在MM-Fit和Opportunity数据集上测试了我们的方法,在MM-Fit数据集上显示了与基eline超级vised和自助学习方法SimCLR的显著改善,在Opportunity数据集上则表现出较小的改善。此外,我们的方法可以在训练数据量很小时也提高超级vised基eline的表现。未来的工作应该探讨我们的方法在人体活动识别系统和相关应用中的有效性。

Towards a more inductive world for drug repurposing approaches

  • paper_url: http://arxiv.org/abs/2311.12670
  • repo_url: https://github.com/ubioinformat/graphemb
  • paper_authors: Jesus de la Fuente, Guillermo Serrano, Uxía Veleiro, Mikel Casals, Laura Vera, Marija Pizurica, Antonio Pineda-Lucena, Idoia Ochoa, Silve Vicent, Olivier Gevaert, Mikel Hernaez
  • for: 预测药物-目标结合(DTI),以减少药物复用的成本和时间投入。
  • methods: 基于图模型学习,并使用新的负边采样策略以增强模型的鲁棒性和可重用性。
  • results: 通过精心评估当前DTI预测模型和数据集,发现现有模型在某些情况下存在泛化问题,并提出了一种新的负边采样策略来解决这个问题。
    Abstract Drug-target interaction (DTI) prediction is a challenging, albeit essential task in drug repurposing. Learning on graph models have drawn special attention as they can significantly reduce drug repurposing costs and time commitment. However, many current approaches require high-demanding additional information besides DTIs that complicates their evaluation process and usability. Additionally, structural differences in the learning architecture of current models hinder their fair benchmarking. In this work, we first perform an in-depth evaluation of current DTI datasets and prediction models through a robust benchmarking process, and show that DTI prediction methods based on transductive models lack generalization and lead to inflated performance when evaluated as previously done in the literature, hence not being suited for drug repurposing approaches. We then propose a novel biologically-driven strategy for negative edge subsampling and show through in vitro validation that newly discovered interactions are indeed true. We envision this work as the underpinning for future fair benchmarking and robust model design. All generated resources and tools are publicly available as a python package.
    摘要 药target交互(DTI)预测是一项具有挑战性的 yet essential task in drug repurposing. 学习图模型在这一领域吸引了特别的注意力,因为它们可以显著减少药品重定向成本和时间投入。然而,当前的方法 often require additional information beyond DTIs, which complicates their evaluation process and usability. 此外,当前的学习建筑结构也妨碍了其公平的比较。在这项工作中,我们首先进行了现有 DTI 数据集和预测模型的深入评估,并通过一种可靠的比较过程来评估这些模型。我们发现,基于推uctive模型的 DTI 预测方法缺乏泛化能力,并且在过去的 literatura 中所评估的方法性能不准确。我们然后提出了一种基于生物学驱动的负边样本采样策略,并通过尺度验证表明了新发现的交互是真实的。我们希望这项工作可以成为未来公平比较和稳定模型设计的基础。所有生成的资源和工具都公开可用,可以通过python包下载。

SSVEP-DAN: A Data Alignment Network for SSVEP-based Brain Computer Interfaces

  • paper_url: http://arxiv.org/abs/2311.12666
  • repo_url: https://github.com/cecnl/ssvep-dan
  • paper_authors: Sung-Yu Chen, Chi-Min Chang, Kuan-Jung Chiang, Chun-Shu Wei
  • For: The paper aims to address the challenge of data insufficiency in steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) by proposing a dedicated neural network model called SSVEP-DAN.* Methods: The proposed SSVEP-DAN model is designed to align SSVEP data across different domains, including various sessions, subjects, or devices. The model uses a novel domain adaptation technique to transform existing source SSVEP data into supplementary calibration data, which can significantly enhance SSVEP decoding accuracy in scenarios with limited calibration data.* Results: The paper presents experimental results across multiple cross-domain scenarios, demonstrating the capability of SSVEP-DAN to transform existing source SSVEP data into supplementary calibration data, leading to improved SSVEP decoding accuracy in scenarios with limited calibration data. The results suggest that SSVEP-DAN can be a catalyst for practical SSVEP-based BCI applications with minimal calibration.
    Abstract Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neural network model designed for aligning SSVEP data across different domains, which can encompass various sessions, subjects, or devices. Our experimental results across multiple cross-domain scenarios demonstrate SSVEP-DAN's capability to transform existing source SSVEP data into supplementary calibration data, significantly enhancing SSVEP decoding accuracy in scenarios with limited calibration data. We envision SSVEP-DAN as a catalyst for practical SSVEP-based BCI applications with minimal calibration. The source codes in this work are available at: https://github.com/CECNL/SSVEP-DAN.
    摘要 静态状态视觉诱发电位(SSVEP)基于的脑机器接口(BCI)提供了一种非侵入的通信方式,通过高速排序系统。然而,其效率受到个人培训数据的限制,这些数据通常在耗时的准备期间获取。为解决 SSVEP 基于 BCI 的数据不足问题,我们提出了 SSVEP-DAN,首个专门为 SSVEP 数据进行对齐的神经网络模型。我们在多个跨领域场景中进行了实验, demonstarted SSVEP-DAN 能够将源 SSVEP 数据转换为补充准备数据,明显提高 SSVEP 解码精度,特别是在有限准备数据的情况下。我们看到 SSVEP-DAN 将成为实用 SSVEP 基于 BCI 应用程序的catalyst,减少准备时间。源代码在这里可以获取:https://github.com/CECNL/SSVEP-DAN。

Carbohydrate NMR chemical shift predictions using E(3) equivariant graph neural networks

  • paper_url: http://arxiv.org/abs/2311.12657
  • repo_url: https://github.com/mariabankestad/geqshift
  • paper_authors: Maria Bånkestad, Keven M. Dorst, Göran Widmalm, Jerk Rönnols
  • for: 这篇论文的目的是理解碳水化合物的分子结构和核磁共振成像。
  • methods: 这篇论文使用了E(3)对称图内成神经网络来预测碳水化合物核磁共振спектrum。
  • results: 这种新方法可以减少均方差误差,比传统方法更加准确,并且具有更好的普适性和泛化能力。
    Abstract Carbohydrates, vital components of biological systems, are well-known for their structural diversity. Nuclear Magnetic Resonance (NMR) spectroscopy plays a crucial role in understanding their intricate molecular arrangements and is essential in assessing and verifying the molecular structure of organic molecules. An important part of this process is to predict the NMR chemical shift from the molecular structure. This work introduces a novel approach that leverages E(3) equivariant graph neural networks to predict carbohydrate NMR spectra. Notably, our model achieves a substantial reduction in mean absolute error, up to threefold, compared to traditional models that rely solely on two-dimensional molecular structure. Even with limited data, the model excels, highlighting its robustness and generalization capabilities. The implications are far-reaching and go beyond an advanced understanding of carbohydrate structures and spectral interpretation. For example, it could accelerate research in pharmaceutical applications, biochemistry, and structural biology, offering a faster and more reliable analysis of molecular structures. Furthermore, our approach is a key step towards a new data-driven era in spectroscopy, potentially influencing spectroscopic techniques beyond NMR.
    摘要 碳水化合物,生物系统中重要的组成部分,具有复杂的分子结构。核磁共振(NMR) спектроскопия在理解其分子结构中扮演关键角色,是评估和验证有机分子分子结构的必备工具。在这个过程中,我们提出了一种新的方法,利用E(3)对称Graph Neural Network(GNN)预测碳水化合物NMR спектrum。与传统模型相比,我们的模型可以减少均方差误差,达到3倍以上的减少,即使数据有限。这种模型的稳定性和泛化能力在有限数据情况下表现出色, indicating its potential in accelerating research in pharmaceutical applications, biochemistry, and structural biology.此外,我们的方法可能会影响spectroscopy技术的发展,不仅限于NMR。

FedDRO: Federated Compositional Optimization for Distributionally Robust Learning

  • paper_url: http://arxiv.org/abs/2311.12652
  • repo_url: None
  • paper_authors: Prashant Khanduri, Chengyin Li, Rafi Ibn Sultan, Yao Qiang, Joerg Kliewer, Dongxiao Zhu
  • For: The paper is written to address the challenges of solving compositional optimization (CO) problems in the federated learning (FL) setting, where large-scale and distributed data is available.* Methods: The paper proposes efficient FedAvg-type algorithms for solving non-convex CO problems in the FL setting, utilizing the DRO problem structure to design a communication strategy that controls the bias in the estimation of the compositional gradient.* Results: The paper achieves $\mathcal{O}(\epsilon^{-2})$ sample and $\mathcal{O}(\epsilon^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients, and corroborates the theoretical findings with empirical studies on large-scale DRO problems.Here is the same information in Simplified Chinese text:
  • for: 本 paper 是为了解决 federated learning(FL)中的 compositional optimization(CO)问题,即大规模和分布式数据的情况下的 CO 问题。
  • methods: 本 paper 提出了一种基于 FedAvg 的方法来解决非对称 CO 问题在 FL Setting 中,利用 DRO 问题结构设计了一种通信策略来控制每个客户端的LOCAL compositional gradient 估计中的偏好。
  • results: 本 paper 在 FL Setting 中实现了 $\mathcal{O}(\epsilon^{-2})$ 样本和 $\mathcal{O}(\epsilon^{-3/2})$ 通信复杂性,并实现了与客户端数量线性增长的速度。此外,paper 还通过大规模 DRO 问题的实证研究证明了其理论发现。
    Abstract Recently, compositional optimization (CO) has gained popularity because of its applications in distributionally robust optimization (DRO) and many other machine learning problems. Large-scale and distributed availability of data demands the development of efficient federated learning (FL) algorithms for solving CO problems. Developing FL algorithms for CO is particularly challenging because of the compositional nature of the objective. Moreover, current state-of-the-art methods to solve such problems rely on large batch gradients (depending on the solution accuracy) not feasible for most practical settings. To address these challenges, in this work, we propose efficient FedAvg-type algorithms for solving non-convex CO in the FL setting. We first establish that vanilla FedAvg is not suitable to solve distributed CO problems because of the data heterogeneity in the compositional objective at each client which leads to the amplification of bias in the local compositional gradient estimates. To this end, we propose a novel FL framework FedDRO that utilizes the DRO problem structure to design a communication strategy that allows FedAvg to control the bias in the estimation of the compositional gradient. A key novelty of our work is to develop solution accuracy-independent algorithms that do not require large batch gradients (and function evaluations) for solving federated CO problems. We establish $\mathcal{O}(\epsilon^{-2})$ sample and $\mathcal{O}(\epsilon^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients. We corroborate our theoretical findings with empirical studies on large-scale DRO problems.
    摘要 近期,compositional optimization(CO)在分布式机器学习(DRO)和其他机器学习问题中得到了广泛应用。随着数据的大规模和分布式化,开发高效的联合学习(FL)算法成为了解决CO问题的必要。在解决CO问题中,FL算法受到了作物的结构化目标函数的特殊挑战。现有的状态arius方法依赖于大批量的梯度(具体取决于解决精度),这些方法在实际应用中不可能实现。为此,在这个工作中,我们提出了高效的FedAvg-type算法来解决非对称CO问题。我们首先证明了vanilla FedAvg不适用于分布式CO问题,因为每个客户端上的数据不同性会使得组合性的本地 Compositional梯度估计中的偏见增加。为此,我们提出了一种基于DRO问题结构的联合学习框架FedDRO,该框架可以通过控制组合性梯度估计中的偏见来设计通信策略。我们的工作的一个关键创新是开发不依赖于大批量梯度(以及函数评估)的解决方案,可以在FL设置下实现$\mathcal{O}(\epsilon^{-2})$样本和$\mathcal{O}(\epsilon^{-3/2})$通信复杂度,同时实现线性增速。我们的理论发现和实验研究表明,这些算法在大规模DRO问题中具有优秀的性能。

Careful Selection and Thoughtful Discarding: Graph Explicit Pooling Utilizing Discarded Nodes

  • paper_url: http://arxiv.org/abs/2311.12644
  • repo_url: None
  • paper_authors: Chuang Liu, Wenhang Yu, Kuang Gao, Xueqi Ma, Yibing Zhan, Jia Wu, Bo Du, Wenbin Hu
  • for: 本研究旨在提高图神经网络(GNNs)中的图表示学习,通过图表示学习来提高图分类任务的性能。
  • methods: 本研究提出了一种新的图显式池化(GrePool)方法,该方法选择节点的过程不仅基于图 convolutional neural networks (CNNs)或多层感知器(MLPs),还基于每个节点对最终表示向量的影响。此外,本研究还提出了一种扩展版的GrePool(GrePool+),该方法在抛弃节点时应用了一个固定的损失函数,以便在训练过程中提高分类精度。
  • results: 根据在12个常用的数据集上进行的广泛的实验,GrePool在大多数数据集上超过14个基eline方法表现。同时,在应用GrePool+后,GrePool的性能得到了进一步改进,而无需增加计算成本。
    Abstract Graph pooling has been increasingly recognized as crucial for Graph Neural Networks (GNNs) to facilitate hierarchical graph representation learning. Existing graph pooling methods commonly consist of two stages: selecting top-ranked nodes and discarding the remaining to construct coarsened graph representations. However, this paper highlights two key issues with these methods: 1) The process of selecting nodes to discard frequently employs additional Graph Convolutional Networks or Multilayer Perceptrons, lacking a thorough evaluation of each node's impact on the final graph representation and subsequent prediction tasks. 2) Current graph pooling methods tend to directly discard the noise segment (dropped) of the graph without accounting for the latent information contained within these elements. To address the first issue, we introduce a novel Graph Explicit Pooling (GrePool) method, which selects nodes by explicitly leveraging the relationships between the nodes and final representation vectors crucial for classification. The second issue is addressed using an extended version of GrePool (i.e., GrePool+), which applies a uniform loss on the discarded nodes. This addition is designed to augment the training process and improve classification accuracy. Furthermore, we conduct comprehensive experiments across 12 widely used datasets to validate our proposed method's effectiveness, including the Open Graph Benchmark datasets. Our experimental results uniformly demonstrate that GrePool outperforms 14 baseline methods for most datasets. Likewise, implementing GrePool+ enhances GrePool's performance without incurring additional computational costs.
    摘要 graph pooling 已经被认为是几何学神经网络(GNNs)中一种重要的技术,以便实现层次化几何学表示学习。现有的几何学池化方法通常包括两个阶段:选择状态ranked nodes和将其搁置以建立简化的几何学表示。但是,这篇文章显示了两个关键问题:1)选择状态搁置过程通常还是使用更多的几何学径网络或多层感知器,未经充分评估每个状态的影响,以及后续的几何学表示和预测任务。2)现有的几何学池化方法通常直接将噪声段(dropped)的几何学搁置掉,未经考虑这些元素中含的潜在信息。为了解决第一个问题,我们引入了一种新的几何学Explicit Pooling(GrePool)方法,可以透过考虑状态之间的关系和最终的表示向量,实现分类的关键性。对于第二个问题,我们提出了增强版的GrePool(i.e., GrePool+),可以透过对搁置状态的均匀损失进行训练,以提高分类精度。此外,我们在12个常用的Open Graph Benchmark datasets上进行了广泛的实验,结果显示GrePool在大多数dataset上表现出色,超过14个基eline方法。另外,实现GrePool+可以将GrePool的性能提高,而不需要额外的计算成本。

Hierarchical Joint Graph Learning and Multivariate Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2311.12630
  • repo_url: None
  • paper_authors: Juhyeon Kim, Hyungeun Lee, Seungwon Yu, Ung Hwang, Wooyul Jung, Miseon Park, Kijung Yoon
  • for: 这篇论文旨在提供一种用于多变量时间序列的模型,以便更好地预测长期时间序列资料中的趋势。
  • methods: 本论文使用 graf neural network (GNN) 和注意力机制,以有效地学习时间序列资料中的下一步关系。
  • results: 根据实验结果,本论文的提案比现有的模型有23%的减少 Mean Squared Error (MSE)。
    Abstract Multivariate time series is prevalent in many scientific and industrial domains. Modeling multivariate signals is challenging due to their long-range temporal dependencies and intricate interactions--both direct and indirect. To confront these complexities, we introduce a method of representing multivariate signals as nodes in a graph with edges indicating interdependency between them. Specifically, we leverage graph neural networks (GNN) and attention mechanisms to efficiently learn the underlying relationships within the time series data. Moreover, we suggest employing hierarchical signal decompositions running over the graphs to capture multiple spatial dependencies. The effectiveness of our proposed model is evaluated across various real-world benchmark datasets designed for long-term forecasting tasks. The results consistently showcase the superiority of our model, achieving an average 23\% reduction in mean squared error (MSE) compared to existing models.
    摘要 这文本将被翻译为简化字的中文。多元时间序列在许多科学和工业领域非常普遍,模型多元信号具有复杂的长距离时间相互作用和复杂的互动——直接和间接。为了面对这些复杂性,我们提出一种将多元信号转换为图形中的节点,并在这些节点之间设置节点间的关系。 Specifically,我们充分利用图形神经网络(GNN)和注意机制来划时间序列数据的下一步。此外,我们建议使用层次信号分解在图形上执行多个空间相互作用。我们的提案模型的效果在许多实际世界的参考数据上进行了长期预测任务的评估。结果显示我们的模型具有超过23%的减少 Mean Squared Error(MSE),相比于现有的模型。

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning

  • paper_url: http://arxiv.org/abs/2311.12624
  • repo_url: None
  • paper_authors: Boumediene Hamzi, Marcus Hutter, Houman Owhadi
  • for: 这个论文的目的是探讨机器学习(ML)和算法信息理论(AIT)如何看待复杂性的问题。
  • methods: 该论文采用了AIT的视角来研究从数据中学习kernels的问题,特别是在kernel ridge regression中使用 sparse kernel flows的方法。
  • results: 该论文证明了使用 sparse kernel flows方法可以自然地从数据中学习kernels,而无需通过统计学的路径来 derivation。此外,论文还发现了MDL和RML之间的相似性和 diferencias,并证明了这种方法的可行性。
    Abstract Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This paper shows that it is not necessary to use the statistical route to derive Sparse Kernel Flows and that one can directly work with code-lengths and complexities that are concepts that show up in AIT.
    摘要

Koopman Learning with Episodic Memory

  • paper_url: http://arxiv.org/abs/2311.12615
  • repo_url: None
  • paper_authors: William T. Redman, Dean Huang, Maria Fonoberova, Igor Mezić
  • for: 学习非站点时间序列预测
  • methods: 使用koopman方法和 episodic memory机制
  • results: 显著改善预测性能 на синтетиче和实际数据上
    Abstract Koopman operator theory, a data-driven dynamical systems framework, has found significant success in learning models from complex, real-world data sets, enabling state-of-the-art prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman learning with the ability to learn from its own mistakes. To address this, we equip Koopman methods - developed for predicting non-stationary time-series - with an episodic memory mechanism, enabling global recall of (or attention to) periods in time where similar dynamics previously occurred. We find that a basic implementation of Koopman learning with episodic memory leads to significant improvements in prediction on synthetic and real-world data. Our framework has considerable potential for expansion, allowing for future advances, and opens exciting new directions for Koopman learning.
    摘要 科普曼算法理论,一种数据驱动动系统框架,在处理复杂实际数据集时取得了显著成功,实现了当前的预测和控制。相比传统机器学习方法,科普曼学习具有更高的解释性和较低的计算成本,使其成为非常吸引人的方法。然而,有很少关于使科普曼学习能够学习自己的错误的研究。为解决这个问题,我们在科普曼方法(开发用于预测非站立时序)中增加了一个 episodic memory 机制,允许在时间上搜索(或强调)过去的相似动力期间。我们发现,将基本的科普曼学习与 episodic memory 结合使用可以在 sintetic 和实际数据上显著提高预测性能。我们的框架具有很大的潜力,允许未来的扩展,开启了新的探索方向。

Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion

  • paper_url: http://arxiv.org/abs/2311.12613
  • repo_url: None
  • paper_authors: Keshav P. Keval, Vivek S. Borkar
  • for: 解决多代理Markov决策过程(MMDP)中每个代理的时间平均成本低于预先指定的代理特定上限。
  • methods: combining Q-learning algorithm with gossip algorithm和Metropolis-Hastings或Multiplicative Weights formalism,使用多个时间尺度,并证明在某些条件下,该算法可以准确达到每个代理的欲望的上限。
  • results: 在MMDP中,该算法可以实现时间平均成本低于预先指定的代理特定上限,并在更一般的情况下,对MMDP有jointly controlled per-stage costs进行了实验性证明。
    Abstract In this paper, we propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP). The goal, inspired by Blackwell's Approachability Theorem, is to lower the time average cost of each agent to below a pre-specified agent-specific bound. For the MMDP, we assume the state dynamics to be controlled by the joint actions of agents, but the per-stage costs to only depend on the individual agent's actions. We combine the Q-learning algorithm for a weighted combination of the costs of each agent, obtained by a gossip algorithm with the Metropolis-Hastings or Multiplicative Weights formalisms to modulate the averaging matrix of the gossip. We use multiple timescales in our algorithm and prove that under mild conditions, it approximately achieves the desired bounds for each of the agents. We also demonstrate the empirical performance of this algorithm in the more general setting of MMDPs having jointly controlled per-stage costs.
    摘要 在这篇论文中,我们提出了一种利用强化学习算法解决多 Agent Markov决策过程(MMDP)的问题。我们的目标,启发自黑威尔的接近性定理,是使每个代理人的时间平均成本低于预先指定的代理人特定的上限。对于MMDP,我们假设状态动力是由代理人共同控制的,但每个阶段的成本只取决于每个代理人的行动。我们将Q学习算法与Weighted Combination的成本来实现一个协调器,并使用多个时间尺度。我们证明,在某些条件下,我们的算法可以相对准确地实现每个代理人的目标。此外,我们还在更通用的MMDP中表明了我们的算法的实际表现。

A New Type Of Upper And Lower Bounds On Right-Tail Probabilities Of Continuous Random Variables

  • paper_url: http://arxiv.org/abs/2311.12612
  • repo_url: None
  • paper_authors: Nikola Zlatanov
  • for: 这 paper 是为了提供一种 Completely new type of upper and lower bounds on the right-tail probabilities of continuous random variables with unbounded support and with semi-bounded support from the left.
  • methods: 这 paper 使用的方法是基于 probability density function (PDF), its first derivative, and two parameters for tightening the bounds.
  • results: 这 paper 提供的 result 是一种新的 tail bounds, which is shown to be tight for a wide range of continuous random variables via numerical examples.
    Abstract In this paper, I present a completely new type of upper and lower bounds on the right-tail probabilities of continuous random variables with unbounded support and with semi-bounded support from the left. The presented upper and lower right-tail bounds depend only on the probability density function (PDF), its first derivative, and two parameters that are used for tightening the bounds. These tail bounds hold under certain conditions that depend on the PDF, its first and second derivatives, and the two parameters. The new tail bounds are shown to be tight for a wide range of continuous random variables via numerical examples.
    摘要 在这篇论文中,我提出了一种 completelly new的上下 bounds 类型,用于绝对随机变量的右尾概率。这些上下 bounds 仅仅取决于分布函数(PDF)、其首导函数和两个参数,用于紧紧化 bounds。这些尾 bounds 在满足某些基于 PDF、其首导函数和两个参数的条件下成立。我们通过 numerical examples 示出了这些尾 bounds 对许多不同类型的绝对随机变量的紧紧性。

ChronoPscychosis: Temporal Segmentation and Its Impact on Schizophrenia Classification Using Motor Activity Data

  • paper_url: http://arxiv.org/abs/2311.12590
  • repo_url: None
  • paper_authors: Pradnya Rajendra Jadhav, Raviprasad Aduri
    for:This paper aims to identify reliable biomarkers for the accurate classification of Schizophrenia patients using motor activity data.methods:The paper uses temporal pattern analysis and machine learning models to classify individuals with Schizophrenia based on their motor activity data. The dataset contains per-minute motor activity measurements collected for an average of 12.7 days in a row for each participant, and the authors use segmentation techniques to divide each day into six parts. They employ sixteen statistical features within these temporal segments and train seven machine learning models to evaluate their impact on classification.results:The results show that temporal segmentation significantly improves the classification of Schizophrenia patients and controls, with the LightGBM model outperforming the other six models. The authors find that distinguishing between diurnal and nocturnal segments amplifies the differences between Schizophrenia patients and controls, but further subdivisions into smaller time segments do not affect the AUC-ROC significantly. The paper concludes that extensive temporal classification beyond distinguishing between day and night does not yield substantial results, offering an efficient approach for further classification, early diagnosis, and monitoring of Schizophrenia.Here is the result in Simplified Chinese text:for: 这篇论文目的是通过动作数据来准确分类Schizophrenia患者。methods: 这篇论文使用时间模式分析和机器学习模型来分类Schizophrenia患者和控制人群的动作数据。数据集包含每分钟动作测量记录,每个参与者平均记录了12.7天的数据。作者使用时间段分法将每天分成六个部分,并使用16个统计特征来评估其影响分类。results: 结果表明,时间段分法可以有效地分类Schizophrenia患者和控制人群,LightGBM模型在七种机器学习模型中表现最佳。作者发现,将每天分成晨、下午、晚上和夜间四个部分可以增强对Schizophrenia患者和控制人群的分化。然而,进一步分为更小的时间段并不会对AUC-ROC产生显著影响。
    Abstract Schizophrenia is a complicated mental illness characterized by a broad spectrum of symptoms affecting cognition, behavior, and emotion. The task of identifying reliable biomarkers to classify Schizophrenia accurately continues to be a challenge in the field of psychiatry. We investigate the temporal patterns within the motor activity data as a potential key to enhancing the categorization of individuals with Schizophrenia, using the dataset having motor activity recordings of 22 Schizophrenia patients and 32 control subjects. The dataset contains per-minute motor activity measurements collected for an average of 12.7 days in a row for each participant. We dissect each day into segments (Twelve, Eight, six, four, three, and two parts) and evaluate their impact on classification. We employ sixteen statistical features within these temporal segments and train them on Seven machine learning models to get deeper insights. LightGBM model outperforms the other six models. Our results indicate that the temporal segmentation significantly improves the classification, with AUC-ROC = 0.93, F1 score = 0.84( LightGBM- without any segmentation) and AUC-ROC = 0.98, F1 score = 0.93( LightGBM- with segmentation). Distinguishing between diurnal and nocturnal segments amplifies the differences between Schizophrenia patients and controls. However, further subdivisions into smaller time segments do not affect the AUC- ROC significantly. Morning, afternoon, evening, and night partitioning gives similar classification performance to day-night partitioning. These findings are valuable as they indicate that extensive temporal classification beyond distinguishing between day and night does not yield substantial results, offering an efficient approach for further classification, early diagnosis, and monitoring of Schizophrenia.
    摘要 <>Schizophrenia 是一种复杂的心理疾病,表现为认知、行为和情感方面的各种症状。鉴别Schizophrenia 的可靠生物标志物仍然是心理医学领域中的挑战。我们通过分析motor activity数据中的时间 patrterns 来增强Schizophrenia 的分类,使用了22名Schizophrenia 患者和32名控制组的motor activity记录数据。数据集包含每分钟的motor activity测量,每名参与者收集了12.7天的数据。我们将每天分解成不同的时间段(12、8、6、4、3、2分钟),并评估它们对分类的影响。我们使用16个统计特征来训练7种机器学习模型,以获得更深入的理解。LightGBM模型在所有模型中表现最佳,我们的结果表明,时间分segmentation 可以明显提高分类效果,AUC-ROC = 0.93,F1 score = 0.84(LightGBM- без分segmentation)和AUC-ROC = 0.98,F1 score = 0.93(LightGBM- with segmentation)。通过分区Diurnal和Nocturnal Segmentation,可以强化对Schizophrenia 患者和控制组的分别。然而,进一步分解时间段不会对AUC-ROC产生显著影响。晨、下午、晚上和夜分 partitioning 具有相同的分类表现,与日夜分 partitioning 相同。这些发现有价值,因为它们表明,进一步的时间分类,超出了日夜分的分类,不会提供显著的改善,提供有效的方法 дляSchizophrenia 的早期诊断和监测。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Nonlinear System Identification of Swarm of UAVs Using Deep Learning Methods

  • paper_url: http://arxiv.org/abs/2311.12906
  • repo_url: None
  • paper_authors: Saman Yazdannik, Morteza Tayefi, Mojtaba Farrokh
  • for: 这个研究设计并评估了多种非线性系统识别技术,用于模型探索无人航空器群系统在平面空间中的动态。
  • methods: 研究使用了RNNs、CNNs和Neural ODE等学习方法,并对它们进行比较。
  • results: 结果显示,结合Neural ODE的模型,使用了稳定数据进行训练,能够灵活地预测无人航空器群的稳定性。
    Abstract This study designs and evaluates multiple nonlinear system identification techniques for modeling the UAV swarm system in planar space. learning methods such as RNNs, CNNs, and Neural ODE are explored and compared. The objective is to forecast future swarm trajectories by accurately approximating the nonlinear dynamics of the swarm model. The modeling process is performed using both transient and steady-state data from swarm simulations. Results show that the combination of Neural ODE with a well-trained model using transient data is robust for varying initial conditions and outperforms other learning methods in accurately predicting swarm stability.
    摘要 Translated into Simplified Chinese:这个研究设计和评估多种非线性系统identification技术,用于模型planar空间中的UAV群体系统。学习方法包括RNNs, CNNs,和Neural ODE,并对它们的能力进行比较,以 forecast future swarm trajectories by accurately approximating the nonlinear dynamics of the swarm model。模型进程使用了 Both transient and steady-state data from swarm simulations。结果表明,将Neural ODE与良好训练的模型结合使用,使用transient数据,对于不同的初始条件是Robust和其他学习方法在accurately predicting swarm stability。

Machine-Guided Discovery of a Real-World Rogue Wave Model

  • paper_url: http://arxiv.org/abs/2311.12579
  • repo_url: https://github.com/dionhaefner/rogue-wave-discovery
  • paper_authors: Dion Häfner, Johannes Gemmrich, Markus Jochum
  • for: 这 paper 的目的是探索如何使用机器学习模型进行科学发现,特别是在海洋恶势力波预测方面。
  • methods: 该 paper 使用了 causal 分析、深度学习、简洁性导向的模型选择和符号回归来从数据中发现一个新的符号模型。
  • results: 该 paper 通过训练一个人工神经网络,使用 causal 特征从广泛的波报船观测数据集中提取特征,并通过符号回归将黑盒模型转化为可读的 математиче Equation,以实现更好的预测性和理解。
    Abstract Big data and large-scale machine learning have had a profound impact on science and engineering, particularly in fields focused on forecasting and prediction. Yet, it is still not clear how we can use the superior pattern matching abilities of machine learning models for scientific discovery. This is because the goals of machine learning and science are generally not aligned. In addition to being accurate, scientific theories must also be causally consistent with the underlying physical process and allow for human analysis, reasoning, and manipulation to advance the field. In this paper, we present a case study on discovering a new symbolic model for oceanic rogue waves from data using causal analysis, deep learning, parsimony-guided model selection, and symbolic regression. We train an artificial neural network on causal features from an extensive dataset of observations from wave buoys, while selecting for predictive performance and causal invariance. We apply symbolic regression to distill this black-box model into a mathematical equation that retains the neural network's predictive capabilities, while allowing for interpretation in the context of existing wave theory. The resulting model reproduces known behavior, generates well-calibrated probabilities, and achieves better predictive scores on unseen data than current theory. This showcases how machine learning can facilitate inductive scientific discovery, and paves the way for more accurate rogue wave forecasting.
    摘要 大数据和大规模机器学习对科学和工程领域的预测和预测有着深远的影响,特别是在预测和预测方面。然而,还没有解释如何使用机器学习模型的优秀模式匹配能力为科学发现提供价值。这是因为机器学习和科学的目标通常不匹配。除了准确性,科学理论还需要符合物理过程的 causal consistency,并且允许人类分析、理解和操作,以提高领域的进步。在这篇论文中,我们提出了一个案例研究,旨在从数据中发现新的符号式模型,用于海洋恶性漫游波。我们使用 causal 分析、深度学习、偏好导向的模型选择和符号回归来训练人工神经网络,并选择了预测性和 causal invariance。我们将这个黑obox模型通过符号回归转换成一个数学方程,保留神经网络的预测能力,同时允许在现有波动理论中进行解释。得到的模型可以重现已知行为,生成准确抽样,并在未seen数据上达到现有理论的预测能力。这种示例揭示了如何使用机器学习进行 inductive 科学发现,并为更准确的恶性漫游波预测开辟了道路。

BEND: Benchmarking DNA Language Models on biologically meaningful tasks

  • paper_url: http://arxiv.org/abs/2311.12570
  • repo_url: https://github.com/frederikkemarin/bend
  • paper_authors: Frederikke Isa Marin, Felix Teufel, Marc Horlacher, Dennis Madsen, Dennis Pultz, Ole Winther, Wouter Boomsma
  • for: 这个论文的目的是为了评估 DNA 语言模型,并提供一个可靠的评估标准。
  • methods: 这个论文使用了一个名为 BEND 的 Benchmark,用于评估 DNA 语言模型的表现。 BEND 包含了一系列真实和生物学上有意义的下游任务,定义于人类基因组。
  • results: 研究发现,当前的 DNA LM 嵌入可以接近专家方法的表现在一些任务上,但只能捕捉长距离特征的有限信息。
    Abstract The genome sequence contains the blueprint for governing cellular processes. While the availability of genomes has vastly increased over the last decades, experimental annotation of the various functional, non-coding and regulatory elements encoded in the DNA sequence remains both expensive and challenging. This has sparked interest in unsupervised language modeling of genomic DNA, a paradigm that has seen great success for protein sequence data. Although various DNA language models have been proposed, evaluation tasks often differ between individual works, and might not fully recapitulate the fundamental challenges of genome annotation, including the length, scale and sparsity of the data. In this study, we introduce BEND, a Benchmark for DNA language models, featuring a collection of realistic and biologically meaningful downstream tasks defined on the human genome. We find that embeddings from current DNA LMs can approach performance of expert methods on some tasks, but only capture limited information about long-range features. BEND is available at https://github.com/frederikkemarin/BEND.
    摘要 genomic DNA 序列中包含细胞生物学过程的蓝图。自过去几个 décadas 来, genomic DNA 序列的可用性有很大提高,但实验室标注 genomic DNA 序列中不同类型的功能、非编码和调控元件的实验室标注仍然是非常昂贵和困难的。这引发了 DNA 语言模型的无监督学习的兴趣,这种 Paradigma 在蛋白质序列数据上已经取得了很大的成功。虽然不同的 DNA 语言模型已经被提出,但评估任务经常不同于具体的研究,并且可能不完全回归生物学上的基本挑战,包括人类基因组序列的长度、大小和稀疏性。在本研究中,我们介绍了 BEND,一个基本的 DNA 语言模型评估工具,其中包含人类基因组序列上的实际和生物学上有意义的下游任务。我们发现,现有的 DNA LM 的嵌入可以在某些任务上 approached 专家方法的性能,但只能捕捉到长距离特征的有限信息。BEND 可以在 上下载。

Differentiable Sampling of Categorical Distributions Using the CatLog-Derivative Trick

  • paper_url: http://arxiv.org/abs/2311.12569
  • repo_url: None
  • paper_authors: Lennert De Smet, Emanuele Sansone, Pedro Zuidberg Dos Martires
  • for: 这个论文主要针对的是如何在某些维度上学习 categorical 随机变量,即 discrete 和不确定的数据中的一部分。
  • methods: 这个论文使用了 Log-Derivative trick 和 CatLog-Derivative trick 等技术来估算 categorical 分布的Gradient。
  • results: 论文提出了一种新的、不偏的 gradient estimator 名为 IndeCateR,其可以更好地估算 products of independent categorical distributions 中的 gradient,并且可以在实际应用中fficiently implement。
    Abstract Categorical random variables can faithfully represent the discrete and uncertain aspects of data as part of a discrete latent variable model. Learning in such models necessitates taking gradients with respect to the parameters of the categorical probability distributions, which is often intractable due to their combinatorial nature. A popular technique to estimate these otherwise intractable gradients is the Log-Derivative trick. This trick forms the basis of the well-known REINFORCE gradient estimator and its many extensions. While the Log-Derivative trick allows us to differentiate through samples drawn from categorical distributions, it does not take into account the discrete nature of the distribution itself. Our first contribution addresses this shortcoming by introducing the CatLog-Derivative trick - a variation of the Log-Derivative trick tailored towards categorical distributions. Secondly, we use the CatLog-Derivative trick to introduce IndeCateR, a novel and unbiased gradient estimator for the important case of products of independent categorical distributions with provably lower variance than REINFORCE. Thirdly, we empirically show that IndeCateR can be efficiently implemented and that its gradient estimates have significantly lower bias and variance for the same number of samples compared to the state of the art.
    摘要 categorial random variables can faithfully represent the discrete and uncertain aspects of data as part of a discrete latent variable model. Learning in such models necessitates taking gradients with respect to the parameters of the categorical probability distributions, which is often intractable due to their combinatorial nature. A popular technique to estimate these otherwise intractable gradients is the Log-Derivative trick. This trick forms the basis of the well-known REINFORCE gradient estimator and its many extensions. While the Log-Derivative trick allows us to differentiate through samples drawn from categorical distributions, it does not take into account the discrete nature of the distribution itself. Our first contribution addresses this shortcoming by introducing the CatLog-Derivative trick - a variation of the Log-Derivative trick tailored towards categorical distributions. Secondly, we use the CatLog-Derivative trick to introduce IndeCateR, a novel and unbiased gradient estimator for the important case of products of independent categorical distributions with provably lower variance than REINFORCE. Thirdly, we empirically show that IndeCateR can be efficiently implemented and that its gradient estimates have significantly lower bias and variance for the same number of samples compared to the state of the art.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

Variational Elliptical Processes

  • paper_url: http://arxiv.org/abs/2311.12566
  • repo_url: None
  • paper_authors: Maria Bånkestad, Jens Sjölund, Jalil Taghia, Thomas B. Schöon
  • for: 这篇论文是为了描述一种非 Parametric 随机模型,它包含 Gaussian 过程和 Student’s t 过程的总和,并且包含了一些新的重 tailed 行为。
  • methods: 这篇论文使用了一种基于椭圆分布的表示,将椭圆分布转换为一个连续的混合 Gaussian 分布,并使用变量推断来参数化这个混合分布。
  • results: 论文通过 regression 和 classification 实验表明,elliptical 过程可以在大规模问题中应用,并且在一些情况下可以超越 Gaussian 过程,如非 Gaussian 的 likelihood 或者需要高精度的 tail 模型。
    Abstract We present elliptical processes, a family of non-parametric probabilistic models that subsume Gaussian processes and Student's t processes. This generalization includes a range of new heavy-tailed behaviors while retaining computational tractability. Elliptical processes are based on a representation of elliptical distributions as a continuous mixture of Gaussian distributions. We parameterize this mixture distribution as a spline normalizing flow, which we train using variational inference. The proposed form of the variational posterior enables a sparse variational elliptical process applicable to large-scale problems. We highlight advantages compared to Gaussian processes through regression and classification experiments. Elliptical processes can supersede Gaussian processes in several settings, including cases where the likelihood is non-Gaussian or when accurate tail modeling is essential.
    摘要 我们介绍了椭圆过程,一种非参数 probabilistic 模型家族,包括 Gaussian 过程和 Student's t 过程的总结。这种总结包括一些新的重 tailed 行为,而且保持计算可 tractability。椭圆过程基于椭圆分布的连续混合模型,我们用 spline 正规流来参数化这个混合模型。我们使用变分INFERENCE 来训练这个变量 posteriors,并且可以通过大规模问题中的稀疏变量 elliptical process 来实现。我们在回归和分类 экспериментах中强调了这种方法的优势,比如在非 Gaussian 的 likelihood 中或者需要高精度尾部模型时。椭圆过程可以在一些情况下超越 Gaussian 过程,例如非 Gaussian 的 likelihood 中或者需要高精度尾部模型时。

Summary of the DISPLACE Challenge 2023 – DIarization of SPeaker and LAnguage in Conversational Environments

  • paper_url: http://arxiv.org/abs/2311.12564
  • repo_url: None
  • paper_authors: Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy
  • for: 这个论文是关于多语言多说者的会话中提取信息的技术问题。
  • methods: 这个论文使用了演讲技术来分类多语言多说者的会话,并提供了一个基准系统来评估这些技术。
  • results: 论文描述了一个开放挑战(DISPLACE),用于评估多语言多说者会话中的 speaker 和语言分类技术。挑战中有两个跟踪,一个是关于 speaker 分类(SD),另一个是关于语言分类(LD)。两个跟踪都使用了同一个原始音频数据进行评估。挑战收到了全球共42个注册和19个提交。这篇论文还提供了一个简洁的概述,关于提交的系统和其性能。
    Abstract In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages. Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers. The DISPLACE (DIarization of SPeaker and LAnguage in Conversational Environments) challenge constitutes an open-call for evaluating and bench-marking the speaker and language diarization technologies on this challenging condition. The challenge entailed two tracks: Track-1 focused on speaker diarization (SD) in multilingual situations while, Track-2 addressed the language diarization (LD) in a multi-speaker scenario. Both the tracks were evaluated using the same underlying audio data. To facilitate this evaluation, a real-world dataset featuring multilingual, multi-speaker conversational far-field speech was recorded and distributed. Furthermore, a baseline system was made available for both SD and LD task which mimicked the state-of-art in these tasks. The challenge garnered a total of $42$ world-wide registrations and received a total of $19$ combined submissions for Track-1 and Track-2. This paper describes the challenge, details of the datasets, tasks, and the baseline system. Additionally, the paper provides a concise overview of the submitted systems in both tracks, with an emphasis given to the top performing systems. The paper also presents insights and future perspectives for SD and LD tasks, focusing on the key challenges that the systems need to overcome before wide-spread commercial deployment on such conversations.
    摘要 在多语言社会中, WHERE 多种语言在小 geographic vicinity 中被使用, informal conversations 常常包含多种语言的混合。现有的 speech 技术可能无法有效地从这些 conversations 中提取信息,因为 speech 数据具有多种语言和 speaker 的多样性。为了解决这个问题, DISPLACE (DIarization of SPeaker and LAnguage in Conversational Environments) 挑战成为了一个开放的评估和比较 speaker 和 language 分类技术的平台。挑战包括了两个track:Track-1 专注于 speaker diarization (SD)在多语言 situational 中,而 Track-2 则关注了 language diarization (LD)在多 speaker 场景中。两个 track 都是通过同一个基础 dataset 进行评估。为了促进这种评估,一个实际的多语言、多 speaker 对话型 far-field speech 数据集被记录和分发。此外,一个基eline system 也被提供了,以模拟当前 SD 和 LD 任务的状态。挑战共收到了全球 $42$ 个注册,并接收了 $19$ 个共订 submissions ,包括 Track-1 和 Track-2 两个 track 的 submissions。这篇文章介绍了挑战,包括数据集、任务和基eline system 的详细信息。此外,文章还提供了 submissions 的概括,强调 top performing systems。文章还提供了 SD 和 LD 任务的材料和未来展望,专注于系统需要在这些对话中解决的关键挑战。

Explainable Anomaly Detection using Masked Latent Generative Modeling

  • paper_url: http://arxiv.org/abs/2311.12550
  • repo_url: None
  • paper_authors: Daesoo Lee, Sara Malacarne, Erlend Aune
  • for: 这个研究是为了提出一个新的时间序列偏常检测方法,其可以实现高精度的检测结果,并且提供更高的解释性。
  • methods: 本研究使用的方法是基于时间序列生成模型,具体来说是使用TimeVQVAE模型,这个模型在时间频率域的潜在空间中进行了覆盖式生成。这使得在不同的频率带中保留了时间序列的尺度 semantics,从而为检测偏常提供更好的解释。
  • results: 实验结果显示,TimeVQVAE-AD方法在UCRL Time Series Anomaly档案中具有较高的检测精度和解释性,较之前的方法有所改善。
    Abstract We present a novel time series anomaly detection method that achieves excellent detection accuracy while offering a superior level of explainability. Our proposed method, TimeVQVAE-AD, leverages masked generative modeling adapted from the cutting-edge time series generation method known as TimeVQVAE. The prior model is trained on the discrete latent space of a time-frequency domain. Notably, the dimensional semantics of the time-frequency domain are preserved in the latent space, enabling us to compute anomaly scores across different frequency bands, which provides a better insight into the detected anomalies. Additionally, the generative nature of the prior model allows for sampling likely normal states for detected anomalies, enhancing the explainability of the detected anomalies through counterfactuals. Our experimental evaluation on the UCR Time Series Anomaly archive demonstrates that TimeVQVAE-AD significantly surpasses the existing methods in terms of detection accuracy and explainability.
    摘要 我们提出了一种新的时间序列异常检测方法,它在检测精度和解释性两个方面具有出色的表现。我们的提议方法,时间VQVAE-AD,利用了遮盖性生成模型,这种模型是基于时间序列生成领域的前沿技术之一。该模型在时间-频域中的权重空间进行训练,并保留了时间-频域中的维度 semantics,因此可以在不同的频率带上计算异常分数,从而提供更好的异常检测result。此外,生成性的特性使得可以对检测到的异常进行采样,从而提高了异常检测的解释性。我们对UCRL时间序列异常存档进行实验表明,TimeVQVAE-AD在检测精度和解释性两个方面都有明显的优势。

Learning to Compute Gröbner Bases

  • paper_url: http://arxiv.org/abs/2311.12904
  • repo_url: None
  • paper_authors: Hiroshi Kera, Yuki Ishihara, Yuta Kambe, Tristan Vaccon, Kazuhiro Yokoyama
  • for: 本研究的目的是通过训练变换器来实现多变量多式系统的格氏矩阵计算,以提高计算效率。
  • methods: 本研究使用了随机生成格氏矩阵和转换非格氏矩阵的方法,以及使用零维度 radical ideals 解决 backwards Gröbner problem。
  • results: 实验表明,在五变量情况下,提议的数据集生成方法比直观方法快五个数量级,解决了计算格氏矩阵的重要挑战。
    Abstract Solving a polynomial system, or computing an associated Gr\"obner basis, has been a fundamental task in computational algebra. However, it is also known for its notoriously expensive computational cost -- doubly exponential time complexity in the number of variables in the worst case. In this paper, we achieve for the first time Gr\"obner basis computation through the training of a transformer. The training requires many pairs of a polynomial system and the associated Gr\"obner basis, thus motivating us to address two novel algebraic problems: random generation of Gr\"obner bases and the transformation of them into non-Gr\"obner polynomial systems, termed as \textit{backward Gr\"obner problem}. We resolve these problems with zero-dimensional radical ideals, the ideals appearing in various applications. The experiments show that in the five-variate case, the proposed dataset generation method is five orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gr\"obner bases.
    摘要 解决一个多项式系统或计算其相关的格罗本基 hath been a fundamental task in computational algebra. However, it is also known for its notoriously expensive computational cost -- doubly exponential time complexity in the number of variables in the worst case. In this paper, we achieve for the first time Gr\"obner basis computation through the training of a transformer. The training requires many pairs of a polynomial system and the associated Gr\"obner basis, thus motivating us to address two novel algebraic problems: random generation of Gr\"obner bases and the transformation of them into non-Gr\"obner polynomial systems, termed as \textit{backward Gr\"obner problem}. We resolve these problems with zero-dimensional radical ideals, the ideals appearing in various applications. The experiments show that in the five-variate case, the proposed dataset generation method is five orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gr\"obner bases.Here's the translation in Traditional Chinese:解决一个多项式系统或计算其相关的格罗本基 hath been a fundamental task in computational algebra. However, it is also known for its notoriously expensive computational cost -- doubly exponential time complexity in the number of variables in the worst case. In this paper, we achieve for the first time Gr\"obner basis computation through the training of a transformer. The training requires many pairs of a polynomial system and the associated Gr\"obner basis, thus motivating us to address two novel algebraic problems: random generation of Gr\"obner bases and the transformation of them into non-Gr\"obner polynomial systems, termed as \textit{backward Gr\"obner problem}. We resolve these problems with zero-dimensional radical ideals, the ideals appearing in various applications. The experiments show that in the five-variate case, the proposed dataset generation method is five orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gr\"obner bases.

An efficient likelihood-free Bayesian inference method based on sequential neural posterior estimation

  • paper_url: http://arxiv.org/abs/2311.12530
  • repo_url: https://github.com/yifei-xiong/efficient-snpe
  • paper_authors: Yifei Xiong, Xiliang Yang, Sanguo Zhang, Zhijian He
  • for: 用于处理基于模拟的模型,具有不可解likelihood的问题。
  • methods: 使用Sequential Neural Posterior Estimation(SNPE)技术,通过神经网络基于的Conditional Density Estimator(CDE)来学习 posterior。
  • results: 提出了一种适应calibration kernel的SNPE方法,以及多种减少方差的技术,从而提高了SNPE的稳定性和训练效率,并在数值实验中证明了其比原始SNPE方法和一些现有竞争者更好地预测 posterior。
    Abstract Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. Unlike approximate Bayesian computation, SNPE techniques learn the posterior from sequential simulation using neural network-based conditional density estimators by minimizing a specific loss function. The SNPE method proposed by Lueckmann et al. (2017) used a calibration kernel to boost the sample weights around the observed data, resulting in a concentrated loss function. However, the use of calibration kernels may increase the variances of both the empirical loss and its gradient, making the training inefficient. To improve the stability of SNPE, this paper proposes to use an adaptive calibration kernel and several variance reduction techniques. The proposed method greatly speeds up the process of training, and provides a better approximation of the posterior than the original SNPE method and some existing competitors as confirmed by numerical experiments.
    摘要 对于基于模拟的模型,Sequential Neural Posterior Estimation(SNPE)技术最近得到了提出。与 approximate Bayesian computation 不同,SNPE 技术通过使用神经网络基于的 conditional density 估计器来学习 posterior。Lueckmann et al. (2017)所提出的 SNPE 方法使用 calibration kernel 提高样本权重 around observed data,从而导致一个集中的损失函数。然而,使用 calibration kernels 可能会增加样本权重的方差,使训练变得不稳定。为了改进 SNPE 的稳定性,本文提出了使用自适应 calibration kernel 和一些减少方差技术。提议的方法可以快速加速训练过程,并提供更好的 posterior 估计than original SNPE 方法和一些现有的竞争对手,如数字实验所证明。

Inverse Problems with Learned Forward Operators

  • paper_url: http://arxiv.org/abs/2311.12528
  • repo_url: None
  • paper_authors: Simon Arridge, Andreas Hauptmann, Yury Korolev
  • for: 这个论文的目的是解决反问题,但是精确的模型可能是计算成本高昂的,所以需要一些便宜的方法来降低计算成本而不影响重建质量。
  • methods: 这篇论文评论了在反问题中使用学习前向Operator的重建方法,这些方法分为两个不同的方法 paradigm。第一种是完全不知道前向Operator的方法,它学习了 restrict 到训练数据所表示的子空间上的前向Operator。然后使用regulization by projection来找到重建。第二种方法使用测量过程的物理模型的简化版本,只需要使用训练数据来学习一个模型修正。
  • results: 我们提出了这两种方法的理论基础,并通过数值比较这两种方法的性能。结果表明,这两种方法都需要或至少受益于训练数据不仅包括前向Operator,还包括其 adj 。
    Abstract Solving inverse problems requires knowledge of the forward operator, but accurate models can be computationally expensive and hence cheaper variants are desired that do not compromise reconstruction quality. This chapter reviews reconstruction methods in inverse problems with learned forward operators that follow two different paradigms. The first one is completely agnostic to the forward operator and learns its restriction to the subspace spanned by the training data. The framework of regularisation by projection is then used to find a reconstruction. The second one uses a simplified model of the physics of the measurement process and only relies on the training data to learn a model correction. We present the theory of these two approaches and compare them numerically. A common theme emerges: both methods require, or at least benefit from, training data not only for the forward operator, but also for its adjoint.
    摘要 Translated into Simplified Chinese:解决反问题需要对前进Operator的了解,但高精度的模型可能是计算成本高昂的,因此需要更加便宜的方法。这章介绍了在反问题中使用学习前进Operator的两种不同方法。第一种是完全不知道前进Operator,只学习其 restrict 到训练数据所 span 的子空间中的部分。然后使用 regularization by projection 方法来找到重建。第二种使用 measurement 过程的物理模型简化,只靠训练数据来学习模型 correction。我们提出了这两种方法的理论,并对它们进行数值比较。一个共同主题出现:两种方法都需要、或至少受益于,训练数据不仅包括前进Operator,还包括其 adj 。

Local Convolution Enhanced Global Fourier Neural Operator For Multiscale Dynamic Spaces Prediction

  • paper_url: http://arxiv.org/abs/2311.12902
  • repo_url: None
  • paper_authors: Xuanle Zhao, Yue Sun, Tielin Zhang, Bo Xu
  • for: 解决多尺度partial differential equations (PDEs)问题
  • methods: 提出了一种基于Fourier Neural Operator (FNO)的 hierarchical neural operator,包括改进的Fourier层和注意机制,以解决多尺度PDEs问题
  • results: 在多种物理场景中进行了实验,对多尺度动态空间的预测和解析具有优秀表现,特别是在快变 coefficients的情况下
    Abstract Neural operators extend the capabilities of traditional neural networks by allowing them to handle mappings between function spaces for the purpose of solving partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which is inspired by Green's function method and approximate operator kernel directly in the frequency domain. In this work, we focus on predicting multiscale dynamic spaces, which is equivalent to solving multiscale PDEs. Multiscale PDEs are characterized by rapid coefficient changes and solution space oscillations, which are crucial for modeling atmospheric convection and ocean circulation. To solve this problem, models should have the ability to capture rapid changes and process them at various scales. However, the FNO only approximates kernels in the low-frequency domain, which is insufficient when solving multiscale PDEs. To address this challenge, we propose a novel hierarchical neural operator that integrates improved Fourier layers with attention mechanisms, aiming to capture all details and handle them at various scales. These mechanisms complement each other in the frequency domain and encourage the model to solve multiscale problems. We perform experiments on dynamic spaces governed by forward and reverse problems of multiscale elliptic equations, Navier-Stokes equations and some other physical scenarios, and reach superior performance in existing PDE benchmarks, especially equations characterized by rapid coefficient variations.
    摘要

From Microbes to Methane: AI-Based Predictive Modeling of Feed Additive Efficacy in Dairy Cows

  • paper_url: http://arxiv.org/abs/2311.12901
  • repo_url: None
  • paper_authors: Yaniv Altshuler, Tzruya Calvao Chebach, Shalom Cohen
    for:The paper aims to optimize livestock feed for enhancing yield and minimizing environmental impact in sustainable agriculture.methods:The study uses rumen microbiome data to predict the efficacy of feed additives in dairy cattle, with a dataset of methane emissions from 2,190 Holstein cows across 34 sites. The experimental groups were administered four leading commercial feed additives, and methane emissions were measured before and after the administration of additives. The study also used deep metagenomic shotgun sequencing of rumen microbiome samples from 510 cows to develop a predictive model for additive efficacy.results:The study found that using targeted feed additive strategies can significantly reduce methane emissions and optimize dairy yield and milk composition. The predictive model developed in the study demonstrates the potential for reducing overall emissions by over 27% by guiding the assignment of additives to farms where they are most effective.
    Abstract In an era of increasing pressure to achieve sustainable agriculture, the optimization of livestock feed for enhancing yield and minimizing environmental impact is a paramount objective. This study presents a pioneering approach towards this goal, using rumen microbiome data to predict the efficacy of feed additives in dairy cattle. We collected an extensive dataset that includes methane emissions from 2,190 Holstein cows distributed across 34 distinct sites. The cows were divided into control and experimental groups in a double-blind, unbiased manner, accounting for variables such as age, days in lactation, and average milk yield. The experimental groups were administered one of four leading commercial feed additives: Agolin, Kexxtone, Allimax, and Relyon. Methane emissions were measured individually both before the administration of additives and over a subsequent 12-week period. To develop our predictive model for additive efficacy, rumen microbiome samples were collected from 510 cows from the same herds prior to the study's onset. These samples underwent deep metagenomic shotgun sequencing, yielding an average of 15.7 million reads per sample. Utilizing innovative artificial intelligence techniques we successfully estimated the efficacy of these feed additives across different farms. The model's robustness was further confirmed through validation with independent cohorts, affirming its generalizability and reliability. Our results underscore the transformative capability of using targeted feed additive strategies to both optimize dairy yield and milk composition, and to significantly reduce methane emissions. Specifically, our predictive model demonstrates a scenario where its application could guide the assignment of additives to farms where they are most effective. In doing so, we could achieve an average potential reduction of over 27\% in overall emissions.
    摘要 在现代农业中,增进可持续的农业发展对于提高产量和减少环境影响是一个 Paramount 的目标。这项研究提出了一种创新的方法,使用羊肠微生物数据预测添加剂效果在牛奶畜牧中。我们收集了一个很大的数据集,包括2,190头的豪士兰牛,分布于34个不同的地点。这些牛被分为控制和试验组,按照年龄、产 milk 天数和平均牛奶产量进行匹配。试验组接受了四种商业添加剂:Agolin、Kexxtone、Allimax 和 Relyon。羊肠气体量被测量为每个牛的前期和12周后。为建立我们的预测模型,羊肠微生物样本从510头牛的同一个牧场上收集,并进行深入的 метаGENOMIC 枪扫sequencing,每个样本平均获得15.7万个读取。通过应用创新的人工智能技术,我们成功地预测了这些添加剂的效果在不同的农场。模型的稳定性得到了独立的验证,证明其普遍性和可靠性。我们的结果表明,通过使用目标添加剂策略,可以同时优化牛奶产量和牛奶组分,并大幅减少气体排放。具体来说,我们的预测模型表明,通过对农场分配添加剂,可以实现平均减少27.3%的总排放。

Fair Polylog-Approximate Low-Cost Hierarchical Clustering

  • paper_url: http://arxiv.org/abs/2311.12501
  • repo_url: None
  • paper_authors: Marina Knittel, Max Springer, John Dickerson, MohammadTaghi Hajiaghayi
  • for: 本研究旨在提出一种可靠的、低成本的、层次归一化 clustering 算法,以解决现代智能系统中的伦理争议。
  • methods: 我们提出了一种基于 Dasgupta 的成本函数优化的 hierarchical clustering 算法,并且使用了一种新的 polylogarithmic-approximate 方法来缩小执行成本。
  • results: 我们的算法可以在低成本下实现高度的层次归一化 clustering,并且可以bridge the gap between the best fair和vanilla hierarchical clustering approximations。
    Abstract Research in fair machine learning, and particularly clustering, has been crucial in recent years given the many ethical controversies that modern intelligent systems have posed. Ahmadian et al. [2020] established the study of fairness in \textit{hierarchical} clustering, a stronger, more structured variant of its well-known flat counterpart, though their proposed algorithm that optimizes for Dasgupta's [2016] famous cost function was highly theoretical. Knittel et al. [2023] then proposed the first practical fair approximation for cost, however they were unable to break the polynomial-approximate barrier they posed as a hurdle of interest. We break this barrier, proposing the first truly polylogarithmic-approximate low-cost fair hierarchical clustering, thus greatly bridging the gap between the best fair and vanilla hierarchical clustering approximations.
    摘要 Knittel et al. (2023) proposed the first practical fair approximation for cost, but were unable to break the polynomial-approximate barrier they posed as a challenge. Our proposed algorithm, on the other hand, achieves truly polylogarithmic-approximate low-cost fair hierarchical clustering, bridging the gap between the best fair and vanilla hierarchical clustering approximations.

Multi-Objective Reinforcement Learning based on Decomposition: A taxonomy and framework

  • paper_url: http://arxiv.org/abs/2311.12495
  • repo_url: https://github.com/lucasalegre/morl-baselines
  • paper_authors: Florian Felten, El-Ghazali Talbi, Grégoire Danoy
  • for: 本研究旨在探讨多目标强化学习(MORL)如何帮助agent在多个目标之间做出妥协。
  • methods: 本研究基于多目标优化(MOO)和分解(D)的方法,提出了一种基于分解的MORL方法,并提供了一个完整的分类系统来涵盖现有和未来MORL研究。
  • results: 研究表明,基于分解的MORL方法可以在多个配置下实现比较好的性能,并且具有较高的灵活性和可扩展性。
    Abstract Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces Multi-Objective Reinforcement Learning based on Decomposition (MORL/D), a novel methodology bridging RL and MOO literature. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Implementation across various configurations demonstrates its versatility, assessed against benchmark problems. Results indicate MORL/D instantiations achieve comparable performance with significantly greater versatility than current state-of-the-art approaches. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL, contributing to the continued advancement of this field.
    摘要 多目标强化学习(MORL)是传统强化学习的扩展,它寻求Policy取得不同的compromise among conflicting objectives。在这些研究的兴趣的最近增长以来,有多种研究和解决方法,经常借鉴现有的多目标优化基于分解(MOO/D)的知识。然而,现有文献中没有一个清晰的分类系统,使得MORL研究人员在归类相关研究时面临困难,因为缺乏一个标准化的分类系统。为解决这个问题,本文提出了基于分解的多目标强化学习(MORL/D),一种新的方法ología,将RL和MOO/D两者相结合。本文还提供了一个完整的分类系统,用于分类现有和潜在的MORL工作。这个分类系统的引入,使得MORL研究更加明了清晰,结构化,并且通过Well-defined分类,提高了文献的简洁性。此外,本文还提出了一个灵活的框架,基于分类系统,可以容纳多种实例,使用RL和MOO/D两者的工具。在不同的配置下进行实现,这个框架的灵活性得到了证明,并且与标准方法相比,实现了更高的多样性。结果表明,MORL/D实例可以达到相当的性能,同时具有更大的多样性。通过提出分类系统和框架,本文为MORL提供了一个全面的视角和一个统一的术语,不仅方便了Algorithmic贡献的标识,还为MORL领域的进一步发展奠定了基础。

Heuristics for Detecting CoinJoin Transactions on the Bitcoin Blockchain

  • paper_url: http://arxiv.org/abs/2311.12491
  • repo_url: None
  • paper_authors: Hugo Schnoering, Michalis Vazirgiannis
  • for: 这份研究探讨比特币的各种方面,包括分布式Peer-to-Peer网络和其相关的区块链,以及用户隐私权的挑战。
  • methods: 这份研究使用了对比特币区块链的分析,以及对CoinJoin协议的实现和分析。
  • results: 研究对比特币区块链上CoinJoin交易的识别提出了新的HEURISTICS,并对这些HEURISTICS的效果进行了全面的分析。
    Abstract This research delves into the intricacies of Bitcoin, a decentralized peer-to-peer network, and its associated blockchain, which records all transactions since its inception. While this ensures integrity and transparency, the transparent nature of Bitcoin potentially compromises users' privacy rights. To address this concern, users have adopted CoinJoin, a method that amalgamates multiple transaction intents into a single, larger transaction to bolster transactional privacy. This process complicates individual transaction tracing and disrupts many established blockchain analysis heuristics. Despite its significance, limited research has been conducted on identifying CoinJoin transactions. Particularly noteworthy are varied CoinJoin implementations such as JoinMarket, Wasabi, and Whirlpool, each presenting distinct challenges due to their unique transaction structures. This study delves deeply into the open-source implementations of these protocols, aiming to develop refined heuristics for identifying their transactions on the blockchain. Our exhaustive analysis covers transactions up to block 760,000, offering a comprehensive insight into CoinJoin transactions and their implications for Bitcoin blockchain analysis.
    摘要 Translation notes:* "decentralized" is translated as "分布式" (pínshū zhì)* "peer-to-peer" is translated as "点对点" (diǎn duī diǎn)* "blockchain" is translated as "区块链" (kuàng zhì lián)* "transactions" is translated as "交易" (jiāoyì)* "integrity" is translated as "完整性" (wánchégòu xìng)* "transparency" is translated as "透明性" (tòu míng xìng)* "privacy" is translated as "隐私" (yǐn wèi)* "CoinJoin" is translated as " coinJoin" (coinJoin)* "transaction intents" is translated as "交易意图" (jiāoyì yìxiǎng)* "single, larger transaction" is translated as "单一大交易" (dan yī dà jiāoyì)* "bolster transactional privacy" is translated as "增强交易隐私" (zhòng qiáng jiāoyì yǐn wèi)* "complicates individual transaction tracing" is translated as "增加个人交易跟踪" (zhòng jī gè rén jiāoyì gēn zhì)* "disrupts many established blockchain analysis heuristics" is translated as "破坏许多已有的区块链分析方法" (hà hǎo duō yǐ yǒu de qū zhì xiǎng fāng fáng)* "limited research has been conducted" is translated as "有限的研究已经进行" (yǒu xiàn jì de yán jí yǐjīn zhì xíng)* "CoinJoin implementations" is translated as " coinJoin 实现" (coinJoin shí jiàn)* "JoinMarket, Wasabi, and Whirlpool" are translated as "JoinMarket、Wasabi 和 Whirlpool" (JoinMarket, Wasabi, yǔ Whirlpool)* "unique transaction structures" is translated as "特有的交易结构" (tè yǒu de jiāoyì jiégòu)* "open-source implementations" is translated as "开源实现" (kāi yuè shí jiàn)* "exhaustive analysis" is translated as "详细分析" (shí jiě fān yì)* "covering transactions up to block 760,000" is translated as "覆盖到块号760,000的交易" (fù gài daō zhù hào 760,000 de jiāoyì)

Harnessing FPGA Technology for Enhanced Biomedical Computation

  • paper_url: http://arxiv.org/abs/2311.12439
  • repo_url: None
  • paper_authors: Nisanur Alici, Kayode Inadagbo, Murat Isik
  • for: 这个研究探讨了具有增强运算能力的神经网络框架,例如卷积神经网络(CNN)、回传神经网络(RNN)、深度快照内存网络(LSTM)和深度信仰网络(DBN),以进一步分析电子心脏信号(ECG)资料。
  • methods: 这个研究使用了麻省理工学院生物医学信息学数据库(MIT-BIH)的ECG资料进行训练和评估,并添加了 Gaussian 噪声以提高算法的适应能力。研究人员还使用了不同层次的对应和分类功能,例如 EarlyStopping 回调和 Dropout 层来避免过拟合。此外,这篇文章还详细介绍了在 PYNQ Z1 平台上开发的特殊的 Tensor Compute Unit (TCU) 加速器。
  • results: 这个研究显示了 FPGA 在生物医学计算中的高效性,包括适用于不同领域的 FPGA 机器学习的全面方法。研究人员还评估了具有延迟和通过率等效能指标的模型。
    Abstract This research delves into sophisticated neural network frameworks like Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for improved analysis of ECG signals via Field Programmable Gate Arrays (FPGAs). The MIT-BIH Arrhythmia Database serves as the foundation for training and evaluating our models, with added Gaussian noise to heighten the algorithms' resilience. The developed architectures incorporate various layers for specific processing and categorization functions, employing strategies such as the EarlyStopping callback and Dropout layer to prevent overfitting. Additionally, this paper details the creation of a tailored Tensor Compute Unit (TCU) accelerator for the PYNQ Z1 platform. It provides a thorough methodology for implementing FPGA-based machine learning, encompassing the configuration of the Tensil toolchain in Docker, selection of architectures, PS-PL configuration, and the compilation and deployment of models. By evaluating performance indicators like latency and throughput, we showcase the efficacy of FPGAs in advanced biomedical computing. This study ultimately serves as a comprehensive guide to optimizing neural network operations on FPGAs across various fields.
    摘要

Deep State-Space Model for Predicting Cryptocurrency Price

  • paper_url: http://arxiv.org/abs/2311.14731
  • repo_url: None
  • paper_authors: Shalini Sharma, Angshul Majumdar, Emilie Chouzenoux, Victor Elvira
  • for: 预测下一天的 криптовалю价格
  • methods: 提议使用深度 neural network 模型
  • results: 实验结果显示,提议的方法可以准确预测 криптовалю价格,并且比STATE-OF-THE-ART和传统动力学模型更好。
    Abstract Our work presents two fundamental contributions. On the application side, we tackle the challenging problem of predicting day-ahead crypto-currency prices. On the methodological side, a new dynamical modeling approach is proposed. Our approach keeps the probabilistic formulation of the state-space model, which provides uncertainty quantification on the estimates, and the function approximation ability of deep neural networks. We call the proposed approach the deep state-space model. The experiments are carried out on established cryptocurrencies (obtained from Yahoo Finance). The goal of the work has been to predict the price for the next day. Benchmarking has been done with both state-of-the-art and classical dynamical modeling techniques. Results show that the proposed approach yields the best overall results in terms of accuracy.
    摘要 我们的工作有两个基本贡献。在应用方面,我们解决了预测当天加密货币价格的复杂问题。在方法学方面,我们提出了一种新的动态模型方法。我们的方法保留了状态空间模型的 probabilistic 表述,这提供了估计结果的不确定性评估,同时具有深度神经网络的函数逼近能力。我们称之为深度状态空间模型。我们在已知加密货币(从 Yahoo Finance 获取)进行了实验,目标是预测下一天的价格。我们与现有的状态艺术技术和古典动态模型技术进行了比较。结果显示,我们的方法在准确性方面取得了最佳成绩。

Classifier Calibration with ROC-Regularized Isotonic Regression

  • paper_url: http://arxiv.org/abs/2311.12436
  • repo_url: None
  • paper_authors: Eugene Berta, Francis Bach, Michael Jordan
  • for: 本研究旨在提高机器学习分类器的可靠性和可解释性,使其预测结果更加可靠和有意义。
  • methods: 本研究使用了iso随变 regression(IR)技术来calibrate binary classifier,通过在calibration set上减少cross entropy来实现。IR acted as an adaptive binning procedure, which allows achieving a calibration error of zero while controlling for overfitting of the calibration set.
  • results: 本研究证明了IR preserve the convex hull of the ROC curve, guaranteeing that a classifier is calibrated while controlling for overfitting of the calibration set. In addition, a novel generalization of IR to accommodate classifiers with K classes was presented, which constructs a multidimensional adaptive binning scheme on the probability simplex and achieves a multi-class calibration error equal to zero. The algorithm was regularized by imposing a form of monotony that preserves the K-dimensional ROC surface of the classifier, and empirical results showed that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.
    Abstract Calibration of machine learning classifiers is necessary to obtain reliable and interpretable predictions, bridging the gap between model confidence and actual probabilities. One prominent technique, isotonic regression (IR), aims at calibrating binary classifiers by minimizing the cross entropy on a calibration set via monotone transformations. IR acts as an adaptive binning procedure, which allows achieving a calibration error of zero, but leaves open the issue of the effect on performance. In this paper, we first prove that IR preserves the convex hull of the ROC curve -- an essential performance metric for binary classifiers. This ensures that a classifier is calibrated while controlling for overfitting of the calibration set. We then present a novel generalization of isotonic regression to accommodate classifiers with K classes. Our method constructs a multidimensional adaptive binning scheme on the probability simplex, again achieving a multi-class calibration error equal to zero. We regularize this algorithm by imposing a form of monotony that preserves the K-dimensional ROC surface of the classifier. We show empirically that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.
    摘要 ��calcibration of machine learning classifiers is necessary to obtain reliable and interpretable predictions, bridging the gap between model confidence and actual probabilities. One prominent technique, isotonic regression (IR), aims at calibrating binary classifiers by minimizing the cross entropy on a calibration set via monotone transformations. IR acts as an adaptive binning procedure, which allows achieving a calibration error of zero, but leaves open the issue of the effect on performance. In this paper, we first prove that IR preserves the convex hull of the ROC curve -- an essential performance metric for binary classifiers. This ensures that a classifier is calibrated while controlling for overfitting of the calibration set. We then present a novel generalization of isotonic regression to accommodate classifiers with K classes. Our method constructs a multidimensional adaptive binning scheme on the probability simplex, again achieving a multi-class calibration error equal to zero. We regularize this algorithm by imposing a form of monotony that preserves the K-dimensional ROC surface of the classifier. We show empirically that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.Here's the text with the same word order as the original English text:machine learning 分类器的准确性和可解释性是必要的,它们可以减少模型信任度和实际概率之间的差距。一种常见的技术是iso逻辑回归(IR),它可以对binary分类器进行calibration,通过在calibration set上下降cross entropy的 monotone 变换。IR acted as an adaptive binning procedure, which allows achieving a calibration error of zero, but leaves open the issue of the effect on performance.在这篇论文中,我们首先证明了IR 保持了分类器的ROC曲线 convex hull -- 一个重要的性能指标。这 garanties that a classifier is calibrated while controlling for overfitting of the calibration set. 然后,我们提出了一种扩展IR 以适应 K 类分类器。我们的方法在可信度Simplex上构建了多维ensional adaptive binning scheme,再次实现了多类calibration error equal to zero。我们在这个算法中加入了一种形式的monotony criterion,以保持 K 维 ROC 表面。我们 empirically show that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.

Looped Transformers are Better at Learning Learning Algorithms

  • paper_url: http://arxiv.org/abs/2311.12424
  • repo_url: None
  • paper_authors: Liu Yang, Kangwook Lee, Robert Nowak, Dimitris Papailiopoulos
  • for: 解决各种数据适应问题,如报告中的Garg等人所报道的各种模型适应问题。
  • methods: 利用循环式变换器架构和相关训练方法,具有迭代特性,以便将迭代算法 integrate 到变换器架构中。
  • results: 实验结果表明,循环式变换器可以与标准变换器相比,在解决各种数据适应问题中表现出色,而且参数计数少于10%。
    Abstract Transformers have demonstrated effectiveness in \emph{in-context solving} data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of \emph{looped} transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10\% of the parameter count.
    摘要 吸引器(Transformers)已经在具有不同(latent)模型的数据适应问题上表现出效果,据格arg等人所报道。然而,吸引器架构缺乏自身迭代结构,这使得模仿传统机器学习方法中常用的迭代算法变得困难。为此,我们提议利用循环吸引器架构和相关的训练方法,以尝试将迭代特性 incorporated into transformer architectures。实验结果表明,循环吸引器可以与标准吸引器相当的性能 solve various data-fitting problems,而且使用参数计数少于10%。

Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence

  • paper_url: http://arxiv.org/abs/2311.12358
  • repo_url: https://github.com/fedcome/fedcome
  • paper_authors: Shu Zheng, Tiandi Ye, Xiang Li, Ming Gao
  • for: 本研究旨在提高异步学习(Federated Learning,FL)在异步数据(heterogeneous data)上的效果,特别是保证每个客户端的风险减少。
  • methods: 我们提出了一种名为FedCOME的新方法,它在服务器端对客户端的梯度进行微小调整,以生成客户端之间的锐角,从而保证每个客户端的风险减少。此外,我们还提出了一种新的客户端采样策略,可以在不同的客户端上选择最 represntative的数据,以便在全球数据分布上进行训练。
  • results: 我们在四个 benchmark 数据集上进行了广泛的实验,并证明了 FedCOME 方法在效果、效率和公平性方面与其他现有方法相比较出色。具体来说,FedCOME 方法可以在不同的客户端上减少风险,并且可以在不同的数据分布下进行高效的训练。我们还提供了 reproduce 的源代码,可以在 \url{https://github.com/fedcome/fedcome} 上下载。
    Abstract Federated learning (FL) on heterogeneous data (non-IID data) has recently received great attention. Most existing methods focus on studying the convergence guarantees for the global objective. While these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address the problem,we propose FedCOME, which introduces a consensus mechanism to enforce decreased risk for each client after each training round. In particular, we allow a slight adjustment to a client's gradient on the server side, which generates an acute angle between the corrected gradient and the original ones of other clients. We theoretically show that the consensus mechanism can guarantee the convergence of the global objective. To generalize the consensus mechanism to the partial participation FL scenario, we devise a novel client sampling strategy to select the most representative clients for the global data distribution. Training on these selected clients with the consensus mechanism could empirically lead to risk decrease for clients that are not selected. Finally, we conduct extensive experiments on four benchmark datasets to show the superiority of FedCOME against other state-of-the-art methods in terms of effectiveness, efficiency and fairness. For reproducibility, we make our source code publicly available at: \url{https://github.com/fedcome/fedcome}.
    摘要 Federated learning (FL) on 异步数据 (异步数据) 在最近几年内得到了广泛关注。大多数现有方法都是研究全局目标的收敛保证。although these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address this problem, we propose FedCOME, which introduces a consensus mechanism to enforce decreased risk for each client after each training round. Specifically, we allow a slight adjustment to a client's gradient on the server side, which generates an acute angle between the corrected gradient and the original ones of other clients. We theoretically show that the consensus mechanism can guarantee the convergence of the global objective. To generalize the consensus mechanism to the partial participation FL scenario, we devise a novel client sampling strategy to select the most representative clients for the global data distribution. Training on these selected clients with the consensus mechanism could empirically lead to risk decrease for clients that are not selected. Finally, we conduct extensive experiments on four benchmark datasets to show the superiority of FedCOME against other state-of-the-art methods in terms of effectiveness, efficiency, and fairness. For reproducibility, we make our source code publicly available at: \url{https://github.com/fedcome/fedcome}.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The traditional Chinese writing system is also widely used in Taiwan and Hong Kong, but it may differ slightly from the Simplified Chinese version.

Random Linear Projections Loss for Hyperplane-Based Optimization in Regression Neural Networks

  • paper_url: http://arxiv.org/abs/2311.12356
  • repo_url: https://github.com/ahmedaloui1997/randomlinearprojections
  • paper_authors: Shyam Venkatasubramanian, Ahmed Aloui, Vahid Tarokh
    for:这个研究旨在提出一种名为随机直线投影(RLP)损失函数,可以有效遏制过滤复杂数据集的神经网络。methods:这个研究使用了RLP损失函数,将神经网络中的特征预测对组和特征标签对组之间的距离最小化。results:实验结果显示,使用RLP损失函数训练神经网络可以提高性能,需要 fewer data samples,并且对于添加性噪声更加抗性。我们也提供了理论分析支持我们的实验结果。
    Abstract Despite their popularity across a wide range of domains, regression neural networks are prone to overfitting complex datasets. In this work, we propose a loss function termed Random Linear Projections (RLP) loss, which is empirically shown to mitigate overfitting. With RLP loss, the distance between sets of hyperplanes connecting fixed-size subsets of the neural network's feature-prediction pairs and feature-label pairs is minimized. The intuition behind this loss derives from the notion that if two functions share the same hyperplanes connecting all subsets of feature-label pairs, then these functions must necessarily be equivalent. Our empirical studies, conducted across benchmark datasets and representative synthetic examples, demonstrate the improvements of the proposed RLP loss over mean squared error (MSE). Specifically, neural networks trained with the RLP loss achieve better performance while requiring fewer data samples and are more robust to additive noise. We provide theoretical analysis supporting our empirical findings.
    摘要 尽管归 regression neural network 在各个领域广泛应用,但它们往往容易过拟合复杂的数据集。在这项工作中,我们提出了一种名为Random Linear Projections(RLP)损失函数,可以有效遏制过拟合。RLP损失函数的目标是将连接固定大小的 neural network 特征预测对的集合与特征标签对的距离最小化。这个概念的INTUITION是,如果两个函数共享所有特征标签对的hyperplane,那么这两个函数必然相等。我们的实验研究,在标准数据集和代表性的synthetic例子中,表明RLP损失函数比mean squared error(MSE)更有优势。具体来说,使用RLP损失函数训练的神经网络在数据量更少的情况下表现更好,并且更敏感于加速噪音。我们还提供了理论分析,支持我们的实验结论。

Acceleration and Implicit Regularization in Gaussian Phase Retrieval

  • paper_url: http://arxiv.org/abs/2311.12888
  • repo_url: None
  • paper_authors: Tyler Maunu, Martin Molina-Fructuoso
  • for: 在泊利阶段减少优化问题中研究加速优化方法。
  • methods: 使用斜环方法或额外推导方法。
  • results: 实验证明,加速方法比梯度下降更快 converge。
    Abstract We study accelerated optimization methods in the Gaussian phase retrieval problem. In this setting, we prove that gradient methods with Polyak or Nesterov momentum have similar implicit regularization to gradient descent. This implicit regularization ensures that the algorithms remain in a nice region, where the cost function is strongly convex and smooth despite being nonconvex in general. This ensures that these accelerated methods achieve faster rates of convergence than gradient descent. Experimental evidence demonstrates that the accelerated methods converge faster than gradient descent in practice.
    摘要 我们研究加速优化方法在幂相幂逻辑问题中。在这个设定下,我们证明了梯度方法(包括波亚克和纳塞诺夫摇摆)具有类似于梯度下降的隐式规则化,这使得算法保持在一个易于处理的区域内,其中Cost函数具有强 convexity和圆滑性,尽管在总体上是非对称的。这使得这些加速方法可以比梯度下降更快地 converges。实验证明,加速方法在实践中比梯度下降更快 converges。Here's the translation in Traditional Chinese:我们研究加速优化方法在幂相幂逻辑问题中。在这个设定下,我们证明了梯度方法(包括波亚克和纳塞诺夫摇摆)具有类似于梯度下降的隐式规则化,这使得算法保持在一个易于处理的区域内,其中Cost函数具有强凸性和圆滑性,几乎在总体上是非对称的。这使得这些加速方法可以比梯度下降更快地 converge。实验证明,加速方法在实践中比梯度下降更快 converge。

Graph Neural Ordinary Differential Equations-based method for Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2311.12329
  • repo_url: None
  • paper_authors: Ke Xu, Yuanjie Zhu, Weizhi Zhang, Philip S. Yu
  • for: collaborative filtering
  • methods: Graph Neural Ordinary Differential Equation-based method (GODE-CF)
  • results: outperforms competitive baselines, including GCN-based models and other state-of-the-art CF methods, with advantages in simplicity, efficiency, and training time.Here’s the Chinese translation:
  • for: 协同推荐
  • methods: 图 neural ordinary differential equation 基本方法 (GODE-CF)
  • results: 比竞争基eline模型和其他状态级 CF 方法高效,具有简单、高效和快速训练时间的优势。
    Abstract Graph Convolution Networks (GCNs) are widely considered state-of-the-art for collaborative filtering. Although several GCN-based methods have been proposed and achieved state-of-the-art performance in various tasks, they can be computationally expensive and time-consuming to train if too many layers are created. However, since the linear GCN model can be interpreted as a differential equation, it is possible to transfer it to an ODE problem. This inspired us to address the computational limitations of GCN-based models by designing a simple and efficient NODE-based model that can skip some GCN layers to reach the final state, thus avoiding the need to create many layers. In this work, we propose a Graph Neural Ordinary Differential Equation-based method for Collaborative Filtering (GODE-CF). This method estimates the final embedding by utilizing the information captured by one or two GCN layers. To validate our approach, we conducted experiments on multiple datasets. The results demonstrate that our model outperforms competitive baselines, including GCN-based models and other state-of-the-art CF methods. Notably, our proposed GODE-CF model has several advantages over traditional GCN-based models. It is simple, efficient, and has a fast training time, making it a practical choice for real-world situations.
    摘要 格 graphs Convolution Networks (GCNs) 是现状的Reference for collaborative filtering. Although several GCN-based methods have been proposed and achieved state-of-the-art performance in various tasks, they can be computationally expensive and time-consuming to train if too many layers are created. However, since the linear GCN model can be interpreted as a differential equation, it is possible to transfer it to an ODE problem. This inspired us to address the computational limitations of GCN-based models by designing a simple and efficient NODE-based model that can skip some GCN layers to reach the final state, thus avoiding the need to create many layers. In this work, we propose a Graph Neural Ordinary Differential Equation-based method for Collaborative Filtering (GODE-CF). This method estimates the final embedding by utilizing the information captured by one or two GCN layers. To validate our approach, we conducted experiments on multiple datasets. The results demonstrate that our model outperforms competitive baselines, including GCN-based models and other state-of-the-art CF methods. Notably, our proposed GODE-CF model has several advantages over traditional GCN-based models. It is simple, efficient, and has a fast training time, making it a practical choice for real-world situations.

Power grid operational risk assessment using graph neural network surrogates

  • paper_url: http://arxiv.org/abs/2311.12309
  • repo_url: None
  • paper_authors: Yadong Zhang, Pranav M Karve, Sankaran Mahadevan
  • for: 这篇论文旨在研究图 neural network (GNN) 是否可以作为执行决策算法 (optimal power flow (OPF) 和 security-constrained unit commitment (SCUC)) 的代理,以便进行准确的风险评估。
  • methods: 研究使用了多个 Monte Carlo (MC) 样本,从推测的空间时间相关的随机网络变量中采样出来。然后使用传统的 OPF 和 SCUC 解决方案来生成数据用于训练 GNN 模型。
  • results: GNN 模型可以快速和准确地预测关键量表 (QoI),包括系统和个别区域热电做出的输出和剩余能源。此外,GNN 模型还可以准确地评估系统的可靠性和风险。
    Abstract We investigate the utility of graph neural networks (GNNs) as proxies of power grid operational decision-making algorithms (optimal power flow (OPF) and security-constrained unit commitment (SCUC)) to enable rigorous quantification of the operational risk. To conduct principled risk analysis, numerous Monte Carlo (MC) samples are drawn from the (foretasted) probability distributions of spatio-temporally correlated stochastic grid variables. The corresponding OPF and SCUC solutions, which are needed to quantify the risk, are generated using traditional OPF and SCUC solvers to generate data for training GNN model(s). The GNN model performance is evaluated in terms of the accuracy of predicting quantities of interests (QoIs) derived from the decision variables in OPF and SCUC. Specifically, we focus on thermal power generation and load shedding at system and individual zone level. We also perform reliability and risk quantification based on GNN predictions and compare with that obtained from OPF/SCUC solutions. Our results demonstrate that GNNs are capable of providing fast and accurate prediction of QoIs and thus can be good surrogate models for OPF and SCUC. The excellent accuracy of GNN-based reliability and risk assessment further suggests that GNN surrogate has the potential to be applied in real-time and hours-ahead risk quantification.
    摘要 我们研究Graph Neural Networks(GNN)作为电力网络运行决策算法(优化电力流(OPF)和安全限制单位配置(SCUC))的代理,以便正确评估运行风险。为了进行原则性的风险分析,我们从 correlate的空间时间协调随机变量中采样了许多Monte Carlo(MC)样本。对应的OPF和SCUC解决方案,需要用传统的OPF和SCUC解决方案生成数据来训练GNN模型。我们评估GNN模型性能是通过对决变量中的量问题(QoI)的预测精度来进行。我们专注于系统和个人区域水平的热电力生产和减少负荷。我们还对GNN预测结果进行可靠性和风险评估,并与OPF/SCUC解决方案中的风险评估进行比较。我们的结果表明GNN可以提供快速和准确的QoI预测,因此可以作为OPF和SCUC的代理模型。GNN模型的可靠性和风险评估的出色性表明它可以在实时和多小时前的风险评估中应用。

Mapping “Brain Coral” Regions on Mars using Deep Learning

  • paper_url: http://arxiv.org/abs/2311.12292
  • repo_url: https://github.com/pearsonkyle/mars-brain-coral-network
  • paper_authors: Kyle A. Pearson, Eldar Noe, Daniel Zhao, Alphan Altinok, Alex Morgan
  • for: 搜寻水星上可能存在过或现在存在生命的证据。
  • methods: 使用 convolutional neural networks 检测水星表面的 “Brain Coral” 地形,并利用 JPEG 压缩和抽象域的方法提高处理速度。
  • results: 在大约28TB的图像数据集中,找到了超过200个图像中的检测结果,并且实现了 ~93% 的准确率和 ~95% 的处理时间减少。
    Abstract One of the main objectives of the Mars Exploration Program is to search for evidence of past or current life on the planet. To achieve this, Mars exploration has been focusing on regions that may have liquid or frozen water. A set of critical areas may have seen cycles of ice thawing in the relatively recent past in response to periodic changes in the obliquity of Mars. In this work, we use convolutional neural networks to detect surface regions containing "Brain Coral" terrain, a landform on Mars whose similarity in morphology and scale to sorted stone circles on Earth suggests that it may have formed as a consequence of freeze/thaw cycles. We use large images (~100-1000 megapixels) from the Mars Reconnaissance Orbiter to search for these landforms at resolutions close to a few tens of centimeters per pixel (~25--50 cm). Over 52,000 images (~28 TB) were searched (~5% of the Martian surface) where we found detections in over 200 images. To expedite the processing we leverage a classifier network (prior to segmentation) in the Fourier domain that can take advantage of JPEG compression by leveraging blocks of coefficients from a discrete cosine transform in lieu of decoding the entire image at the full spatial resolution. The hybrid pipeline approach maintains ~93% accuracy while cutting down on ~95% of the total processing time compared to running the segmentation network at the full resolution on every image. The timely processing of big data sets helps inform mission operations, geologic surveys to prioritize candidate landing sites, avoid hazardous areas, or map the spatial extent of certain terrain. The segmentation masks and source code are available on Github for the community to explore and build upon.
    摘要 一个主要目标 OF Mars Exploration Program 是搜寻 mars 上过去或当前生命的证据。为了实现这一目标, Mars 探索团队将注意力集中在可能存在液态或冰封水的区域上。在这种工作中,我们使用 convolutional neural networks (CNN) 来检测 mars 表面的 "Brain Coral" 地形,这种地形的形态和比例与地球上的排序石圈相似, suggesting 它可能是冰封/解冻过程中形成的。我们使用 Mars Reconnaissance Orbiter 上大量的图像 (~100-1000 megapixels) 在 ~25--50 cm 的分辨率下搜索这些地形(相当于 ~28 TB 的图像),并在 ~5% 的 Mars 表面上找到了多达 200 个检测结果。为了加速处理,我们利用了一个类ifier network (在 Fourier 频域) ,可以利用 JPEG 压缩来优化图像处理。这种混合管道方法可以保持 ~93% 的准确率,同时减少 ~95% 的处理时间。快速处理大数据集可以帮助指导任务操作、 geologic 评估和选择候选 touched 地点,以及地图这些特定地形的空间扩展。检测mask 和源代码都可以在 GitHub 上找到,以便社区可以进一步探索和建立在这些基础上。

A Supervised Contrastive Learning Pretrain-Finetune Approach for Time Series

  • paper_url: http://arxiv.org/abs/2311.12290
  • repo_url: None
  • paper_authors: Trang H. Tran, Lam M. Nguyen, Kyongmin Yeo, Nam Nguyen, Roman Vaculin
  • for: 本研究旨在推广机器学习领域内的基础模型,以提高大规模数据处理的效率。
  • methods: 本研究使用了指导学习法,通过在预训练数据集中学习特征之间的对比,从预训练数据集中提取特征表示。然后,使用这些表示进行细订训练,以更好地预测目标数据。
  • results: 我们的实验结果显示,我们的方法可以提高预测目标数据的准确率。
    Abstract Foundation models have recently gained attention within the field of machine learning thanks to its efficiency in broad data processing. While researchers had attempted to extend this success to time series models, the main challenge is effectively extracting representations and transferring knowledge from pretraining datasets to the target finetuning dataset. To tackle this issue, we introduce a novel pretraining procedure that leverages supervised contrastive learning to distinguish features within each pretraining dataset. This pretraining phase enables a probabilistic similarity metric, which assesses the likelihood of a univariate sample being closely related to one of the pretraining datasets. Subsequently, using this similarity metric as a guide, we propose a fine-tuning procedure designed to enhance the accurate prediction of the target data by aligning it more closely with the learned dynamics of the pretraining datasets. Our experiments have shown promising results which demonstrate the efficacy of our approach.
    摘要 基础模型在机器学习领域最近受到了关注,感谢它的广泛数据处理效率。然而,研究人员在扩展这种成功到时间序列模型方面遇到了主要挑战,即从预训练数据集中提取有用的表示并将知识传递到目标训练数据集。为解决这个问题,我们提出了一种新的预训练方法,利用supervised contrastive learning来分别特征。这个预训练阶段生成了一个概率相似度度量,用于衡量预训练数据集中单变量样本与其他预训练数据集之间的相似性。然后,我们提议一种细化训练方法,通过将预训练数据集中学习的动力与目标数据集更加相似来提高目标数据集的准确预测。我们的实验结果表明我们的方法的效果是可靠的。

Orthogonally weighted $\ell_{2,1}$ regularization for rank-aware joint sparse recovery: algorithm and analysis

  • paper_url: http://arxiv.org/abs/2311.12282
  • repo_url: https://github.com/a-petr/owl
  • paper_authors: Armenak Petrosyan, Konstantin Pieper, Hoang Tran
  • for: 解决 JOINT SPARSE RECOVERY 问题
  • methods: 使用新的规则化方法 called orthogonally weighted $\ell_{2,1}$(ow$\ell_{2,1}$),该方法考虑解matrix的排名特性
  • results: 提出一种高效的算法,并提供了证明和数学实验来证明其效果。
    Abstract We propose and analyze an efficient algorithm for solving the joint sparse recovery problem using a new regularization-based method, named orthogonally weighted $\ell_{2,1}$ ($\mathit{ow}\ell_{2,1}$), which is specifically designed to take into account the rank of the solution matrix. This method has applications in feature extraction, matrix column selection, and dictionary learning, and it is distinct from commonly used $\ell_{2,1}$ regularization and other existing regularization-based approaches because it can exploit the full rank of the row-sparse solution matrix, a key feature in many applications. We provide a proof of the method's rank-awareness, establish the existence of solutions to the proposed optimization problem, and develop an efficient algorithm for solving it, whose convergence is analyzed. We also present numerical experiments to illustrate the theory and demonstrate the effectiveness of our method on real-life problems.
    摘要 我们提出并分析了一种高效的算法,用于解决共聚散恢复问题,使用一种新的规则化基于方法,称为正交加重 $\ell_{2,1}$(OW $\ell_{2,1}$)。这种方法特别是为了考虑解决矩阵的级别,而不是通常使用的 $\ell_{2,1}$ 规则化和其他现有的规则化方法。我们证明了该方法的级别意识,证明存在解决提出的优化问题的解,并开发了一种高效的解决方案,其 converges 分析。我们还在实际问题上进行了数值实验,以证明我们的方法的理论和实际效果。

Beyond Simulated Drivers: Evaluating the Impact of Real-World Car-Following in Mixed Traffic Control

  • paper_url: http://arxiv.org/abs/2311.12261
  • repo_url: None
  • paper_authors: Bibek Poudel, Weizi Li
  • for: 本研究旨在研究人工驾驶车辆如何使用 robot 车辆来缓解交通堵塞问题,提高安全性、有效性和稳定性。
  • methods: 本研究使用实际的人类驾驶轨迹数据,提取了各种加速行为,并将这些行为integrated into simulations where robot vehicles from prior studies are employed to mitigate congestion. 在这些 simulations中,我们还引入了一种基于强化学习的 robot 车辆,使用一个堵塞stage分类神经网络来优化”安全+稳定”或”效率”在人类驾驶行为的存在下。
  • results: 我们在两个不同的混合交通控制环境中评估了提出的 robot 车辆,并与先前的 robot 车辆进行比较。结果表明,我们的方法可以提高安全性、有效性和稳定性,并且可以适应不同的人类驾驶行为。
    Abstract Human-driven vehicles can amplify naturally occurring perturbations in traffic, leading to congestion and consequently increased fuel consumption, higher collision risks, and reduced capacity utilization. While previous research has highlighted that a fraction of Robot Vehicles (RVs) can mitigate these issues, they often rely on simulations with simplistic, model-based Human-driven Vehicles (HVs) during car-following scenarios. Diverging from this trend, in this study, we analyze real-world human driving trajectories, extracting a wide range of acceleration behaviors during car-following. We then incorporate these behaviors in simulation where RVs from prior studies are employed to mitigate congestion, and evaluate their safety, efficiency, and stability. Further, we also introduce a reinforcement learning based RV that utilizes a congestion stage classifier neural network to optimize either "safety+stability" or "efficiency" in the presence of the diverse human driving behaviors. We evaluate the proposed RVs in two different mixed traffic control environments at various densities, configurations, and penetration rates and compare with the existing RVs.
    摘要 人类驾驶车可以增强天然发生的交通干扰,导致堵塞和更高的燃油消耗、更高的碰撞风险和更低的负载使用率。而前一些研究表明,一部分机器人车(RV)可以解决这些问题,但它们通常在车辆尾随场景下使用简单化的人类驾驶车(HV)模型进行模拟。在这种情况下,我们分析了实际的人类驾驶轨迹数据,提取了车辆尾随场景中的各种加速行为。然后,我们在模拟中包含这些行为,使用先前研究中的RV来 mitigate堵塞,并评估其安全、效率和稳定性。此外,我们还引入了一种基于强化学习的RV,使用车辆堵塞阶段分类神经网络来优化“安全+稳定”或“效率”在人类驾驶行为的存在下。我们对提议的RV进行了两种不同的混合交通控制环境的评估,包括不同的混合率、配置和分布。并与现有的RV进行了比较。

  • paper_url: http://arxiv.org/abs/2311.12255
  • repo_url: https://github.com/silencex12138/time-granularity-on-temporal-graphs
  • paper_authors: Xiangjian Jiang, Yanyi Pu
  • for: 这 paper 的目的是研究 dynamic graph neural networks (DGNNs) 在处理动态图数据时的性能和稳定性如何受到时间信息的影响,特别是在不同的时间粒度下进行预测任务时。
  • methods: 该 paper 使用了多种 domain 的动态图和三种不同的 DGNN 模型,通过对四种不同的时间粒度进行比较来探讨时间粒度对模型性能和稳定性的影响。
  • results: 研究发现,一个复杂的记忆机制和适当的时间粒度是在动态链接预测任务中使 DGNN 达到竞争性和稳定性的关键因素。 In addition, the paper also discusses the limitations of the considered models and datasets and proposes promising directions for future research on the time granularity of temporal graphs.
    Abstract Dynamic Graph Neural Networks (DGNNs) have emerged as the predominant approach for processing dynamic graph-structured data. However, the influence of temporal information on model performance and robustness remains insufficiently explored, particularly regarding how models address prediction tasks with different time granularities. In this paper, we explore the impact of time granularity when training DGNNs on dynamic graphs through extensive experiments. We examine graphs derived from various domains and compare three different DGNNs to the baseline model across four varied time granularities. We mainly consider the interplay between time granularities, model architectures, and negative sampling strategies to obtain general conclusions. Our results reveal that a sophisticated memory mechanism and proper time granularity are crucial for a DGNN to deliver competitive and robust performance in the dynamic link prediction task. We also discuss drawbacks in considered models and datasets and propose promising directions for future research on the time granularity of temporal graphs.
    摘要 “几何图 neural network(DGNN)已成为处理动态图струк成数据的主流方法。然而,对于模型在不同时间粒度下的表现和可靠性的影响仍然不充分探索,特别是在不同时间粒度下进行预测任务时。本文通过广泛的实验进行了深入探讨。我们使用来自不同领域的图进行测试,比较了三种不同的DGNN和基eline模型,并在四种不同的时间粒度下进行比较。我们主要考虑了时间粒度、模型架构和负样本策略之间的互动,以获得一般的结论。我们的结果显示,在动态链接预测任务中,一个智能的内存机制和适当的时间粒度是不可或缺的。我们 также Discuss了考虑的模型和数据集的缺陷,并提出了未来研究时间粒度的几何图模型的可能性。”

The limitation of neural nets for approximation and optimization

  • paper_url: http://arxiv.org/abs/2311.12253
  • repo_url: https://github.com/sohaboumaima/basesnnapproxforopt
  • paper_authors: Tommaso Giovannelli, Oumaima Sohab, Luis Nunes Vicente
  • for: 本研究旨在利用神经网络作为优化问题中的准确模型,以优化和规避优化问题中的目标函数。
  • methods: 本研究首先确定了最佳的激活函数来近似各种非线性优化问题中的目标函数,并证明了~SiLU激活函数的最佳性。然后,研究分析了神经网络和 interpol/regression 模型对目标函数值、导数和偏导数的近似精度。
  • results: 研究发现,神经网络可以在优化问题中提供竞争力强的零和首项近似(但需要高训练成本),但在第二项近似方面表现较差。然而,结合神经网络激活函数和自然基准的 quadratic interpol/regression 模型可以减少参数数量,从而提高模型精度。最后,研究证明了一种现有的偏导数-free 优化算法的性能很难超过使用神经网络或其他准确模型来approximate gradient的情况。
    Abstract We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems. While neural networks are widely used for machine learning tasks such as classification and regression, their application in solving optimization problems has been limited. Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems, and the evidence provided shows that~SiLU has the best performance. We then analyze the accuracy of function value, gradient, and Hessian approximations for such objective functions obtained through interpolation/regression models and neural networks. When compared to interpolation/regression models, neural networks can deliver competitive zero- and first-order approximations (at a high training cost) but underperform on second-order approximation. However, it is shown that combining a neural net activation function with the natural basis for quadratic interpolation/regression can waive the necessity of including cross terms in the natural basis, leading to models with fewer parameters to determine. Lastly, we provide evidence that the performance of a state-of-the-art derivative-free optimization algorithm can hardly be improved when the gradient of an objective function is approximated using any of the surrogate models considered, including neural networks.
    摘要 我们 interessante在使用神经网络作为估算函数的优化模型,以替代估算函数的最小化问题。神经网络在机器学习任务中广泛使用,但它们在解决估算函数问题上的应用相对较少。我们的研究开始于选择最佳启动函数,以便对各种非线性估算函数进行拟合。我们的证据显示,~SiLU 启动函数在这些估算函数中的表现最佳。然后,我们分析了在这些估算函数中的函数值、梯度和贝维对应的精度。相比于插值/回归模型,神经网络可以提供竞争力的零阶和一阶拟合(对于训练成本而言),但它们在二阶拟合方面表现不佳。然而,我们发现,将神经网络启动函数与自然基底结合起来,可以将跨项减少到最少,从而获得 fewer 参数的模型。最后,我们提供证据,认为使用现有的导引�optimization algorithm可以很难提高目标函数的梯度拟合精度,包括使用神经网络。

eess.SP - 2023-11-21

Transferred Thin Film Lithium Niobate as Millimeter Wave Acoustic Filter Platforms

  • paper_url: http://arxiv.org/abs/2311.13044
  • repo_url: None
  • paper_authors: Omar Barrera, Sinwoo Cho, Kenny Hyunh, Jack Kramer, Michael Liao, Vakhtang Chulukhadze, Lezli Matto, Mark S. Goorsky, Ruochen Lu
  • for: This paper aims to develop high-performance acoustic filters for millimeter wave (mmWave) bands using transferred single-crystal thin film lithium niobate (LiNbO3).
  • methods: The paper uses transferred LiNbO3 on top of silicon (Si) and sapphire (Al2O3) substrates with an intermediate amorphous Si (aSi) bonding and sacrificial layer to achieve compact acoustic filters with record-breaking performance beyond 20 GHz.
  • results: The paper demonstrates low insertion loss (IL) of 1.62 dB and 3-dB fractional bandwidth (FBW) of 19.8% at 22.1 GHz in the LN-aSi-Al2O3 platform, and low IL of 2.38 dB and FBW of 18.2% at 23.5 GHz in the LN-aSi-Si platform, with high crystalline quality of the stacks validated by material analysis.
    Abstract This paper reports the first high-performance acoustic filters toward millimeter wave (mmWave) bands using transferred single-crystal thin film lithium niobate (LiNbO3). By transferring LiNbO3 on the top of silicon (Si) and sapphire (Al2O3) substrates with an intermediate amorphous Si (aSi) bonding and sacrificial layer, we demonstrate compact acoustic filters with record-breaking performance beyond 20 GHz. In the LN-aSi-Al2O3 platform, the third-order ladder filter exhibits low insertion loss (IL) of 1.62 dB and 3-dB fractional bandwidth (FBW) of 19.8% at 22.1 GHz, while in the LN-aSi-Si platform, the filter shows low IL of 2.38 dB and FBW of 18.2% at 23.5 GHz. Material analysis validates the great crystalline quality of the stacks. The high-resolution x-ray diffraction (HRXRD) shows full width half maximum (FWHM) of 53 arcsec for Al2O3 and 206 arcsec for Si, both remarkably low compared to piezoelectric thin films of similar thickness. The reported results bring the state-of-the-art (SoA) of compact acoustic filters to much higher frequencies, and highlight transferred LiNbO3 as promising platforms for mmWave filters in future wireless front ends.
    摘要 In the LN-aSi-Al2O3 platform, the third-order ladder filter shows low insertion loss (IL) of 1.62 dB and 3-dB fractional bandwidth (FBW) of 19.8% at 22.1 GHz. In the LN-aSi-Si platform, the filter shows low IL of 2.38 dB and FBW of 18.2% at 23.5 GHz. Material analysis validates the great crystalline quality of the stacks, with high-resolution x-ray diffraction (HRXRD) showing full width half maximum (FWHM) of 53 arcsec for Al2O3 and 206 arcsec for Si, both remarkably low compared to piezoelectric thin films of similar thickness.The reported results bring the state-of-the-art (SoA) of compact acoustic filters to much higher frequencies, and highlight transferred LiNbO3 as promising platforms for mmWave filters in future wireless front ends.

Antenna Selection For Receive Spatial Modulation System Empowered By Reconfigurable Intelligent Surface

  • paper_url: http://arxiv.org/abs/2311.13005
  • repo_url: None
  • paper_authors: Burak Ahmet Ozden, Erdogan Aydin
    for: The paper proposes a new wireless communication system that enhances signal quality and spectral efficiency using a reconfigurable intelligent surface (RIS) and spatial modulation (SM) techniques.methods: The proposed system combines capacity-optimized antenna selection (COAS), antenna correlation antenna selection (ACAS), and Euclidean distance-optimized antenna selection (EDAS) with RIS-empowered receive SM (RIS-RSM) in a single-input multiple-output (SIMO) structure.results: The proposed system achieves high spectral efficiency, high energy efficiency, and low error data transmission. The analytical ABER results and capacity analyses of the proposed system are derived and shown to outperform counterpart wireless communication systems.
    Abstract Reconfigurable intelligent surface (RIS) enhances signal quality by adjusting the phase of electromagnetic waves in wireless communication. Spatial modulation (SM), a prominent index modulation (IM) technique, provides high spectral efficiency and low energy consumption. In this article, a new wireless communication system is proposed by combining capacity-optimized antenna selection (COAS), antenna correlation antenna selection (ACAS), and Euclidean distance-optimized antenna selection (EDAS)-supported RIS-empowered receive SM (RIS-RSM) system (AS-RIS-RSM) in a single-input multiple-output (SIMO) structure. The proposed AS-RIS-RSM schemes (COAS-RIS-RSM, ACAS-RIS-RSM, and EDAS-RIS-RSM) have superior features such as high spectral efficiency, high energy efficiency, and low error data transmission. Integrating COAS, ACAS, and EDAS techniques into the system enables the selection of the channel with the best conditions, thus increasing the error performance of the proposed system. Also, using RIS increases the error performance of the system by controlling the transmitted signal to a certain extent. The analytical ABER results of the proposed AS-RIS-RSM systems are derived and shown to overlap with simulation results. For the proposed systems, an optimal maximum likelihood (ML) detector and a sub-optimal low-complexity greedy detector (GD) are offered. Also, capacity analyses of the proposed AS-RIS-RSM systems are derived and it is observed that they have higher capacity compared to RIS-QAM/PSK and RIS-RSM systems. Then, computational complexity analyses of the proposed COAS-RIS-RSM, ACAS-RIS-RSM, and EDAS-RIS-RSM systems are presented. The proposed systems have been compared to counterpart wireless communication systems including RIS-RSM, RIS-QAM, and RIS-PSK under equivalent conditions, demonstrating that the proposed systems achieve better error performance.
    摘要 智能表面增强信号质量(RIS)可以调整无线通信中电磁波的相位,提高无线通信的质量。空间模ulation(SM)是一种重要的指标模ulation(IM)技术,它可以提供高 spectral efficiency 和低能耗。在这篇文章中,一种新的无线通信系统被提议,该系统通过结合可 configurable antenna selection(COAS)、antenna correlation antenna selection(ACAS)和Euclidean distance-optimized antenna selection(EDAS)支持RIS-empowered receive SM(RIS-RSM)系统,实现高 spectral efficiency、高能耗和低错误数据传输。在SIMO结构中,提议的AS-RIS-RSM方案(COAS-RIS-RSM、ACAS-RIS-RSM和EDAS-RIS-RSM)具有优秀的特点,包括高 spectral efficiency、高能耗和低错误数据传输。将COAS、ACAS和EDAS技术integrated into the system可以选择channel with the best conditions,提高系统的错误性能。同时,使用RIS可以控制传输的信号,从而提高系统的错误性能。文章中提出的analytical ABER结果和实验结果均表明,提议的AS-RIS-RSM系统具有优秀的性能。此外,文章还提出了一个最佳最大可信度(ML)检测器和一个低复杂度的恰好检测器(GD),以及AS-RIS-RSM系统的容量分析。结果表明,提议的AS-RIS-RSM系统具有更高的容量,并且计算复杂度分析表明,提议的系统的计算复杂度较低。最后,文章进行了与其他无线通信系统的比较,结果表明,提议的系统在等效条件下具有更好的错误性能。

Bit Error Rate Performance and Diversity Analysis for Mediumband Wireless Communication

  • paper_url: http://arxiv.org/abs/2311.12968
  • repo_url: None
  • paper_authors: Dushyantha A Basnayaka, Jiabin Jia
  • for: 这个论文研究了媒体带通信频率范围内的通信性能限制,特别是未编码比特错误率(BER)和多普通性顺序。
  • methods: 通过统计分析和计算模拟,这篇论文研究了媒体带通信频率范围内的通信性能限制。
  • results: 研究发现,通过judicious设计,中频媒体通信系统在非直线视线(NLoS)射频环境中可以 дости到更高的错误率和更高的多普通性顺序。
    Abstract Mediumband wireless communication refers to wireless communication through a class of channels known as mediumband that exists on the TmTs-plane. This paper, through statistical analysis and computer simulations, studies the performance limits of this class of channels in terms of uncoded bit error rate (BER) and diversity order. We show that, owing mainly to the effect of the deep fading avoidance, which is unique to the channels in the mediumband region, mediumband wireless systems, if designed judiciously, have the potential to achieve significantly superior error rate and higher order diversity even in non-line-of-sight (NLoS) propagation environments where the achievable diversity order is otherwise low.
    摘要 中等频宽无线通信指的是通过称为中等频宽的通道在TmTs平面上进行无线通信。本文通过统计分析和计算模拟,研究中等频宽通道的性能限制,包括未编码比特错误率(BER)和多普逻议级别。我们显示,归因于中等频宽通道特有的深抗减抗效应,中等频宽无线系统,如果judicious设计,在非直线视线(NLoS)媒体环境下可以实现更高的错误率和更高的多普逻议级别。

Learn to Augment Network Simulators Towards Digital Network Twins

  • paper_url: http://arxiv.org/abs/2311.12745
  • repo_url: None
  • paper_authors: Yuru Zhang, Ming Zhao, Qiang Liu
  • for: This paper aims to address the challenge of building digital network twins (DNTs) that can accurately replicate real-world cellular networks, and to improve the generalization, explainability, and transparency of DNTs.
  • methods: The proposed approach uses a learn-to-bridge algorithm that combines cost-aware Bayesian optimization and Bayesian neural networks (BNN) to bridge the simulation-to-reality (sim-to-real) discrepancy in two stages. The first stage selects states to query performances in real-world networks, and the second stage trains the neural agent to learn the state context and bridge the probabilistic discrepancy.
  • results: The proposed solution substantially outperforms existing methods, with more than 92% reduction in the sim-to-real discrepancy, as demonstrated in a small-scale end-to-end network testbed based on OpenAirInterface RAN and Core with USRP B210 and a smartphone, and replicated in NS-3.
    Abstract Digital network twin (DNT) is a promising paradigm to replicate real-world cellular networks toward continual assessment, proactive management, and what-if analysis. Existing discussions have been focusing on using only deep learning techniques to build DNTs, which raises widespread concerns regarding their generalization, explainability, and transparency. In this paper, we explore an alternative approach to augment network simulators with context-aware neural agents. The main challenge lies in the non-trivial simulation-to-reality (sim-to-real) discrepancy between offline simulators and real-world networks. To solve the challenge, we propose a new learn-to-bridge algorithm to cost-efficiently bridge the sim-to-real discrepancy in two alternative stages. In the first stage, we select states to query performances in real-world networks by using newly-designed cost-aware Bayesian optimization. In the second stage, we train the neural agent to learn the state context and bridge the probabilistic discrepancy based on Bayesian neural networks (BNN). In addition, we build a small-scale end-to-end network testbed based on OpenAirInterface RAN and Core with USRP B210 and a smartphone, and replicate the network in NS-3. The evaluation results show that, our proposed solution substantially outperforms existing methods, with more than 92\% reduction in the sim-to-real discrepancy.
    摘要 《数字网络双》是一种有前途的思想,可以用于持续评估、积极管理和什么样的分析。现有的讨论都集中在使用深度学习技术来构建数字网络双,但这引发了广泛的担忧,包括其通用性、可解释性和透明度。在这篇论文中,我们尝试了一种不同的方法,即在网络仿真器中添加上下文意识神经网络。主要挑战在于非轻松的仿真到实际(sim-to-real)差异。为解决这个挑战,我们提出了一种新的学习到桥接算法,以cost-efficiently bridging sim-to-real差异。在第一个阶段,我们使用新设计的成本意识搜索选择状态来查询实际网络中的性能。在第二个阶段,我们使用概率神经网络(BNN)训练神经网络学习状态上下文,并bridge probabilistic差异。此外,我们建立了一个小规模的端到端网络测试平台,基于OpenAirInterface RAN和核心,并使用USRP B210和智能手机复制网络。评估结果显示,我们的提案substantially outperforms现有方法,减少了92.3%的sim-to-real差异。

Satellite Swarms for Narrow Beamwidth Applications

  • paper_url: http://arxiv.org/abs/2311.12721
  • repo_url: None
  • paper_authors: Juan A. Vásquez-Peralvo, Juan Carlos Merlano Duncan, Geoffrey Eappen, Symeon Chatzinotas
  • for: 这个论文是为了研究一种基于2D正态分布的卫星群组合,以实现高精度的指向性和广泛的 beamforming 能力。
  • methods: 该论文使用分布式subarray配置,使用多个小卫星作为subarray,实现一个大天线数组的效果。
  • results: simulations 结果显示,该方案可以实现非常窄的扩散 Pattern,最大侧LOB level为18.8 dB,最小侧LOB level为14.8 dB。这种方案可以用于高速数据应用或紧急系统。
    Abstract Satellite swarms have recently gained attention in the space industry due to their ability to provide extremely narrow beamwidths at a lower cost than single satellite systems. This paper proposes a concept for a satellite swarm using a distributed subarray configuration based on a 2D normal probability distribution. The swarm comprises multiple small satellites acting as subarrays of a big aperture array limited by a radius of 20000 wavelengths working at a central frequency of 19 GHz. The main advantage of this approach is that the distributed subarrays can provide extremely directive beams and beamforming capabilities that are not possible using a conventional antenna and satellite design. The proposed swarm concept is analyzed, and the simulation results show that the radiation pattern achieves a beamwidth as narrow as 0.0015-degrees with a maximum side lobe level of 18.8 dB and a grating lobe level of 14.8 dB. This concept can be used for high data rates applications or emergency systems.
    摘要 Translation:卫星群体在空间业务中受到了关注,因为它们可以提供非常窄的扩散角,而且比单卫星系统更加经济。本文提出了基于2D正态分布的卫星群体概念,其中包括多个小卫星 acting as subarrays of a big aperture array,限制Radius of 20000 wavelengths,工作中心频率为19 GHz。主要优点是分布的subarrays可以提供非常导向的扩散和射频能力,这些能力不可能使用传统的天线和卫星设计。提议的卫星群体概念被分析,并且 simulation 结果显示,射频模式可以实现扩散角为0.0015度,最大侧波强度为18.8 dB,镜面强度为14.8 dB。这种概念可以用于高速数据应用或紧急系统。

Empirical Validation of the Impedance-Based RIS Channel Model in an Indoor Scattering Environment

  • paper_url: http://arxiv.org/abs/2311.12628
  • repo_url: None
  • paper_authors: Placido Mursia, Taghrid Mazloum, Frederic Munoz, Vincenzo Sciancalepore, Gabriele Gradoni, Raffaele D Errico, Marco Di Renzo, Xavier Costa-Perez, Antonio Clemente, Geoffroy Lerosey
  • for: validate a recently-proposed impedance-based RIS channel model
  • methods: exploit real-life channel measurements and discrete array of loaded dipoles
  • results: superior performance compared to reference schemes
    Abstract Ensuring the precision of channel modeling plays a pivotal role in the development of wireless communication systems, and this requirement remains a persistent challenge within the realm of networks supported by Reconfigurable Intelligent Surfaces (RIS). Achieving a comprehensive and reliable understanding of channel behavior in RIS-aided networks is an ongoing and complex issue that demands further exploration. In this paper, we empirically validate a recently-proposed impedance-based RIS channel model that accounts for the mutual coupling at the antenna array and precisely models the presence of scattering objects within the environment as a discrete array of loaded dipoles. To this end, we exploit real-life channel measurements collected in an office environment to demonstrate the validity of such a model and its applicability in a practical scenario. Finally, we provide numerical results demonstrating that designing the RIS configuration based upon such model leads to superior performance as compared to reference schemes.
    摘要 Ensuring the precision of channel modeling plays a crucial role in the development of wireless communication systems, and this requirement remains a persistent challenge within the realm of networks supported by Reconfigurable Intelligent Surfaces (RIS). Achieving a comprehensive and reliable understanding of channel behavior in RIS-aided networks is an ongoing and complex issue that demands further exploration. In this paper, we empirically validate a recently-proposed impedance-based RIS channel model that accounts for the mutual coupling at the antenna array and precisely models the presence of scattering objects within the environment as a discrete array of loaded dipoles. To this end, we exploit real-life channel measurements collected in an office environment to demonstrate the validity of such a model and its applicability in a practical scenario. Finally, we provide numerical results demonstrating that designing the RIS configuration based upon such model leads to superior performance as compared to reference schemes.Here's the translation in Traditional Chinese: Ensuring the precision of channel modeling plays a crucial role in the development of wireless communication systems, and this requirement remains a persistent challenge within the realm of networks supported by Reconfigurable Intelligent Surfaces (RIS). Achieving a comprehensive and reliable understanding of channel behavior in RIS-aided networks is an ongoing and complex issue that demands further exploration. In this paper, we empirically validate a recently-proposed impedance-based RIS channel model that accounts for the mutual coupling at the antenna array and precisely models the presence of scattering objects within the environment as a discrete array of loaded dipoles. To this end, we exploit real-life channel measurements collected in an office environment to demonstrate the validity of such a model and its applicability in a practical scenario. Finally, we provide numerical results demonstrating that designing the RIS configuration based upon such model leads to superior performance as compared to reference schemes.

A Unified Framework for Pulse-Shaping on Delay-Doppler Plane

  • paper_url: http://arxiv.org/abs/2311.12543
  • repo_url: None
  • paper_authors: Mohsen Bayat, Arman Farhang
    for: This paper aims to classify and analyze the properties of different pulse-shaping techniques used in delay-Doppler multiplexing, and to develop a unified framework for understanding their similarities and distinctions.methods: The paper uses a combination of theoretical analysis and simulation to classify pulse-shaping techniques into two types: circular and linear, and to derive a generalized input-output relationship that captures the influence of pulse-shaping on the effective channel. The authors also propose a unified modem for delay-Doppler plane pulse-shaping and introduce effective techniques to reduce OOB emissions and improve BER performance.results: The paper reveals that the recently emerged waveform ODDM is a linear pulse-shaping technique with an interesting staircase spectral behavior. The proposed modem structures are substantially simpler than existing ones in the literature, and the proposed techniques can reduce OOB emissions and improve BER performance for both circular and linear pulse-shaping techniques. The paper also extensively compares different pulse-shaping techniques using various performance metrics.Here is the answer in Simplified Chinese text:for: 这篇论文的目的是为了分类和分析不同的激光整形技术在延迟-Doppler多路复用中的应用,并发展一个统一的框架,以便更好地理解这些技术之间的相似性和差异。methods: 这篇论文使用了理论分析和 simulating 来分类激光整形技术为两种:圆拱和线性,并 derivate 一个通用的输入-输出关系,以捕捉激光整形对有效通道的影响。作者还提出了一个统一的模式机制,以便在延迟-Doppler平面上进行整形。results: 这篇论文发现,最近提出的波形ODDM是一种线性整形技术,具有一个有趣的楔形频率特性。提出的模式结构比现有的文献中的结构更加简单,并且提出的技术可以降低圆拱整形信号的射频干扰和提高BER性能。论文还对不同的整形技术进行了广泛的比较,使用了多种性能指标。
    Abstract Delay-Doppler multiplexing has recently stirred a great deal of attention in research community. While multiple studies have investigated pulse-shaping aspects of this technology, it is challenging to identify the relationships between different pulse-shaping techniques and their properties. Hence, in this paper, we classify these techniques into two types, namely, circular and linear pulse-shaping. This paves the way towards the development of a unified framework that brings deep insights into the properties, similarities, and distinctions of different pulse-shaping techniques. This framework reveals that the recently emerged waveform orthogonal delay-Doppler multiplexing (ODDM) is a linear pulse-shaping technique with an interesting staircase spectral behaviour. Using this framework, we derive a generalized input-output relationship that captures the influence of pulse-shaping on the effective channel. We also introduce a unified modem for delay-Doppler plane pulse-shaping that leads to the proposal of fast convolution based low-complexity structures. Based on our complexity analysis, the proposed modem structures are substantially simpler than the existing ones in the literature. Furthermore, we propose effective techniques that not only reduce the out-of-band (OOB) emissions of circularly pulse-shaped signals but also improve the bit-error-rate (BER) performance of both circular and linear pulse-shaping techniques. Finally, we extensively compare different pulse-shaping techniques using various performance metrics.
    摘要 延迟-多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多普勒多

Wearable Technologies for Monitoring Upper Extremity Functions During Daily Life in Neurologically Impaired Individuals

  • paper_url: http://arxiv.org/abs/2311.12513
  • repo_url: None
  • paper_authors: Tommaso Proietti, Andrea Bandini
    for: This review focuses on wearable technologies to monitor upper extremity (UE) function in neurologically impaired individuals during daily life activities.methods: The review categorizes different sensors, data collection and data processing approaches employed in wearable technologies for UE function monitoring.results: The majority of studies involved stroke survivors, and predominantly employed inertial measurement units and accelerometers to collect kinematics. Most analyses were performed offline, focusing on activity duration and frequency as key metrics. However, an ideal solution that combines non-intrusiveness, lightweight design, detailed hand and finger movement capture, contextual information, extended recording duration, ease of use, and privacy protection remains an elusive goal.
    Abstract Neurological disorders, including stroke, spinal cord injuries, multiple sclerosis, and Parkinson's disease, generally lead to diminished upper extremity (UE) function, impacting individuals' independence and quality of life. Traditional assessments predominantly focus on standardized clinical tasks, offering limited insights into real-life UE performance. In this context, this review focuses on wearable technologies as a promising solution to monitor UE function in neurologically impaired individuals during daily life activities. Our primary objective is to categorize the different sensors, data collection and data processing approaches employed. What comes to light is that the majority of studies involved stroke survivors, and predominantly employed inertial measurement units and accelerometers to collect kinematics. Most analyses in these studies were performed offline, focusing on activity duration and frequency as key metrics. Although wearable technology shows potential in monitoring UE function in real-life scenarios, an ideal solution that combines non-intrusiveness, lightweight design, detailed hand and finger movement capture, contextual information, extended recording duration, ease of use, and privacy protection remains an elusive goal. Furthermore, it stands out a growing necessity for a multimodal approach in capturing comprehensive data on UE function during real-life activities to enhance the personalization of rehabilitation strategies and ultimately improve outcomes for these individuals.
    摘要 神经系统疾病,包括中风、脊梁损伤、多发性硬化病和 Parkinson 病,通常导致上肢功能减退,影响个体独立和生活质量。传统评估主要集中在标准临床任务上,提供有限的实际生活活动信息。在这种情况下,本文报告着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着着��

A study on Satellite-to-Ground Propagation in Urban Environment

  • paper_url: http://arxiv.org/abs/2311.12500
  • repo_url: None
  • paper_authors: N. Cenni, V. Degli-Esposti, E. M. Vitucci, F. Fuschini, M. Barbiroli
  • for: 这篇论文旨在探讨未来6G无线网络中非地面网络的重要作用,以增强全球连接性和性能,并与地面网络合作。
  • methods: 这篇论文使用了射线轨迹模拟工具来分析卫星到地面通信频谱的主要媒体传播机制,以及卫星位置对K因子的影响。
  • results: 研究发现,在非直线视场情况下,非 Specular反射 debido to surface irregularities emerges as a primary propagation mechanism。此外,K因子在高度角度上显示一个轻微增加趋势,与之前的研究不同。
    Abstract Non-Terrestrial Networks are going to play an important role in future 6G wireless networks to enhance global connectivity a performance in cooperation with terrestrial networks. In order to properly design and deploy non-terrestrial networks, the satellite-to-ground channel must be properly characterized, with particular focus on the urban environment. This paper uses a Ray-Tracing simulation tool to analyze the primary propagation mechanisms and the behaviour of the Rician K-factor as a function of satellite position in a reference urban environment. Non-specular reflection due to surface irregularities emerges as a primary propagation mechanism in non-line-of-sight cases. Additionally, the Rician K-factor shows a slightly increasing trend with elevation angle, in contrast to previous studies.
    摘要 非地面网络在未来6G无线网络中将扮演重要的角色,以提高全球连接性和性能,并在地面网络的协作下进行设计和部署。为了正确地设计和部署非地面网络,需要准确地 caracterize satellite-to-ground通道,特别是在 urbano环境中。这篇文章使用了射线轨迹模拟工具来分析主要媒体传播机制和射线K-因子在参照城市环境中的行为,发现非 Specular反射因为表面不规则的影响是非线性媒体传播机制的主要作用。此外,射线K-因子与高度角度有一些提升趋势,与前一些研究不同。

Constellation Shaping under Phase Noise Impairment for Sub-THz Communications

  • paper_url: http://arxiv.org/abs/2311.12433
  • repo_url: None
  • paper_authors: Dileepa Marasinghe, Le Hang Nguyen, Jafar Mohammadi, Yejian Chen, Thorsten Wild, Nandana Rajatheva
  • for: 本文旨在开探使用单载波频域平衡(SC-FDE)波形在sub-THz频率范围内实现高速通信,并解决因硬件缺陷而导致的phasenoise和PAPR问题。
  • methods: 本文使用了一种基于Lagrangian函数的优化方法,通过控制constellation的几何结构来实现phasenoise和PAPR问题的解决。
  • results: 本文的实验结果表明,通过使用提出的优化方法,可以实现一种phasenoise和PAPR问题的 numerically robust的SC-FDE波形,并且可以保持低的PAPR值。
    Abstract The large untapped spectrum in the sub-THz allows for ultra-high throughput communication to realize many seemingly impossible applications in 6G. One of the challenges in radio communications in sub-THz is the hardware impairments. Specifically, phase noise is one key hardware impairment, which is accentuated as we increase the frequency and bandwidth. Furthermore, the modest output power of the sub-THz power amplifier demands limits on peak to average power ratio (PAPR) signal design. Single carrier frequency domain equalization (SC-FDE) waveform has been identified as a suitable candidate for sub-THz, although some challenges such as phase noise and PAPR still remain to be tackled. In this work, we design a phase noise robust, low PAPR SC-FDE waveform by geometrically shaping the constellation under practical conditions. We formulate the waveform optimization problem in its augmented Lagrangian form and use a back-propagation-inspired technique to obtain a constellation design that is numerically robust to phase noise, while maintaining a low PAPR.
    摘要 “对于6G中的几个不可能的应用,让大量未使用的频谱在低频 THz 频段提供了无限高的通信能力。但是,对于射频通信在低频 THz 频段的一个挑战是硬件障碍。具体来说,频率干扰是一个关键的硬件障碍,随着频率和宽度的增加,这个问题会更加突出。此外,低频 THz 发射器的输出功率较低,对于峰值至平均功率比(PAPR)的限制也是一个挑战。单束频率领域均衡(SC-FDE)波形已被识别为低频 THz 的适合选择,但是这些问题仍然需要解决。在这个工作中,我们设计了具有几何对称的、低 PAPR SC-FDE 波形,并通过实际条件下的几何对称形成问题来解决这些问题。我们使用了一种具有调适问题的扩展拉格朗日式问题,并使用一种基于复调的技术来获得一个可 numerically 具有对频率干扰的免疫力,同时维持低 PAPR。”

A Hybrid Frame Structure Design of OTFS for Multi-tasks Communications

  • paper_url: http://arxiv.org/abs/2311.12390
  • repo_url: None
  • paper_authors: Pu Yuan, Jin Liu, Dajie Jiang, Fei Qin
  • for: 高速场景中的宽谱波形(OTFS)可以充分利用时间频率空间的多样性,但是它会增加处理延迟,不符合一些服务的精细延迟要求。
  • methods: 我们提出了一种混合帧结构,将OTFS和OFDM在时间频率空间中相互 ortogonally 多元化,以适应多样性和延迟两个任务的需求。
  • results: 我们发现在通道卷积后,这种多元化被打乱,我们提供了实用的算法来 Mitigate ISI между OTFS 和 OFDM,并且数值结果证明了混合帧结构的效果。
    Abstract Orthogonal time frequency space (OTFS) is a promising waveform in high mobility scenarios for it fully exploits the time-frequency diversity using a discrete Fourier transform (DFT) based two dimensional spreading. However, it trades off the processing latency for performance and may not fulfill the stringent latency requirements in some services. This fact motivates us to design a hybrid frame structure where the OTFS and Orthogonal Frequency Division Multiplexing (OFDM) are orthogonally multiplexed in the time domain, which can adapt to both diversity-preferred and latency-preferred tasks. As we identify that this orthogonality is disrupted after channel coupling, we provide practical algorithms to mitigate the inter symbol interference between (ISI) the OTFS and OFDM, and the numerical results ensure the effectiveness of the hybrid frame structure.
    摘要 高度可移植的时域频率空间(OTFS)是一种有前途的波形,它在高速移动场景中充分利用时域频率多样性,使用基于离散傅里叶变换(DFT)的二维扩散。然而,它需要让处理延迟与性能之间进行权衡,并且可能不符合一些服务的串行时间要求。这种情况驱使我们设计一种混合帧结构,其中OTFS和Orthogonal Frequency Division Multiplexing(OFDM)在时域上相互对抗,以适应多样性和延迟两个任务。然而,我们发现在通道对接后,这种一致性被破坏,因此我们提供了实用的算法来 Mitigate the inter symbol interference(ISI)between OTFS和OFDM,并且数学结果证明了混合帧结构的有效性。

eess.AS - 2023-11-20

How does end-to-end speech recognition training impact speech enhancement artifacts?

  • paper_url: http://arxiv.org/abs/2311.11599
  • repo_url: None
  • paper_authors: Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita, Hiroshi Sato, Shoko Araki, Shigeru Katagiri
  • for: 本研究旨在探讨对单通道声音提高前端和自动语音识别后端的共同训练对声音提高前端的信号水平特征的影响。
  • methods: 本研究使用了一种新的方法,即将SE front-end和ASR back-end的训练合并到一起,以降低由单通道SE处理生成的处理损害对ASR的影响。
  • results: 研究结果表明,对ASRlevel进行SE front-end的训练可以降低artifact错误,但是会增加噪声错误。此外,通过简单地 interpolate the enhanced and observed signals,可以达到降低artifacts和增加噪声的效果,而无需修改SE和ASR模块。这些结果为设计ASR无关的SE front-end提供了更好的理解和一种新的想法。
    Abstract Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we investigate the effect of such joint training on the signal-level characteristics of the enhanced signals from the viewpoint of the decomposed noise and artifact errors. The experimental analyses provide two novel findings: 1) ASR-level training of the SE front-end reduces the artifact errors while increasing the noise errors, and 2) simply interpolating the enhanced and observed signals, which achieves a similar effect of reducing artifacts and increasing noise, improves ASR performance without jointly modifying the SE and ASR modules, even for a strong ASR back-end using a WavLM feature extractor. Our findings provide a better understanding of the effect of joint training and a novel insight for designing an ASR agnostic SE front-end.
    摘要 jointly 训练一个抖音减少(SE)前端和一个自动语音识别(ASR)后端,以降低单通道SE生成的处理扭曲对ASR的影响。在这篇论文中,我们研究了这种联合训练对减少噪声和缺陷错误的信号水平特征的影响。实验分析提供了两个新发现:1)ASR级别训练SE前端可以降低缺陷错误,同时增加噪声错误;2)简单地 interpolating 减少和观察到的信号,可以实现降低缺陷和增加噪声的效果,无需修改SE和ASR模块,即使使用一个强大的ASR后端使用WavLM特征提取器。我们的发现为 joint 训练的影响提供了更好的理解,并为设计ASR无关的SE前端提供了一个新的视角。

Neural network-based virtual microphone estimation with virtual microphone and beamformer-level multi-task loss

  • paper_url: http://arxiv.org/abs/2311.11595
  • repo_url: None
  • paper_authors: Hanako Segawa, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Shoko Araki, Takeshi Yamada, Shoji Makino
  • for: 提高数字微phone数量以提高数字笔记录器的性能
  • methods: 使用神经网络预测虚拟微phone信号,并使用多任务损失函数组合VM-level和beamformer-level损失函数
  • results: 在多话者下定制不足的情况下,提出了一种多任务NN-VME方法,实现了33.1%的相对WRER提升和10.8%的相对于先前NN-VME方法的提升。
    Abstract Array processing performance depends on the number of microphones available. Virtual microphone estimation (VME) has been proposed to increase the number of microphone signals artificially. Neural network-based VME (NN-VME) trains an NN with a VM-level loss to predict a signal at a microphone location that is available during training but not at inference. However, this training objective may not be optimal for a specific array processing back-end, such as beamforming. An alternative approach is to use a training objective considering the array-processing back-end, such as a loss on the beamformer output. This approach may generate signals optimal for beamforming but not physically grounded. To combine the advantages of both approaches, this paper proposes a multi-task loss for NN-VME that combines both VM-level and beamformer-level losses. We evaluate the proposed multi-task NN-VME on multi-talker underdetermined conditions and show that it achieves a 33.1 % relative WER improvement compared to using only real microphones and 10.8 % compared to using a prior NN-VME approach.
    摘要 (Simplified Chinese translation)阵列处理性能取决于可用的麦克风数量。虚拟麦克风估计(VME)已经提出来增加虚拟麦克风信号的数量。基于神经网络的VME(NN-VME)使用一个神经网络,在训练时使用VM级损失来预测在训练过程中可用的麦克风位置上的信号,但在推理时不可用。然而,这种训练目标可能不适合特定的阵列处理后端,例如扩散算法。一种alternative approach是使用包含阵列处理后端的训练目标,例如扩散输出损失。这种方法可能生成适合扩散的信号,但不是物理上的。为了结合两种方法的优点,本文提出了一种多任务损失函数 дляNN-VME,该函数combines VM级和扩散级损失。我们对多话者下的干扰过度的情况进行评估,并显示了使用该方法可以相比使用真实的麦克风和10.8%相比,提高33.1%的相对WRER。

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

  • paper_url: http://arxiv.org/abs/2311.11545
  • repo_url: None
  • paper_authors: Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
    for: 提高高质量语音生成的实用性methods: 采用ConvNeXt v2作为后馈网络进行幅度和相位预测,并引入多分辨率检定器(MRD)进行GAN型损失优化results: 在常见配置下(即采样率22.05kHz,spectral frame shift256点,约11.6ms),提出的APNet2 vocoder在比较 HiFi-GAN和iSTFTNet等其他 vocoder的情况下,Synthesized speech质量达到了相同水平,同时具有迅速的推理速度。
    Abstract In our previous work, we proposed a neural vocoder called APNet, which directly predicts speech amplitude and phase spectra with a 5 ms frame shift in parallel from the input acoustic features, and then reconstructs the 16 kHz speech waveform using inverse short-time Fourier transform (ISTFT). APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed. However, the performance of the APNet vocoder is constrained by the waveform sampling rate and spectral frame shift, limiting its practicality for high-quality speech synthesis. Therefore, this paper proposes an improved iteration of APNet, named APNet2. The proposed APNet2 vocoder adopts ConvNeXt v2 as the backbone network for amplitude and phase predictions, expecting to enhance the modeling capability. Additionally, we introduce a multi-resolution discriminator (MRD) into the GAN-based losses and optimize the form of certain losses. At a common configuration with a waveform sampling rate of 22.05 kHz and spectral frame shift of 256 points (i.e., approximately 11.6ms), our proposed APNet2 vocoder outperformed the original APNet and Vocos vocoders in terms of synthesized speech quality. The synthesized speech quality of APNet2 is also comparable to that of HiFi-GAN and iSTFTNet, while offering a significantly faster inference speed.
    摘要 在我们之前的工作中,我们提出了一种神经 vocoder called APNet,它直接预测了speech 幅和相位 спектrum 的 5 ms 帧shift 并在并行地从输入语音特征中预测,然后使用 inverse short-time Fourier transform (ISTFT) 重建 16 kHz 语音波形。APNet 表现出了能够生成与 HiFi-GAN vocoder 相当的质量的合成语音,但是它的推理速度明显提高。然而,APNet 的性能受到波形采样率和 spectral frame shift 的限制,限制了其实际应用的质量。因此,这篇文章提出了 APNet2 vocoder。我们的提议的 APNet2 vocoder 采用 ConvNeXt v2 作为幅和相位预测的背bone 网络,以提高模型的能力。此外,我们还引入了 multi-resolution discriminator (MRD) 到 GAN-based 损失中,并优化了某些损失的形式。在一般配置下(即波形采样率为 22.05 kHz,spectral frame shift 为 256点,约为 11.6ms),我们的提议的 APNet2 vocoder 在与原始 APNet 和 Vocos vocoders 的比较中表现出了较好的合成语音质量。同时,APNet2 的合成语音质量也与 HiFi-GAN 和 iSTFTNet 相当,而且具有了显著 faster 的推理速度。

cs.CV - 2023-11-20

HandSight: DeCAF & Improved Fisher Vectors to Classify Clothing Color and Texture with a Finger-Mounted Camera

  • paper_url: http://arxiv.org/abs/2311.12225
  • repo_url: None
  • paper_authors: Alexander J. Medeiros, Lee Stearns, Jon E. Froehlich
  • for: 解决盲人每天选衣问题,使用手指搭载摄像头和现代分类算法。
  • methods: 使用DeCAF和改进的抽象报告图像特征进行服装 текстуre 分类。
  • results: 获得 >95% 的准确率,并提供了一个大量的图像 dataset(HCTD)和现代分类算法的评估。
    Abstract We demonstrate the use of DeCAF and Improved Fisher Vector image features to classify clothing texture. The issue of choosing clothes is a problem for the blind every day. This work attempts to solve the issue with a finger-mounted camera and state-of-the-art classification algorithms. To evaluate our solution, we collected 520 close-up images across 29 pieces of clothing. We contribute (1) the HCTD, an image dataset taken with a NanEyeGS camera, a camera small enough to be mounted on the finger, and (2) evaluations of state-of-the-art recognition algorithms applied to our dataset - achieving an accuracy >95%. Throughout the paper, we will discuss previous work, evaluate the current work, and finally, suggest the project's future direction.
    摘要 我们展示了使用DeCAF和改进的鱼雷vector图像特征来分类服装Texture。每天选择衣服是盲人的问题。这项工作尝试解决这个问题通过用户手指上的摄像头和当今最佳分类算法。为评估我们的解决方案,我们收集了520个 close-up图像,涵盖29件不同的服装。我们的贡献包括(1)HCTD dataset,使用 NanEyeGS摄像头拍摄的图像集,该摄像头可以被安装在手指上,以及(2)对我们的数据集进行当今最佳recognition算法的评估,实现了准确率大于95%。在这篇论文中,我们将讨论前一项工作、评估当前工作,并最后提出未来项目的方向。

DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation

  • paper_url: http://arxiv.org/abs/2311.12194
  • repo_url: None
  • paper_authors: Yifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafianos, Wojciech Matusik, Tuur Stuyck
  • for: 这个论文的目的是提高数字人体的现实感,以便实现虚拟存在和自定义。
  • methods: 这个论文使用了差分 simulation 技术来实现人体和服装的共优化。
  • results: 实验结果表明,这个方法可以生成现实的服装和人体形状,可以方便地应用于下游应用程序。Here’s the breakdown of each point in English:
  • for: The purpose of this paper is to improve the realism of digital avatars, enabling telepresence applications with self-expression and customization.
  • methods: The paper uses a novel approach called differentiable simulation to perform body and garment co-optimization.
  • results: The experimental results show that the proposed approach can generate realistic clothing and body shapes that can be easily used in downstream applications.
    Abstract The realism of digital avatars is crucial in enabling telepresence applications with self-expression and customization. A key aspect of this realism originates from the physical accuracy of both a true-to-life body shape and clothing. While physical simulations can produce high-quality, realistic motions for clothed humans, they require precise estimation of body shape and high-quality garment assets with associated physical parameters for cloth simulations. However, manually creating these assets and calibrating their parameters is labor-intensive and requires specialized expertise. To address this gap, we propose DiffAvatar, a novel approach that performs body and garment co-optimization using differentiable simulation. By integrating physical simulation into the optimization loop and accounting for the complex nonlinear behavior of cloth and its intricate interaction with the body, our framework recovers body and garment geometry and extracts important material parameters in a physically plausible way. Our experiments demonstrate that our approach generates realistic clothing and body shape that can be easily used in downstream applications.
    摘要 现实化数字人物的重要性在推动虚拟存在应用中具有自我表达和定制功能。一个关键的这种现实性来自于真实的身体形状和服装的物理准确性。虽然物理模拟可以生成高质量、现实的人类运动,但它们需要精准地估计身体形状和高质量的服装资产,并且需要特殊的专业知识来调整参数。为解决这个差距,我们提出了DiffAvatar,一种新的方法,它通过拥有可微的模拟来实现身体和服装的共优化。我们的框架将物理模拟集成到优化循环中,考虑到织物的复杂非线性行为和身体之间的细腻交互,从而实现了身体和服装的准确物理性。我们的实验表明,我们的方法可以生成现实的衣服和身体形状,可以方便地在下游应用中使用。

Disentangling Structure and Appearance in ViT Feature Space

  • paper_url: http://arxiv.org/abs/2311.12193
  • repo_url: None
  • paper_authors: Narek Tumanyan, Omer Bar-Tal, Shir Amir, Shai Bagon, Tali Dekel
  • for: The paper is written for transferring the visual appearance of one natural image to another, specifically for generating an image where objects in a source structure image are “painted” with the visual appearance of their semantically related objects in a target appearance image.
  • methods: The paper uses a pre-trained and fixed Vision Transformer (ViT) model to leverage semantic information and derive novel disentangled representations of structure and appearance. The objective function splices the desired structure and appearance representations together in the space of ViT features.
  • results: The paper demonstrates high-resolution results on a variety of in-the-wild image pairs, under significant variations in the number of objects, pose, and appearance, without requiring adversarial training or additional input information such as semantic segmentation or correspondences.Here’s the simplified Chinese text for the three key points:
  • for: 这篇论文是为了将一个自然图像的视觉特征转移到另一个图像上的。特别是,我们的目标是生成一个图像,其中源结构图像中的对象被”涂抹”上目标外观图像中的semantically相关的对象的视觉特征。
  • methods: 这篇论文使用一个预训练和固定的视觉转换器(ViT)模型,以利用 semantic信息并 deriv出新的分离的结构和外观表示。我们定义了一个目标函数,将愿望的结构和外观表示拼接在ViT特征空间中。
  • results: 这篇论文在一系列的宽频域自然图像对中 demonstarted高分辨率结果,面对对象的数量、 pose和外观变化,无需对抗恐或额外的输入信息,如semantic分割或对应关系。
    Abstract We present a method for semantically transferring the visual appearance of one natural image to another. Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image. To integrate semantic information into our framework, our key idea is to leverage a pre-trained and fixed Vision Transformer (ViT) model. Specifically, we derive novel disentangled representations of structure and appearance extracted from deep ViT features. We then establish an objective function that splices the desired structure and appearance representations, interweaving them together in the space of ViT features. Based on our objective function, we propose two frameworks of semantic appearance transfer -- "Splice", which works by training a generator on a single and arbitrary pair of structure-appearance images, and "SpliceNet", a feed-forward real-time appearance transfer model trained on a dataset of images from a specific domain. Our frameworks do not involve adversarial training, nor do they require any additional input information such as semantic segmentation or correspondences. We demonstrate high-resolution results on a variety of in-the-wild image pairs, under significant variations in the number of objects, pose, and appearance. Code and supplementary material are available in our project page: splice-vit.github.io.
    摘要 我们提出了一种方法,可以将一个自然图像的视觉特征semantic transfer到另一个图像中。特icularly,我们的目标是将源结构图像中的 объекts "涂抹" 上 Target appearance image中的semantic相关的物体的视觉特征。为了 integrate semantic information into our framework, our key idea is to leverage a pre-trained and fixed Vision Transformer (ViT) model. Specifically, we derive novel disentangled representations of structure and appearance extracted from deep ViT features. We then establish an objective function that splices the desired structure and appearance representations, interweaving them together in the space of ViT features. Based on our objective function, we propose two frameworks of semantic appearance transfer -- "Splice", which works by training a generator on a single and arbitrary pair of structure-appearance images, and "SpliceNet", a feed-forward real-time appearance transfer model trained on a dataset of images from a specific domain. Our frameworks do not involve adversarial training, nor do they require any additional input information such as semantic segmentation or correspondences. We demonstrate high-resolution results on a variety of in-the-wild image pairs, under significant variations in the number of objects, pose, and appearance. Code and supplementary material are available in our project page: splice-vit.github.io.Here's the translation in Traditional Chinese:我们提出了一种方法,可以将一个自然图像的视觉特征semantic transfer到另一个图像中。特别是,我们的目标是将源结构图像中的 objects "涂抹" 上 Target appearance image中的semantic相关的物体的视觉特征。为了 integrate semantic information into our framework, our key idea is to leverage a pre-trained and fixed Vision Transformer (ViT) model. Specifically, we derive novel disentangled representations of structure and appearance extracted from deep ViT features. We then establish an objective function that splices the desired structure and appearance representations, interweaving them together in the space of ViT features. Based on our objective function, we propose two frameworks of semantic appearance transfer -- "Splice", which works by training a generator on a single and arbitrary pair of structure-appearance images, and "SpliceNet", a feed-forward real-time appearance transfer model trained on a dataset of images from a specific domain. Our frameworks do not involve adversarial training, nor do they require any additional input information such as semantic segmentation or correspondences. We demonstrate high-resolution results on a variety of in-the-wild image pairs, under significant variations in the number of objects, pose, and appearance. Code and supplementary material are available in our project page: splice-vit.github.io.

LABELMAKER: Automatic Semantic Label Generation from RGB-D Trajectories

  • paper_url: http://arxiv.org/abs/2311.12174
  • repo_url: None
  • paper_authors: Silvan Weder, Hermann Blum, Francis Engelmann, Marc Pollefeys
  • for: 该论文主要用于提供一种自动生成2D/3D标签框架,以便训练或评估视觉模型。
  • methods: 该框架基于多种现状顶尖分割模型和神经网络抽象 render,可以自动生成高精度的2D/3D标签数据,不需要人工干预。
  • results: 对比手动标注的ScanNet数据集,该框架可以生成更高精度的标签数据,并自动标注了之前未标注的ARKitScenes数据集。
    Abstract Semantic annotations are indispensable to train or evaluate perception models, yet very costly to acquire. This work introduces a fully automated 2D/3D labeling framework that, without any human intervention, can generate labels for RGB-D scans at equal (or better) level of accuracy than comparable manually annotated datasets such as ScanNet. Our approach is based on an ensemble of state-of-the-art segmentation models and 3D lifting through neural rendering. We demonstrate the effectiveness of our LabelMaker pipeline by generating significantly better labels for the ScanNet datasets and automatically labelling the previously unlabeled ARKitScenes dataset. Code and models are available at https://labelmaker.org
    摘要 <>对于训练或评估观察模型, semantic annotation 是不可或缺的,但是它们很昂贵。这项工作提出了一个完全自动化的 2D/3D 标注框架,可以在等同于或更高的准确率下生成 RGB-D 扫描数据的标注,不需要人工干预。我们的方法基于 Ensemble 的 state-of-the-art segmentation 模型和神经渲染的 3D 提升。我们在 LabelMaker 管道中证明了我们的方法的效果,通过生成 ScanNet 数据集中的更好的标注,并自动标注 ARKitScenes 数据集。可以在 https://labelmaker.org 获取代码和模型。Note: "Semantic annotation" in the text refers to the process of labeling objects or scenes in a video or image dataset with their semantic meaning (e.g., "chair", "table", "person", etc.).

ChemScraper: Graphics Extraction, Molecular Diagram Parsing, and Annotated Data Generation for PDF Images

  • paper_url: http://arxiv.org/abs/2311.12161
  • repo_url: https://gitlab.com/dprl/graphics-extraction
  • paper_authors: Ayush Kumar Shah, Bryan Manrique Amador, Abhisek Dey, Ming Creekmore, Blake Ocampo, Scott Denmark, Richard Zanibbi
  • for: 这篇论文是为了提出一种将生成的PDF图像翻译为化学结构表示(CDXML)的方法。
  • methods: 这种方法使用了 born-digital PDF图像中的Explicit locations和 shapes来提取符号,然后应用简单的图形变换来捕捉图像和化学结构的视觉和化学结构。
  • results: 作者的方法可以快速(PDF $\rightarrow$ 视觉图 $\rightarrow$ 化学图)转化 born-digital PDF图像,并且不需要GPU、Optical Character Recognition(OCR)或vectorization。在标准准确性测试中,作者的方法可以与SMILES字符串进行高度准确的比较。
    Abstract Existing visual parsers for molecule diagrams translate pixel-based raster images such as PNGs to chemical structure representations (e.g., SMILES). However, PDFs created by word processors including \LaTeX{} and Word provide explicit locations and shapes for characters, lines, and polygons. We %introduce a method to extract symbols from born-digital PDF molecule images and then apply simple graph transformations to capture both visual and chemical structure in editable ChemDraw files (CDXML). Our fast ( PDF $\rightarrow$ visual graph $\rightarrow$ chemical graph ) pipeline does not require GPUs, Optical Character Recognition (OCR) or vectorization. We evaluate on standard benchmarks using SMILES strings, along with a novel evaluation that provides graph-based metrics and error compilation using LgEval. The geometric information in born-digital PDFs produces a highly accurate parser, motivating generating training data for visual parsers that recognize from raster images, with extracted graphics, visual structure, and chemical structure as annotations. To do this we render SMILES strings in Indigo, parse molecule structure, and then validate recognized structure to select correct files.
    摘要 现有的视觉解析器 для分子图表示图像(如PNG)可以将图像转换为化学结构表示(如SMILES)。然而,WORD处理器生成的PDF文档提供了显式的字体、线条和多边形的位置和形状信息。我们提出了一种方法,可以从生成的PDF分子图像中提取符号,然后应用简单的图形变换来捕捉图像中的视觉结构和化学结构,并将其保存为可编辑的ChemDraw文件(CDXML)。我们的快速(PDF $\rightarrow$ 视觉图 $\rightarrow$ 化学图)管道不需要GPU、Optical Character Recognition(OCR)或vectorization。我们在标准准确度测试上使用SMILES字符串进行评估,以及一种新的评估方法,使用LgEval进行图形基准测试和错误编译。生成的PDF中的几何信息使得我们的解析器具有非常高的准确率,这引起了生成用于视觉解析器的训练数据的需求,其中包括提取的图形、视觉结构和化学结构作为注释。为此,我们使用Indigo渲染SMILES字符串,解析分子结构,然后验证认出的结构是否正确,以选择正确的文件。

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions

  • paper_url: http://arxiv.org/abs/2311.12157
  • repo_url: https://github.com/dimitris-christodoulou57/model-aware_3d_eye_gaze
  • paper_authors: Nikola Popovic, Dimitrios Christodoulou, Danda Pani Paudel, Xi Wang, Luc Van Gool
  • for: 这个论文的目的是提出一种基于弱监睹的3D眼动识别方法,使用眼动 semantic segmentation图像来预测3D眼动。
  • methods: 该方法使用 transformer 网络架构,并将眼动 semantic segmentation图像和直接监睹的3D眼动vector composite together,以便适应3D眼动模型的预测。
  • results: 实验结果表明,该方法在多种场景下具有显著的优势,与基eline方法相比,angular gaze error下降约5度。此外,只使用0.05%的3D注解数据可以达到类似于基eline方法的性能。
    Abstract The task of predicting 3D eye gaze from eye images can be performed either by (a) end-to-end learning for image-to-gaze mapping or by (b) fitting a 3D eye model onto images. The former case requires 3D gaze labels, while the latter requires eye semantics or landmarks to facilitate the model fitting. Although obtaining eye semantics and landmarks is relatively easy, fitting an accurate 3D eye model on them remains to be very challenging due to its ill-posed nature in general. On the other hand, obtaining large-scale 3D gaze data is cumbersome due to the required hardware setups and computational demands. In this work, we propose to predict 3D eye gaze from weak supervision of eye semantic segmentation masks and direct supervision of a few 3D gaze vectors. The proposed method combines the best of both worlds by leveraging large amounts of weak annotations--which are easy to obtain, and only a few 3D gaze vectors--which alleviate the difficulty of fitting 3D eye models on the semantic segmentation of eye images. Thus, the eye gaze vectors, used in the model fitting, are directly supervised using the few-shot gaze labels. Additionally, we propose a transformer-based network architecture, that serves as a solid baseline for our improvements. Our experiments in diverse settings illustrate the significant benefits of the proposed method, achieving about 5 degrees lower angular gaze error over the baseline, when only 0.05% 3D annotations of the training images are used. The source code is available at https://github.com/dimitris-christodoulou57/Model-aware_3D_Eye_Gaze.
    摘要 “ predicting 3D eye gaze from eye images 可以通过(a)全程学习对图像与视线映射的映射,或者(b)将3D眼模型适配到图像上。前者需要3D gaze标签,而后者需要眼动 semantics或特征点来促进模型适配。虽然获取眼动 semantics 和特征点 Comparatively easy, but fitting an accurate 3D eye model on them remains challenging due to its ill-posed nature in general。另一方面,获取大规模3D gaze数据 cumbersome due to the required hardware setups and computational demands。在这个工作中,我们提出了基于弱级别指导的3D eye gaze预测方法。该方法结合了大量弱级别指导(容易获取)和直接监督一些3D gaze вектор。因此,使用 Direct supervision of a few 3D gaze vectors alleviates the difficulty of fitting 3D eye models on the semantic segmentation of eye images。此外,我们还提出了一种基于 transformer 网络架构的方法,作为我们的提高基础。我们的实验在多种设定下证明了我们的方法具有显著的优势,在使用0.05% 3D注解的训练图像时,angular gaze error 降低约5度。源代码可以在 中获取。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Uncertainty Estimation in Contrast-Enhanced MR Image Translation with Multi-Axis Fusion

  • paper_url: http://arxiv.org/abs/2311.12153
  • repo_url: None
  • paper_authors: Ivo M. Baltruschat, Parvaneh Janbakhshi, Melanie Dohmen, Matthias Lenga
  • for: 这个论文主要针对的是医学影像转换任务中的知识不确定性评估。
  • methods: 该论文提出了一种基于多视角图像数据的多轴融合(MAF)模型不确定性评估方法。
  • results: 对于Synthesizing contrast enhanced T1-weighted images based on native T1, T2和T2-FLAIR scans任务,该方法显示了良好的相对误差和不确定性评估结果($\rho_{\text healthy} = 0.89$)。
    Abstract In recent years, deep learning has been applied to a wide range of medical imaging and image processing tasks. In this work, we focus on the estimation of epistemic uncertainty for 3D medical image-to-image translation. We propose a novel model uncertainty quantification method, Multi-Axis Fusion (MAF), which relies on the integration of complementary information derived from multiple views on volumetric image data. The proposed approach is applied to the task of synthesizing contrast enhanced T1-weighted images based on native T1, T2 and T2-FLAIR scans. The quantitative findings indicate a strong correlation ($\rho_{\text healthy} = 0.89$) between the mean absolute image synthetization error and the mean uncertainty score for our MAF method. Hence, we consider MAF as a promising approach to solve the highly relevant task of detecting synthetization failures at inference time.
    摘要 Note:* "epistemic uncertainty" refers to the uncertainty in the knowledge of the model, i.e., the uncertainty in the output of the model due to the limitations of the model's understanding of the input data.* "image-to-image translation" refers to the task of translating an input image from one modality or domain to another, e.g., translating a native T1 scan to a contrast-enhanced T1-weighted image.* "synthesizing" refers to the process of generating a new image based on input data, e.g., synthesizing a contrast-enhanced T1-weighted image based on native T1, T2, and T2-FLAIR scans.* "uncertainty quantification" refers to the process of estimating the uncertainty of a model's output, e.g., estimating the uncertainty of the synthesized image.* "failure detection" refers to the process of detecting when the model's output is incorrect or unreliable, e.g., detecting when the synthesized image is of poor quality or does not accurately represent the input data.

Applications of Large Scale Foundation Models for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2311.12144
  • repo_url: None
  • paper_authors: Yu Huang, Yue Chen, Zhu Li
  • for: 本研究旨在应用基础模型和大语言模型(LLM)于自动驾驶系统中,以解决现有AI长尾问题。
  • methods: 本研究使用了基础模型和LLM,包括模拟、世界模型、数据标注和观念规划等方法。
  • results: 本研究发现,通过结合基础模型和LLM,可以将人类知识、常识和推理应用到自动驾驶系统中,从而解决现有AI长尾问题。
    Abstract Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
    摘要 Translation notes:* "rural" and "urban" challenges are translated as "DARPA Grand Challenges" in Simplified Chinese, as the term "rural" and "urban" are not commonly used in Chinese.* "long-tailed AI dilemma" is translated as "current long-tailed AI dilemma" in Simplified Chinese, as the phrase "long-tailed" is not commonly used in Chinese.* "foundation models" is translated as "基础模型" in Simplified Chinese, as the term "foundation" is not commonly used in Chinese.* "LLMs" is translated as "大语言模型" in Simplified Chinese, as the term "LLM" is not commonly used in Chinese.* "planning or E2E solutions" is translated as "规划或E2E解决方案" in Simplified Chinese, as the term "E2E" is not commonly used in Chinese.

Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models

  • paper_url: http://arxiv.org/abs/2311.12128
  • repo_url: None
  • paper_authors: Pooya Fayyazsanavi, Negar Nejatishahidin, Jana Kosecka
  • for: 本研究旨在提高美国手语指写翻译的精度,使用视频在野进行 fingerspelling 翻译。
  • methods: 该研究利用了更加准确的手姿估计技术,并提出了一种基于 transformer 编码器-解码器模型的新型建议,允许无缝上下文ual word 翻译。此外,研究还添加了一种新的损失函数,以准确预测手写字符串的长度,从而提高训练和推测的性能。
  • results: 通过了广泛的实验,研究人员表明了其提议的方法在 ChicagoFSWild 和 ChicagoFSWild+ 上的超过 10% 的相对改进。这些结果表明了该方法的有效性,并且可能推动手语翻译的进步。代码可以在 https://github.com/pooyafayyaz/Fingerspelling-PoseNet 上获取。
    Abstract We address the task of American Sign Language fingerspelling translation using videos in the wild. We exploit advances in more accurate hand pose estimation and propose a novel architecture that leverages the transformer based encoder-decoder model enabling seamless contextual word translation. The translation model is augmented by a novel loss term that accurately predicts the length of the finger-spelled word, benefiting both training and inference. We also propose a novel two-stage inference approach that re-ranks the hypotheses using the language model capabilities of the decoder. Through extensive experiments, we demonstrate that our proposed method outperforms the state-of-the-art models on ChicagoFSWild and ChicagoFSWild+ achieving more than 10% relative improvement in performance. Our findings highlight the effectiveness of our approach and its potential to advance fingerspelling recognition in sign language translation. Code is also available at https://github.com/pooyafayyaz/Fingerspelling-PoseNet.
    摘要 我们 Addressing the task of American Sign Language fingerspelling translation using videos in the wild. We exploit advances in more accurate hand pose estimation and propose a novel architecture that leverages the transformer based encoder-decoder model enabling seamless contextual word translation. The translation model is augmented by a novel loss term that accurately predicts the length of the finger-spelled word, benefiting both training and inference. We also propose a novel two-stage inference approach that re-ranks the hypotheses using the language model capabilities of the decoder. Through extensive experiments, we demonstrate that our proposed method outperforms the state-of-the-art models on ChicagoFSWild and ChicagoFSWild+ achieving more than 10% relative improvement in performance. Our findings highlight the effectiveness of our approach and its potential to advance fingerspelling recognition in sign language translation. 码也可以在https://github.com/pooyafayyaz/Fingerspelling-PoseNet 查看。

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.12092
  • repo_url: https://github.com/rohitgandikota/sliders
  • paper_authors: Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau
  • for: 这个论文的目的是创建可解释的概念滑块,以便在扩散模型中控制图像生成中的属性。
  • methods: 这个方法利用了一个低纬度参数方向,以实现一个概念的精确控制,同时尽量减少其他属性的干扰。这个滑块可以通过一小组的提示或样本图像来创建,因此可以为文本或视觉概念创建滑块。
  • results: 对比之前的编辑技术,我们的滑块显示出更强的target编辑和更低的干扰。我们还展示了滑块的组合和连续调整,以及在StyleGAN中的intuitive编辑。此外,我们发现我们的方法可以帮助解决扩散过程中的一些常见问题,如物体扭曲和手部扭曲。我们的代码、数据和训练滑块可以在https://sliders.baulab.info/上获取。
    Abstract We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images; thus slider directions can be created for either textual or visual concepts. Concept Sliders are plug-and-play: they can be composed efficiently and continuously modulated, enabling precise control over image generation. In quantitative experiments comparing to previous editing techniques, our sliders exhibit stronger targeted edits with lower interference. We showcase sliders for weather, age, styles, and expressions, as well as slider compositions. We show how sliders can transfer latents from StyleGAN for intuitive editing of visual concepts for which textual description is difficult. We also find that our method can help address persistent quality issues in Stable Diffusion XL including repair of object deformations and fixing distorted hands. Our code, data, and trained sliders are available at https://sliders.baulab.info/
    摘要 我们提出了一种可解释的概念滑块创建方法,用于在扩散模型中控制特征。我们的方法可以在一个概念方向下确定低维度参数方向,以避免其他特征干扰。我们使用一小集的提示或样本图像来创建滑块,因此滑块方向可以基于文本概念或视觉概念。我们称之为概念滑块,可以高效地组合和连续调整,以实现图像生成的精确控制。在对比前期技术的量化实验中,我们的滑块表现出更强的特定编辑效果,同时干扰下降。我们展示了不同气候、年龄、风格和表情等概念滑块,以及如何将滑块组合以实现更复杂的图像生成。此外,我们发现我们的方法可以帮助解决扩散xl中的持续质量问题,如修复物体扭形和扭formed手部。我们的代码、数据和训练滑块可以在https://sliders.baulab.info/中获取。

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

  • paper_url: http://arxiv.org/abs/2311.12024
  • repo_url: None
  • paper_authors: Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang
  • for: 这个论文是为了重构3D对象从几个无法定位的图像中,同时估计相机pose,并在1.3秒钟内完成这个任务。
  • methods: 这个方法使用了自注意机制来交换3D对象token和2D图像token之间的信息,预测每个视角的粗略点云,然后使用可导Perspective-n-Point(PnP)解决器来获取相机pose。
  • results: 当训练在大量多视图定位数据上(约1M个对象)时,PF-LRM表现出了强泛化能力,并在不同评估数据集上超过基eline方法的姿态预测精度和3D重建质量。此外,我们还证明了这个模型在文本/图像-to-3D任务中的应用性,通过快速前向推理。更多信息请访问我们的项目网站:https://totoro97.github.io/pf-lrm。
    Abstract We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1.3 seconds on a single A100 GPU. PF-LRM is a highly scalable method utilizing the self-attention blocks to exchange information between 3D object tokens and 2D image tokens; we predict a coarse point cloud for each view, and then use a differentiable Perspective-n-Point (PnP) solver to obtain camera poses. When trained on a huge amount of multi-view posed data of ~1M objects, PF-LRM shows strong cross-dataset generalization ability, and outperforms baseline methods by a large margin in terms of pose prediction accuracy and 3D reconstruction quality on various unseen evaluation datasets. We also demonstrate our model's applicability in downstream text/image-to-3D task with fast feed-forward inference. Our project website is at: https://totoro97.github.io/pf-lrm .
    摘要 我们提出了一种无 pose 大型重建模型(PF-LRM),可以从几张无 pose 图像中重建3D对象,甚至在视觉重叠少的情况下,在单个 A100 GPU 上运行时间约为1.3秒。PF-LRM 是一种高度可扩展的方法,通过自我注意块来交换3D对象标记和2D图像标记之间的信息;我们预测每个视图中的粗略点云,然后使用可导式 Perspective-n-Point(PnP)解决方案来获取相机位置。当在大量多视图posed数据上训练时,PF-LRM 显示出强大的跨数据集泛化能力,并在不同评估数据集上超越基线方法的姿态预测精度和3D重建质量。我们还证明了我们的模型在文本/图像到3D任务中的应用性,通过快速 feed-forward 推理来实现。我们的项目网站位于:https://totoro97.github.io/pf-lrm .

DAS: A Deformable Attention to Capture Salient Information in CNNs

  • paper_url: http://arxiv.org/abs/2311.12091
  • repo_url: None
  • paper_authors: Farzad Salajegheh, Nader Asadi, Soroush Saryazdi, Sudhir Mudur
  • for: 提高图像识别和物体检测的性能
  • methods: 使用弹性卷积和分解卷积实现快速和简单的全 convolutional 方法 DAS,以增强模型对重要信息的访问
  • results: 在添加 DAS 到流行的 CNN 上进行图像分类和物体检测时,得到了性能提高(比如,Stanford Dogs 上的提高率为4.47%,ImageNet 上的提高率为1.91%,COCO AP 上的提高率为3.3%),超过了其他 CNN 注意机制的性能,使用相同或更少的 FLOPs。
    Abstract Convolutional Neural Networks (CNNs) excel in local spatial pattern recognition. For many vision tasks, such as object recognition and segmentation, salient information is also present outside CNN's kernel boundaries. However, CNNs struggle in capturing such relevant information due to their confined receptive fields. Self-attention can improve a model's access to global information but increases computational overhead. We present a fast and simple fully convolutional method called DAS that helps focus attention on relevant information. It uses deformable convolutions for the location of pertinent image regions and separable convolutions for efficiency. DAS plugs into existing CNNs and propagates relevant information using a gating mechanism. Compared to the O(n^2) computational complexity of transformer-style attention, DAS is O(n). Our claim is that DAS's ability to pay increased attention to relevant features results in performance improvements when added to popular CNNs for Image Classification and Object Detection. For example, DAS yields an improvement on Stanford Dogs (4.47%), ImageNet (1.91%), and COCO AP (3.3%) with base ResNet50 backbone. This outperforms other CNN attention mechanisms while using similar or less FLOPs. Our code will be publicly available.
    摘要 卷积神经网络(CNN)在本地空间模式识别方面表现出色。然而,对于许多视觉任务,如物体识别和分割,salient information 也存在外部 CNN 的核心 boundaries。然而,CNN 很难捕捉这些相关信息,因为它们的捕捉范围太窄。自我注意可以提高模型对全球信息的访问权,但是会增加计算开销。我们提出了一种快速、简单的全 convolutional 方法called DAS,它使用可变尺寸 convolution 来定位相关图像区域,并使用分解 convolution 来提高效率。DAS 可以与现有 CNN 集成,并通过阀门机制将相关信息传递给下一层。相比于 transformer 样式的注意力计算复杂度 O(n^2),DAS 的计算复杂度为 O(n)。我们的主张是,DAS 能够增加对相关特征的注意力,会在添加到流行的 CNN 上进行图像分类和物体检测中提高性能。例如,DAS 在 Stanford Dogs 上获得了 4.47% 的改进,在 ImageNet 上获得了 1.91% 的改进,并在 COCO AP 上获得了 3.3% 的改进。这超过了其他 CNN 注意力机制,同时使用相同或更少的 FLOPs。我们将代码公开。

FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation

  • paper_url: http://arxiv.org/abs/2311.12090
  • repo_url: None
  • paper_authors: Chenliang Zhou, Fangcheng Zhong, Param Hanji, Zhilin Guo, Kyle Fogarty, Alejandro Sztrajman, Hongyun Gao, Cengiz Oztireli
  • for: 该论文旨在提出一种基于自适应变换的点云生成管线,以实现高质量、多样性和可控的点云数量生成任务。
  • methods: 该管线使用了一种新的频率纠正模块,通过圆形幂等方法来保持高频域内容,同时学习点云分布。此外,该管线还使用了一种梯度随机过程模型来学习拟合的离散分布。
  • results: 相比于现有的方法,该管线能够实现高质量、多样性和可控的点云数量生成,同时具有高效的计算性。我们的量化和 качеitative结果都证明了 FrePolad 的 estado-of-the-art 性。
    Abstract We propose FrePolad: frequency-rectified point latent diffusion, a point cloud generation pipeline integrating a variational autoencoder (VAE) with a denoising diffusion probabilistic model (DDPM) for the latent distribution. FrePolad simultaneously achieves high quality, diversity, and flexibility in point cloud cardinality for generation tasks while maintaining high computational efficiency. The improvement in generation quality and diversity is achieved through (1) a novel frequency rectification module via spherical harmonics designed to retain high-frequency content while learning the point cloud distribution; and (2) a latent DDPM to learn the regularized yet complex latent distribution. In addition, FrePolad supports variable point cloud cardinality by formulating the sampling of points as conditional distributions over a latent shape distribution. Finally, the low-dimensional latent space encoded by the VAE contributes to FrePolad's fast and scalable sampling. Our quantitative and qualitative results demonstrate the state-of-the-art performance of FrePolad in terms of quality, diversity, and computational efficiency.
    摘要 我们提出了FrePolad:一种结合变量自动编码器(VAE)和杂化扩散概率模型(DDPM)的点云生成管道,用于点云生成任务中的高质量、多样性和可变性。FrePolad同时实现了高效率和低维度的点云生成。我们通过以下两个方法提高生成质量和多样性:1. 通过圆柱幂融合学习点云分布,保留高频内容并减少噪声,实现高质量点云生成。2. 使用杂化扩散模型学习受杂化的幂值分布,以获得规则化但复杂的latent分布。此外,FrePolad支持可变点云Cardinality,通过对latent shape分布进行采样来实现。最后,VAE嵌入的低维度latent空间使得FrePolad的采样速度快且可扩展。我们的量化和质量效果表明FrePolad在质量、多样性和计算效率三个方面具有前所未有的表现。

LiDAR-HMR: 3D Human Mesh Recovery from LiDAR

  • paper_url: http://arxiv.org/abs/2311.11971
  • repo_url: https://github.com/soullessrobot/lidar-hmr
  • paper_authors: Bohao Fan, Wenzhao Zheng, Jianjiang Feng, Jie Zhou
  • for: 这篇论文旨在计算 sparse LiDAR 点云中的3D人体干树结构。
  • methods: 该论文提出了一种有效的稀疏到粗糙重建方法,通过估计3D人体姿势和逐渐重建人体干树结构。为更好地利用点云的3D结构信息,该方法使用了一种垂直堆栈Transformer(graphormer)来引入点云特征。
  • results: 对于三个公共可用的数据库进行了实验,并达到了效果的结果。Here’s the translation in English:
  • for: This paper aims to estimate the 3D human body mesh from sparse LiDAR point clouds.
  • methods: The proposed method uses an effective sparse-to-dense reconstruction scheme to reconstruct the 3D human mesh, by estimating a sparse representation of the human (3D human pose) and gradually reconstructing the body mesh. To better leverage the 3D structural information of point clouds, the method employs a cascaded graph transformer (graphormer) to introduce point cloud features during sparse-to-dense reconstruction.
  • results: Experimental results on three publicly available databases demonstrate the effectiveness of the proposed approach.
    Abstract In recent years, point cloud perception tasks have been garnering increasing attention. This paper presents the first attempt to estimate 3D human body mesh from sparse LiDAR point clouds. We found that the major challenge in estimating human pose and mesh from point clouds lies in the sparsity, noise, and incompletion of LiDAR point clouds. Facing these challenges, we propose an effective sparse-to-dense reconstruction scheme to reconstruct 3D human mesh. This involves estimating a sparse representation of a human (3D human pose) and gradually reconstructing the body mesh. To better leverage the 3D structural information of point clouds, we employ a cascaded graph transformer (graphormer) to introduce point cloud features during sparse-to-dense reconstruction. Experimental results on three publicly available databases demonstrate the effectiveness of the proposed approach. Code: https://github.com/soullessrobot/LiDAR-HMR/
    摘要 Recently, point cloud perception tasks have been gaining increasing attention. This paper presents the first attempt to estimate 3D human body mesh from sparse LiDAR point clouds. We found that the major challenge in estimating human pose and mesh from point clouds lies in the sparsity, noise, and incompleteness of LiDAR point clouds. To address these challenges, we propose an effective sparse-to-dense reconstruction scheme to reconstruct 3D human mesh. This involves estimating a sparse representation of a human (3D human pose) and gradually reconstructing the body mesh. To better leverage the 3D structural information of point clouds, we employ a cascaded graph transformer (graphormer) to introduce point cloud features during sparse-to-dense reconstruction. Experimental results on three publicly available databases demonstrate the effectiveness of the proposed approach. Code: https://github.com/soullessrobot/LiDAR-HMR/.Here's the translation in Traditional Chinese:近年来,点云识别任务 receiving increasing attention. 本篇文章提出了首次从稀叠 LiDAR 点云中 estimate 3D 人体网格. 我们发现,从点云中 estimate 人姿和网格的主要挑战在于稀叠、噪音和缺失 LiDAR 点云的问题. 面对这些挑战,我们提出了一个有效的稀叠到简的重建方案,从稀叠的人姿中逐步重建人体网格. 为了更好地利用点云的3D 结构资讯,我们使用了弹性的图形变换器 (graphormer) 引入点云特征 During sparse-to-dense reconstruction. 实验结果显示,提出的方法具有效iveness. Code: https://github.com/soullessrobot/LiDAR-HMR/.

SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

  • paper_url: http://arxiv.org/abs/2311.11969
  • repo_url: https://github.com/OpenGVLab/SAM-Med2D
  • paper_authors: Jin Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao
  • For: The paper is written for developing medical artificial intelligence for enhancing diagnosis, medical image analysis, knowledge sharing, and education.* Methods: The paper introduces SA-Med2D-20M, a large-scale segmentation dataset of 2D medical images built upon numerous public and private datasets, which consists of 4.6 million 2D medical images and 19.7 million corresponding masks covering almost the whole body and showing significant diversity.* Results: The paper presents comprehensive statistics of SA-Med2D-20M to facilitate the better use of the dataset, which can help researchers build medical vision foundation models or apply their models to downstream medical applications.
    Abstract Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowledge into SAM, we introduce SA-Med2D-20M, a large-scale segmentation dataset of 2D medical images built upon numerous public and private datasets. It consists of 4.6 million 2D medical images and 19.7 million corresponding masks, covering almost the whole body and showing significant diversity. This paper describes all the datasets collected in SA-Med2D-20M and details how to process these datasets. Furthermore, comprehensive statistics of SA-Med2D-20M are presented to facilitate the better use of our dataset, which can help the researchers build medical vision foundation models or apply their models to downstream medical applications. We hope that the large scale and diversity of SA-Med2D-20M can be leveraged to develop medical artificial intelligence for enhancing diagnosis, medical image analysis, knowledge sharing, and education. The data with the redistribution license is publicly available at https://github.com/OpenGVLab/SAM-Med2D.
    摘要 segments anything model (SAM) 已经取得了天然图像分割的出色结果,使用点和 bounding box 作为输入提示。其成功主要归功于大量标注训练数据。然而,直接将 SAM 应用于医疗图像分割是不可取的,因为 SAM 缺乏医疗知识 -- 它没有使用医疗图像进行训练。为了将医疗知识 integrate 到 SAM 中,我们介绍了 SA-Med2D-20M,一个基于多个公共和私人数据集的大规模分割dataset。它包括 4.6 万个 2D 医疗图像和 19.7 万个相应的mask,覆盖了大部分的身体和显示了显著的多样性。本文描述了 SA-Med2D-20M 中所收集的所有数据集,并详细介绍如何处理这些数据集。此外,我们还提供了 SA-Med2D-20M 的全面统计数据,以便更好地使用我们的数据集,帮助研究人员建立医疗视觉基础模型或将其模型应用到下游医疗应用。我们希望通过 SA-Med2D-20M 的大规模和多样性,为医疗人工智能的发展做出贡献,以便提高诊断、医疗图像分析、知识共享和教育。数据集的红色分布授权是公共可用的,可以在 上下载。

What Can AutoML Do For Continual Learning?

  • paper_url: http://arxiv.org/abs/2311.11963
  • repo_url: None
  • paper_authors: Mert Kilickaya, Joaquin Vanschoren
  • for: 该论文探讨了AutoML在逐步学习中的潜在应用,具体来说是如何使用AutoML方法来促进逐步学习的更多研究。
  • methods: 该论文不直接提出新方法,而是通过提出“什么样的AutoML方法可以用于逐步学习”这个问题,探讨了三个关键的研究方向,即使用AutoML方法来实现更动态的逐步学习,挑战了AutoML研究领域的新问题。
  • results: 该论文未直接提出新方法,但是通过探讨AutoML在逐步学习中的应用,提出了三个关键的研究方向,这些研究方向可能会带来更多的研究和应用。
    Abstract This position paper outlines the potential of AutoML for incremental (continual) learning to encourage more research in this direction. Incremental learning involves incorporating new data from a stream of tasks and distributions to learn enhanced deep representations and adapt better to new tasks. However, a significant limitation of incremental learners is that most current techniques freeze the backbone architecture, hyperparameters, and the order & structure of the learning tasks throughout the learning and adaptation process. We strongly believe that AutoML offers promising solutions to address these limitations, enabling incremental learning to adapt to more diverse real-world tasks. Therefore, instead of directly proposing a new method, this paper takes a step back by posing the question: "What can AutoML do for incremental learning?" We outline three key areas of research that can contribute to making incremental learners more dynamic, highlighting concrete opportunities to apply AutoML methods in novel ways as well as entirely new challenges for AutoML research.
    摘要 这份位点论文描述了AutoML在逐渐学习方面的潜在潜力,以促进更多关于这个方向的研究。逐渐学习是指在流动任务和分布中逐渐 incorporating new data,以学习提高深度表示和适应新任务。然而,现有的逐渐学习技术的一个重要限制是,它们在学习和适应过程中冻结了背bone架构、超参数和学习任务的顺序和结构。我们强烈认为,AutoML可以为逐渐学习提供了可能的解决方案,使其能够更好地适应多样化的实际任务。因此,而不是直接提出一种新方法,这篇论文做出了一个问题:"AutoML可以做什么为逐渐学习?"我们详细描述了三个关键的研究领域,可以使逐渐学习更加动态,并高亮了可以应用AutoML方法的具体机会以及 entirely new challenges for AutoML research。

An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis

  • paper_url: http://arxiv.org/abs/2311.11919
  • repo_url: None
  • paper_authors: Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan
  • for: 本文的主要目标是使用单个参考图像提取多个特征(如颜色、物体、布局、风格),并生成新样本以这些特征为条件。
  • methods: 本文提出了一种新的多特征倒拍算法(MATTE),该算法在DDPM模型层级和时间步级上同时学习多个嵌入,以提高特征分离。
  • results: 本文通过广泛的分析和实验表明,MATTE算法可以更好地分离多个特征,并且可以生成更高质量的样本。
    Abstract We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of new samples with this learned token. These methods, however, do not learn multiple tokens that are necessary to condition model outputs on the multiple attributes noted above. Another line of techniques expand the inversion space to learn multiple embeddings but they do this only along the layer dimension (e.g., one per layer of the DDPM model) or the timestep dimension (one for a set of timesteps in the denoising process), leading to suboptimal attribute disentanglement. To address the aforementioned gaps, the first contribution of this paper is an extensive analysis to determine which attributes are captured in which dimension of the denoising process. As noted above, we consider both the time-step dimension (in reverse denoising) as well as the DDPM model layer dimension. We observe that often a subset of these attributes are captured in the same set of model layers and/or across same denoising timesteps. For instance, color and style are captured across same U-Net layers, whereas layout and color are captured across same timestep stages. Consequently, an inversion process that is designed only for the time-step dimension or the layer dimension is insufficient to disentangle all attributes. This leads to our second contribution where we design a new multi-attribute inversion algorithm, MATTE, with associated disentanglement-enhancing regularization losses, that operates across both dimensions and explicitly leads to four disentangled tokens (color, style, layout, and object).
    摘要 我们考虑到将散布模型的输出受限于使用者提供的参考图像。我们的主要目标是从这个单一的参考图像中提取多个特征(例如颜色、物件、布局、Style),然后产生新的样本。现有的方法包括将参考图像反射为单一的文本条件 vector,以便产生新的样本。但这些方法不会学习多个条件,以调控模型的输出。另一些方法则是扩展反射空间,以学习多个嵌入,但是这些方法只是在层级(例如 DDPM 模型的层)或时间步(一组时间步骤)上进行对应,这会导致不够好的特征分离。为了解决这些问题,我们的第一个贡献是对于哪些特征是在哪个维度中捕捉的,我们考虑了时间步骤维度(在逆推实验中)和 DDPM 模型层级。我们发现,一些特征通常是在同一些层级和/或同一些时间步骤中捕捉的。例如,颜色和风格是在同一些 U-Net 层中捕捉的,而布局和颜色则是在同一些时间步骤中捕捉的。因此,仅仅对于时间步骤维度或层级进行对应是不足以将所有特征分离。这导致我们的第二个贡献,即设计了一个新的多特征反射算法(MATTE),以及相应的分离提升训练损失,以确保四个分离的条件(颜色、风格、布局、物件)。

Identifying the Defective: Detecting Damaged Grains for Cereal Appearance Inspection

  • paper_url: http://arxiv.org/abs/2311.11901
  • repo_url: https://github.com/hellodfan/ai4graininsp
  • paper_authors: Lei Fan, Yiwen Ding, Dongdong Fan, Yong Wu, Maurice Pagnucco, Yang Song
  • for: This paper aims to develop an automated Grain Appearance Inspection (GAI) system to improve the efficiency and accuracy of grain quality evaluation.
  • methods: The proposed system uses anomaly detection (AD) to identify damaged grains or unknown objects in grain kernels. The AD model, called AD-GAI, is trained using only normal samples and shows high performance in comparison with advanced AD methods.
  • results: The proposed system achieves a speedup of over 20x compared to human experts and shows highly consistent performance with human evaluation. A large-scale dataset of 220K high-quality images of wheat and maize kernels is created and released for future research.
    Abstract Cereal grain plays a crucial role in the human diet as a major source of essential nutrients. Grain Appearance Inspection (GAI) serves as an essential process to determine grain quality and facilitate grain circulation and processing. However, GAI is routinely performed manually by inspectors with cumbersome procedures, which poses a significant bottleneck in smart agriculture. In this paper, we endeavor to develop an automated GAI system:AI4GrainInsp. By analyzing the distinctive characteristics of grain kernels, we formulate GAI as a ubiquitous problem: Anomaly Detection (AD), in which healthy and edible kernels are considered normal samples while damaged grains or unknown objects are regarded as anomalies. We further propose an AD model, called AD-GAI, which is trained using only normal samples yet can identify anomalies during inference. Moreover, we customize a prototype device for data acquisition and create a large-scale dataset including 220K high-quality images of wheat and maize kernels. Through extensive experiments, AD-GAI achieves considerable performance in comparison with advanced AD methods, and AI4GrainInsp has highly consistent performance compared to human experts and excels at inspection efficiency over 20x speedup. The dataset, code and models will be released at https://github.com/hellodfan/AI4GrainInsp.
    摘要 粮食作物在人类饮食中扮演着重要角色,为提供重要营养素的主要来源。然而,粮食质量检测(GAI)通常是由人工检查员 manually进行,这会带来智能农业中的一定瓶颈。在这篇论文中,我们努力开发一个自动化的 GAI 系统:AI4GrainInsp。我们通过分析谷物块的特征,将 GAI 转化为一个普遍问题:异常检测(AD),健康和食用谷物被视为正常样本,而受损谷物或未知物体则被视为异常。我们还提出了一种 AD 模型,called AD-GAI,它通过训练只使用正常样本而能够在推理中检测异常。此外,我们还自定义了一种原型设备 для数据收集,并创建了包括 220 万个高质量谷物块的大规模数据集。经过广泛的实验,AD-GAI 在与先进的 AD 方法进行比较中表现出了显著的性能优势,AI4GrainInsp 的检测效果与人工专家相当,并且在速度方面高于人工检测的 20 倍。数据集、代码和模型将在 GitHub 上发布。

SniffyArt: The Dataset of Smelling Persons

  • paper_url: http://arxiv.org/abs/2311.11888
  • repo_url: None
  • paper_authors: Mathias Zinnen, Azhar Hussian, Hang Tran, Prathmesh Madhu, Andreas Maier, Vincent Christlein
  • for: 这篇论文的目的是为了开发一个具有人体姿势和肢体关键点的历史艺术作品中的臭味姿势识别方法。
  • methods: 这篇论文使用了一个名为SniffyArt的数据集,该数据集包含1941名人物在441幅历史艺术作品中的捕捉。每个人物都有一个紧靠的盒子 bounding box,17个姿势关键点和一个姿势标签。通过这些注释,数据集允许开发混合类型的识别方法。
  • results: 论文还提供了一个基线分析,评估了代表性的检测、关键点估计和分类任务的性能,展示了结合关键点估计和臭味姿势分类的潜在可能性。SniffyArt数据集为未来研究提供了一个坚实的基础,可以推动人姿和臭味维度分析在历史艺术作品中的进一步发展。
    Abstract Smell gestures play a crucial role in the investigation of past smells in the visual arts yet their automated recognition poses significant challenges. This paper introduces the SniffyArt dataset, consisting of 1941 individuals represented in 441 historical artworks. Each person is annotated with a tightly fitting bounding box, 17 pose keypoints, and a gesture label. By integrating these annotations, the dataset enables the development of hybrid classification approaches for smell gesture recognition. The datasets high-quality human pose estimation keypoints are achieved through the merging of five separate sets of keypoint annotations per person. The paper also presents a baseline analysis, evaluating the performance of representative algorithms for detection, keypoint estimation, and classification tasks, showcasing the potential of combining keypoint estimation with smell gesture classification. The SniffyArt dataset lays a solid foundation for future research and the exploration of multi-task approaches leveraging pose keypoints and person boxes to advance human gesture and olfactory dimension analysis in historical artworks.
    摘要 < span> Investigation of past smells in the visual arts relies heavily on smell gestures, yet automated recognition poses significant challenges. This paper introduces the SniffyArt dataset, consisting of 1941 individuals represented in 441 historical artworks. Each person is annotated with a tightly fitting bounding box, 17 pose keypoints, and a gesture label. By integrating these annotations, the dataset enables the development of hybrid classification approaches for smell gesture recognition. The dataset's high-quality human pose estimation keypoints are achieved through the merging of five separate sets of keypoint annotations per person. The paper also presents a baseline analysis, evaluating the performance of representative algorithms for detection, keypoint estimation, and classification tasks, showcasing the potential of combining keypoint estimation with smell gesture classification. The SniffyArt dataset lays a solid foundation for future research and the exploration of multi-task approaches leveraging pose keypoints and person boxes to advance human gesture and olfactory dimension analysis in historical artworks.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

  • paper_url: http://arxiv.org/abs/2311.11882
  • repo_url: https://github.com/ramihaf/mtf_data_set
  • paper_authors: Rami Haffar, David Sánchez, Josep Domingo-Ferrer
  • for: 这个论文是为了提供一个多任务面部数据集(MTF),用于多种分类任务,包括识别、年龄、性别和种族等。
  • methods: 这个论文使用了公共可用的明星图片来收集数据,并且严格遵循了版权法规。数据集被精心挑选和处理,以满足不同的分类任务。
  • results: 论文中提供了五种深度学习模型在MTF数据集上的性能分析,包括识别、年龄、性别和种族等多个任务。同时,论文还对raw网络上的数据进行了处理和分析,以评估模型在不同的数据上的性能。
    Abstract Human facial data hold tremendous potential to address a variety of classification problems, including face recognition, age estimation, gender identification, emotion analysis, and race classification. However, recent privacy regulations, such as the EU General Data Protection Regulation and others, have restricted the ways in which human images may be collected and used for research. As a result, several previously published data sets containing human faces have been removed from the internet due to inadequate data collection methods that failed to meet privacy regulations. Data sets consisting of synthetic data have been proposed as an alternative, but they fall short of accurately representing the real data distribution. On the other hand, most available data sets are labeled for just a single task, which limits their applicability. To address these issues, we present the Multi-Task Faces (MTF) image data set, a meticulously curated collection of face images designed for various classification tasks, including face recognition, as well as race, gender, and age classification. The MTF data set has been ethically gathered by leveraging publicly available images of celebrities and strictly adhering to copyright regulations. In this paper, we present this data set and provide detailed descriptions of the followed data collection and processing procedures. Furthermore, we evaluate the performance of five deep learning (DL) models on the MTF data set across the aforementioned classification tasks. Additionally, we compare the performance of DL models over the processed MTF data and over raw data crawled from the internet. The reported results constitute a baseline for further research employing these data. The MTF data set can be accessed through the following link (please cite the present paper if you use the data set): https://github.com/RamiHaf/MTF_data_set
    摘要 人类脸部数据具有巨大的潜力,可以解决多种分类问题,包括人脸识别、年龄估计、性别确定、情感分析和种族分类。然而,最近的隐私法规,如欧盟通用数据保护条例等,限制了人类图像的收集和使用方式。这导致了一些以前发布在互联网上的人脸数据集被下载,因为这些数据集的收集方法不符合隐私法规。同时,大多数可用的数据集仅适用于单个任务,这限制了它们的可 reuse。为解决这些问题,我们提出了多任务脸(MTF)图像数据集,这是一个仔细精心收集的人脸图像集,适用于多种分类任务,包括人脸识别、种族、性别和年龄分类。MTF数据集通过公开的 célèbres的图像来收集,严格遵循版权法规。在这篇文章中,我们介绍了这个数据集,并提供了数据收集和处理过程的详细描述。此外,我们还评估了五种深度学习(DL)模型在MTF数据集上的性能,以及模型在处理后的MTF数据集和互联网上爬取的原始数据集上的性能。报告的结果可作为后续研究使用这些数据集的基eline。MTF数据集可以通过以下链接访问:https://github.com/RamiHaf/MTF_data_set。(请参考本文中的报告来使用这些数据集。)

VLM-Eval: A General Evaluation on Video Large Language Models

  • paper_url: http://arxiv.org/abs/2311.11865
  • repo_url: None
  • paper_authors: Shuailin Li, Yuang Zhang, Yucheng Zhao, Qiuyue Wang, Fan Jia, Yingfei Liu, Tiancai Wang
  • for: 这个论文的目的是为视频大语言模型(LLM)进行全面评估。
  • methods: 这篇论文使用了多种视频任务,包括标题写作、问答、检索和动作识别等,并使用了传统度量标准和GPT基于的评估方法来评估响应质量。它还提出了一个简单的基线:Video-LLaVA,该基线使用了单一的直线投影,并超越了现有的视频 LLM。
  • results: 这篇论文发现,使用只需要几百个视频教程对象进行微调的方法可以在驾驶场景中获得激发人理解和计算能力。此外,它还证明了视频 LLM 的评估可以在实际场景中扩展。
    Abstract Despite the rapid development of video Large Language Models (LLMs), a comprehensive evaluation is still absent. In this paper, we introduce a unified evaluation that encompasses multiple video tasks, including captioning, question and answering, retrieval, and action recognition. In addition to conventional metrics, we showcase how GPT-based evaluation can match human-like performance in assessing response quality across multiple aspects. We propose a simple baseline: Video-LLaVA, which uses a single linear projection and outperforms existing video LLMs. Finally, we evaluate video LLMs beyond academic datasets, which show encouraging recognition and reasoning capabilities in driving scenarios with only hundreds of video-instruction pairs for fine-tuning. We hope our work can serve as a unified evaluation for video LLMs, and help expand more practical scenarios. The evaluation code will be available soon.
    摘要 尽管视频大语言模型(LLM)的快速发展,但全面评估仍然缺失。在这篇论文中,我们提出了一种涵盖多个视频任务的综合评估,包括标注、问答、检索和动作识别。除了传统的度量器,我们显示了如何使用GPT基于的评估匹配人类水平评估Response质量在多个方面。我们提出了一个简单的基线:视频-LLaVA,它使用单个直线投影,并超过现有的视频LLM。最后,我们评估视频LLM在外部数据集上,显示了鼓励人的识别和推理能力,只需要undreds of video-instruction pairs进行微调。我们希望我们的工作能为视频LLM提供一个统一的评估,并帮助扩展更多实际场景。评估代码将很快 disponible。

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

  • paper_url: http://arxiv.org/abs/2311.11863
  • repo_url: None
  • paper_authors: Hao Li, Dingwen Zhang, Yalun Dai, Nian Liu, Lechao Cheng, Jingfeng Li, Jingdong Wang, Junwei Han
  • for: 本研究旨在提高3D场景理解和表示 tasks中的下游任务,即semantic prediction和instance segmentation。
  • methods: 我们提出了一种新的渠道Generalized Perception NeRF(GP-NeRF),它将通用 segmentation 模型和NeRF合并在一个框架下,以便在3D场景中提高context-aware的理解。我们还引入了 transformers 来聚合辐射和 semantic embedding 领域的信息,以便在新的视图下渲染这两个领域。此外,我们还提出了两种自适应机制,即Semantic Distill Loss和Depth-Guided Semantic Distill Loss,以提高semantic field的精度和 geometric consistency。
  • results: 我们在两个任务上进行了实验比较(semantic segmentation和instance segmentation),使用了both synthetic和real-world datasets。结果显示,我们的方法在Generalized semantic segmentation、fine-tuning semantic segmentation和instance segmentation任务中的性能比SOTA方法高出6.94%、11.76%和8.47%。
    Abstract Applying NeRF to downstream perception tasks for scene understanding and representation is becoming increasingly popular. Most existing methods treat semantic prediction as an additional rendering task, \textit{i.e.}, the "label rendering" task, to build semantic NeRFs. However, by rendering semantic/instance labels per pixel without considering the contextual information of the rendered image, these methods usually suffer from unclear boundary segmentation and abnormal segmentation of pixels within an object. To solve this problem, we propose Generalized Perception NeRF (GP-NeRF), a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework, for facilitating context-aware 3D scene perception. To accomplish this goal, we introduce transformers to aggregate radiance as well as semantic embedding fields jointly for novel views and facilitate the joint volumetric rendering of both fields. In addition, we propose two self-distillation mechanisms, i.e., the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field and the maintenance of geometric consistency. In evaluation, we conduct experimental comparisons under two perception tasks (\textit{i.e.} semantic and instance segmentation) using both synthetic and real-world datasets. Notably, our method outperforms SOTA approaches by 6.94\%, 11.76\%, and 8.47\% on generalized semantic segmentation, finetuning semantic segmentation, and instance segmentation, respectively.
    摘要 通过应用NeRF到下游识别任务中,以提高场景理解和表示的能力,现在越来越普遍。大多数现有方法都会对 semantic prediction 视为额外的渲染任务,即“标签渲染”任务,以建立semantic NeRFs。然而,由于不考虑渲染图像中的上下文信息,这些方法通常会出现不清晰的边界分割和对象内部像素的异常分割问题。为解决这个问题,我们提出了通用识别NeRF(GP-NeRF),一种新的管道,使得通用的分割模型和NeRF在一个统一框架下工作,以便实现上下文意识的3D场景识别。为达到这个目标,我们引入了 transformers,以便在新视图下对光谱和semantic embedding场景进行并行汇聚。此外,我们还提出了两种自适应机制,即Semantic Distill Loss和Depth-Guided Semantic Distill Loss,以提高semantic场景的精度和质量。在评估中,我们通过对semantic和实例分割任务进行实验比较,使用了 sintetic和实际数据集。可见,我们的方法在Generalized semantic segmentation、fine-tuning semantic segmentation和实例分割任务中,相比SOTA方法,提高了6.94%、11.76%和8.47%。

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

  • paper_url: http://arxiv.org/abs/2311.11860
  • repo_url: https://github.com/rshaojimmy/jiutian
  • paper_authors: Gongwei Chen, Leyang Shen, Rui Shao, Xiang Deng, Liqiang Nie
  • for: 这个论文主要是为了提高多模态大语言模型(MLLM)的性能,使其能够更好地理解和使用多种多样的信息。
  • methods: 这篇论文提出了一种两级视觉知识增强策略,包括进程式地 integrate 细化的空间意识视觉知识,以及软提示高级别 semantic 视觉证据。
  • results: 对多个多模态benchmark进行了广泛的实验,并表明该模型在VSRCIDEr等任务上具有显著的提高(比如VSRCIDEr上提高5%,TextCaps上提高3%,RefCOCOg上提高5%)。
    Abstract Multimodal Large Language Models (MLLMs) have endowed LLMs with the ability to perceive and understand multi-modal signals. However, most of the existing MLLMs mainly adopt vision encoders pretrained on coarsely aligned image-text pairs, leading to insufficient extraction and reasoning of visual knowledge. To address this issue, we devise a dual-Level vIsual knOwledge eNhanced Multimodal Large Language Model (LION), which empowers the MLLM by injecting visual knowledge in two levels. 1) Progressive incorporation of fine-grained spatial-aware visual knowledge. We design a vision aggregator cooperated with region-level vision-language (VL) tasks to incorporate fine-grained spatial-aware visual knowledge into the MLLM. To alleviate the conflict between image-level and region-level VL tasks during incorporation, we devise a dedicated stage-wise instruction-tuning strategy with mixture-of-adapters. This progressive incorporation scheme contributes to the mutual promotion between these two kinds of VL tasks. 2) Soft prompting of high-level semantic visual evidence. We facilitate the MLLM with high-level semantic visual evidence by leveraging diverse image tags. To mitigate the potential influence caused by imperfect predicted tags, we propose a soft prompting method by embedding a learnable token into the tailored text instruction. Comprehensive experiments on several multi-modal benchmarks demonstrate the superiority of our model (e.g., improvement of 5% accuracy on VSR and 3% CIDEr on TextCaps over InstructBLIP, 5% accuracy on RefCOCOg over Kosmos-2).
    摘要 多Modal大型自然语言模型(MLLMs)已经赋予大型自然语言模型(LLMs)可以感知和理解多Modal信号。然而,大多数现有的 MLLMs 主要采用先天Alignment的视觉编码器,导致视觉知识的抽取和理解不充分。为解决这个问题,我们设计了两级视觉知识增强的多Modal大型自然语言模型(LION),它使得 MLLM 具有更好的视觉知识抽取和理解能力。1. 细化空间意识视觉知识的进行式整合。我们设计了一个与区域级视觉语言(VL)任务相结合的视觉汇集器,以整合细化空间意识的视觉知识到 MLLM 中。为了解决图像级和区域级 VL 任务之间的冲突,我们提出了一种适应器混合策略和 mixture-of-adapters。这种进行式整合方案对这两种 VL 任务产生了互相促进的效果。2. 软提示高级Semantic视觉证据。我们为 MLLM 提供高级Semantic视觉证据,通过利用多种图像标签。为了减少预测标签的不准确性的影响,我们提出了一种软提示方法,通过在专门设计的文本指令中嵌入学习的token。经过了多种多Modal benchmark 的实验,我们的模型在 VSR 和 TextCaps 上比 InstructBLIP 高5%,在 RefCOCOg 上高3%,在 Kosmos-2 上高5%。

FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding

  • paper_url: http://arxiv.org/abs/2311.11856
  • repo_url: None
  • paper_authors: Mahmoud Limam, Marwa Dhiaf, Yousri Kessentini
  • For: The paper is written for researchers in the field of document analysis and understanding.* Methods: The paper uses a dataset called FATURA, which is a highly diverse dataset featuring multi-layout, annotated invoice document images.* Results: The paper provides comprehensive benchmarks for various document analysis and understanding tasks and conducts experiments under diverse training and evaluation scenarios.Here’s the information in Simplified Chinese text:
  • for: 这篇论文是为了探讨文档分析和理解领域的研究人员而写的。
  • methods: 这篇论文使用了一个名为FATURA的 dataset,该dataset是一个多format的、注释的发票文档图像集。
  • results: 这篇论文提供了多种文档分析和理解任务的全面的标准准测试数据,并在不同的训练和评估场景下进行了实验。
    Abstract Document analysis and understanding models often require extensive annotated data to be trained. However, various document-related tasks extend beyond mere text transcription, requiring both textual content and precise bounding-box annotations to identify different document elements. Collecting such data becomes particularly challenging, especially in the context of invoices, where privacy concerns add an additional layer of complexity. In this paper, we introduce FATURA, a pivotal resource for researchers in the field of document analysis and understanding. FATURA is a highly diverse dataset featuring multi-layout, annotated invoice document images. Comprising $10,000$ invoices with $50$ distinct layouts, it represents the largest openly accessible image dataset of invoice documents known to date. We also provide comprehensive benchmarks for various document analysis and understanding tasks and conduct experiments under diverse training and evaluation scenarios. The dataset is freely accessible at https://zenodo.org/record/8261508, empowering researchers to advance the field of document analysis and understanding.
    摘要 In this paper, we introduce FATURA, a groundbreaking resource for researchers in the field of document analysis and understanding. FATURA is a highly diverse dataset featuring multi-layout, annotated invoice document images. Comprising 10,000 invoices with 50 distinct layouts, it represents the largest openly accessible image dataset of invoice documents known to date.We also provide comprehensive benchmarks for various document analysis and understanding tasks and conduct experiments under diverse training and evaluation scenarios. The dataset is freely accessible at , empowering researchers to advance the field of document analysis and understanding.

Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision

  • paper_url: http://arxiv.org/abs/2311.11853
  • repo_url: None
  • paper_authors: Sanket Kachole, Hussain Sajwani, Fariborz Baghaei Naeini, Dimitrios Makris, Yahya Zweiri
  • for: 这个研究旨在提出一种具有生物灵感的神经网络,以提高视觉数据处理效率,降低能源消耗。
  • methods: 研究使用了 asynchronous bioplausible neuron(ABN),一种动态脉搏机制,以自动调整输入信号的变化。
  • results: 实验结果显示,ABN能够优化图像分类和 segmentation 的性能,维护神经平衡,并提高能效性。
    Abstract Spiking Neural Networks (SNNs) offer a biologically inspired approach to computer vision that can lead to more efficient processing of visual data with reduced energy consumption. However, maintaining homeostasis within these networks is challenging, as it requires continuous adjustment of neural responses to preserve equilibrium and optimal processing efficiency amidst diverse and often unpredictable input signals. In response to these challenges, we propose the Asynchronous Bioplausible Neuron (ABN), a dynamic spike firing mechanism to auto-adjust the variations in the input signal. Comprehensive evaluation across various datasets demonstrates ABN's enhanced performance in image classification and segmentation, maintenance of neural equilibrium, and energy efficiency.
    摘要 神经网络(SNN)提供一种生物体发展的视觉处理方法,可以实现更高效的数据处理和降低能源消耗。然而,保持神经网络的HOMEOSTASIS是挑战,因为需要不断调整神经响应以维持平衡和最佳处理效率面对多样化和随机的输入信号。为解决这些挑战,我们提议使用异步生物可能性神经元(ABN),一种动态脉冲发射机制来自动调整输入信号的变化。经过了不同的数据集的全面评估,ABN在图像分类和分割方面表现出了更好的表现,同时保持神经网络的平衡和能效性。

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2311.11845
  • repo_url: https://github.com/tatakai1/evenerf
  • paper_authors: Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang
  • for: 本研究旨在提高NeRF模型的常规化能力,使其直接从新场景中生成新视角图像,不需要重新训练场景特定的NeRF模型。
  • methods: 我们提出了一种名为EVE-NeRF的Entangled View-Epipolar Information Aggregation方法,它在拼接多视图特征时注入场景不变的外观连续性和几何一致性约束,以提高3D表示的普适性。
  • results: 我们的方法在多种评估场景中达到了状态的表现,比起单一维度的拼接,杂合网络更好地保持了3D场景的几何和外观重建精度。
    Abstract Generalizable NeRF can directly synthesize novel views across new scenes, eliminating the need for scene-specific retraining in vanilla NeRF. A critical enabling factor in these approaches is the extraction of a generalizable 3D representation by aggregating source-view features. In this paper, we propose an Entangled View-Epipolar Information Aggregation method dubbed EVE-NeRF. Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process. Our approach effectively mitigates the potential lack of inherent geometric and appearance constraint resulting from one-dimensional interactions, thus further boosting the 3D representation generalizablity. EVE-NeRF attains state-of-the-art performance across various evaluation scenarios. Extensive experiments demonstate that, compared to prevailing single-dimensional aggregation, the entangled network excels in the accuracy of 3D scene geometry and appearance reconstruction.Our project page is https://github.com/tatakai1/EVENeRF.
    摘要 通用的NeRF可以直接生成新场景中的新视图,从而消除普通的NeRFScene-specific retraining。一个关键的优化因子在这些方法中是提取一个通用的3D表示。在这篇论文中,我们提议一种拓展视图-轴线信息汇集方法,名为EVE-NeRF。与现有方法不同,EVE-NeRF在汇集视图和轴线信息时采用杂合的方式,通过注入场景不变的外观继续性和几何约束约束,从而有效地消除一维交互中的可能的内置几何和外观约束缺失,从而进一步提高3D表示的通用性。EVE-NeRF实现了最新的性能标准在各种评估场景中。广泛的实验表明,相比于现有的单一维度汇集,杂合网络在3D场景几何和外观重建精度方面具有明显的优势。我们的项目页面是.

Few-shot Multispectral Segmentation with Representations Generated by Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.11827
  • repo_url: None
  • paper_authors: Dilith Jayakody, Thanuja Ambegoda
  • for: 提高几个示例数据集上的多spectral图像分割性能
  • methods: 使用 reinforcement learning 生成表达来生成特定类分割的表达,并将其用于更新数据集和进行分割
  • results: 在多个多spectral数据集上证明了提高分割性能的效果
    Abstract The task of multispectral image segmentation (segmentation of images with numerous channels/bands, each capturing a specific range of wavelengths of electromagnetic radiation) has been previously explored in contexts with large amounts of labeled data. However, these models tend not to generalize well to datasets of smaller size. In this paper, we propose a novel approach for improving few-shot segmentation performance on multispectral images using reinforcement learning to generate representations. These representations are generated in the form of mathematical expressions between channels and are tailored to the specific class being segmented. Our methodology involves training an agent to identify the most informative expressions, updating the dataset using these expressions, and then using the updated dataset to perform segmentation. Due to the limited length of the expressions, the model receives useful representations without any added risk of overfitting. We evaluate the effectiveness of our approach on several multispectral datasets and demonstrate its effectiveness in boosting the performance of segmentation algorithms.
    摘要 在多spectral图像分割任务中,我们已经曾经利用大量标注数据进行过研究。然而,这些模型通常无法通用于小型数据集。在这篇论文中,我们提出了一种新的方法,用于在多spectral图像上提高少量shot分割性能使用反射学习生成表示。这些表示是通过Channel之间的数学表达来生成的,并且特制于具体的分割类。我们的方法包括使用代理人来确定最有用的表达,更新数据集使用这些表达,然后使用更新后的数据集进行分割。由于表达的长度有限,模型能够获得有用的表示,而不会额外风险过拟合。我们对多spectral数据集进行了评估,并证明了我们的方法的有效性。

Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning

  • paper_url: http://arxiv.org/abs/2311.11825
  • repo_url: None
  • paper_authors: Zixuan Xie, Rengan Xie, Rong Li, Kai Huang, Pengju Qiao, Jingsen Zhu, Xu Yin, Qi Ye, Wei Hua, Yuchi Huo, Hujun Bao
  • for: 用于 facade 的 geometry, lighting, and material 重建
  • methods: 使用 neural signed distance fields (SDFs) 和三种适应策略:semantic regularization、frequency-aware geometry regularization 和 visibility probe-based scheme
  • results: 实现 physically based 和 photorealistic 的 novel-view rendering、relighting 和 editing,并在实际环境中超越之前的方法。
    Abstract In this work, we use multi-view aerial images to reconstruct the geometry, lighting, and material of facades using neural signed distance fields (SDFs). Without the requirement of complex equipment, our method only takes simple RGB images captured by a drone as inputs to enable physically based and photorealistic novel-view rendering, relighting, and editing. However, a real-world facade usually has complex appearances ranging from diffuse rocks with subtle details to large-area glass windows with specular reflections, making it hard to attend to everything. As a result, previous methods can preserve the geometry details but fail to reconstruct smooth glass windows or verse vise. In order to address this challenge, we introduce three spatial- and semantic-adaptive optimization strategies, including a semantic regularization approach based on zero-shot segmentation techniques to improve material consistency, a frequency-aware geometry regularization to balance surface smoothness and details in different surfaces, and a visibility probe-based scheme to enable efficient modeling of the local lighting in large-scale outdoor environments. In addition, we capture a real-world facade aerial 3D scanning image set and corresponding point clouds for training and benchmarking. The experiment demonstrates the superior quality of our method on facade holistic inverse rendering, novel view synthesis, and scene editing compared to state-of-the-art baselines.
    摘要 在这项工作中,我们使用多视图飞行图像来重建建筑外墙的几何、照明和材质。我们的方法只需要简单的RGB图像, captured by a drone,作为输入,以实现物理基于的、 photorealistic 新视图渲染、重新照明和编辑。然而,实际世界中的facade通常具有复杂的外观,从柔软的岩石到大面积的玻璃窗户的镜面反射,这会使得 previous methods 难以同时 preserve geometry details 和 reconstruction smooth glass windows。为 Addressing this challenge,我们引入三个空间和semantic-adaptive optimization strategies,包括基于zero-shot segmentation技术的semantic regularizationapproach,频率意识geometry regularization,和基于可见探针的 scheme。此外,我们还capture了一个真实世界facade飞行3D扫描图像集和相应的点云数据用于训练和参考。实验结果表明,我们的方法在facade holistic inverse rendering、新视图合成和场景编辑方面具有较高的质量,比state-of-the-art baselines。

Cross-View Graph Consistency Learning for Invariant Graph Representations

  • paper_url: http://arxiv.org/abs/2311.11821
  • repo_url: None
  • paper_authors: Jie Chen, Zhiming Li, Hua Mao, Wai Lok Woo, Xi Peng
  • for: This paper is written for analyzing graph-structured data and learning invariant graph representations for link prediction.
  • methods: The paper proposes a cross-view graph consistency learning (CGCL) method that uses two complementary augmented views to derive an incomplete graph structure, and then learns invariant graph representations through a cross-view training scheme.
  • results: The paper achieves competitive results on graph datasets in comparisons with several state-of-the-art algorithms, demonstrating the effectiveness of the proposed CGCL method.Here’s the Chinese version of the three information points:
  • for: 这篇论文是为了分析图structured数据而写的,并且学习图结构中的不变性表示以便预测链接。
  • methods: 该篇论文提出了跨视图图一致学习(CGCL)方法,通过两个补充的增强视图来 derivation incomplete图结构,然后通过跨视图训练方案来学习不变性表示。
  • results: 纸上提供了一些比较竞争力强的结果,证明了提案的CGCL方法的有效性。
    Abstract Graph representation learning is fundamental for analyzing graph-structured data. Exploring invariant graph representations remains a challenge for most existing graph representation learning methods. In this paper, we propose a cross-view graph consistency learning (CGCL) method that learns invariant graph representations for link prediction. First, two complementary augmented views are derived from an incomplete graph structure through a bidirectional graph structure augmentation scheme. This augmentation scheme mitigates the potential information loss that is commonly associated with various data augmentation techniques involving raw graph data, such as edge perturbation, node removal, and attribute masking. Second, we propose a CGCL model that can learn invariant graph representations. A cross-view training scheme is proposed to train the proposed CGCL model. This scheme attempts to maximize the consistency information between one augmented view and the graph structure reconstructed from the other augmented view. Furthermore, we offer a comprehensive theoretical CGCL analysis. This paper empirically and experimentally demonstrates the effectiveness of the proposed CGCL method, achieving competitive results on graph datasets in comparisons with several state-of-the-art algorithms.
    摘要 GRAPH 表示学习是数据结构化的数据分析的基础。寻找不变的 GRAPH 表示仍然是现有 GRAPH 表示学习方法的挑战。在这篇论文中,我们提出了跨视图 GRAPH 一致学习(CGCL)方法,用于预测链接。首先,通过双向 GRAPH 结构增强方案,从不完整的 GRAPH 结构中 derivation 出两个补充的视图。这种增强方案可以减少 raw GRAPH 数据的各种数据增强技术中的信息损失,例如边干扰、节点移除和特征遮盖。其次,我们提出了一种 CGCL 模型,可以学习不变的 GRAPH 表示。我们提出了一种跨视图训练方案,用于训练提订的 CGCL 模型。这种方案尝试通过将一个增强视图与另一个视图中的 GRAPH 结构匹配来最大化两个视图之间的一致信息。此外,我们还提供了 CGCL 的完整理论分析。本文通过实验和实证证明了提订的 CGCL 方法的效果,在比较了多个状态前的算法中达到了竞争性的结果。

CrackCLF: Automatic Pavement Crack Detection based on Closed-Loop Feedback

  • paper_url: http://arxiv.org/abs/2311.11815
  • repo_url: None
  • paper_authors: Chong Li, Zhun Fan, Ying Chen, Huibiao Lin, Laura Moretti, Giuseppe Loprencipe, Weihua Sheng, Kelvin C. P. Wang
  • for: automatic pavement crack detection
  • methods: encoder-decoder framework with closed-loop feedback and generative adversarial networks
  • results: outperforms other methods on three public datasets, with the ability to correct errors and adapt to changes in the environmentHere’s the full text in Simplified Chinese:
  • for: automatic pavement crack detection
  • methods: 使用encoder-decoder框架和循环反馈,以及生成对抗网络
  • results: 在三个公共数据集上取得了更高的性能,并能自动 correect错误和适应环境变化
    Abstract Automatic pavement crack detection is an important task to ensure the functional performances of pavements during their service life. Inspired by deep learning (DL), the encoder-decoder framework is a powerful tool for crack detection. However, these models are usually open-loop (OL) systems that tend to treat thin cracks as the background. Meanwhile, these models can not automatically correct errors in the prediction, nor can it adapt to the changes of the environment to automatically extract and detect thin cracks. To tackle this problem, we embed closed-loop feedback (CLF) into the neural network so that the model could learn to correct errors on its own, based on generative adversarial networks (GAN). The resulting model is called CrackCLF and includes the front and back ends, i.e. segmentation and adversarial network. The front end with U-shape framework is employed to generate crack maps, and the back end with a multi-scale loss function is used to correct higher-order inconsistencies between labels and crack maps (generated by the front end) to address open-loop system issues. Empirical results show that the proposed CrackCLF outperforms others methods on three public datasets. Moreover, the proposed CLF can be defined as a plug and play module, which can be embedded into different neural network models to improve their performances.
    摘要 自动路面裂隙检测是一项重要任务,以确保路面在服务寿命中的功能性能。受深度学习(DL)的激发,编码-解码框架成为了裂隙检测的powerful工具。然而,这些模型通常是开 loop(OL)系统,它们会将薄裂隙视为背景。同时,这些模型无法自动更正预测错误,也无法适应环境变化自动提取和检测薄裂隙。为解决这个问题,我们将closed-loop反馈(CLF) embedding到神经网络中,使模型可以通过生成 adversarial networks(GAN)学习自动更正错误。得到的模型被称为CrackCLF,它包括前端和后端,即分 segmentation和对抗网络。前端采用U型框架生成裂隙地图,而后端使用多尺度损失函数来更正高阶不一致性 между标签和裂隙地图(生成前端),以解决开 loop系统问题。实验结果表明,提议的CrackCLF在三个公共数据集上表现出色,并且可以作为插件模块,嵌入不同神经网络模型以提高其性能。

Robot Hand-Eye Calibration using Structure-from-Motion

  • paper_url: http://arxiv.org/abs/2311.11808
  • repo_url: None
  • paper_authors: Nicolas Andreff, Radu Horaud, Bernard Espiau
  • for: 这个论文提出了一种新的 flexible 手眼协调方法,而大多数现有的手眼协调技术都需要使用协调器,并且与摄像头定位方法相结合。
  • methods: 我们结合了结构从运动和知道的机器人运动,并证明该解可以得到在线形式。这个解不仅解决了普通扭矩运动的问题,还可以处理纯翻译、纯旋转和平面运动等特殊运动。
  • results: 我们进行了大量的实验,并比较了该方法与现有的方法。结果表明,该方法的质量具有较高的精度和稳定性。
    Abstract In this paper we propose a new flexible method for hand-eye calibration. The vast majority of existing hand-eye calibration techniques requires a calibration rig which is used in conjunction with camera pose estimation methods. Instead, we combine structure-from-motion with known robot motions and we show that the solution can be obtained in linear form. The latter solves for both the hand-eye parameters and for the unknown scale factor inherent with structure-from-motion methods. The algebraic analysis that is made possible with such a linear formulation allows to investigate not only the well known case of general screw motions but also such singular motions as pure translations, pure rotations, and planar motions. In essence, the robot-mounted camera looks to an unknown rigid layout, tracks points over an image sequence and estimates the camera-to-robot relationship. Such a self calibration process is relevant for unmanned vehicles, robots working in remote places, and so forth. We conduct a large number of experiments which validate the quality of the method by comparing it with existing ones.
    摘要 在这篇论文中,我们提出了一种新的灵活手眼准备方法。大多数现有的手眼准备技术都需要使用准备机构,并与摄像头位置估计方法结合使用。而我们则将结构从运动与知道机器人运动相结合,并证明该解可以表示为线性形式。这种线性表述使得我们可以对不同类型的运动进行数学分析,包括普通旋转、平面运动和纯粹翻译等特殊运动。在实际应用中,机器人搭载的摄像头将看到未知的固定布局,跟踪图像序列中的点并估计摄像头和机器人之间的关系。这种自动准备过程对无人车、远程工作的机器人等有益。我们进行了大量实验,并与现有方法进行比较,以证明方法的高效性。

Robust Tumor Segmentation with Hyperspectral Imaging and Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.11782
  • repo_url: None
  • paper_authors: Mayar Lotfy, Anna Alperovich, Tommaso Giannantonio, Bjorn Barz, Xiaohan Zhang, Felix Holm, Nassir Navab, Felix Boehm, Carolin Schwamborn, Thomas K. Hoffmann, Patrick J. Schuler
    for:* This paper aims to improve the accuracy of tumor segmentation in hyperspectral imaging (HSI) for surgical cancer resection.methods:* The proposed method combines hyperspectral imaging (HSI) with graph neural networks (GNNs) to leverage the spatial context of tiles for more robust and smoother segmentation.* The method uses a convolutional neural network (CNN) to extract features for each tile within the graph, and incorporates local image quality metrics into the loss function to enhance the training procedure’s robustness.results:* The proposed method significantly outperforms context-agnostic approaches in accurately distinguishing between healthy and tumor tissues, even in images from previously unseen patients.* The carefully designed loss function, accounting for local image quality, results in additional improvements.Here is the Chinese translation of the three key points:for:* 这篇论文目标是在医学激光成像(HSI)中提高肿瘤分割的准确率,以便更好地进行手术治疗。methods:* 提议的方法是将HSI与图 neural network(GNN)结合起来,利用图中块的空间Context来提高分割的稳定性和准确性。* 方法使用一个卷积神经网络(CNN)来提取每个块内图的特征,并在训练过程中同时训练后续的GNN。results:* 提议的方法在与无关的方法进行比较时,能够更好地分割健康和肿瘤组织,包括在之前未看到的患者的图像中。* 经过精心设计的损失函数,包括地方图像质量指标,可以进一步提高训练过程的稳定性。
    Abstract Segmenting the boundary between tumor and healthy tissue during surgical cancer resection poses a significant challenge. In recent years, Hyperspectral Imaging (HSI) combined with Machine Learning (ML) has emerged as a promising solution. However, due to the extensive information contained within the spectral domain, most ML approaches primarily classify individual HSI (super-)pixels, or tiles, without taking into account their spatial context. In this paper, we propose an improved methodology that leverages the spatial context of tiles for more robust and smoother segmentation. To address the irregular shapes of tiles, we utilize Graph Neural Networks (GNNs) to propagate context information across neighboring regions. The features for each tile within the graph are extracted using a Convolutional Neural Network (CNN), which is trained simultaneously with the subsequent GNN. Moreover, we incorporate local image quality metrics into the loss function to enhance the training procedure's robustness against low-quality regions in the training images. We demonstrate the superiority of our proposed method using a clinical ex vivo dataset consisting of 51 HSI images from 30 patients. Despite the limited dataset, the GNN-based model significantly outperforms context-agnostic approaches, accurately distinguishing between healthy and tumor tissues, even in images from previously unseen patients. Furthermore, we show that our carefully designed loss function, accounting for local image quality, results in additional improvements. Our findings demonstrate that context-aware GNN algorithms can robustly find tumor demarcations on HSI images, ultimately contributing to better surgery success and patient outcome.
    摘要 医学图像分割是肿瘤除除术中的一大挑战。近年来,快照 спектраль成像(HSI)与机器学习(ML)的结合已经出现为扩展可能性。然而,由于spectral domain中的信息过度充沛,大多数ML方法仅将HSI(超)像素或块分类,不考虑其空间上的相互关系。在这篇论文中,我们提出了改进的方法,利用块之间的空间关系来提供更加稳定和准确的分割。为了处理块的不规则形状,我们使用图像神经网络(GNN)将相邻区域之间的信息传播。每个块在图中EXTRACT特征使用卷积神经网络(CNN),并同时训练后续的GNN。此外,我们将图像质量指标添加到损失函数中,以提高训练过程的稳定性。我们在30名患者的51张医学图像上进行了临床外ivo的实验,并证明我们提出的GNN基于方法在上述Context-agnostic方法之上显著提高了分割精度。此外,我们还发现我们 precisely设计的损失函数,考虑到本地图像质量,会提供额外的改进。我们的发现表明,在HSI图像上使用Context-aware GNN算法可以稳定地找到肿瘤分界,从而对患者的手术成功和疾病结果产生正面的影响。

Multimodal deep learning for mapping forest dominant height by fusing GEDI with earth observation data

  • paper_url: http://arxiv.org/abs/2311.11777
  • repo_url: None
  • paper_authors: Man Chen, Wenquan Dong, Hao Yu, Iain Woodhouse, Casey M. Ryan, Haoyu Liu, Selena Georgiou, Edward T. A. Mitchard
  • for: 这个研究旨在使用多源Remote sensing数据和深度学习模型精准地映射高分辨率森林高度。
  • methods: 我们提出了一种基于多Modal attention remote sensing network(MARSNet)的新深度学习框架,使用GEDI数据、Setinel-1数据、ALOS-2 PALSAR-2数据、Sentinel-2光学数据和其他数据。 MARSNet包括每种Remote sensing数据模式的单独编码器来提取多尺度特征,以及共享解码器来融合特征和估计高度。
  • results: 我们的研究表明,MARSNet可以高效地估计森林主要高度,R2为0.62,RMSE为2.82米,超过了广泛使用的随机森林方法(R2=0.55,RMSE=3.05米)。此外,我们使用训练好的MARSNet模型生成了10米分辨率的墙壁到墙壁地域高度图像,并通过独立验证使用场地测量结果,MARSNet得到了R2=0.58和RMSE=3.76米的结果,与随机森林基准值(R2=0.41,RMSE=4.37米)相比,表明MARSNet的精度更高。
    Abstract The integration of multisource remote sensing data and deep learning models offers new possibilities for accurately mapping high spatial resolution forest height. We found that GEDI relative heights (RH) metrics exhibited strong correlation with the mean of the top 10 highest trees (dominant height) measured in situ at the corresponding footprint locations. Consequently, we proposed a novel deep learning framework termed the multi-modal attention remote sensing network (MARSNet) to estimate forest dominant height by extrapolating dominant height derived from GEDI, using Setinel-1 data, ALOS-2 PALSAR-2 data, Sentinel-2 optical data and ancillary data. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Using individual encoders for each remote sensing imagery avoids interference across modalities and extracts distinct representations. To focus on the efficacious information from each dataset, we reduced the prevalent spatial and band redundancies in each remote sensing data by incorporating the extended spatial and band reconstruction convolution modules in the encoders. MARSNet achieved commendable performance in estimating dominant height, with an R2 of 0.62 and RMSE of 2.82 m, outperforming the widely used random forest approach which attained an R2 of 0.55 and RMSE of 3.05 m. Finally, we applied the trained MARSNet model to generate wall-to-wall maps at 10 m resolution for Jilin, China. Through independent validation using field measurements, MARSNet demonstrated an R2 of 0.58 and RMSE of 3.76 m, compared to 0.41 and 4.37 m for the random forest baseline. Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high resolution dominant height estimation.
    摘要 “多源Remote数据和深度学习模型的整合可以提供高分辨率森林高度的新可能性。我们发现GEDI相对高度(RH)指标与场景中最高10棵树(主对高)的平均值 exhibited strong correlation。因此,我们提出了一个名为多模式注意深度测量网络(MARSNet)的新深度学习框架,用于估计森林主对高。MARSNet包括每个遥感数据模式的分别Encoder来提取多尺度特征,以及共同的解码器来融合特征和估计高度。这些Encoder对每个遥感数据模式进行分别对应,以避免模式之间的干扰和提取特有的表现。为了避免每个遥感数据模式的空间和频率统计重复,我们在Encoder中添加了扩展空间和频率重建卷积模组。MARSNet在主对高估计中表现了优异的成绩,R2为0.62,RMSE为2.82米,比较 Random Forest方法的R2为0.55,RMSE为3.05米。最后,我们将训练好的 MARSNet 模型应用到了 Jilin 地区的壁垒壁垒地图上,以10米resolution进行估计。通过独立验证使用场地测量,MARSNet exhibited R2为0.58,RMSE为3.76米,与Random Forest基eline的R2为0.41,RMSE为4.37米相比。我们的研究显示了多模式深度学习方法,融合 GEDI 、SAR 和过程式光学图像,可以提高高分辨率主对高估计的精度。”

Practical cross-sensor color constancy using a dual-mapping strategy

  • paper_url: http://arxiv.org/abs/2311.11773
  • repo_url: None
  • paper_authors: Shuwei Yue, Minchen Wei
  • for: 提出了一种基于双映射策略的图像照明估计方法,该方法只需要一个简单的白点测试传感器,并且可以在照明估计和图像重建之间进行快速转换。
  • methods: 该方法使用了图像重建和照明估计两个映射,然后使用轻量级多层感知神经网络(MLP)模型进行优化。
  • results: 该方法可以快速实现图像照明估计,并且可以减少传感器差异和提高性能,仅需要一小段的训练时间(约0.003 MB的内存和1小时的训练时间)和快速执行(约0.3 ms和1 ms在GPU和CPU上),并且不敏感于输入图像分辨率。
    Abstract Deep Neural Networks (DNNs) have been widely used for illumination estimation, which is time-consuming and requires sensor-specific data collection. Our proposed method uses a dual-mapping strategy and only requires a simple white point from a test sensor under a D65 condition. This allows us to derive a mapping matrix, enabling the reconstructions of image data and illuminants. In the second mapping phase, we transform the re-constructed image data into sparse features, which are then optimized with a lightweight multi-layer perceptron (MLP) model using the re-constructed illuminants as ground truths. This approach effectively reduces sensor discrepancies and delivers performance on par with leading cross-sensor methods. It only requires a small amount of memory (~0.003 MB), and takes ~1 hour training on an RTX3070Ti GPU. More importantly, the method can be implemented very fast, with ~0.3 ms and ~1 ms on a GPU or CPU respectively, and is not sensitive to the input image resolution. Therefore, it offers a practical solution to the great challenges of data recollection that is faced by the industry.
    摘要 深度神经网络(DNNs)广泛应用于照明估计,这是时间费时且需要特定传感器数据采集。我们的提议方法采用双映射策略,只需要一个简单的白点数据集来自测传感器,并在D65条件下进行映射。这使得我们可以 derivate一个映射矩阵,启用图像数据和照明的重建。在第二个映射阶段,我们将重建的图像数据转换为稀疏特征,然后使用轻量级多层感知器(MLP)模型进行优化,使用重建的照明作为真实值。这种方法可以有效减少传感器差异,并提供与前列横跨传感器方法相当的性能。它只需要一小Amount of memory(约0.003 MB),并在RTX3070Ti GPU上训练约1小时。此外,该方法具有快速实现的特点,在GPU和CPU上分别需要0.3毫秒和1毫秒的时间,并不敏感于输入图像分辨率。因此,它提供了实际的解决方案,避免了业界面估计数据收集的大问题。

A Good Feature Extractor Is All You Need for Weakly Supervised Learning in Histopathology

  • paper_url: http://arxiv.org/abs/2311.11772
  • repo_url: None
  • paper_authors: Georg Wölflein, Dyke Ferber, Asier Rabasco Meneghetti, Omar S. M. El Nahhas, Daniel Truhn, Zunamys I. Carrero, David J. Harrison, Ognjen Arandjelović, Jakob N. Kather
  • for: 这 paper 的目的是evaluating the robustness of public pathology SSL feature extractors and identifying the most suitable feature extractors for clinical applications.
  • methods: 这 paper 使用了多种方法,包括 slide-level prediction tasks in a weakly supervised setting with external validation cohorts, and an empirical evaluation of publicly available feature extractors.
  • results: 这 paper 的结果表明 that omitting stain normalization and image augmentations does not compromise downstream performance, while incurring substantial savings in memory and compute. Additionally, the top-performing feature extractors are remarkably robust to variations in stain and augmentations in their latent space.
    Abstract Deep learning is revolutionising pathology, offering novel opportunities in disease prognosis and personalised treatment. Historically, stain normalisation has been a crucial preprocessing step in computational pathology pipelines, and persists into the deep learning era. Yet, with the emergence of feature extractors trained using self-supervised learning (SSL) on diverse pathology datasets, we call this practice into question. In an empirical evaluation of publicly available feature extractors, we find that omitting stain normalisation and image augmentations does not compromise downstream performance, while incurring substantial savings in memory and compute. Further, we show that the top-performing feature extractors are remarkably robust to variations in stain and augmentations like rotation in their latent space. Contrary to previous patch-level benchmarking studies, our approach emphasises clinical relevance by focusing on slide-level prediction tasks in a weakly supervised setting with external validation cohorts. This work represents the most comprehensive robustness evaluation of public pathology SSL feature extractors to date, involving more than 6,000 training runs across nine tasks, five datasets, three downstream architectures, and various preprocessing setups. Our findings stand to streamline digital pathology workflows by minimising preprocessing needs and informing the selection of feature extractors.
    摘要

Non-Contact NIR PPG Sensing through Large Sequence Signal Regression

  • paper_url: http://arxiv.org/abs/2311.11757
  • repo_url: None
  • paper_authors: Timothy Hanley, Dara Golden, Robyn Maxwell, Ashkan Parsi, Joseph Lemley
  • for: 这个论文是为了演示一种新的非接触感知技术,用于从 Near Infra-Red (NIR) 视频中提取心率信号。
  • methods: 这个论文使用了一种 alternating Convolution Attention Network (CAN) 架构,通过对 NIR 视频序列进行卷积和注意力重叠来进行感知。
  • results: 这个论文使用了两个公共可用的数据集,通过对这些数据集进行训练,实现了对 NIR 视频中心率信号的高精度预测。训练结果表明,使用这种 CAN 架构可以在 NIR 视频中提取高精度的心率信号,MAE 为 0.99 bpm。
    Abstract Non-Contact sensing is an emerging technology with applications across many industries from driver monitoring in vehicles to patient monitoring in healthcare. Current state-of-the-art implementations focus on RGB video, but this struggles in varying/noisy light conditions and is almost completely unfeasible in the dark. Near Infra-Red (NIR) video, however, does not suffer from these constraints. This paper aims to demonstrate the effectiveness of an alternative Convolution Attention Network (CAN) architecture, to regress photoplethysmography (PPG) signal from a sequence of NIR frames. A combination of two publicly available datasets, which is split into train and test sets, is used for training the CAN. This combined dataset is augmented to reduce overfitting to the 'normal' 60 - 80 bpm heart rate range by providing the full range of heart rates along with corresponding videos for each subject. This CAN, when implemented over video cropped to the subject's head, achieved a Mean Average Error (MAE) of just 0.99 bpm, proving its effectiveness on NIR video and the architecture's feasibility to regress an accurate signal output.
    摘要 非接触感测是一种emerging技术,应用于多个行业,从车辆驾驶员监测到医疗保健行业的患者监测。当前状态的实现主要基于RGB视频,但这在不同/噪音的照明条件下受到限制,而且在黑暗中基本无法实现。然而,近红外(NIR)视频不受这些限制。这篇论文目的是提出一种alternative Convolution Attention Network(CAN)架构,用于从NIR视频序列中回归血氧检测信号(PPG)信号。使用两个公共可用的数据集,通过将其分成训练和测试集,进行了训练CAN。这个合并的数据集通过提供每个主题的完整的心率范围,从而降低了预测到“常见”60-80bpm心率范围的溢出。这个CAN,当应用于对主题的头部视频进行裁剪后,实现了 Mean Average Error(MAE)为0.99bpm,证明了其在NIR视频和架构上的可行性和回归精度的输出信号。

AdvGen: Physical Adversarial Attack on Face Presentation Attack Detection Systems

  • paper_url: http://arxiv.org/abs/2311.11753
  • repo_url: None
  • paper_authors: Sai Amrit Patnaik, Shivali Chansoriya, Anil K. Jain, Anoop M. Namboodiri
  • for: 防止面部识别系统在真实世界中受到攻击,因为攻击者可以通过修改捕捉到的图像来诱导系统进行误认。
  • methods: 我们提出了一种基于生成对抗网络的自动化攻击策略,可以在物理世界场景下生成攻击图像,并在四个数据集和十个国家级人脸识别系统上进行了广泛的测试。
  • results: 我们的攻击策略可以在物理世界场景下达到82.01%的攻击成功率,并在实际Physical环境中进行了实验验证。
    Abstract Evaluating the risk level of adversarial images is essential for safely deploying face authentication models in the real world. Popular approaches for physical-world attacks, such as print or replay attacks, suffer from some limitations, like including physical and geometrical artifacts. Recently, adversarial attacks have gained attraction, which try to digitally deceive the learning strategy of a recognition system using slight modifications to the captured image. While most previous research assumes that the adversarial image could be digitally fed into the authentication systems, this is not always the case for systems deployed in the real world. This paper demonstrates the vulnerability of face authentication systems to adversarial images in physical world scenarios. We propose AdvGen, an automated Generative Adversarial Network, to simulate print and replay attacks and generate adversarial images that can fool state-of-the-art PADs in a physical domain attack setting. Using this attack strategy, the attack success rate goes up to 82.01%. We test AdvGen extensively on four datasets and ten state-of-the-art PADs. We also demonstrate the effectiveness of our attack by conducting experiments in a realistic, physical environment.
    摘要 evaluating the risk level of adversarial images is essential for safely deploying face authentication models in the real world. popular approaches for physical-world attacks, such as print or replay attacks, suffer from some limitations, like including physical and geometrical artifacts. recently, adversarial attacks have gained attraction, which try to digitally deceive the learning strategy of a recognition system using slight modifications to the captured image. while most previous research assumes that the adversarial image could be digitally fed into the authentication systems, this is not always the case for systems deployed in the real world. this paper demonstrates the vulnerability of face authentication systems to adversarial images in physical world scenarios. we propose advgen, an automated generative adversarial network, to simulate print and replay attacks and generate adversarial images that can fool state-of-the-art padss in a physical domain attack setting. using this attack strategy, the attack success rate goes up to 82.01%. we test advgen extensively on four datasets and ten state-of-the-art padss. we also demonstrate the effectiveness of our attack by conducting experiments in a realistic, physical environment.Here's the translation in Traditional Chinese:评估对于攻击性图像的风险水平是在实际应用中部署人脸识别系统的必要条件。传统的物理攻击方法,如印刷或重播攻击,受到一些限制,例如包括物理和几何学性错误。过去的研究多数假设可以将攻击图像直接传入识别系统,但这不一定适用于实际应用中的系统。本文展示了面部识别系统对于攻击图像在实际世界场景中的脆弱性。我们提出了AdvGen,一个自动生成的对抗网络,来模拟印刷和重播攻击,生成可以诱导面部识别系统的攻击图像。使用这种攻击策略,攻击成功率可以达到82.01%。我们对四个数据集和十个现代PADS进行了广泛的测试。我们还证明了我们的攻击的有效性,通过在实际、物理环境中进行实验。

Fuzzy Information Seeded Region Growing for Automated Lesions After Stroke Segmentation in MR Brain Images

  • paper_url: http://arxiv.org/abs/2311.11742
  • repo_url: https://github.com/mawio02/fisrg-for-automated-lesion-after-stroke-segmentation-in-mri
  • paper_authors: Mario Pascual González
  • for: stroke lesion segmentation from brain MRI images
  • methods: Fuzzy Information Seeded Region Growing (FISRG) algorithm
  • results: highest Dice score of 94.2%, with an average Dice score of 88.1% in the third experiment, indicating effective segmentation of stroke lesions.
    Abstract In the realm of medical imaging, precise segmentation of stroke lesions from brain MRI images stands as a critical challenge with significant implications for patient diagnosis and treatment. Addressing this, our study introduces an innovative approach using a Fuzzy Information Seeded Region Growing (FISRG) algorithm. Designed to effectively delineate the complex and irregular boundaries of stroke lesions, the FISRG algorithm combines fuzzy logic with Seeded Region Growing (SRG) techniques, aiming to enhance segmentation accuracy. The research involved three experiments to optimize the FISRG algorithm's performance, each focusing on different parameters to improve the accuracy of stroke lesion segmentation. The highest Dice score achieved in these experiments was 94.2\%, indicating a high degree of similarity between the algorithm's output and the expert-validated ground truth. Notably, the best average Dice score, amounting to 88.1\%, was recorded in the third experiment, highlighting the efficacy of the algorithm in consistently segmenting stroke lesions across various slices. Our findings reveal the FISRG algorithm's strengths in handling the heterogeneity of stroke lesions. However, challenges remain in areas of abrupt lesion topology changes and in distinguishing lesions from similar intensity brain regions. The results underscore the potential of the FISRG algorithm in contributing significantly to advancements in medical imaging analysis for stroke diagnosis and treatment.
    摘要 在医学成像领域,精准地从脑MRI图像中分割中风损害的 segmentation 作为一项关键挑战,对患者诊断和治疗具有重要意义。我们的研究报告了一种新的方法,即基于模糊逻辑和种子区域生长(FISRG)算法。这种算法旨在准确地界定中风损害的复杂和不规则边界,并通过结合模糊逻辑和种子区域生长(SRG)技术,提高分割精度。我们的研究进行了三个实验来优化FISRG算法的性能,每个实验都关注不同的参数来提高中风损害分割的准确率。实验中最高的 dice 分数为 94.2%,表明算法的输出与专家验证的真实值之间存在高度的相似性。而第三个实验的平均 dice 分数为 88.1%,表明算法在不同的slice中具有高度的稳定性,并能够一致地分割中风损害。我们的发现表明FISRG算法在处理中风损害的多样性方面具有优异的能力。然而,在突然的损害 topology 变化和类似intensity脑区域的区分方面仍然存在挑战。结果表明FISRG算法在医学成像分析中具有广泛的应用前景,对stroke诊断和治疗具有重要意义。

On the Importance of Large Objects in CNN Based Object Detection Algorithms

  • paper_url: http://arxiv.org/abs/2311.11714
  • repo_url: None
  • paper_authors: Ahmed Ben Saad, Gabriele Facciolo, Axel Davy
  • for: 提高对象检测器的性能,特别是小对象的检测 scores。
  • methods: 引入一个基于对象面积的权重项到训练损失函数中,以便更加强调大对象的学习特征。
  • results: 在COCO val 2017上,与 InternImage-T 模型结合使用我们的方法可以提高对象检测器的总性能 (+2 p.p. on small objects, +2 p.p. on medium objects, +4 p.p. on large objects)。 Additionally, we conduct additional experiments and ablation studies to confirm the robustness of our findings.
    Abstract Object detection models, a prominent class of machine learning algorithms, aim to identify and precisely locate objects in images or videos. However, this task might yield uneven performances sometimes caused by the objects sizes and the quality of the images and labels used for training. In this paper, we highlight the importance of large objects in learning features that are critical for all sizes. Given these findings, we propose to introduce a weighting term into the training loss. This term is a function of the object area size. We show that giving more weight to large objects leads to improved detection scores across all object sizes and so an overall improvement in Object Detectors performances (+2 p.p. of mAP on small objects, +2 p.p. on medium and +4 p.p. on large on COCO val 2017 with InternImage-T). Additional experiments and ablation studies with different models and on a different dataset further confirm the robustness of our findings.
    摘要 Translated into Simplified Chinese:对象检测模型,一种常见的机器学习算法,目的是在图像或视频中准确地识别和定位对象。然而,这个任务可能会导致不均匀的性能,这可能是因为对象的大小以及用于训练的图像和标签的质量。在这篇文章中,我们强调大对象在学习特征上的重要性,这些特征是所有大小对象都需要的。基于这些发现,我们提议在训练损失函数中添加一个面积大小相关的权重项。我们显示,对大对象的权重更重,会导致所有对象大小上的检测分数提高 (+2 p.p. of mAP on small objects, +2 p.p. on medium and +4 p.p. on large on COCO val 2017 with InternImage-T)。附加的实验和缺省研究表明我们的发现是可靠的。

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

  • paper_url: http://arxiv.org/abs/2311.11700
  • repo_url: None
  • paper_authors: Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, Xuelong Li
  • for: 这个论文是为了提出一种基于3D Gaussian表示的同时定位和地图生成(SLAM)系统。
  • methods: 这个方法使用了实时可导渲染 Rendering 管道,以提高地图优化和RGB-D重渲染的速度。它还提出了一种适应扩展策略,以有效地重建新观察到的场景几何结构,并改进了已经观察到的区域的地图。
  • results: 该方法在Replica和TUM-RGBD数据集上实现了与现有实时方法相当的竞争性表现,并且在运行时间和稳定性方面具有明显的优势。
    Abstract In this paper, we introduce $\textbf{GS-SLAM}$ that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. The source code will be released soon.
    摘要 在这篇论文中,我们引入了$\textbf{GS-SLAM}$,它首先利用了3D高斯函数表示在同时地标注和地图(SLAM)系统中。它使得更好地寻求效率和准确之间的平衡。相比最近的SLAM方法使用神经网络卷积表示,我们的方法使用了实时可导渲染管道,从而提供了显著的速度提升,用于地图优化和RGB-D重新渲染。具体来说,我们提出了适应扩展策略,将新的或噪声3D高斯函数添加到重构新观察到的场景几何,以提高已观察区域的地图。这种策略是对3D高斯函数表示扩展到重构整个场景而不是Synthesize静止物体的现有方法。此外,在摄像头跟踪过程中,我们设计了一种有效的过滤策略,选择可靠的3D高斯函数表示,以优化摄像头pose,从而实现时间缩短和稳定估计。我们的方法在实时实现的State-of-the-art方法上达到了竞争性表现。我们即将发布源代码。

Cut-and-Paste: Subject-Driven Video Editing with Attention Control

  • paper_url: http://arxiv.org/abs/2311.11697
  • repo_url: None
  • paper_authors: Zhichao Zuo, Zhao Zhang, Yan Luo, Yang Zhao, Haijun Zhang, Yi Yang, Meng Wang
  • for: 这个 paper 是为了提出一种基于文本指导的Semantic Video editing方法,以便在视频编辑中具有更高精度的控制,并且能够保留视频背景。
  • methods: 该 paper 使用了一种称为 Cut-and-Paste 的新框架,它利用文本指导和补充图像来进行 Semantic Video editing。具体来说,该方法使用了 cross attention 控制方法来限制编辑区域,以保持视频背景和空间时间一致性。
  • results: 该 paper 的实验结果表明,相比于现有的方法,Cut-and-Paste 方法能够更好地控制视频编辑,并且能够保留视频背景。这些结果是基于量化和主观评价的。
    Abstract This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image. While the text-driven video editing has demonstrated remarkable ability to generate highly diverse videos following given text prompts, the fine-grained semantic edits are hard to control by plain textual prompt only in terms of object details and edited region, and cumbersome long text descriptions are usually needed for the task. We therefore investigate subject-driven video editing for more precise control of both edited regions and background preservation, and fine-grained semantic generation. We achieve this goal by introducing an reference image as supplementary input to the text-driven video editing, which avoids racking your brain to come up with a cumbersome text prompt describing the detailed appearance of the object. To limit the editing area, we refer to a method of cross attention control in image editing and successfully extend it to video editing by fusing the attention map of adjacent frames, which strikes a balance between maintaining video background and spatio-temporal consistency. Compared with current methods, the whole process of our method is like ``cut" the source object to be edited and then ``paste" the target object provided by reference image. We demonstrate that our method performs favorably over prior arts for video editing under the guidance of text prompt and extra reference image, as measured by both quantitative and subjective evaluations.
    摘要 To achieve this, we introduce a reference image as supplementary input to the text-driven video editing process. This helps avoid the need for detailed text prompts describing object appearance, allowing for more intuitive and efficient editing. Additionally, we extend a cross-attention control method from image editing to video editing, fusing attention maps of adjacent frames to maintain video background and spatio-temporal consistency.Our Cut-and-Paste method is like "cutting" the source object to be edited and "pasting" the target object from the reference image. We demonstrate that our method outperforms prior arts in terms of both quantitative and subjective evaluations.

Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement

  • paper_url: http://arxiv.org/abs/2311.11695
  • repo_url: None
  • paper_authors: Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang
  • for: 提高图像修复和优化(IRE)方法的通用能力和交互功能,解决现有IRE方法的限制性和不足。
  • methods: 提出了一种基于对话智能的 transformative 系统 Clarity ChatGPT,结合了多种IRE方法,可自动探测图像异常类型并选择合适的修复方法,或者基于用户反馈进行迭代生成满意结果。
  • results: 在实验studies中,Clarity ChatGPT 能够有效提高IRE方法的通用能力和交互功能,并填补现有视力语言模型的低级域 gap。
    Abstract The generalization capability of existing image restoration and enhancement (IRE) methods is constrained by the limited pre-trained datasets, making it difficult to handle agnostic inputs such as different degradation levels and scenarios beyond their design scopes. Moreover, they are not equipped with interactive mechanisms to consider user preferences or feedback, and their end-to-end settings cannot provide users with more choices. Faced with the above-mentioned IRE method's limited performance and insufficient interactivity, we try to solve it from the engineering and system framework levels. Specifically, we propose Clarity ChatGPT-a transformative system that combines the conversational intelligence of ChatGPT with multiple IRE methods. Clarity ChatGPT can automatically detect image degradation types and select appropriate IRE methods to restore images, or iteratively generate satisfactory results based on user feedback. Its innovative features include a CLIP-powered detector for accurate degradation classification, no-reference image quality evaluation for performance evaluation, region-specific processing for precise enhancements, and advanced fusion techniques for optimal restoration results. Clarity ChatGPT marks a significant advancement in integrating language and vision, enhancing image-text interactions, and providing a robust, high-performance IRE solution. Our case studies demonstrate that Clarity ChatGPT effectively improves the generalization and interaction capabilities in the IRE, and also fills the gap in the low-level domain of the existing vision-language model.
    摘要 现有的图像修复和改善(IRE)方法的通用能力受到先前训练数据的限制,这使得它们处理不同的降低水平和场景变得困难。此外,它们没有交互机制来考虑用户偏好或反馈,其端到端设置也无法提供更多的选择。面临以上问题的IRE方法表现不佳,我们尝试解决它从工程和系统框架的角度。我们提出了明亮ChatGPT-一个将语言智能和多种IRE方法结合的转变系统。明亮ChatGPT可以自动检测图像降低类型并选择合适的IRE方法来修复图像,或者基于用户反馈进行迭代生成满意的结果。它的创新特点包括CLIP驱动的检测器以确定准确的降低类型,无参图像质量评价来评估性能,区域特定的处理以实现精细的改善,以及高级融合技术来实现优化的修复结果。明亮ChatGPT对整合语言和视觉,提高图像文本交互,并提供了一个robust、高性能的IRE解决方案。我们的案例研究表明,明亮ChatGPT可以有效地提高IRE的通用和交互能力,同时填补现有视力语言模型的低级域空白。

Segment Together: A Versatile Paradigm for Semi-Supervised Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2311.11686
  • repo_url: None
  • paper_authors: Qingjie Zeng, Yutong Xie, Zilin Lu, Mengkang Lu, Yicheng Wu, Yong Xia
    for: 这篇论文的目的是提出一个新的多任务 semi-supervised 架构,以实现医疗影像分类 tasks 中的资料欠缺问题,并且可以将多个数据集合到一个统一的模型中,以便更好地利用无标的数据。methods: 这篇论文使用了一个动态任务推问设计,可以在不同的数据集上灵活地推问不同的目标,以及一个实验室实现的 cutmix 策略来增强模型的准确性。此外,这篇论文还引入了一个内部协调的一致性约束,以使用无标的数据更好地对模型进行训练。results: 这篇论文的实验结果显示,VerSemi 模型可以在四个公共标准 benchmark 上实现比第二最佳方法的大幅提升(例如,平均提升 2.69% Dice 值),实现新的 SOTA 性能在 semi-supervised 医疗影像分类中。
    Abstract Annotation scarcity has become a major obstacle for training powerful deep-learning models for medical image segmentation, restricting their deployment in clinical scenarios. To address it, semi-supervised learning by exploiting abundant unlabeled data is highly desirable to boost the model training. However, most existing works still focus on limited medical tasks and underestimate the potential of learning across diverse tasks and multiple datasets. Therefore, in this paper, we introduce a \textbf{Ver}satile \textbf{Semi}-supervised framework (VerSemi) to point out a new perspective that integrates various tasks into a unified model with a broad label space, to exploit more unlabeled data for semi-supervised medical image segmentation. Specifically, we introduce a dynamic task-prompted design to segment various targets from different datasets. Next, this unified model is used to identify the foreground regions from all labeled data, to capture cross-dataset semantics. Particularly, we create a synthetic task with a cutmix strategy to augment foreground targets within the expanded label space. To effectively utilize unlabeled data, we introduce a consistency constraint. This involves aligning aggregated predictions from various tasks with those from the synthetic task, further guiding the model in accurately segmenting foreground regions during training. We evaluated our VerSemi model on four public benchmarking datasets. Extensive experiments demonstrated that VerSemi can consistently outperform the second-best method by a large margin (e.g., an average 2.69\% Dice gain on four datasets), setting new SOTA performance for semi-supervised medical image segmentation. The code will be released.
    摘要 annotation scarcity has become a major obstacle for training powerful deep-learning models for medical image segmentation, restricting their deployment in clinical scenarios. To address it, semi-supervised learning by exploiting abundant unlabeled data is highly desirable to boost the model training. However, most existing works still focus on limited medical tasks and underestimate the potential of learning across diverse tasks and multiple datasets. Therefore, in this paper, we introduce a VERsatile semi-supervised framework (VerSemi) to point out a new perspective that integrates various tasks into a unified model with a broad label space, to exploit more unlabeled data for semi-supervised medical image segmentation. Specifically, we introduce a dynamic task-prompted design to segment various targets from different datasets. Next, this unified model is used to identify the foreground regions from all labeled data, to capture cross-dataset semantics. Particularly, we create a synthetic task with a cutmix strategy to augment foreground targets within the expanded label space. To effectively utilize unlabeled data, we introduce a consistency constraint. This involves aligning aggregated predictions from various tasks with those from the synthetic task, further guiding the model in accurately segmenting foreground regions during training. We evaluated our VerSemi model on four public benchmarking datasets. Extensive experiments demonstrated that VerSemi can consistently outperform the second-best method by a large margin (e.g., an average 2.69\% Dice gain on four datasets), setting new SOTA performance for semi-supervised medical image segmentation. The code will be released.Here's the translation in Traditional Chinese:annotation scarcity has become a major obstacle for training powerful deep-learning models for medical image segmentation, restricting their deployment in clinical scenarios. To address it, semi-supervised learning by exploiting abundant unlabeled data is highly desirable to boost the model training. However, most existing works still focus on limited medical tasks and underestimate the potential of learning across diverse tasks and multiple datasets. Therefore, in this paper, we introduce a VERsatile semi-supervised framework (VerSemi) to point out a new perspective that integrates various tasks into a unified model with a broad label space, to exploit more unlabeled data for semi-supervised medical image segmentation. Specifically, we introduce a dynamic task-prompted design to segment various targets from different datasets. Next, this unified model is used to identify the foreground regions from all labeled data, to capture cross-dataset semantics. Particularly, we create a synthetic task with a cutmix strategy to augment foreground targets within the expanded label space. To effectively utilize unlabeled data, we introduce a consistency constraint. This involves aligning aggregated predictions from various tasks with those from the synthetic task, further guiding the model in accurately segmenting foreground regions during training. We evaluated our VerSemi model on four public benchmarking datasets. Extensive experiments demonstrated that VerSemi can consistently outperform the second-best method by a large margin (e.g., an average 2.69\% Dice gain on four datasets), setting new SOTA performance for semi-supervised medical image segmentation. The code will be released.

Pyramid Diffusion for Fine 3D Large Scene Generation

  • paper_url: http://arxiv.org/abs/2311.12085
  • repo_url: https://github.com/Yuheng-SWJTU/pyramid-discrete-diffusion
  • paper_authors: Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang
  • for: 本研究旨在Addressing the challenges of directly transferring 2D techniques to 3D scene generation, and proposing a novel approach for high-quality 3D scene generation.
  • methods: 本研究提出了一种多尺度模型(Pyramid Discrete Diffusion,PDD),可以逐步生成高质量的3D场景,从粗略到细节。
  • results: 实验覆盖了无条件和条件生成两种情况,并得到了吸引人的结果,证明模型在生成真实和细腻的3D场景方面具有效果和稳定性。
    Abstract Directly transferring the 2D techniques to 3D scene generation is challenging due to significant resolution reduction and the scarcity of comprehensive real-world 3D scene datasets. To address these issues, our work introduces the Pyramid Discrete Diffusion model (PDD) for 3D scene generation. This novel approach employs a multi-scale model capable of progressively generating high-quality 3D scenes from coarse to fine. In this way, the PDD can generate high-quality scenes within limited resource constraints and does not require additional data sources. To the best of our knowledge, we are the first to adopt the simple but effective coarse-to-fine strategy for 3D large scene generation. Our experiments, covering both unconditional and conditional generation, have yielded impressive results, showcasing the model's effectiveness and robustness in generating realistic and detailed 3D scenes. Our code will be available to the public.
    摘要 <>使用 Pyramid Discrete Diffusion 模型(PDD)进行三维场景生成是一项挑战,因为它会导致场景的分辨率减少,并且有限的实际三维场景数据。为解决这些问题,我们的工作提出了一种新的方法:逐步生成高质量的三维场景。在这种方法中,PDD 可以逐步生成高质量的场景,并且不需要额外的数据源。我们知道,我们是首次采用简单 yet 有效的粗化到细化策略来进行大型三维场景生成。我们的实验,涵盖无条件生成和条件生成,具有吸引人的效果和稳定性,证明了 PDD 模型的可行性和可靠性。我们的代码将对公众开放。

PMP-Swin: Multi-Scale Patch Message Passing Swin Transformer for Retinal Disease Classification

  • paper_url: http://arxiv.org/abs/2311.11669
  • repo_url: None
  • paper_authors: Zhihan Yang, Zhiming Cheng, Tengjin Weng, Shucheng He, Yaqi Wang, Xin Ye, Shuai Wang
  • for: 预测眼病诊断,提高诊断精度。
  • methods: 基于Message Passing机制的Patch Message Passing模块,以global交互强制特异性特征,并采用多种缩放的PatchSize进行多级划分。
  • results: 与现有方法比较,实现了remarkable的性能。
    Abstract Retinal disease is one of the primary causes of visual impairment, and early diagnosis is essential for preventing further deterioration. Nowadays, many works have explored Transformers for diagnosing diseases due to their strong visual representation capabilities. However, retinal diseases exhibit milder forms and often present with overlapping signs, which pose great difficulties for accurate multi-class classification. Therefore, we propose a new framework named Multi-Scale Patch Message Passing Swin Transformer for multi-class retinal disease classification. Specifically, we design a Patch Message Passing (PMP) module based on the Message Passing mechanism to establish global interaction for pathological semantic features and to exploit the subtle differences further between different diseases. Moreover, considering the various scale of pathological features we integrate multiple PMP modules for different patch sizes. For evaluation, we have constructed a new dataset, named OPTOS dataset, consisting of 1,033 high-resolution fundus images photographed by Optos camera and conducted comprehensive experiments to validate the efficacy of our proposed method. And the results on both the public dataset and our dataset demonstrate that our method achieves remarkable performance compared to state-of-the-art methods.
    摘要 retinal disease 是一种主要导致视力障碍的原因,早期诊断非常重要,以防止更进一步的恶化。现在,许多研究已经利用 Transformers 进行疾病诊断,因为它们具有强的视觉表示能力。然而,肠眼病的表现相对软,常常出现 overlap 的症状,这会带来精度的多类分类困难。因此,我们提出了一种新的框架,即多尺度缝 Message Passing Swin Transformer,用于多类肠眼病分类。特别是,我们设计了一个缝 Message Passing(PMP)模块,基于消息传递机制,以确立全局交互,激发不同疾病之间的微妙差异。此外,考虑到不同尺度的病理特征,我们将多个 PMP 模块结合在一起,用于不同的缝size。为了评估我们的提议方法,我们建立了一个新的数据集,名为 OPTOS 数据集,包含 1,033 个高分辨率肠眼图像,通过 Optos 摄像机拍摄,并进行了全面的实验来验证我们的提议方法的效果。结果表明,我们的方法在公共数据集和我们自己的数据集上均表现出色,胜过当前的状态对照方法。

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

  • paper_url: http://arxiv.org/abs/2311.12084
  • repo_url: None
  • paper_authors: Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique
    for:This paper aims to mitigate the effects of patch-based adversarial attacks on machine learning models.methods:The proposed method, ODDR, uses a three-stage pipeline consisting of Fragmentation, Segregation, and Neutralization. Outlier detection techniques are used to identify and segregate anomalous features associated with adversarial perturbations, and dimension reduction methods are applied to mitigate the impact of these perturbations.results:ODDR effectively mitigates patch-based adversarial attacks, with robust accuracies matching or lying within a small range of clean accuracies, and only a marginal compromise of 1-2% in performance on clean samples. The method outperforms other defenses, demonstrating its effectiveness.Here is the simplified Chinese text for the three key points:for:这篇论文目的是为了 Mitigate patch-based adversarial attacks 对机器学习模型的影响。methods:提议的方法 ODDR 使用了一个三个阶段的管道,包括 Fragmentation、Segregation 和 Neutralization。它使用 outlier detection 技术来标识和分离 adversarial perturbations 对图像的异常特征,并使用 dimension reduction 方法来减少这些异常特征对模型的影响。results:ODDR 有效地 Mitigate patch-based adversarial attacks,Robust accuracy 与 clean accuracy 之间几乎相同,并且只有一个微小的牺牲(1-2%)。与其他防御方法相比,ODDR 表现更佳。
    Abstract Adversarial attacks are a major deterrent towards the reliable use of machine learning models. A powerful type of adversarial attacks is the patch-based attack, wherein the adversarial perturbations modify localized patches or specific areas within the images to deceive the trained machine learning model. In this paper, we introduce Outlier Detection and Dimension Reduction (ODDR), a holistic defense mechanism designed to effectively mitigate patch-based adversarial attacks. In our approach, we posit that input features corresponding to adversarial patches, whether naturalistic or otherwise, deviate from the inherent distribution of the remaining image sample and can be identified as outliers or anomalies. ODDR employs a three-stage pipeline: Fragmentation, Segregation, and Neutralization, providing a model-agnostic solution applicable to both image classification and object detection tasks. The Fragmentation stage parses the samples into chunks for the subsequent Segregation process. Here, outlier detection techniques identify and segregate the anomalous features associated with adversarial perturbations. The Neutralization stage utilizes dimension reduction methods on the outliers to mitigate the impact of adversarial perturbations without sacrificing pertinent information necessary for the machine learning task. Extensive testing on benchmark datasets and state-of-the-art adversarial patches demonstrates the effectiveness of ODDR. Results indicate robust accuracies matching and lying within a small range of clean accuracies (1%-3% for classification and 3%-5% for object detection), with only a marginal compromise of 1%-2% in performance on clean samples, thereby significantly outperforming other defenses.
    摘要 机器学习模型面临着严重的抗击攻击的威胁。许多抗击攻击中最具攻击力的是覆盖式攻击,即在图像中添加小量攻击干扰以让机器学习模型进行误分类。在这篇论文中,我们介绍了一种名为异常检测和维度减少(ODDR)的总体防御机制,用于有效地抵御覆盖式攻击。我们认为,受攻击的图像特征(自然或者不自然的攻击干扰)与图像的主要特征分布不同,可以被识别为异常或者异常值。ODDR使用三个阶段管道:分割、分类和中和,提供了对机器学习任务无关的解决方案,适用于图像分类和物体检测任务。分割阶段将样本分割成多个块,以便于后续的分类阶段。在这个阶段,异常检测技术将攻击干扰相关的异常特征分离出来。中和阶段使用维度减少方法对异常特征进行中和,以降低攻击干扰的影响,同时保留必要的信息以确保机器学习任务的正确性。我们对标准的攻击 dataset 和最新的攻击干扰进行了广泛的测试,结果显示ODDR具有优秀的效果。结果表明,ODDR 可以保持和清理样本中的净度(1%-3% для分类和3%-5% для物体检测),只有较小的牺牲(1%-2%),而其他防御机制则需要更大的牺牲。

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning

  • paper_url: http://arxiv.org/abs/2311.11666
  • repo_url: None
  • paper_authors: Haiyang Ying, Yixuan Yin, Jinzhi Zhang, Fan Wang, Tao Yu, Ruqi Huang, Lu Fang
  • for: 实现全面理解3D场景,需要一种通用的3D分割方法,可以同时分割多种对象,无论对象数量多少或类别多少,而且能够反映内在的层次结构。
  • methods: 我们提出了OmniSeg3D方法,它是一种通过层次对比学习框架将多视图不一致的2D分割映射到一致的3D特征场,以实现全面的3D分割和层次结构理解。
  • results: 我们的方法能够在高质量3D分割和准确的层次结构理解方面达到杰出的效果,并且提供了一个便捷的图形用户界面,以便用户在多种场景中进行自由交互。
    Abstract Towards holistic understanding of 3D scenes, a general 3D segmentation method is needed that can segment diverse objects without restrictions on object quantity or categories, while also reflecting the inherent hierarchical structure. To achieve this, we propose OmniSeg3D, an omniversal segmentation method aims for segmenting anything in 3D all at once. The key insight is to lift multi-view inconsistent 2D segmentations into a consistent 3D feature field through a hierarchical contrastive learning framework, which is accomplished by two steps. Firstly, we design a novel hierarchical representation based on category-agnostic 2D segmentations to model the multi-level relationship among pixels. Secondly, image features rendered from the 3D feature field are clustered at different levels, which can be further drawn closer or pushed apart according to the hierarchical relationship between different levels. In tackling the challenges posed by inconsistent 2D segmentations, this framework yields a global consistent 3D feature field, which further enables hierarchical segmentation, multi-object selection, and global discretization. Extensive experiments demonstrate the effectiveness of our method on high-quality 3D segmentation and accurate hierarchical structure understanding. A graphical user interface further facilitates flexible interaction for omniversal 3D segmentation.
    摘要 为了实现全面的3D场景理解,我们需要一种通用的3D分割方法,可以无Restriction地分割多种对象,同时还能够反映Scene中的自然层次结构。为此,我们提出了OmniSeg3D方法,它目标是同时分割所有3D场景中的任何对象。我们的关键发现是通过对多视图不一致的2D分割进行层次对比学习框架,将多级关系模型为像素之间的层次关系。我们首先设计了一种新的层次表示,基于类型无关的2D分割来表示多级关系。其次,我们从3D特征场景中生成的图像特征进行层次归一化,并将归一化后的特征分割成不同层次。在解决不一致2D分割的挑战时,这个框架实现了一个全局一致的3D特征场景,从而实现了层次分割、多对象选择和全球精度分割。我们的实验证明了OmniSeg3D方法的有效性,可以实现高质量的3D分割和准确的层次结构理解。此外,我们还提供了一个图形用户界面,以便用户通过价值Omniversal 3D分割。

PanBench: Towards High-Resolution and High-Performance Pansharpening

  • paper_url: http://arxiv.org/abs/2311.12083
  • repo_url: None
  • paper_authors: Shiying Wang, Xuechao Zou, Kai Li, Junliang Xing, Pin Tao
  • for: 这篇论文是为了探讨远程感知领域中的笔划合成问题,即将低分辨率多spectral图像与高分辨率灰度图像集成成一个高分辨率图像,以提高远程感知数据分析中的精度。
  • methods: 这篇论文提出了一种新的笔划合成网络(CMFNet),使用了级联多尺度融合技术,以实现高精度的笔划合成。
  • results: 论文通过了广泛的实验,证明了CMFNet的效果是非常高,可以在远程感知领域中提高笔划合成的精度。
    Abstract Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information. These pansharpened images enhance precision in land cover classification, change detection, and environmental monitoring within remote sensing data analysis. While deep learning techniques have shown significant success in pansharpening, existing methods often face limitations in their evaluation, focusing on restricted satellite data sources, single scene types, and low-resolution images. This paper addresses this gap by introducing PanBench, a high-resolution multi-scene dataset containing all mainstream satellites and comprising 5,898 pairs of samples. Each pair includes a four-channel (RGB + near-infrared) multispectral image of 256x256 pixels and a mono-channel panchromatic image of 1,024x1,024 pixels. To achieve high-fidelity synthesis, we propose a Cascaded Multiscale Fusion Network (CMFNet) for Pansharpening. Extensive experiments validate the effectiveness of CMFNet. We have released the dataset, source code, and pre-trained models in the supplementary, fostering further research in remote sensing.
    摘要 Remote sensing 中的缩进任务之一是束合低分辨率多spectral图像与高分辨率独立图像,以生成高分辨率的图像,同时保留多spectral信息。这些缩进图像可以增强远程感知数据分类、变化探测和环境监测中的精度。深度学习技术在缩进中已经表现出了显著的成功,但现有方法常面临评估限制,包括固定卫星数据源、单个场景类型和低分辨率图像。本文填补这个空白,通过介绍 PanBench 高分辨率多场景数据集,该数据集包含所有主流卫星,包括 5,898 对样本。每对样本包括一个四通道 (RGB + near-infrared) 多spectral图像,分辨率为 256x256 像素,以及一个单通道独立图像,分辨率为 1,024x1,024 像素。为实现高准确性束合,我们提议一种顺序多尺度融合网络 (CMFNet)。广泛的实验证明了 CMFNet 的有效性。我们在补充中发布了数据集、源代码和预训练模型,欢迎更多的研究人员进行远程感知领域的进一步研究。

Enhanced Spatio-Temporal Context for Temporally Consistent Robust 3D Human Motion Recovery from Monocular Videos

  • paper_url: http://arxiv.org/abs/2311.11662
  • repo_url: None
  • paper_authors: Sushovan Chanda, Amogh Tiwari, Lokender Tiwari, Brojeshwar Bhowmick, Avinash Sharma, Hrishav Barua
  • for: Temporally consistent 3D human body pose, shape, and motion estimation from monocular videos.
  • methods: Body-aware feature representation, per-frame pose and camera initialization, spatio-temporal feature aggregation using self-similarity and self-attention, and LSTM refinement.
  • results: Significantly lower acceleration error and outperformance over existing state-of-the-art methods in complex scenarios like partial occlusion, complex poses, and low illumination.Here’s the Chinese translation:
  • for: 从单目视频中提取一致的3D人体姿态、形状和运动信息。
  • methods: 使用人体意识的特征表示、每帧姿态和摄像头初始化、基于自相似和自注意的体部特征综合、以及LSTM精度调整。
  • results: 在复杂的场景下(如部分遮挡、复杂的姿态和低照明)表现出明显低于加速误差和超越现有状态的方法。
    Abstract Recovering temporally consistent 3D human body pose, shape and motion from a monocular video is a challenging task due to (self-)occlusions, poor lighting conditions, complex articulated body poses, depth ambiguity, and limited availability of annotated data. Further, doing a simple perframe estimation is insufficient as it leads to jittery and implausible results. In this paper, we propose a novel method for temporally consistent motion estimation from a monocular video. Instead of using generic ResNet-like features, our method uses a body-aware feature representation and an independent per-frame pose and camera initialization over a temporal window followed by a novel spatio-temporal feature aggregation by using a combination of self-similarity and self-attention over the body-aware features and the perframe initialization. Together, they yield enhanced spatiotemporal context for every frame by considering remaining past and future frames. These features are used to predict the pose and shape parameters of the human body model, which are further refined using an LSTM. Experimental results on the publicly available benchmark data show that our method attains significantly lower acceleration error and outperforms the existing state-of-the-art methods over all key quantitative evaluation metrics, including complex scenarios like partial occlusion, complex poses and even relatively low illumination.
    摘要 recuperar la pose y el movimiento temporales consistentes del cuerpo humano en una videomonocular es un desafío debido a (auto-oclusión), condiciones de iluminación pobres, poses corporales articuladas complejas, ambigüedad de profundidad y la limitada disponibilidad de datos annotados. Además, una estimación simple por frame no es suficiente ya que lleva a resultados jittery e implausibles. En este artículo, propusimos un método novel para la estimación de la motion consistente en una videomonocular. En lugar de utilizar características genericas de ResNet, nuestro método utiliza una representación de características corporal-consciente y una inicialización de pose y cámara independiente por ventana temporal, seguida de una agregación de características espacio-temporal innovadora utilizando una combinación de self-similarity y self-attention sobre las características corporal-conscientes y la inicialización por frame. Juntos, они proporcionan un contexto espacio-temporal mejorado para cada marco considerando los marcos restantes en el pasado y el futuro. Estas características se utilizan para predecir los parámetros de pose y forma del modelo de cuerpo humano, que se refinan utilizando un LSTM. Los resultados experimentales en los datos de referencia públicos muestran que nuestro método tiene un error de aceleración significativamente menor y supera a los métodos estado-de-la-arte existentes en todas las métricas cuantitativas clave, incluyendo escenarios complejos como la ocultación parcial, poses corporales complejas y hasta una iluminación relativamente baja.

Double-Condensing Attention Condenser: Leveraging Attention in Deep Learning to Detect Skin Cancer from Skin Lesion Images

  • paper_url: http://arxiv.org/abs/2311.11656
  • repo_url: None
  • paper_authors: Chi-en Amy Tai, Elizabeth Janes, Chris Czarnecki, Alexander Wong
  • for: 这paper是为了检测皮肤癌症的皮肤变性图像而写的。
  • methods: 这paper使用了一种叫做Double-Condensing Attention Condensers(DC-AC)的自注意网络结构,以便更快速地计算。
  • results: 这paper介绍了一种特化于皮肤癌症检测的深度学习模型,其中使用了DC-AC自注意网络结构,并且公开发布了这个模型以便让科学家在抗癌战中获得更多的助手。
    Abstract Skin cancer is the most common type of cancer in the United States and is estimated to affect one in five Americans. Recent advances have demonstrated strong performance on skin cancer detection, as exemplified by state of the art performance in the SIIM-ISIC Melanoma Classification Challenge; however these solutions leverage ensembles of complex deep neural architectures requiring immense storage and compute costs, and therefore may not be tractable. A recent movement for TinyML applications is integrating Double-Condensing Attention Condensers (DC-AC) into a self-attention neural network backbone architecture to allow for faster and more efficient computation. This paper explores leveraging an efficient self-attention structure to detect skin cancer in skin lesion images and introduces a deep neural network design with DC-AC customized for skin cancer detection from skin lesion images. The final model is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
    摘要 美国最常见的癌症是皮肤癌,据估计每一个美国人中有一个在五个人会患癌。最新的进展表明在皮肤癌检测方面存在强大的表现,如SIIM-ISIC皮肤癌分类挑战赛的国际首席表现,但这些解决方案具有复杂的深度神经网络架构,需要巨量的存储和计算成本,因此可能不太可能。最近,对于微型机器学习(TinyML)应用而言,把双层凝聚注意力压缩(DC-AC) integrate into a self-attention neural network backbone architecture可以实现更快和高效的计算。这篇论文探讨了使用高效的自注意结构来检测皮肤癌,并提出了一种特化于皮肤癌检测的深度神经网络设计,使用DC-AC。最终模型已经公开发布,并成为全球开源的机器学习推进医生在抗癌斗争中的工具。

Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data

  • paper_url: http://arxiv.org/abs/2311.11647
  • repo_url: None
  • paper_authors: Hayden Gunraj, Chi-en Amy Tai, Alexander Wong
  • for: This paper is written for the purpose of introducing an open-source benchmark dataset of volumetric correlated diffusion imaging (CDI$^s$) data for prostate cancer (PCa) patients, with the goal of advancing research efforts in machine learning and imaging for PCa diagnosis and treatment.
  • methods: The paper uses CDI$^s$ imaging data from a patient cohort of 200 cases, along with full annotations (gland masks, tumor masks, and PCa diagnosis for each tumor). The authors analyze the demographic and label region diversity of the dataset for potential biases.
  • results: The paper introduces Cancer-Net PCa-Data, the first-ever public dataset of CDI$^s$ imaging data for PCa, which is an open-source benchmark dataset for researchers to use in developing and evaluating machine learning models for PCa diagnosis and treatment. The dataset is diverse and comprehensive, with 200 patient cases and full annotations, and has the potential to aid clinicians in the global fight against cancer.
    Abstract The recent introduction of synthetic correlated diffusion (CDI$^s$) imaging has demonstrated significant potential in the realm of clinical decision support for prostate cancer (PCa). CDI$^s$ is a new form of magnetic resonance imaging (MRI) designed to characterize tissue characteristics through the joint correlation of diffusion signal attenuation across different Brownian motion sensitivities. Despite the performance improvement, the CDI$^s$ data for PCa has not been previously made publicly available. In our commitment to advance research efforts for PCa, we introduce Cancer-Net PCa-Data, an open-source benchmark dataset of volumetric CDI$^s$ imaging data of PCa patients. Cancer-Net PCa-Data consists of CDI$^s$ volumetric images from a patient cohort of 200 patient cases, along with full annotations (gland masks, tumor masks, and PCa diagnosis for each tumor). We also analyze the demographic and label region diversity of Cancer-Net PCa-Data for potential biases. Cancer-Net PCa-Data is the first-ever public dataset of CDI$^s$ imaging data for PCa, and is a part of the global open-source initiative dedicated to advancement in machine learning and imaging research to aid clinicians in the global fight against cancer.
    摘要 Recent introduction of synthetic correlated diffusion (CDI$^s$) imaging has shown significant potential in clinical decision support for prostate cancer (PCa). CDI$^s$ is a new form of magnetic resonance imaging (MRI) that characterizes tissue characteristics through joint correlation of diffusion signal attenuation across different Brownian motion sensitivities. Although CDI$^s$ data for PCa has not been publicly available before, we are committed to advancing research efforts for PCa and introduce Cancer-Net PCa-Data, an open-source benchmark dataset of volumetric CDI$^s$ imaging data for PCa patients. Cancer-Net PCa-Data includes CDI$^s$ volumetric images from a patient cohort of 200 cases, along with full annotations (gland masks, tumor masks, and PCa diagnosis for each tumor). We also analyze the demographic and label region diversity of Cancer-Net PCa-Data for potential biases. Cancer-Net PCa-Data is the first public dataset of CDI$^s$ imaging data for PCa and is part of the global open-source initiative dedicated to advancing machine learning and imaging research to aid clinicians in the global fight against cancer.

CastDet: Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning

  • paper_url: http://arxiv.org/abs/2311.11646
  • repo_url: None
  • paper_authors: Yan Li, Weiwei Guo, Dunyun He, Jiaqi Zhou, Yuze Gao, Wenxian Yu
    for:This paper focuses on open-vocabulary object detection (OVD) in aerial images, which enables the characterization of new objects beyond training categories on the earth surface without annotating training images for these new categories.methods:The proposed CastDet framework is an end-to-end student-teacher open-vocabulary object detection framework that leverages the CLIP model as an extra omniscient teacher of rich knowledge into the student-teacher self-learning process. The framework also employs a dynamic label queue technique to maintain high-quality pseudo labels during batch training and mitigate label imbalance.results:The proposed CastDet achieves superior open-vocabulary detection performance, with an HM (Harmonic Mean) of 40.0, outperforming previous methods Detic/ViLD by 26.9/21.1 on the VisDroneZSD dataset.
    Abstract Object detection in aerial images is a pivotal task for various earth observation applications, whereas current algorithms learn to detect only a pre-defined set of object categories demanding sufficient bounding-box annotated training samples and fail to detect novel object categories. In this paper, we consider open-vocabulary object detection (OVD) in aerial images that enables the characterization of new objects beyond training categories on the earth surface without annotating training images for these new categories. The performance of OVD depends on the quality of class-agnostic region proposals and pseudo-labels that can generalize well to novel object categories. To simultaneously generate high-quality proposals and pseudo-labels, we propose CastDet, a CLIP-activated student-teacher open-vocabulary object Detection framework. Our end-to-end framework within the student-teacher mechanism employs the CLIP model as an extra omniscient teacher of rich knowledge into the student-teacher self-learning process. By doing so, our approach boosts novel object proposals and classification. Furthermore, we design a dynamic label queue technique to maintain high-quality pseudo labels during batch training and mitigate label imbalance. We conduct extensive experiments on multiple existing aerial object detection datasets, which are set up for the OVD task. Experimental results demonstrate our CastDet achieving superior open-vocabulary detection performance, e.g., reaching 40.0 HM (Harmonic Mean), which outperforms previous methods Detic/ViLD by 26.9/21.1 on the VisDroneZSD dataset.
    摘要 <>对于地球观测应用,空中图像中的对象检测是一项重要任务,而当前算法只能学习预定的对象类别,需要充足的 bounding-box 注释样本,无法检测新的对象类别。在这篇论文中,我们考虑了开放词汇对象检测(OVD)在空中图像中,可以在地球表面上无需注释训练样本中检测新的对象类别。OVD 的性能取决于高质量的类型不敏感区域提案和 Pseudo-label,这些可以很好地泛化到新的对象类别。为了同时生成高质量的提案和 Pseudo-label,我们提出了 CastDet,一个基于 CLIP 的学生教师开放词汇对象检测框架。我们的端到端框架在学生教师机制中使用 CLIP 模型作为额外的宏观知识的教师,从而提高新对象提案和分类。此外,我们设计了动态标签队列技术,以保持批处理训练期间高质量的 Pseudo-label。我们对多个现有的空中对象检测数据集进行了广泛的实验,并证明我们的 CastDet 在开放词汇对象检测任务中表现出色,例如在 VisDroneZSD 数据集上达到 40.0 HM(和律mean),比前方法 Detic/ViLD 提高 26.9/21.1。Note: "HM" stands for "Harmonic Mean", which is a measure of the performance of object detection algorithms.

Video Face Re-Aging: Toward Temporally Consistent Face Re-Aging

  • paper_url: http://arxiv.org/abs/2311.11642
  • repo_url: https://github.com/kyugorithm/VFRAN
  • paper_authors: Abdul Muqeet, Kyuchul Lee, Bumsoo Kim, Yohan Hong, Hyungrae Lee, Woonggon Kim, Kwang Hee Lee
  • for: 修改视频中人脸年龄,以达到目标年龄
  • methods: 提出了一种新的人脸视频重新年龄化方法,包括一种新的人脸数据集和一种基eline架构,以及三种专门为视频重新年龄化测试的度量
  • results: 对公共数据集进行了广泛的实验,并 показа出方法在年龄变化和时间一致性方面的优于现有方法
    Abstract Video face re-aging deals with altering the apparent age of a person to the target age in videos. This problem is challenging due to the lack of paired video datasets maintaining temporal consistency in identity and age. Most re-aging methods process each image individually without considering the temporal consistency of videos. While some existing works address the issue of temporal coherence through video facial attribute manipulation in latent space, they often fail to deliver satisfactory performance in age transformation. To tackle the issues, we propose (1) a novel synthetic video dataset that features subjects across a diverse range of age groups; (2) a baseline architecture designed to validate the effectiveness of our proposed dataset, and (3) the development of three novel metrics tailored explicitly for evaluating the temporal consistency of video re-aging techniques. Our comprehensive experiments on public datasets, such as VFHQ and CelebV-HQ, show that our method outperforms the existing approaches in terms of both age transformation and temporal consistency.
    摘要 <>Video face re-aging 征�到将人体表现出target age的年龄,这个问题具有挑战性,因为缺乏具有时间一致性的人脸视频对应数据集。大多数现有方法不考虑视频的时间一致性,对每帧图像进行处理。有些现有方法通过在幽默空间进行视频人脸特征修饰来解决问题,但它们经常无法提供满意的年龄转换表现。为了解决这些问题,我们提议:1. 一个新的人造视频数据集,包含不同年龄群体的人体;2. 一个基线架构,用于验证我们的提议数据集的效果;3. 三个专门为视频重新年龄评价的新指标。我们对公共数据集,如VFHQ和CelebV-HQ进行了全面的实验,结果显示,我们的方法在年龄转换和时间一致性两个方面都超过了现有方法。Translation notes:* 征�到 (chèng zhì da) means "to achieve" or "to reach" in Chinese.* 人体 (rén tǐ) means "person" in Chinese.* 表现 (biǎo xiǎng) means "to show" or "to exhibit" in Chinese.* 年龄 (nián jì) means "age" in Chinese.* 时间 (shí jian) means "time" in Chinese.* 一致性 (yī chéng xìng) means "consistency" or "coherence" in Chinese.* 幽默 (yōu mó) means "latent" or "hidden" in Chinese.* 特征 (tè shē) means "feature" or "characteristic" in Chinese.* 修饰 (xiū shī) means "to modify" or "to alter" in Chinese.* 满意 (mǎn yì) means "satisfactory" or "pleasing" in Chinese.* 评价 (píng jì) means "evaluation" or "assessment" in Chinese.

Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model

  • paper_url: http://arxiv.org/abs/2311.11638
  • repo_url: https://github.com/chunminghe/reti-diff
  • paper_authors: Chunming He, Chengyu Fang, Yulun Zhang, Kai Li, Longxiang Tang, Chenyu You, Fengyang Xiao, Zhenhua Guo, Xiu Li
  • For: 提高降低图像质量的阈值环境照明照片的可见性和稳定性。* Methods: 利用扩散模型(DM)在紧凑的幽默空间中生成简洁导向假设,并 introduce a novel solution called Reti-Diff for the IDIR task, which includes two key components: Retinex-based latent DM (RLDM) and Retinex-guided transformer (RGformer).* Results: 比较 existing methods 在三个 IDIR 任务上表现出色,以及下游应用程序中的表现。
    Abstract Illumination degradation image restoration (IDIR) techniques aim to improve the visibility of degraded images and mitigate the adverse effects of deteriorated illumination. Among these algorithms, diffusion model (DM)-based methods have shown promising performance but are often burdened by heavy computational demands and pixel misalignment issues when predicting the image-level distribution. To tackle these problems, we propose to leverage DM within a compact latent space to generate concise guidance priors and introduce a novel solution called Reti-Diff for the IDIR task. Reti-Diff comprises two key components: the Retinex-based latent DM (RLDM) and the Retinex-guided transformer (RGformer). To ensure detailed reconstruction and illumination correction, RLDM is empowered to acquire Retinex knowledge and extract reflectance and illumination priors. These priors are subsequently utilized by RGformer to guide the decomposition of image features into their respective reflectance and illumination components. Following this, RGformer further enhances and consolidates the decomposed features, resulting in the production of refined images with consistent content and robustness to handle complex degradation scenarios. Extensive experiments show that Reti-Diff outperforms existing methods on three IDIR tasks, as well as downstream applications. Code will be available at \url{https://github.com/ChunmingHe/Reti-Diff}.
    摘要 ILLUMINATION DEGRADATION IMAGE RESTORATION(IDIR)技术目的是提高受损图像的可见度和减轻照明衰减的不良影响。其中的扩散模型(DM)基本方法具有良好的表现,但它们常受到重复计算和像素不对齐问题的困扰,尤其是在预测图像级别分布时。为解决这些问题,我们提议利用DM在紧凑的尺度空间内运行,生成简洁的指导假设。这种方法被称为Reti-Diff。Reti-Diff包括两个关键组件:Retinex基于的秘密DM(RLDM)和Retinex引导的变换器(RGformer)。为确保细节重建和照明更正,RLDM被赋予Retinex知识,从而提取反射和照明假设。这些假设后来被RGformer使用,以导引图像特征的分解为其各自的反射和照明组件。接下来,RGformer进一步加强和卷积这些分解的特征,从而生成了更加细腻和稳定的图像,可以满足复杂的受损enario。实验表明,Reti-Diff在三个IDIR任务上的表现都高于现有方法,同时在下游应用中也达到了更好的效果。代码将在 \url{https://github.com/ChunmingHe/Reti-Diff} 上提供。

Generating Realistic Counterfactuals for Retinal Fundus and OCT Images using Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.11629
  • repo_url: None
  • paper_authors: Indu Ilanchezian, Valentyn Boreiko, Laura Kühlewein, Ziwei Huang, Murat Seçkin Ayhan, Matthias Hein, Lisa Koch, Philipp Berens
  • for: 用于解释临床决策或评估alternatives
  • methods: 使用一种扩散模型和一种对抗性鲁棒分类器,生成高度真实的对比图像和OCT B-scan
  • results: 用户研究发现,使用我们的方法生成的对比图像比之前的方法生成的更真实,甚至与真实图像难以分辨
    Abstract Counterfactual reasoning is often used in a clinical setting to explain decisions or weigh alternatives. Therefore, for imaging based modalities such as ophthalmology, it would be beneficial to be able to create counterfactual images, illustrating the answer to the question: "If the subject had had diabetic retinopathy, how would the fundus image have looked?" Here, we demonstrate that using a diffusion model in combination with an adversarially robust classifier trained on retinal disease classification tasks enables generation of highly realistic counterfactuals of retinal fundus images and optical coherence tomorgraphy (OCT) B-scans. Ideally, these classifiers encode the salient features indicative for each disease class and can steer the diffusion model to show realistic disease signs or remove disease-related lesions in a realistic way. Importantly, in a user study, domain experts found the counterfactuals generated using our method significantly more realistic than counterfactuals generated from a previous method, and even indistiguishable from realistic images.
    摘要 ounterfactual 理智 often 用于 clinical setting to explain decisions 或 weigh alternatives. Therefore, for imaging based modalities such as ophthalmology, it would be beneficial to be able to create counterfactual images, illustrating the answer to the question: "If the subject had had diabetic retinopathy, how would the fundus image have looked?" Here, we demonstrate that using a diffusion model in combination with an adversarially robust classifier trained on retinal disease classification tasks enables generation of highly realistic counterfactuals of retinal fundus images and optical coherence tomography (OCT) B-scans. Ideally, these classifiers encode the salient features indicative for each disease class and can steer the diffusion model to show realistic disease signs or remove disease-related lesions in a realistic way. Importantly, in a user study, domain experts found the counterfactuals generated using our method significantly more realistic than counterfactuals generated from a previous method, and even indistinguishable from realistic images.Note that Simplified Chinese is used here, as it is the more widely used standard for Chinese writing in mainland China. If you prefer Traditional Chinese, I can provide that version as well.

Semantic-Preserved Point-based Human Avatar

  • paper_url: http://arxiv.org/abs/2311.11614
  • repo_url: None
  • paper_authors: Lixiang Lin, Jianke Zhu
  • for: 提高 AR/VR 和数字娱乐领域中人类模拟体验的现实感,该论文首次提出了一种点基式人类模板,涵盖了数字人类表达范围的全部。
  • methods: 该模型使用两个多层感知(MLP)来模型 pose-dependent deformation 和直线皮剂(LBS)的 weights。人体外观表示采用解码器和每个点附加的特征。与其他 alternatives 的隐式方法不同,点方向表示方式不仅提供了更直观的人类模板动画模型方法,还能够显著降低训练和推理时间。
  • results: 该方法的实验结果表明其有效性。
    Abstract To enable realistic experience in AR/VR and digital entertainment, we present the first point-based human avatar model that embodies the entirety expressive range of digital humans. We employ two MLPs to model pose-dependent deformation and linear skinning (LBS) weights. The representation of appearance relies on a decoder and the features that attached to each point. In contrast to alternative implicit approaches, the oriented points representation not only provides a more intuitive way to model human avatar animation but also significantly reduces both training and inference time. Moreover, we propose a novel method to transfer semantic information from the SMPL-X model to the points, which enables to better understand human body movements. By leveraging the semantic information of points, we can facilitate virtual try-on and human avatar composition through exchanging the points of same category across different subjects. Experimental results demonstrate the efficacy of our presented method.
    摘要 为提供真实的AR/VR和数字娱乐体验,我们介绍了首个点基的人类化模型,涵盖了整个数字人类表达范围。我们使用两个多层感知(MLP)来模型 pose-dependent deformation和直线皮床(LBS)weights。人物外表表示 rely on decoder 和每个点附加的特征。与替代的隐式方法不同,点云表示不仅提供了更直观的人物动画模型化方法,还可以显著减少训练和推断时间。此外,我们提出了一种将SMPL-X模型中的 semantics 信息传递到点上的新方法,使得更好地理解人体动作。通过利用点上的semantic信息,我们可以实现虚拟试穿和人物组合 durch exchange 点的同类Category 的不同主体。实验结果表明我们提出的方法的有效性。

CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement

  • paper_url: http://arxiv.org/abs/2311.11604
  • repo_url: https://github.com/npupilab/curriculumloc
  • paper_authors: Boni Hu, Lin Chen, Runjian Chen, Shuhui Bu, Pengcheng Han, Haowei Li
  • for: 这篇论文旨在提出一个可靠且可扩展的可视地对照对应方法,以实现实际的可视地对照 зада务。
  • methods: 这篇论文使用了一个精心设计的多阶段精度提高管线,以及一种全球 semantic 意识和本地几何验证的关键点检测和描述方法。
  • results: 实验结果显示,这篇论文的方法可以实现高精度的可视地对照,并且在两个不同的距离度量上创下新的高 recall@1 纪录值。
    Abstract Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images, taken at some unknown location, to a set of geo-tagged reference images. Existing methods, devoted to semantic features representation, evolving towards robustness to a wide variety between query and reference, including illumination and viewpoint changes, as well as scale and seasonal variations. However, practical visual geolocalization approaches need to be robust in appearance changing and extreme viewpoint variation conditions, while providing accurate global location estimates. Therefore, inspired by curriculum design, human learn general knowledge first and then delve into professional expertise. We first recognize semantic scene and then measure geometric structure. Our approach, termed CurriculumLoc, involves a delicate design of multi-stage refinement pipeline and a novel keypoint detection and description with global semantic awareness and local geometric verification. We rerank candidates and solve a particular cross-domain perspective-n-point (PnP) problem based on these keypoints and corresponding descriptors, position refinement occurs incrementally. The extensive experimental results on our collected dataset, TerraTrack and a benchmark dataset, ALTO, demonstrate that our approach results in the aforementioned desirable characteristics of a practical visual geolocalization solution. Additionally, we achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively. Dataset, code and trained models are publicly available on https://github.com/npupilab/CurriculumLoc.
    摘要 Visual地理位置定位是一项经济可行和可扩展的任务,即将一组或多组查询图像,取自未知位置,与一组准备了地理标记的参考图像进行匹配。现有方法主要关注semantic特征表示,逐渐向多样化 между查询和参考图像的鲁棒性进化,包括照明和视角变化、比例和季节变化。然而,实际 visual地理位置定位应用需要对应变化和极端视角变化的鲁棒性,同时提供准确的全球位置估计。因此,我们受到curriculum设计的 inspirited,人类在学习通用知识后,才能专注于专业专长。我们首先识别semantic场景,然后测量几何结构。我们的方法,称之为CurriculumLoc,包括细致的多Stage刷新管道和一种新型的关键点检测和描述,具有全球semantic认知和本地几何验证。我们在这些关键点和对应描述符基础上进行重新排名,并解决一个特定的 across-domain perspective-n-point(PnP)问题,在这个过程中,位置精度进行逐渐调整。我们的实验结果表明,我们的方法具有上述实际 visual地理位置定位应用中需要的愉悦特性。此外,我们在ALTO标准 dataset上取得了新的高 recall@1 分数为62.6%和94.5%,分别使用两种距离度量。数据集、代码和训练模型都可以在https://github.com/npupilab/CurriculumLoc上获得。

Deep Equilibrium Diffusion Restoration with Parallel Sampling

  • paper_url: http://arxiv.org/abs/2311.11600
  • repo_url: https://github.com/caojiezhang/deqir
  • paper_authors: Jiezhang Cao, Yue Shi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool
  • for: 这 paper 的目的是重新思考基于扩散模型的图像修复方法,通过改进 sampling 链来减少计算成本并且提高图像修复质量。
  • methods: 这 paper 使用了 deep equilibrium 固定点系统来解决 diffusion-based IR 模型中的问题,并提出了一种基于 analytical solution 的单个图像 sampling 方法,以便在平行化的方式下进行图像修复。
  • results: 实验结果表明,这 paper 提出的方法可以在典型的 IR 任务和实际应用中达到高质量的图像修复,并且可以快速计算梯度和初始化优化。
    Abstract Diffusion-based image restoration (IR) methods aim to use diffusion models to recover high-quality (HQ) images from degraded images and achieve promising performance. Due to the inherent property of diffusion models, most of these methods need long serial sampling chains to restore HQ images step-by-step. As a result, it leads to expensive sampling time and high computation costs. Moreover, such long sampling chains hinder understanding the relationship between the restoration results and the inputs since it is hard to compute the gradients in the whole chains. In this work, we aim to rethink the diffusion-based IR models through a different perspective, i.e., a deep equilibrium (DEQ) fixed point system. Specifically, we derive an analytical solution by modeling the entire sampling chain in diffusion-based IR models as a joint multivariate fixed point system. With the help of the analytical solution, we are able to conduct single-image sampling in a parallel way and restore HQ images without training. Furthermore, we compute fast gradients in DEQ and found that initialization optimization can boost performance and control the generation direction. Extensive experiments on benchmarks demonstrate the effectiveness of our proposed method on typical IR tasks and real-world settings. The code and models will be made publicly available.
    摘要 Diffusion-based image restoration(IR)方法目标是使用扩散模型来恢复高质量(HQ)图像从降低图像中,并达到了可以的表现。由于扩散模型的内在性质,大多数这些方法需要长串行样本链来恢复HQ图像步骤。这会导致样本时间昂贵和计算成本高。此外,这些长串行样本链使得理解恢复结果和输入之间的关系困难,因为计算整个链上的梯度很难。在这种工作中,我们想要重新思考扩散基于IR模型的方法,即深度平衡(DEQ)固定点系统。我们特别是使用扩散模型整个样本链的联合多变量固定点系统来 derivate一个分析解。通过分析解,我们可以在平行样本中进行单图像恢复,并不需要训练。此外,我们在DEQ中计算了快速的梯度,并发现初始化优化可以提高性能并控制生成方向。我们对典型的IR任务和实际场景进行了广泛的实验,并证明了我们的提出的方法的有效性。代码和模型将公开发布。

Predicting urban tree cover from incomplete point labels and limited background information

  • paper_url: http://arxiv.org/abs/2311.11592
  • repo_url: None
  • paper_authors: Hui Zhang, Ankit Kariryaa, Venkanna Babu Guthula, Christian Igel, Stefan Oehmcke
  • for: 这 paper 是为了提高城市树木的识别和映射,以便更好地了解城市微气候和城市居民的物理和心理健康。
  • methods: 该 paper 使用深度学习方法来映射城市树木在高分辨率飞行图像中,使用有限的数据集和深度学习来实现。
  • results: 该 paper 在 Hamburg, Germany 进行了实验,显示系统可以生成城市树木覆盖率图像,不需要提供树木分割。系统的性能会逐渐下降,如果不使用开源地理数据库。
    Abstract Trees inside cities are important for the urban microclimate, contributing positively to the physical and mental health of the urban dwellers. Despite their importance, often only limited information about city trees is available. Therefore in this paper, we propose a method for mapping urban trees in high-resolution aerial imagery using limited datasets and deep learning. Deep learning has become best-practice for this task, however, existing approaches rely on large and accurately labelled training datasets, which can be difficult and expensive to obtain. However, often noisy and incomplete data may be available that can be combined and utilized to solve more difficult tasks than those datasets were intended for. This paper studies how to combine accurate point labels of urban trees along streets with crowd-sourced annotations from an open geographic database to delineate city trees in remote sensing images, a task which is challenging even for humans. To that end, we perform semantic segmentation of very high resolution aerial imagery using a fully convolutional neural network. The main challenge is that our segmentation maps are sparsely annotated and incomplete. Small areas around the point labels of the street trees coming from official and crowd-sourced data are marked as foreground class. Crowd-sourced annotations of streets, buildings, etc. define the background class. Since the tree data is incomplete, we introduce a masking to avoid class confusion. Our experiments in Hamburg, Germany, showed that the system is able to produce tree cover maps, not limited to trees along streets, without providing tree delineations. We evaluated the method on manually labelled trees and show that performance drastically deteriorates if the open geographic database is not used.
    摘要 urban 内部的树木对城市微气候有积极影响,为城市居民的物理和心理健康做出正面贡献。然而,有限的城市树木信息 frequently 不足,因此在这篇论文中,我们提出了一种使用有限数据集和深度学习方法来映射城市树木在高分辨率飞行图像中的方法。深度学习已成为最佳实践,但现有的方法通常需要大量、准确地标注数据,这可能是 expensive 和困难的。然而,可能存在噪声和不完整的数据,这些数据可以组合并利用来解决更加复杂的任务。本文研究了如何将精确的城市树木点标签与开源地理数据库中的人工标注结合使用,以在遥感图像中划分城市树木,这是人类也难以完成的任务。为此,我们使用了全 convolutional neural network 进行semantic segmentation 的高分辨率飞行图像。主要挑战在于我们的分类图像 sparse 和不完整。小区域附近街道树木的点标签来自官方和开源数据库,被定义为前景类。人工标注的街道、建筑等定义背景类。由于树木数据不完整,我们引入了masking来避免分类混淆。我们在 Hamburg, Germany 进行了实验,发现系统可以生成不限于街道上的树木覆盖率图像,而不需提供树木划分。我们对手动标注的树木进行了评估,并发现如果不使用开源地理数据库,系统性能会下降很快。

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

  • paper_url: http://arxiv.org/abs/2311.12079
  • repo_url: None
  • paper_authors: Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang
  • for: 该论文主要针对卷积批处理任务( dense prediction tasks)中的知识填充(knowledge distillation)问题,旨在提高学生模型的性能。
  • methods: 该论文提出了一种基于频谱频谱(frequency domain)的知识填充方法,称为“FreeKD”,它通过在教师模型中插入频谱推荐(Frequency Prompt)来吸收频谱上的 semantic context,并通过Pixel-wise频谱面(pixel-wise frequency mask)来定位关键的像素点。
  • results: 该论文的实验结果表明,FreeKD方法可以在 dense prediction tasks 中提高学生模型的性能,并且比传统的空间基于的知识填充方法(spatial-based distillation methods)更加稳定和robust。
    Abstract Knowledge distillation (KD) has been applied to various tasks successfully, and mainstream methods typically boost the student model via spatial imitation losses. However, the consecutive downsamplings induced in the spatial domain of teacher model is a type of corruption, hindering the student from analyzing what specific information needs to be imitated, which results in accuracy degradation. To better understand the underlying pattern of corrupted feature maps, we shift our attention to the frequency domain. During frequency distillation, we encounter a new challenge: the low-frequency bands convey general but minimal context, while the high are more informative but also introduce noise. Not each pixel within the frequency bands contributes equally to the performance. To address the above problem: (1) We propose the Frequency Prompt plugged into the teacher model, absorbing the semantic frequency context during finetuning. (2) During the distillation period, a pixel-wise frequency mask is generated via Frequency Prompt, to localize those pixel of interests (PoIs) in various frequency bands. Additionally, we employ a position-aware relational frequency loss for dense prediction tasks, delivering a high-order spatial enhancement to the student model. We dub our Frequency Knowledge Distillation method as FreeKD, which determines the optimal localization and extent for the frequency distillation. Extensive experiments demonstrate that FreeKD not only outperforms spatial-based distillation methods consistently on dense prediction tasks (e.g., FreeKD brings 3.8 AP gains for RepPoints-R50 on COCO2017 and 4.55 mIoU gains for PSPNet-R18 on Cityscapes), but also conveys more robustness to the student. Notably, we also validate the generalization of our approach on large-scale vision models (e.g., DINO and SAM).
    摘要 知识塑化(KD)已经成功应用于多种任务,主流方法通常通过空间模仿损失提高学生模型。然而,在教师模型中的连续下采样induced的空间频谱损害是一种损害,使学生无法分析需要被模仿的具体信息,从而导致精度下降。为了更好地理解下频谱损害的下游特征,我们将注意力集中在频谱频率上。在频谱塑化过程中,我们遇到了一个新的挑战:低频带 convey通用但是有限的信息,而高频带更加有用但也会引入噪音。不是每个像素在频谱带中都有相同的贡献。为了解决以上问题,我们提出了频谱提醒(Frequency Prompt),在教师模型中捕捉频谱语义上下文的 semantic frequency context durante el finetuning。在塑化期间,我们通过频谱提醒生成了一个像素级别的频谱面积掩码,以确定在不同频谱带中的关键像素(PoIs)。此外,我们采用了一种位置感知的相关频谱损失,为精密预测任务提供高阶空间提高。我们称之为FreeKD,它确定了塑化的优化本地化和范围。广泛的实验表明,FreeKD不仅在精密预测任务上(例如,FreeKD在COCO2017上提高了RepPoints-R50的AP值3.8,在Cityscapes上提高了PSPNet-R18的mIoU值4.55),而且传递了更加Robustness到学生。另外,我们还验证了我们的方法在大规模视觉模型(例如,DINO和SAM)上的普适性。

AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters

  • paper_url: http://arxiv.org/abs/2311.11587
  • repo_url: https://github.com/cv-zhangxin/akconv
  • paper_authors: Xin Zhang, Yingze Song, Tingting Song, Degang Yang, Yichen Ye, Jie Zhou, Liming Zhang
  • for: 这 paper 是为了解决标准 convolutional operation 中的两个缺陷而提出的,即 local window 的限制和固定的 convolutional kernel 大小。
  • methods: 这 paper 提出了 Alterable Kernel Convolution (AKConv),一种可变参数和样式的 convolutional operation,通过新的坐标生成算法定义初始位置,并通过偏移来适应目标变化。
  • results: 对于 COCO2017、VOC 7+12 和 VisDrone-DET2021 等 dataset,AKConv 能够提高 объек 检测性能,并且可以作为替换 convolutional operation 来提高网络性能。
    Abstract Neural networks based on convolutional operations have achieved remarkable results in the field of deep learning, but there are two inherent flaws in standard convolutional operations. On the one hand, the convolution operation be confined to a local window and cannot capture information from other locations, and its sampled shapes is fixed. On the other hand, the size of the convolutional kernel is fixed to k $\times$ k, which is a fixed square shape, and the number of parameters tends to grow squarely with size. It is obvious that the shape and size of targets are various in different datasets and at different locations. Convolutional kernels with fixed sample shapes and squares do not adapt well to changing targets. In response to the above questions, the Alterable Kernel Convolution (AKConv) is explored in this work, which gives the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to provide richer options for the trade-off between network overhead and performance. In AKConv, we define initial positions for convolutional kernels of arbitrary size by means of a new coordinate generation algorithm. To adapt to changes for targets, we introduce offsets to adjust the shape of the samples at each position. Moreover, we explore the effect of the neural network by using the AKConv with the same size and different initial sampled shapes. AKConv completes the process of efficient feature extraction by irregular convolutional operations and brings more exploration options for convolutional sampling shapes. Object detection experiments on representative datasets COCO2017, VOC 7+12 and VisDrone-DET2021 fully demonstrate the advantages of AKConv. AKConv can be used as a plug-and-play convolutional operation to replace convolutional operations to improve network performance. The code for the relevant tasks can be found at https://github.com/CV-ZhangXin/AKConv.
    摘要 神经网络基于卷积操作已经在深度学习中取得了惊人的成果,但标准卷积操作存在两个内在的缺陷。一方面,卷积操作只能在本地窗口中进行,无法捕捉其他位置的信息,而且采样形状是固定的。另一方面,卷积核心的大小是固定的,即k x k,这是一个固定的方正形状,而参数的数量呈平方增长。这是不合理的,因为目标的形状和位置在不同的数据集和位置上是多样的。标准卷积核心的固定采样形状和大小不能适应变化的目标。为了解决这些问题,本文提出了可变卷积(AKConv),它允许卷积核心有可变的参数数量和采样形状,以提供更多的质量和性能之间的质量。在AKConv中,我们使用新的坐标生成算法来定义卷积核心的初始位置。为了适应目标的变化,我们引入偏移量来调整采样形状。此外,我们还研究了使用AKConv的效果,包括使用相同大小的AKConv和不同初始采样形状。AKConv完tes了不规则卷积操作的效果,并提供了更多的卷积采样形状的exploration option。在COCO2017、VOC 7+12和VisDrone-DET2021等代表性数据集上,对象检测实验全面展示了AKConv的优势。AKConv可以作为替换标准卷积操作的卷积操作来提高网络性能。相关任务的代码可以在https://github.com/CV-ZhangXin/AKConv中找到。

SeaDSC: A video-based unsupervised method for dynamic scene change detection in unmanned surface vehicles

  • paper_url: http://arxiv.org/abs/2311.11580
  • repo_url: None
  • paper_authors: Linh Trinh, Ali Anwar, Siegfried Mercelis
  • For: This paper is focused on detecting dynamic scene changes in Unmanned Surface Vehicles (USVs) using video data.* Methods: The proposed method utilizes a modified VQ-VAE-2 generative picture model for feature extraction and a novel similarity scoring technique for comparing consecutive frames.* Results: The authors demonstrate the efficiency of their technique on a nautical video dataset called RoboWhaler, showing the effectiveness of their approach in detecting dynamic scene changes.
    Abstract Recently, there has been an upsurge in the research on maritime vision, where a lot of works are influenced by the application of computer vision for Unmanned Surface Vehicles (USVs). Various sensor modalities such as camera, radar, and lidar have been used to perform tasks such as object detection, segmentation, object tracking, and motion planning. A large subset of this research is focused on the video analysis, since most of the current vessel fleets contain the camera's onboard for various surveillance tasks. Due to the vast abundance of the video data, video scene change detection is an initial and crucial stage for scene understanding of USVs. This paper outlines our approach to detect dynamic scene changes in USVs. To the best of our understanding, this work represents the first investigation of scene change detection in the maritime vision application. Our objective is to identify significant changes in the dynamic scenes of maritime video data, particularly those scenes that exhibit a high degree of resemblance. In our system for dynamic scene change detection, we propose completely unsupervised learning method. In contrast to earlier studies, we utilize a modified cutting-edge generative picture model called VQ-VAE-2 to train on multiple marine datasets, aiming to enhance the feature extraction. Next, we introduce our innovative similarity scoring technique for directly calculating the level of similarity in a sequence of consecutive frames by utilizing grid calculation on retrieved features. The experiments were conducted using a nautical video dataset called RoboWhaler to showcase the efficient performance of our technique.
    摘要 近些年来,marine vision领域内有一场势在浮现的研究活动,其中许多研究受到计算机视觉在无人水面车(USV)上的应用的影响。不同的感知modalities,如摄像头、雷达和激光雷达,都被用于实现对象检测、分割、跟踪和运动规划等任务。大多数当前的船舶舰队都装备了船舶上的摄像头,因此视频分析在这些研究中占据了一个重要的位置。由于视频数据的庞大量,视频场景变化检测是USV视频分析中的初始和关键阶段。本文介绍了我们对USV动态场景变化检测的方法。到目前为止,这是marine vision应用中首次对场景变化检测的研究。我们的目标是在USV动态场景中检测出显著变化,特别是那些场景具有高度的相似性。在我们的系统中,我们提出了一种完全无监督学习方法。与先前的研究不同,我们使用了修改后的VQ-VAE-2模型来在多个海洋数据集上训练,以提高特征提取。然后,我们介绍了我们的创新的相似度评分技术,通过在检索到的特征上进行格子计算来直接计算连续帧之间的相似度水平。实验使用了名为RoboWhaler的海洋视频数据集,以展示我们的技术的高效性。

A 3D Multi-Style Cross-Modality Segmentation Framework for Segmenting Vestibular Schwannoma and Cochlea

  • paper_url: http://arxiv.org/abs/2311.11578
  • repo_url: None
  • paper_authors: Yuzhou Zhuang
  • for: 本研究旨在用Multi-style Cross-modality Segmentation方法精确地分类 vestibular schwannoma和cochlea区域在无标注hrT2扫描中,以便提高肿瘤诊断和治疗的精度。
  • methods: 本研究使用了3D多式 Cross-modality Segmentation框架,包括多式转换和自学习分类阶段。首先,使用min-max normalization、voxel size resampling和center cropping来调整ceT1和hrT2扫描的像素大小和中心位置,以获得固定大小的子体积 для训练。接着,使用多种转换网络来超越intensity distribution差异 between多modal扫描。最后,使用nnU-Net框架和iterative自学习方法使用pseudo-labels来在目标领域进行自学习分类。
  • results: 在crossMoDA2023验证集上,本研究的方法获得了Promising results,mean DSC值为72.78%和80.64%,ASSD值为5.85 mm和0.25 mm дляVS肿瘤和cochlea区域,分别。此外,for intra-和extra-meatal区域,本研究的方法获得了DSC值为59.77%和77.14%。
    Abstract The crossMoDA2023 challenge aims to segment the vestibular schwannoma (sub-divided into intra- and extra-meatal components) and cochlea regions of unlabeled hrT2 scans by leveraging labeled ceT1 scans. In this work, we proposed a 3D multi-style cross-modality segmentation framework for the crossMoDA2023 challenge, including the multi-style translation and self-training segmentation phases. Considering heterogeneous distributions and various image sizes in multi-institutional scans, we first utilize the min-max normalization, voxel size resampling, and center cropping to obtain fixed-size sub-volumes from ceT1 and hrT2 scans for training. Then, we perform the multi-style image translation phase to overcome the intensity distribution discrepancy between unpaired multi-modal scans. Specifically, we design three different translation networks with 2D or 2.5D inputs to generate multi-style and realistic target-like volumes from labeled ceT1 volumes. Finally, we perform the self-training volumetric segmentation phase in the target domain, which employs the nnU-Net framework and iterative self-training method using pseudo-labels for training accurate segmentation models in the unlabeled target domain. On the crossMoDA2023 validation dataset, our method produces promising results and achieves the mean DSC values of 72.78% and 80.64% and ASSD values of 5.85 mm and 0.25 mm for VS tumor and cochlea regions, respectively. Moreover, for intra- and extra-meatal regions, our method achieves the DSC values of 59.77% and 77.14%, respectively.
    摘要 <>crossMoDA2023挑战目标是将vestibular schwannoma(分为内部和外部颈部组分)和auditory cochlea区域从无标签hrT2扫描图像中分割,通过利用标注ceT1扫描图像。在这个工作中,我们提出了一个3D多样性交叉Modal segmentation框架 дляcrossMoDA2023挑战,包括多样性翻译和自动训练segmentation阶段。 Considering不均分布和多种图像大小在多机构扫描中,我们首先使用最小值最大值归一化、voxel大小调整和中心剪辑以获取固定大小的sub-volumes从ceT1和hrT2扫描图像中进行训练。然后,我们进行多样性图像翻译阶段,以超越intensity分布差异 между多Modal scans。我们设计了三种不同的翻译网络,其中两个是2D或2.5D输入,以生成多样性和实际的目标类似体积量from标注ceT1体积图像。最后,我们进行自动训练volumetric segmentation阶段,使用nnU-Net框架和迭代自动训练方法使用pseudo-labels进行训练准确的分割模型在无标签目标Domain中。在crossMoDA2023验证集上,我们的方法产生了有前途的结果,得到了 mean DSC 值为72.78%和80.64%,和ASSD值为5.85 mm和0.25 mm дляVS tumor和auditory cochlea区域,分别。此外,对于内部和外部颈部区域,我们的方法得到了 DSC 值为59.77%和77.14%。Note: "vestibular schwannoma" is translated as "vestibular schwannoma" in Simplified Chinese, and "auditory cochlea" is translated as "auditory cochlea" in Simplified Chinese.

CORE-MM: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

  • paper_url: http://arxiv.org/abs/2311.11567
  • repo_url: None
  • paper_authors: Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang
  • For: The paper aims to evaluate the reasoning capabilities of multi-modal large language models (MLLMs) by creating a new benchmark dataset that focuses on complex reasoning tasks, such as deductive, abductive, and analogical reasoning.* Methods: The authors manually curate a dataset of queries that engage the reasoning capabilities of MLLMs, and incorporate intermediate reasoning steps into their evaluation criteria to assess the models’ ability to generate answers.* Results: The authors evaluate a selection of representative MLLMs using this new benchmark dataset and find that their reasoning capabilities are more accurately measured using this open-ended multi-step elaborate reasoning benchmark, which challenges the models to demonstrate their ability to perform complex reasoning tasks.
    Abstract Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence. These models not only excel in traditional vision-language tasks but also demonstrate impressive performance in contemporary multi-modal benchmarks. Although many of these benchmarks attempt to holistically evaluate MLLMs, they typically concentrate on basic reasoning tasks, often yielding only simple yes/no or multi-choice responses. These methods naturally lead to confusion and difficulties in conclusively determining the reasoning capabilities of MLLMs. To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks. Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning. The queries in our dataset are intentionally constructed to engage the reasoning capabilities of MLLMs in the process of generating answers. For a fair comparison across various MLLMs, we incorporate intermediate reasoning steps into our evaluation criteria. In instances where an MLLM is unable to produce a definitive answer, its reasoning ability is evaluated by requesting intermediate reasoning steps. If these steps align with our manual annotations, appropriate scores are assigned. This evaluation scheme resembles methods commonly used in human assessments, such as exams or assignments, and represents what we consider a more effective assessment technique compared with existing benchmarks. We evaluate a selection of representative MLLMs using this rigorously developed open-ended multi-step elaborate reasoning benchmark, designed to challenge and accurately measure their reasoning capabilities. The code and data will be released at https://core-mm.github.io/
    摘要 多modal大型自然语言模型(MLLM)在人工智能领域日益突出。这些模型不仅在传统的视觉语言任务中表现出色,而且在当今的多modal benchmark中也显示出卓越表现。虽然许多这些 benchmark 尝试总体评估 MLLM,但它们通常只集中在基本的理解任务上,常常产生单纯的是或否或多选答案。这些方法自然导致混乱和判定 MLLM 的理解能力困难。为解决这个问题,我们手动精心制作了一个特有的 benchmark 数据集,专门为 MLLM 设计。我们的 benchmark 包括三种关键的理解类别:推理、推理和 analogical reasoning。我们的查询是通过特意设计来让 MLLM 在回答时 engag 其理解能力。为 garantuee fair comparison across 不同的 MLLM,我们在评估标准中包括中间的推理步骤。在 MLLM 无法生成准确答案的情况下,我们评估其推理能力通过请求中间推理步骤。如果这些步骤与我们的手动注释相符,就会得分。这种评估方法与人类评估方法相似,例如考试或作业,并且代表我们认为更有效的评估方法,相比已有的 benchmark。我们使用这些精心制作的开放式多步逻辑 benchmark 评估一 selección 的代表 MLLM,以挑战并准确测量它们的理解能力。我们的代码和数据将在 上发布。

Does complimentary information from multispectral imaging improve face presentation attack detection?

  • paper_url: http://arxiv.org/abs/2311.11566
  • repo_url: None
  • paper_authors: Narayan Vetrekar, Raghavendra Ramachandra, Sushma Venkatesh, Jyoti D. Pawar, R. S. Gad
  • For: The paper is written to study the use of multispectral imaging for detecting presentation attacks in face recognition systems.* Methods: The paper uses a dataset called Face Presentation Attack Multispectral (FPAMS) to evaluate the performance of two fusion methods (image fusion and score fusion) in detecting presentation artifacts.* Results: The paper presents superior performance of the PAD based on the score fusion and image fusion methods, demonstrating the significance of employing multispectral imaging to detect presentation artifacts.Here are the three information points in Simplified Chinese text:* For: 这篇论文是为了研究基于多spectral成像的面 recognition系统中的示范攻击检测。* Methods: 这篇论文使用了一个名为Face Presentation Attack Multispectral (FPAMS)的数据集来评估两种混合方法(图像混合和得分混合)在检测示范 artifacts 中的性能。* Results: 这篇论文显示了基于得分混合和图像混合方法的 PAD 表现出色,证明了在检测示范 artifacts 中使用多spectral成像的重要性。
    Abstract Presentation Attack Detection (PAD) has been extensively studied, particularly in the visible spectrum. With the advancement of sensing technology beyond the visible range, multispectral imaging has gained significant attention in this direction. We present PAD based on multispectral images constructed for eight different presentation artifacts resulted from three different artifact species. In this work, we introduce Face Presentation Attack Multispectral (FPAMS) database to demonstrate the significance of employing multispectral imaging. The goal of this work is to study complementary information that can be combined in two different ways (image fusion and score fusion) from multispectral imaging to improve the face PAD. The experimental evaluation results present an extensive qualitative analysis of 61650 sample multispectral images collected for bonafide and artifacts. The PAD based on the score fusion and image fusion method presents superior performance, demonstrating the significance of employing multispectral imaging to detect presentation artifacts.
    摘要

NePF: Neural Photon Field for Single-Stage Inverse Rendering

  • paper_url: http://arxiv.org/abs/2311.11555
  • repo_url: None
  • paper_authors: Tuen-Yue Tsui, Qin Zou
  • for: addressed the ill-posed inverse rendering problem from multi-view images
  • methods: introduced a novel single-stage framework called Neural Photon Field (NePF), which fully utilizes the physical implication behind the weight function of neural implicit surfaces and the view-dependent radiance
  • results: demonstrated superiority in recovering high-fidelity geometry and visual-plausible material attributes, evaluated on both real and synthetic datasets.Here’s the summary in English for reference:
  • for: The paper addresses the ill-posed inverse rendering problem from multi-view images.
  • methods: The paper introduces a novel single-stage framework called Neural Photon Field (NePF), which fully utilizes the physical implication behind the weight function of neural implicit surfaces and the view-dependent radiance.
  • results: The paper demonstrates the superiority of the proposed approach in recovering high-fidelity geometry and visual-plausible material attributes, evaluated on both real and synthetic datasets.
    Abstract We present a novel single-stage framework, Neural Photon Field (NePF), to address the ill-posed inverse rendering from multi-view images. Contrary to previous methods that recover the geometry, material, and illumination in multiple stages and extract the properties from various multi-layer perceptrons across different neural fields, we question such complexities and introduce our method - a single-stage framework that uniformly recovers all properties. NePF achieves this unification by fully utilizing the physical implication behind the weight function of neural implicit surfaces and the view-dependent radiance. Moreover, we introduce an innovative coordinate-based illumination model for rapid volume physically-based rendering. To regularize this illumination, we implement the subsurface scattering model for diffuse estimation. We evaluate our method on both real and synthetic datasets. The results demonstrate the superiority of our approach in recovering high-fidelity geometry and visual-plausible material attributes.
    摘要 我们提出了一种新的单阶段框架,神经光场(NePF),用于解决多视图图像的逆向渲染问题。与前方法不同,我们的方法不需要在多个层次感知机中提取不同的物理属性,而是通过全面利用神经凝件表面的权重函数和视角依赖的辐射来协调所有属性。此外,我们还提出了一种新的坐标基于的照明模型,用于快速Physically-Based Rendering(PBR)。为了规范这种照明,我们实现了吸收散射模型来估算柔化。我们在真实数据集和 sintetic 数据集上评估了我们的方法,结果显示了我们的方法在高精度的几何和可见性上具有明显的优势。

Unearthing Common Inconsistency for Generalisable Deepfake Detection

  • paper_url: http://arxiv.org/abs/2311.11549
  • repo_url: None
  • paper_authors: Beilin Chu, Xuan Xu, Weike You, Linna Zhou
  • for: 本研究旨在提出一种能够普适应用于不同涂抹方法的深伪检测方法,以解决现有的深伪检测方法无法普适应用于不同频谱频率频谱频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频率频��
    Abstract Deepfake has emerged for several years, yet efficient detection techniques could generalize over different manipulation methods require further research. While current image-level detection method fails to generalize to unseen domains, owing to the domain-shift phenomenon brought by CNN's strong inductive bias towards Deepfake texture, video-level one shows its potential to have both generalization across multiple domains and robustness to compression. We argue that although distinct face manipulation tools have different inherent bias, they all disrupt the consistency between frames, which is a natural characteristic shared by authentic videos. Inspired by this, we proposed a detection approach by capturing frame inconsistency that broadly exists in different forgery techniques, termed unearthing-common-inconsistency (UCI). Concretely, the UCI network based on self-supervised contrastive learning can better distinguish temporal consistency between real and fake videos from multiple domains. We introduced a temporally-preserved module method to introduce spatial noise perturbations, directing the model's attention towards temporal information. Subsequently, leveraging a multi-view cross-correlation learning module, we extensively learn the disparities in temporal representations between genuine and fake samples. Extensive experiments demonstrate the generalization ability of our method on unseen Deepfake domains.
    摘要

Efficient Model Agnostic Approach for Implicit Neural Representation Based Arbitrary-Scale Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2311.12077
  • repo_url: None
  • paper_authors: Young Jae Oh, Jihun Kim, Tae Hyun Kim
  • for: 提高单图超解像的计算效率,无需牺牲重建质量。
  • methods: 使用混合专家模型,动态分配专家对每个像素进行重建。
  • results: 比 tradicional方法减少73%的 floating point operations(FLOPs),并且提供相同或更高的峰值信号噪听比(PSNR)。
    Abstract Single image super-resolution (SISR) has experienced significant advancements, primarily driven by deep convolutional networks. Traditional networks, however, are limited to upscaling images to a fixed scale, leading to the utilization of implicit neural functions for generating arbitrarily scaled images. Nevertheless, these methodologies have imposed substantial computational demands as they involve querying every target pixel to a single resource-intensive decoder. In this paper, we introduce a novel and efficient framework, the Mixture of Experts Implicit Super-Resolution (MoEISR), which enables super-resolution at arbitrary scales with significantly increased computational efficiency without sacrificing reconstruction quality. MoEISR dynamically allocates the most suitable decoding expert to each pixel using a lightweight mapper module, allowing experts with varying capacities to reconstruct pixels across regions with diverse complexities. Our experiments demonstrate that MoEISR successfully reduces up to 73% in floating point operations (FLOPs) while delivering comparable or superior peak signal-to-noise ratio (PSNR).
    摘要 单一图像超解析(SISR)在最近已经经历了重要的进步,主要受到深度卷积神经的驱动。然而,传统的神经网络仅能将图像调整到固定比例,从而需要使用隐藏的神经函数来生成自适应比例的图像。不过,这些方法具有较高的计算成本,因为它们需要每个目标像素与单一资源密集的解oder进行询问。在这篇论文中,我们提出了一个新的和高效的框架,即混合专家隐藏超解析(MoEISR),允许在自适应比例下进行超解析,并大幅降低计算成本。MoEISR通过动态分配最适合的解oding专家给每个像素使用轻量级映射模组,让专家具有不同容量进行像素重建。我们的实验结果显示,MoEISR可以成功降低73%的浮点运算(FLOPs),同时保持比或superior的峰峰信号输出比率(PSNR)。

Event Camera Data Dense Pre-training

  • paper_url: http://arxiv.org/abs/2311.11533
  • repo_url: None
  • paper_authors: Yan Yang, Liyuan Pan, Liu Liu
  • for: 本研究旨在为 dense prediction 任务预训练神经网络,使用事件摄像头数据。
  • methods: 我们的方法仅使用事件数据进行训练,并利用事件图像中的事件特征进行自动归一化,以捕捉事件图像中的相似性关系。
  • results: 我们的方法在 dense prediction 下投入 transfer learning 性能较高,特别是在 DSEC-Flow 比赛中,单个模型占据了挑战性的首位。
    Abstract This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data. Our approach utilizes solely event data for training. Transferring achievements from dense RGB pre-training directly to event camera data yields subpar performance. This is attributed to the spatial sparsity inherent in an event image (converted from event data), where many pixels do not contain information. To mitigate this sparsity issue, we encode an event image into event patch features, automatically mine contextual similarity relationships among patches, group the patch features into distinctive contexts, and enforce context-to-context similarities to learn discriminative event features. For training our framework, we curate a synthetic event camera dataset featuring diverse scene and motion patterns. Transfer learning performance on downstream dense prediction tasks illustrates the superiority of our method over state-of-the-art approaches. Notably, our single model secured the top position in the challenging DSEC-Flow benchmark.
    摘要

Generalized Category Discovery in Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2311.11525
  • repo_url: https://github.com/jethropeng/gcdss
  • paper_authors: Zhengyuan Peng, Qijian Tian, Jianqing Xu, Yizhang Jin, Xuequan Lu, Xin Tan, Yuan Xie, Lizhuang Ma
  • for: 这篇论文探索了一个新的设定,即通用类别发现(Generalized Category Discovery,GCD),企从已知类别的图像推导未知类别。不同于先前的新类别发现(Novel Category Discovery,NCD),这个设定不需要每个未知类别都存在于每个未知图像中。此外,我们扩展了分类的范围,让它包括整个图像。
  • methods: 我们提出了一个简单 yet effective的框架,将GCDSS挑战转换为一个面块分类任务。此外,我们开发了一个基eline方法,即邻居关系导向面块分类算法(NeRG-MaskCA),来进行面块分类。
  • results: 我们的方法显示了GCDSS的可行性和可能性,并且可以在未知图像中发现和分类新的类别。我们使用我们的方法生成的假标签作为真实标签,以便训练其他模型,从而允许它们在未知类别上进行分类。这构成了更多研究的基础,扩展 semantic segmentation 的应用范围。
    Abstract This paper explores a novel setting called Generalized Category Discovery in Semantic Segmentation (GCDSS), aiming to segment unlabeled images given prior knowledge from a labeled set of base classes. The unlabeled images contain pixels of the base class or novel class. In contrast to Novel Category Discovery in Semantic Segmentation (NCDSS), there is no prerequisite for prior knowledge mandating the existence of at least one novel class in each unlabeled image. Besides, we broaden the segmentation scope beyond foreground objects to include the entire image. Existing NCDSS methods rely on the aforementioned priors, making them challenging to truly apply in real-world situations. We propose a straightforward yet effective framework that reinterprets the GCDSS challenge as a task of mask classification. Additionally, we construct a baseline method and introduce the Neighborhood Relations-Guided Mask Clustering Algorithm (NeRG-MaskCA) for mask categorization to address the fragmentation in semantic representation. A benchmark dataset, Cityscapes-GCD, derived from the Cityscapes dataset, is established to evaluate the GCDSS framework. Our method demonstrates the feasibility of the GCDSS problem and the potential for discovering and segmenting novel object classes in unlabeled images. We employ the generated pseudo-labels from our approach as ground truth to supervise the training of other models, thereby enabling them with the ability to segment novel classes. It paves the way for further research in generalized category discovery, broadening the horizons of semantic segmentation and its applications. For details, please visit https://github.com/JethroPeng/GCDSS
    摘要

Towards Few-shot Out-of-Distribution Detection

  • paper_url: http://arxiv.org/abs/2311.12076
  • repo_url: None
  • paper_authors: Jiuqing Dong, Yongbin Gao, Heng Zhou, Jun Cen, Yifan Yao, Sook Yoon, Park Dong Sun
  • for: 提高 open-world 智能系统 的可靠性,即 out-of-distribution (OOD) 检测。
  • methods: 引入了一个新的几shot OOD 检测benchmark,并进行了实验研究,发现ParameterEfficient Fine-Tuning (PEFT) 策略在几shot OOD 检测任务中表现更好,包括完全 Fine-Tuning 和线性探索 Tuning。
  • results: 研究发现,在 fine-tuning 过程中可能会产生一些关键信息,这些信息对 OOD 检测非常重要,因此提出了一种方法,即 DomainSpecific and General Knowledge Fusion (DSGF),以提高几shot OOD 检测能力。
    Abstract Out-of-distribution (OOD) detection is critical for ensuring the reliability of open-world intelligent systems. Despite the notable advancements in existing OOD detection methodologies, our study identifies a significant performance drop under the scarcity of training samples. In this context, we introduce a novel few-shot OOD detection benchmark, carefully constructed to address this gap. Our empirical analysis reveals the superiority of ParameterEfficient Fine-Tuning (PEFT) strategies, such as visual prompt tuning and visual adapter tuning, over conventional techniques, including fully fine-tuning and linear probing tuning in the few-shot OOD detection task. Recognizing some crucial information from the pre-trained model, which is pivotal for OOD detection, may be lost during the fine-tuning process, we propose a method termed DomainSpecific and General Knowledge Fusion (DSGF). This approach is designed to be compatible with diverse fine-tuning frameworks. Our experiments show that the integration of DSGF significantly enhances the few-shot OOD detection capabilities across various methods and fine-tuning methodologies, including fully fine-tuning, visual adapter tuning, and visual prompt tuning. The code will be released.
    摘要 外部数据(OOD)检测是智能系统的可靠性 garantia 的关键。 despite notable advancements in existing OOD detection methodologies, our study identifies a significant performance drop under the scarcity of training samples. In this context, we introduce a novel few-shot OOD detection benchmark, carefully constructed to address this gap. Our empirical analysis reveals the superiority of ParameterEfficient Fine-Tuning (PEFT) strategies, such as visual prompt tuning and visual adapter tuning, over conventional techniques, including fully fine-tuning and linear probing tuning in the few-shot OOD detection task. Recognizing some crucial information from the pre-trained model, which is pivotal for OOD detection, may be lost during the fine-tuning process, we propose a method termed DomainSpecific and General Knowledge Fusion (DSGF). This approach is designed to be compatible with diverse fine-tuning frameworks. Our experiments show that the integration of DSGF significantly enhances the few-shot OOD detection capabilities across various methods and fine-tuning methodologies, including fully fine-tuning, visual adapter tuning, and visual prompt tuning. The code will be released.Here's the text in Traditional Chinese:外部数据(OOD)检测是智能系统的可靠性 garantia 的关键。 despite notable advancements in existing OOD detection methodologies, our study identifies a significant performance drop under the scarcity of training samples. In this context, we introduce a novel few-shot OOD detection benchmark, carefully constructed to address this gap. Our empirical analysis reveals the superiority of ParameterEfficient Fine-Tuning (PEFT) strategies, such as visual prompt tuning and visual adapter tuning, over conventional techniques, including fully fine-tuning and linear probing tuning in the few-shot OOD detection task. Recognizing some crucial information from the pre-trained model, which is pivotal for OOD detection, may be lost during the fine-tuning process, we propose a method termed DomainSpecific and General Knowledge Fusion (DSGF). This approach is designed to be compatible with diverse fine-tuning frameworks. Our experiments show that the integration of DSGF significantly enhances the few-shot OOD detection capabilities across various methods and fine-tuning methodologies, including fully fine-tuning, visual adapter tuning, and visual prompt tuning. The code will be released.

Liver Tumor Prediction with Advanced Attention Mechanisms Integrated into a Depth-Based Variant Search Algorithm

  • paper_url: http://arxiv.org/abs/2311.11520
  • repo_url: None
  • paper_authors: P. Kalaiselvi, S. Anusuya
  • for: 预测肝脏癌症
  • methods: 使用卷积神经网络(CNN)和深度基于变体搜索算法(CNN-DS-AM),并含有进一步的注意力机制
  • results: 提高预测肝脏癌症的准确率和可靠性,高于其他当前领域方法
    Abstract In recent days, Deep Learning (DL) techniques have become an emerging transformation in the field of machine learning, artificial intelligence, computer vision, and so on. Subsequently, researchers and industries have been highly endorsed in the medical field, predicting and controlling diverse diseases at specific intervals. Liver tumor prediction is a vital chore in analyzing and treating liver diseases. This paper proposes a novel approach for predicting liver tumors using Convolutional Neural Networks (CNN) and a depth-based variant search algorithm with advanced attention mechanisms (CNN-DS-AM). The proposed work aims to improve accuracy and robustness in diagnosing and treating liver diseases. The anticipated model is assessed on a Computed Tomography (CT) scan dataset containing both benign and malignant liver tumors. The proposed approach achieved high accuracy in predicting liver tumors, outperforming other state-of-the-art methods. Additionally, advanced attention mechanisms were incorporated into the CNN model to enable the identification and highlighting of regions of the CT scans most relevant to predicting liver tumors. The results suggest that incorporating attention mechanisms and a depth-based variant search algorithm into the CNN model is a promising approach for improving the accuracy and robustness of liver tumor prediction. It can assist radiologists in their diagnosis and treatment planning. The proposed system achieved a high accuracy of 95.5% in predicting liver tumors, outperforming other state-of-the-art methods.
    摘要 现在的日子里,深度学习(DL)技术已成为机器学习、人工智能、计算机视觉等领域的一种emerging transformation。随后,研究人员和产业在医疗领域中高度支持,预测和控制多种疾病。肝肿瘤预测是分析和治疗肝病的重要任务。本文提出了一种使用卷积神经网络(CNN)和深度基本变体搜索算法(CNN-DS-AM)的新方法,以提高肝肿瘤预测的准确性和稳定性。该方法采用了CT扫描图像 dataset,包括了正常和肿瘤肝肿瘤。Results suggest that incorporating attention mechanisms and a depth-based variant search algorithm into the CNN model is a promising approach for improving the accuracy and robustness of liver tumor prediction. It can assist radiologists in their diagnosis and treatment planning, with an accuracy of 95.5% in predicting liver tumors, outperforming other state-of-the-art methods.

Seeing through the Mask: Multi-task Generative Mask Decoupling Face Recognition

  • paper_url: http://arxiv.org/abs/2311.11512
  • repo_url: None
  • paper_authors: Zhaohui Wang, Sufang Zhang, Jianteng Peng, Xinyi Wang, Yandong Guo
  • for: 本研究旨在提高面Recognition系统在受到 occlusion 影响时的性能,解决现有系统在 occluded scenes 下表现不佳的问题。
  • methods: 本研究提出了 Multi-task gEnerative mask dEcoupling face Recognition (MEER) 网络,该网络可以同时处理 occlusion 和 identity 相关的表示,从可见的 facial 部分提取更加纯净的 identity 特征,并通过 join-training 策略实现不受 occlusion 影响的 face synthesis。
  • results: 实验表明,MEER 可以在实际和synthetic occlusions benchmarks 上进行面Recognition,并且在 occluded scenes 下表现较为出色,超过了现有方法的性能。
    Abstract The outbreak of COVID-19 pandemic make people wear masks more frequently than ever. Current general face recognition system suffers from serious performance degradation,when encountering occluded scenes. The potential reason is that face features are corrupted by occlusions on key facial regions. To tackle this problem, previous works either extract identity-related embeddings on feature level by additional mask prediction, or restore the occluded facial part by generative models. However, the former lacks visual results for model interpretation, while the latter suffers from artifacts which may affect downstream recognition. Therefore, this paper proposes a Multi-task gEnerative mask dEcoupling face Recognition (MEER) network to jointly handle these two tasks, which can learn occlusionirrelevant and identity-related representation while achieving unmasked face synthesis. We first present a novel mask decoupling module to disentangle mask and identity information, which makes the network obtain purer identity features from visible facial components. Then, an unmasked face is restored by a joint-training strategy, which will be further used to refine the recognition network with an id-preserving loss. Experiments on masked face recognition under realistic and synthetic occlusions benchmarks demonstrate that the MEER can outperform the state-ofthe-art methods.
    摘要 COVID-19 疫情爆发使人们更常穿戴口罩,现有普通面部识别系统在遇到遮挡场景时表现出了严重的性能下降。 这可能是因为面部特征被遮挡的关键区域所致。为解决这个问题,前一些作品可以在特征层提取人类相关的嵌入,或者使用生成模型恢复遮挡的面部部分。然而,前一些作品缺乏可视化结果,而后者可能会出现artefacts,这些artefacts可能会影响下游识别。因此,本文提出了一种多任务生成式面部隐藏减少网络(MEER),该网络可以同时处理这两个任务,学习遮挡不相关的人类特征表示,并实现不遮挡的面部合成。我们首先提出了一种新的面部隐藏模块,该模块可以分离面部和身份信息,使网络从可见的面部组件中获得纯净的身份特征。然后,我们使用联合训练策略恢复无遮挡的面部,该面部将被用来改进识别网络,并使用id保持损失进行补做。实验结果表明,MEER可以在实际和synthetic occlusion benchmark上超越当前的状态OF-THE-ART方法。

BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning

  • paper_url: http://arxiv.org/abs/2311.12075
  • repo_url: None
  • paper_authors: Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao, Ee-Chien Chang
  • for: 本研究旨在提高模型版权保护和防御力,通过研究后门攻击。
  • methods: 本文使用了 dual-embedding 指导架构和 Bayesian 规则,实现了不易被检测的后门攻击。
  • results: 对比州对抗后门攻击的基eline,本文的攻击效果更高 (+45.3% ASR),这些防御策略在实际应用中基本无效。
    Abstract Studying backdoor attacks is valuable for model copyright protection and enhancing defenses. While existing backdoor attacks have successfully infected multimodal contrastive learning models such as CLIP, they can be easily countered by specialized backdoor defenses for MCL models. This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses and introduces the \emph{\toolns} attack, which is resistant to backdoor detection and model fine-tuning defenses. To achieve this, we draw motivations from the perspective of the Bayesian rule and propose a dual-embedding guided framework for backdoor attacks. Specifically, we ensure that visual trigger patterns approximate the textual target semantics in the embedding space, making it challenging to detect the subtle parameter variations induced by backdoor learning on such natural trigger patterns. Additionally, we optimize the visual trigger patterns to align the poisoned samples with target vision features in order to hinder the backdoor unlearning through clean fine-tuning. Extensive experiments demonstrate that our attack significantly outperforms state-of-the-art baselines (+45.3% ASR) in the presence of SoTA backdoor defenses, rendering these mitigation and detection strategies virtually ineffective. Furthermore, our approach effectively attacks some more rigorous scenarios like downstream tasks. We believe that this paper raises awareness regarding the potential threats associated with the practical application of multimodal contrastive learning and encourages the development of more robust defense mechanisms.
    摘要 To achieve this, we draw motivations from the Bayesian rule and propose a dual-embedding guided framework for backdoor attacks. Specifically, we ensure that visual trigger patterns approximate the textual target semantics in the embedding space, making it challenging to detect the subtle parameter variations induced by backdoor learning on such natural trigger patterns. Additionally, we optimize the visual trigger patterns to align the poisoned samples with target vision features in order to hinder the backdoor unlearning through clean fine-tuning.Our extensive experiments show that our attack significantly outperforms state-of-the-art baselines (+45.3% ASR) in the presence of SoTA backdoor defenses, rendering these mitigation and detection strategies virtually ineffective. Furthermore, our approach effectively attacks some more rigorous scenarios like downstream tasks. We believe that this paper raises awareness regarding the potential threats associated with the practical application of multimodal contrastive learning and encourages the development of more robust defense mechanisms.

cs.AI - 2023-11-20

Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.12244
  • repo_url: None
  • paper_authors: Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai
  • for: 解决实际探索学习问题中state信息只能部分 observable,导致基本假设不成立,从而影响性能。
  • methods: 利用表示视图,提出一种可行的探索学习算法,并提供了对 partial observations 的理论分析,以确保算法的统计效率。
  • results: 实验表明,提出的算法可以在多种 benchmark 上超越现有的表现,因此推动可靠的探索学习向更实际应用。
    Abstract In real-world reinforcement learning problems, the state information is often only partially observable, which breaks the basic assumption in Markov decision processes, and thus, leads to inferior performances. Partially Observable Markov Decision Processes have been introduced to explicitly take the issue into account for learning, exploration, and planning, but presenting significant computational and statistical challenges. To address these difficulties, we exploit the representation view, which leads to a coherent design framework for a practically tractable reinforcement learning algorithm upon partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm. We also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, therefore, pushing reliable reinforcement learning towards more practical applications.
    摘要

InteraSSort: Interactive Assortment Planning Using Large Language Models

  • paper_url: http://arxiv.org/abs/2311.12241
  • repo_url: None
  • paper_authors: Saketh Reddy Karra, Theja Tulabandhula
  • for: 这个论文主要针对的是电商和零售业中的购物策略规划问题,即如何通过互动对话来帮助店长做出优化的决策。
  • methods: 该论文提出了一种互动购物策略规划框架,称为InteraSSort,该框架通过结合大语言模型和优化工具来帮助店长在互动对话中做出优化的决策。
  • results: experiments 表明,InteraSSort 可以帮助店长做出更加精准和个性化的决策,并且可以扩展到多种操作管理挑战。
    Abstract Assortment planning, integral to multiple commercial offerings, is a key problem studied in e-commerce and retail settings. Numerous variants of the problem along with their integration into business solutions have been thoroughly investigated in the existing literature. However, the nuanced complexities of in-store planning and a lack of optimization proficiency among store planners with strong domain expertise remain largely overlooked. These challenges frequently necessitate collaborative efforts with multiple stakeholders which often lead to prolonged decision-making processes and significant delays. To mitigate these challenges and capitalize on the advancements of Large Language Models (LLMs), we propose an interactive assortment planning framework, InteraSSort that augments LLMs with optimization tools to assist store planners in making decisions through interactive conversations. Specifically, we develop a solution featuring a user-friendly interface that enables users to express their optimization objectives as input text prompts to InteraSSort and receive tailored optimized solutions as output. Our framework extends beyond basic functionality by enabling the inclusion of additional constraints through interactive conversation, facilitating precise and highly customized decision-making. Extensive experiments demonstrate the effectiveness of our framework and potential extensions to a broad range of operations management challenges.
    摘要 产品排编规划是电商和零售业中关键的问题,它在多种商业服务中发挥重要作用。现有文献中已经进行了详细的研究和分析。然而,在门店规划中存在许多复杂的特点,以及门店规划人员具有强定领域专业知识的问题,这些问题经常需要多方合作和讨论,导致决策过程延长,延迟。为了解决这些挑战,我们提出一种互动式排编规划框架,即InteraSSort,该框架通过与大语言模型(LLM)的合作,为门店规划人员提供互动式会话的优化解决方案。specifically,我们开发了一个用户友好的界面,允许用户通过输入文本提示来表达优化目标,并从InteraSSort获得适应性的优化解决方案。我们的框架不仅具有基本功能,还允许通过互动会话中的添加约束,实现精准和个性化的决策。我们的实验证明了我们的框架的有效性和扩展性,并可应用于多种运维管理挑战。

Ontological Reasoning over Shy and Warded Datalog$+/-$ for Streaming-based Architectures (technical report)

  • paper_url: http://arxiv.org/abs/2311.12236
  • repo_url: None
  • paper_authors: Teodoro Baldazzi, Luigi Bellomarini, Marco Favorito, Emanuel Sallinger
  • for: 这篇论文关注在现代化的 Ontological Reasoning 系统中,具体是使用 Datalog$+/- $语言扩展 Datalog,以提高推理效率和计算复杂性的 equilibrio。
  • methods: 这篇论文使用了现代reasoner的实现经验,如volcano-iterator架构,以实现有限内存占用和好scalability。
  • results: 这篇论文介绍了两种非常有潜力和可追加的语言,namely Shy和Warded Datalog$+/- $,并基于它们的理论基础,提出了新的推理技巧,以便高效地解决实际场景中的 Ontological Reasoning 问题。
    Abstract Recent years witnessed a rising interest towards Datalog-based ontological reasoning systems, both in academia and industry. These systems adopt languages, often shared under the collective name of Datalog$+/-$, that extend Datalog with the essential feature of existential quantification, while introducing syntactic limitations to sustain reasoning decidability and achieve a good trade-off between expressive power and computational complexity. From an implementation perspective, modern reasoners borrow the vast experience of the database community in developing streaming-based data processing systems, such as volcano-iterator architectures, that sustain a limited memory footprint and good scalability. In this paper, we focus on two extremely promising, expressive, and tractable languages, namely, Shy and Warded Datalog$+/-$. We leverage their theoretical underpinnings to introduce novel reasoning techniques, technically, "chase variants", that are particularly fit for efficient reasoning in streaming-based architectures. We then implement them in Vadalog, our reference streaming-based engine, to efficiently solve ontological reasoning tasks over real-world settings.
    摘要 Here is the text in Simplified Chinese:近年来,有越来越多的关注于基于Datalog的 ontological reasoning系统,在学术和产业中。这些系统使用语言,通常被称为Datalog$+/-$, 它们扩展了Datalog,添加了存在量化,同时保持可 decidability 和计算复杂性的平衡。从实现角度来看,现代推理器借鉴了数据库社区对流处理系统的开发经验,如火山迭代架构,以保持有限内存占用和良好的扩展性。在这篇论文中,我们关注两种极其有前途和表达力强的语言,即Shy和Warded Datalog$+/-$.我们利用它们的理论基础,引入了新的推理技术,即"追踪变种",这些技术特别适合流处理基础结构中高效的推理。然后,我们在Vadalog,我们的参考流处理基础结构中,实现了这些技术,以高效地解决了实际应用中的 ontological reasoning问题。

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2311.12229
  • repo_url: None
  • paper_authors: Shachar Rosenman, Vasudev Lal, Phillip Howard
  • for: 提高文本到图像生成模型的生成质量,减少人工引擎的干预。
  • methods: 使用受控文本解码与适应的语言模型,自动提高用户的提示,以提高文本到图像生成的质量。
  • results: 通过对大量人工引擎生成的提示进行分析和优化,自动生成高质量的提示,以提高文本到图像生成的质量。
    Abstract Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code, a screencast video demo and a live demo instance of NeuroPrompts publicly available.
    摘要 尽管最近的文本到图像扩散模型已经很出色地进步,但是获得高质量图像 Frequently requires human expertise in using these models. 在这项工作中,我们介绍NeuroPrompts,一个可靠的框架,可以自动提高用户提示的质量,以提高文本到图像模型生成的质量。我们的框架使用了约束文本解码,并利用预训练的自然语言模型,生成类似于人类提示工程师生成的提示。这种方法可以提高文本到图像生成的质量,并提供用户控制风格特征的功能,通过约束集定。我们在创建了一个交互式应用程序,用于提示增强和图像生成,并在大量人类工程师生成的提示集合上进行了实验,并证明了我们的方法可以自动生成提高图像质量的提示。我们将我们的代码、屏幕捕捉视频和实时示例 instances of NeuroPrompts 公开 disponibles。

Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators

  • paper_url: http://arxiv.org/abs/2311.12224
  • repo_url: https://github.com/trevorpogue/algebraic-nnhw
  • paper_authors: Trevor E. Pogue, Nicola Nicolici
  • For: The paper is written for improving the performance of machine learning (ML) models by proposing a new algorithm called Free-pipeline Fast Inner Product (FFIP) and its hardware architecture.* Methods: The paper uses an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968, and implements it for the first time in an ML accelerator. The authors also propose a new algorithm called FFIP and a generalized architecture that improves FIP’s clock frequency and throughput for a similar hardware cost.* Results: The paper shows that FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units, or it can double the maximum systolic array size that can fit onto devices with a fixed hardware budget. The authors also demonstrate that their FFIP implementation for non-sparse ML models with 8 to 16-bit fixed-point inputs achieves higher throughput and compute efficiency than the best-in-class prior solutions on the same type of compute platform.Here are the three key points in Simplified Chinese:* For: 本文是为了提高机器学习(ML)模型的性能,提出了一种新的算法called Free-pipeline Fast Inner Product(FFIP)和其硬件体系结构。* Methods: 本文使用了一种尚未得到充分研究的快速内积算法(FIP),并在 ML 加速器中实现了它。作者们还提出了一种新的算法called FFIP 和一种通用的体系结构,可以提高 FIP 的时钟频率和吞吐量,同时保持相同的硬件成本。* Results: 本文表明,FFIP 可以轻松地与传统的 fixed-point systolic array ML 加速器集成,以 достиieving the same throughput with half the number of multiply-accumulate(MAC)单元,或者 doubles the maximum systolic array size that can fit onto devices with a fixed hardware budget。作者们还证明了他们的 FFIP 实现对非零杂质 ML 模型的 8 到 16 位 fixed-point 输入实现了更高的throughput和计算效率,比最佳类型的计算平台上的先前解决方案更好。
    Abstract We introduce a new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture that improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968. Unlike the unrelated Winograd minimal filtering algorithms for convolutional layers, FIP is applicable to all machine learning (ML) model layers that can mainly decompose to matrix multiplication, including fully-connected, convolutional, recurrent, and attention/transformer layers. We implement FIP for the first time in an ML accelerator then present our FFIP algorithm and generalized architecture which inherently improve FIP's clock frequency and, as a consequence, throughput for a similar hardware cost. Finally, we contribute ML-specific optimizations for the FIP and FFIP algorithms and architectures. We show that FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units, or it can double the maximum systolic array size that can fit onto devices with a fixed hardware budget. Our FFIP implementation for non-sparse ML models with 8 to 16-bit fixed-point inputs achieves higher throughput and compute efficiency than the best-in-class prior solutions on the same type of compute platform.
    摘要 我们介绍一种新的算法 called Free-pipeline Fast Inner Product (FFIP) 和其硬件架构,该算法可以提高Winograd在1968年提出的快速内积算法(FIP)的性能。与涉及到卷积层的Winograd最小滤波算法不同,FIP可以应用于所有机器学习(ML)模型层,包括完全连接层、卷积层、回卷层和注意力/转换器层。我们在ML加速器上实现了FIP,然后提出了我们的FFIP算法和通用架构,这两者都可以提高FIP的时钟频率和通过put,同时具有相同的硬件成本。最后,我们对FIP和FFIP算法和架构进行了特定于ML的优化。我们展示了FFIP可以轻松地与传统的 fixes-point systolic array ML加速器结合使用,以实现同样的throughput,但使用的MAC单元数量减半,或者 doublesystolic array的最大大小,可以在设备上匹配的硬件预算内。我们对具有8到16位Fixed-point输入的非稀盐ML模型进行了实现,并取得了与同类 compute平台上的最佳前一solution相同或更高的throughput和计算效率。

Digital Twin-Based User-Centric Edge Continual Learning in Integrated Sensing and Communication

  • paper_url: http://arxiv.org/abs/2311.12223
  • repo_url: None
  • paper_authors: Shisheng Hu, Jie Gao, Xinyu Huang, Mushu Li, Kaige Qu, Conghao Zhou, Xuemin, Shen
  • for: 这篇论文旨在提出一个基于数位双(DT)的使用者中心方法,用于处理感应数据,并实现高精度和有效资源使用。
  • methods: 这篇论文使用了一个轻量级的深度神经网络(DNN)和一个移动边缘计算(MEC)服务器,并将感应数据上传到服务器进行更高精度的处理。为了处理数据漂移,服务器会在必要时更新轻量级DNN, referred to as continual learning。
  • results: 经过实验显示,提出的DT基于方法可以实现 computation cost minimization,并且在执行DNN基于人姿识别任务时表现出色。
    Abstract In this paper, we propose a digital twin (DT)-based user-centric approach for processing sensing data in an integrated sensing and communication (ISAC) system with high accuracy and efficient resource utilization. The considered scenario involves an ISAC device with a lightweight deep neural network (DNN) and a mobile edge computing (MEC) server with a large DNN. After collecting sensing data, the ISAC device either processes the data locally or uploads them to the server for higher-accuracy data processing. To cope with data drifts, the server updates the lightweight DNN when necessary, referred to as continual learning. Our objective is to minimize the long-term average computation cost of the MEC server by optimizing two decisions, i.e., sensing data offloading and sensing data selection for the DNN update. A DT of the ISAC device is constructed to predict the impact of potential decisions on the long-term computation cost of the server, based on which the decisions are made with closed-form formulas. Experiments on executing DNN-based human motion recognition tasks are conducted to demonstrate the outstanding performance of the proposed DT-based approach in computation cost minimization.
    摘要 在这篇论文中,我们提出了基于数字双(DT)的用户中心的方法,用于处理感知数据在集成感知通信(ISAC)系统中,并且具有高精度和有效资源利用。我们考虑的场景是一个具有轻量级深度学习网络(DNN)的ISAC设备,以及一个具有大型DNN的移动边缘计算(MEC)服务器。在收集感知数据后,ISAC设备会选择是在本地处理数据,或者上传到服务器进行更高精度的数据处理。为了应对数据漂移,服务器会在必要时更新轻量级DNN,称为 kontinual learning。我们的目标是将MEC服务器的长期平均计算成本降低到最低水平,通过优化两个决策:感知数据上载和感知数据选择,以便更新DNN。一个DT的ISAC设备是建立,以预测可能的决策对MEC服务器的长期计算成本产生的影响,并根据这些预测结果,制定了关闭式公式来做出决策。我们对执行基于DNN的人体运动识别任务进行了实验,以证明我们提出的DT-基于方法在计算成本减少方面表现出色。

Defense semantics of argumentation: revisit

  • paper_url: http://arxiv.org/abs/2311.12207
  • repo_url: None
  • paper_authors: Beishui Liao, Leendert van der Torre
  • for: 本研究提出了一种新的 semantics,即防御 semantics,用于杜氏抽象论证框架中的一种受到部分防御的论证 triple encoding。
  • methods: 本研究使用了防御 semantics 来研究批判的自我攻击和三角形的批判,并提出了一种新的防御相等性的定义。
  • results: 本研究发现了一些不可满足的防御,并提出了一种基于防御 semantics 的论证概要化方法。
    Abstract In this paper we introduce a novel semantics, called defense semantics, for Dung's abstract argumentation frameworks in terms of a notion of (partial) defence, which is a triple encoding that one argument is (partially) defended by another argument via attacking the attacker of the first argument. In terms of defense semantics, we show that defenses related to self-attacked arguments and arguments in 3-cycles are unsatifiable under any situation and therefore can be removed without affecting the defense semantics of an AF. Then, we introduce a new notion of defense equivalence of AFs, and compare defense equivalence with standard equivalence and strong equivalence, respectively. Finally, by exploiting defense semantics, we define two kinds of reasons for accepting arguments, i.e., direct reasons and root reasons, and a notion of root equivalence of AFs that can be used in argumentation summarization.
    摘要 在本文中,我们引入了一种新的 semantics,即防御 semantics,用于杜нг的抽象口头论证框架中的一种幂等,其中一个Argument被另一个Argument部分防御,通过攻击该Argument的攻击者。在防御 semantics 中,我们表明了自我攻击的论证和3-cycle中的论证是不可满足的,因此可以无需考虑这些论证。然后,我们引入了一种新的论证相等性(defense equivalence),并与标准相等性和强相等性进行比较。最后,通过利用防御 semantics,我们定义了两种接受论证的理由:直接理由和根理由,以及一种论证相等性的概念,可以用于口头论证概要化。

Nepotistically Trained Generative-AI Models Collapse

  • paper_url: http://arxiv.org/abs/2311.12202
  • repo_url: None
  • paper_authors: Matyas Bohacek, Hany Farid
  • for: 这个论文主要用于探讨人工智能图像生成器 retrained 后的表现。
  • methods: 该论文使用了大量的人工生成内容来训练 AI 图像生成器,并在这些模型 retrained 后发现它们会生成高度扭曲的图像。
  • results: 研究发现, retrained 后的模型会生成高度扭曲的图像,并且这种扭曲不仅限于使用的文本提示,而且会影响整个图像的表现。此外,已经恶化的模型即使在重新训练于真实图像上也难以恢复原状。
    Abstract Trained on massive amounts of human-generated content, AI (artificial intelligence) image synthesis is capable of reproducing semantically coherent images that match the visual appearance of its training data. We show that when retrained on even small amounts of their own creation, these generative-AI models produce highly distorted images. We also show that this distortion extends beyond the text prompts used in retraining, and that once poisoned, the models struggle to fully heal even after retraining on only real images.
    摘要 <>用大量人工生成的内容训练,AI图像生成器可以生成具有Semantic coherence的图像,与训练数据的视觉外观相匹配。我们显示,当 retrained наeven small amounts of their own creation,这些生成-AI模型会生成高度扭曲的图像。我们还显示,这种扭曲不仅局限于用于 retrained 的文本提示,而且这些模型一旦被毒化,即使在仅使用真实图像进行重训练,也难以完全恢复。Note: "毒化" (poisoned) in the text refers to the situation where the AI model is trained on inappropriate or misleading data, which can cause the model to produce distorted or incorrect output.

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

  • paper_url: http://arxiv.org/abs/2311.12198
  • repo_url: None
  • paper_authors: Tianyi Xie, Zeshun Zong, Yuxin Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang
  • for: PhysGaussian is a new method for achieving high-quality novel motion synthesis by integrating physically grounded Newtonian dynamics within 3D Gaussians.
  • methods: PhysGaussian employs a custom Material Point Method (MPM) to enrich 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles.
  • results: The method demonstrates exceptional versatility across a wide variety of materials, showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements.Here is the same information in Simplified Chinese text:
  • for: PhysGaussian 是一种新的方法,用于实现高质量的新动作 sintesis,通过在3D Gaussians 中绑定物理基础 Newtonian 动力学。
  • methods: PhysGaussian 使用自定义的 Material Point Method (MPM),以抽象 3D Gaussian 函数中的物理意义的剪枝减压和机械压缩特性,所有在continuum mechanics 原理下演化。
  • results: PhysGaussian 方法在各种材料上显示出了强大的多样性,包括弹簧体、金属、非牛顿流体和尘埃材料,并且能够创造出多种视觉内容,包括新的视角和运动。
    Abstract We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, "cage meshes," or any other geometry embedding, highlighting the principle of "what you see is what you simulate (WS$^2$)." Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. Our project page is at: https://xpandora.github.io/PhysGaussian/
    摘要 我们介绍PhysGaussian,一种新的方法,可以内置物理基础的新顿动力学在3D Gaussian中实现高质量的新动作 sintesis。我们的方法使用自定义的物点方法(MPM),将3D Gaussian kernel给予物理意义的几何对象和机械压力属性,并与流体力学原理一致地进行演算。PhysGaussian的一个特点是让visual rendering和物理 simulated互相使用同一个3D Gaussian kernel作为粗糙表示,这标志着"你看到的就是你实际上实现"的原则(WS$^2)。我们的方法在不同材料中展示了出色的多样性,包括弹性体、金属、非新顿流体和尘质物质,并证明了它在创造多种视角和运动的丰富可见性。我们的项目页面位于:https://xpandora.github.io/PhysGaussian/

ChatGPT and post-test probability

  • paper_url: http://arxiv.org/abs/2311.12188
  • repo_url: None
  • paper_authors: Samuel J. Weisenthal
  • for: 这篇论文旨在测试ChatGPT是否能够进行形式概率医疗诊断推理,以及如何使用Bayes规则进行医疗诊断。
  • methods: 这篇论文使用ChatGPT进行推理,并提供了一些医疗诊断查询,以测试ChatGPT的能力。
  • results: 研究发现,当医疗诊断查询中使用医疗特有的词汇时,ChatGPT的错误率会增加。然而,通过几何工程来设计提示,可以帮助ChatGPT部分避免这些错误。
    Abstract Reinforcement learning-based large language models, such as ChatGPT, are believed to have potential to aid human experts in many domains, including healthcare. There is, however, little work on ChatGPT's ability to perform a key task in healthcare: formal, probabilistic medical diagnostic reasoning. This type of reasoning is used, for example, to update a pre-test probability to a post-test probability. In this work, we probe ChatGPT's ability to perform this task. In particular, we ask ChatGPT to give examples of how to use Bayes rule for medical diagnosis. Our prompts range from queries that use terminology from pure probability (e.g., requests for a "posterior probability") to queries that use terminology from the medical diagnosis literature (e.g., requests for a "post-test probability"). We show how the introduction of medical variable names leads to an increase in the number of errors that ChatGPT makes. Given our results, we also show how one can use prompt engineering to facilitate ChatGPT's partial avoidance of these errors. We discuss our results in light of recent commentaries on sensitivity and specificity. We also discuss how our results might inform new research directions for large language models.
    摘要 大型自然语言模型,如ChatGPT,believed to have potential to aid human experts in many domains, including healthcare. However, there is little work on ChatGPT's ability to perform a key task in healthcare: 形式抽象医学诊断思维。这种思维是用于更新先前概率到后测概率的。在这项工作中,我们探究ChatGPT的能力来执行这个任务。我们问ChatGPT给出医学诊断中使用杯因之则的示例。我们的提示从普适概率中的词汇(例如,"后测概率")到医学诊断文献中的词汇(例如,"后测概率")。我们发现,在医学变量名导入后,ChatGPT的错误率增加。根据我们的结果,我们还示出了如何使用提示工程来facilitate ChatGPT的部分避免错误。我们讨论我们的结果与最近的敏感性和特点的评论相关。我们还讨论如何将我们的结果引入新的大语言模型研究方向。

Common (good) practices measuring trust in HRI

  • paper_url: http://arxiv.org/abs/2311.12182
  • repo_url: None
  • paper_authors: Patrick Holthaus, Alessandra Rossi
  • for: 本研究旨在探讨人们对机器人的信任程度,以便更好地理解人机合作的情境和满意度。
  • methods: 本研究使用了现有的信任量测试方法,包括文本描述和图像测试,以衡量人们对机器人的信任程度。
  • results: 研究发现,人们对机器人的信任程度受到许多因素的影响,包括机器人的能力和可靠性、交互情境和任务等。同时,研究还发现现有的信任量测试方法存在一些不足之处,需要进一步的改进和扩展。
    Abstract Trust in robots is widely believed to be imperative for the adoption of robots into people's daily lives. It is, therefore, understandable that the literature of the last few decades focuses on measuring how much people trust robots -- and more generally, any agent - to foster such trust in these technologies. Researchers have been exploring how people trust robot in different ways, such as measuring trust on human-robot interactions (HRI) based on textual descriptions or images without any physical contact, during and after interacting with the technology. Nevertheless, trust is a complex behaviour, and it is affected and depends on several factors, including those related to the interacting agents (e.g. humans, robots, pets), itself (e.g. capabilities, reliability), the context (e.g. task), and the environment (e.g. public spaces vs private spaces vs working spaces). In general, most roboticists agree that insufficient levels of trust lead to a risk of disengagement while over-trust in technology can cause over-reliance and inherit dangers, for example, in emergency situations. It is, therefore, very important that the research community has access to reliable methods to measure people's trust in robots and technology. In this position paper, we outline current methods and their strengths, identify (some) weakly covered aspects and discuss the potential for covering a more comprehensive amount of factors influencing trust in HRI.
    摘要 信任机器人被广泛认为是人工智能技术的采用的关键因素。因此,文献上最近几十年的研究都集中在测量人们对机器人的信任——更广泛地说,任何代理人——以促进这些技术的采用。研究人员在不同的方式测量人们对机器人的信任,例如基于文本描述或图像无physical contact的人机交互(HRI)中的信任。然而,信任是复杂的行为,它受到多种因素的影响和依赖,包括交互代理人(如人、机器人、宠物)、自身(如能力、可靠性)、 Context(如任务)和环境(如公共空间、私人空间、工作空间)。大多数机器人学家认为,不足的信任会导致技术离别,而过度信任可能会导致过度依赖和固有危险,例如在紧急情况下。因此,研究社区需要可靠的方法来测量人们对机器人的信任。在这篇Position paper中,我们介绍当前的方法和其优点,识别一些不足的方面,并讨论涵盖更广泛的信任因素的潜在可能性。

Conditional Modeling Based Automatic Video Summarization

  • paper_url: http://arxiv.org/abs/2311.12159
  • repo_url: None
  • paper_authors: Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring
  • for: 缩短视频自动化,以保留关键信息并传达整个故事的核心。
  • methods: 基于人类创建真实视频摘要的知识,提出了一种新的视频摘要方法,包括多个有意义的随机变量和共同分布来描述视频摘要的关键组成部分。
  • results: 对常用的视频摘要数据集进行了广泛的实验,并 показа了该方法可以超过现有方法,达到视频摘要的州OF-THE-ART性能。
    Abstract The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story. Video summarization methods mainly rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video. There are other non-visual factors, such as interestingness, representativeness, and storyline consistency that should also be considered for generating high-quality video summaries. Current methods do not adequately take into account these non-visual factors, resulting in suboptimal performance. In this work, a new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries. The method utilizes a conditional modeling perspective and introduces multiple meaningful random variables and joint distributions to characterize the key components of video summarization. Helper distributions are employed to improve the training of the model. A conditional attention module is designed to mitigate potential performance degradation in the presence of multi-modal input. The proposed video summarization method incorporates the above innovative design choices that aim to narrow the gap between human-generated and machine-generated video summaries. Extensive experiments show that the proposed approach outperforms existing methods and achieves state-of-the-art performance on commonly used video summarization datasets.
    摘要 In this work, we propose a new approach to video summarization based on insights from human-created ground truth summaries. Our method uses a conditional modeling perspective and introduces multiple meaningful random variables and joint distributions to characterize the key components of video summarization. We also employ helper distributions to improve the training of the model. A conditional attention module is designed to mitigate potential performance degradation in the presence of multi-modal input.Our proposed video summarization method incorporates several innovative design choices that aim to narrow the gap between human-generated and machine-generated video summaries. Extensive experiments show that our approach outperforms existing methods and achieves state-of-the-art performance on commonly used video summarization datasets.

User-Like Bots for Cognitive Automation: A Survey

  • paper_url: http://arxiv.org/abs/2311.12154
  • repo_url: None
  • paper_authors: Habtom Kahsay Gidey, Peter Hillmann, Andreas Karcher, Alois Knoll
  • for: 本研究旨在探讨软件机器人的高级通用智能工程,以及如何通过认知架构支持这些工程。
  • methods: 本研究使用了许多不同的认知架构,以探讨它们如何支持软件机器人的智能行为。
  • results: 研究发现,使用认知架构可以帮助软件机器人更好地理解和利用多个虚拟环境的便利功能,从而实现更高级的自主智能行为。
    Abstract Software bots have attracted increasing interest and popularity in both research and society. Their contributions span automation, digital twins, game characters with conscious-like behavior, and social media. However, there is still a lack of intelligent bots that can adapt to web environments' variability and dynamic nature. Unlike human users, they have difficulty understanding and exploiting the affordances across multiple virtual environments. Despite the hype, bots with human user-like cognition do not currently exist. Chatbots, for instance, lack situational awareness on the digital platforms where they operate, preventing them from enacting meaningful and autonomous intelligent behavior similar to human users. In this survey, we aim to explore the role of cognitive architectures in supporting efforts towards engineering software bots with advanced general intelligence. We discuss how cognitive architectures can contribute to creating intelligent software bots. Furthermore, we highlight key architectural recommendations for the future development of autonomous, user-like cognitive bots.
    摘要 软件机器人在研究和社会中受到越来越多的关注和欢迎。它们的贡献包括自动化、数字双胞胎、游戏角色具有意识型行为和社交媒体。然而,目前还缺乏智能机器人,能够适应网络环境的变化和动态性。与人类用户不同,机器人困难理解和利用多个虚拟环境中的便利。尽管有很多赞美,但是机器人与人类用户类似的认知仍然不存在。聊天机器人,例如,在数字平台上缺乏情况意识,使其无法展现出类似人类用户的智能行为。在本调查中,我们想要探索软件机器人的高级通用智能的工程准备。我们讨论了如何使用认知架构来支持高级通用智能机器人的开发。此外,我们也强调了未来开发自主、用户类似认知机器人的关键建筑方案。

Teaching Robots to Build Simulations of Themselves

  • paper_url: http://arxiv.org/abs/2311.12151
  • repo_url: None
  • paper_authors: Yuhang Hu, Jiong Lin, Hod Lipson
  • for: robots to plan and estimate the outcomes of prospective actions without physically executing them
  • methods: self-supervised learning framework using brief raw video data
  • results: accurate motion planning and detection of abnormalities/recovery from damage
    Abstract Simulation enables robots to plan and estimate the outcomes of prospective actions without the need to physically execute them. We introduce a self-supervised learning framework to enable robots model and predict their morphology, kinematics and motor control using only brief raw video data, eliminating the need for extensive real-world data collection and kinematic priors. By observing their own movements, akin to humans watching their reflection in a mirror, robots learn an ability to simulate themselves and predict their spatial motion for various tasks. Our results demonstrate that this self-learned simulation not only enables accurate motion planning but also allows the robot to detect abnormalities and recover from damage.
    摘要 使用模拟可以让机器人规划和估算未来行动的结果,不需要物理执行。我们介绍了一种自我超级学习框架,使机器人通过短暴露视频数据自学模型和预测自己的形态、运动和动力控制,从而消除了大量实际世界数据收集和遥感估计。机器人通过观察自己的运动,类似于人类在镜中观察自己,学习了模拟自己的能力,并且可以准确规划运动和检测异常。我们的结果表明,自学 simulate 不仅允许精准的运动规划,还允许机器人恢复自身损害。

Mixing-Denoising Generalizable Occupancy Networks

  • paper_url: http://arxiv.org/abs/2311.12125
  • repo_url: None
  • paper_authors: Amine Ouasfi, Adnane Boukhayma
  • for: 这篇论文旨在探讨如何使用多层感知(MLP)来实现三维形状重建从点云数据中,并以快速的前向传播来实现推理。
  • methods: 该论文使用了一种新的方法,即将MLP用于编码本地特征,而不是 convolutional neural networks(CNN)。这种方法可以减少模型的参数数量,并通过在推理过程中使用减噪正则化来提高模型的泛化能力。
  • results: 根据论文的结果,使用MLP编码本地特征和减噪正则化可以在三维形状重建 task 上达到比使用 CNN 更高的性能,并且使用的模型参数数量只是使用 CNN 的一半。
    Abstract While current state-of-the-art generalizable implicit neural shape models rely on the inductive bias of convolutions, it is still not entirely clear how properties emerging from such biases are compatible with the task of 3D reconstruction from point cloud. We explore an alternative approach to generalizability in this context. We relax the intrinsic model bias (i.e. using MLPs to encode local features as opposed to convolutions) and constrain the hypothesis space instead with an auxiliary regularization related to the reconstruction task, i.e. denoising. The resulting model is the first only-MLP locally conditioned implicit shape reconstruction from point cloud network with fast feed forward inference. Point cloud borne features and denoising offsets are predicted from an exclusively MLP-made network in a single forward pass. A decoder predicts occupancy probabilities for queries anywhere in space by pooling nearby features from the point cloud order-invariantly, guided by denoised relative positional encoding. We outperform the state-of-the-art convolutional method while using half the number of model parameters.
    摘要 当前最新的通用隐藏型神经形态模型,它们靠 inductive bias 来实现通用性。然而,这些特性如何与点云 reconstruction 任务相容,仍然不是 entirely clear。我们explore 一种 alternativeto 通用性的方法。我们将本地特征编码使用 MLP 而不是 convolutions,并使用 auxiliary regularization 来约束假设空间。这导致了首个只有 MLP 的本地条件隐藏形态重建从点云网络,具有快速前向推理。从点云中生成的特征和denoising offset 都是通过 exclusively MLP-made 网络在单个前进 pass 预测的。一个解码器在空间中任意位置预测 queries 的占据概率,通过 pooling nearby features 从点云中order-invariantly,以denoised relative positional encoding 为导航。我们在使用一半数量的模型参数时超越了状态的 convolutional 方法。

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

  • paper_url: http://arxiv.org/abs/2311.12028
  • repo_url: None
  • paper_authors: Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, Nicu Sebe
  • for: 这个论文目的是提出一种高效的Transformers-based 3D人姿估计方法,以提高资源受限的设备上的计算效率。
  • methods: 这个方法使用了一种名为Hourglass Tokenizer (HoT)的插件和恢复框架,包括干扰减少和恢复Token Pruning Cluster (TPC)和Token Recovering Attention (TRA)等技术,以提高模型的效率。
  • results: 对于两个标准数据集(Human3.6M和MPI-INF-3DHP)的实验结果表明,这种方法可以同时实现高效和准确的3D人姿估计,比原始VPT模型更高效,并且可以适应资源受限的设备。例如,在应用于MotionBERT和MixSTE模型上,HoT可以将计算量减少约50% без损失精度,或者减少约40%的计算量,仅减少精度0.2%。
    Abstract Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D human pose estimation from videos. Our HoT begins with pruning pose tokens of redundant frames and ends with recovering full-length tokens, resulting in a few pose tokens in the intermediate transformer blocks and thus improving the model efficiency. To effectively achieve this, we propose a token pruning cluster (TPC) that dynamically selects a few representative tokens with high semantic diversity while eliminating the redundancy of video frames. In addition, we develop a token recovering attention (TRA) to restore the detailed spatio-temporal information based on the selected tokens, thereby expanding the network output to the original full-length temporal resolution for fast inference. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that our method can achieve both high efficiency and estimation accuracy compared to the original VPT models. For instance, applying to MotionBERT and MixSTE on Human3.6M, our HoT can save nearly 50% FLOPs without sacrificing accuracy and nearly 40% FLOPs with only 0.2% accuracy drop, respectively. Our source code will be open-sourced.
    摘要 <>Transformers 已经成功应用于视频基于三维人体姿态估计领域。然而,高计算成本使得这些视频pose transformers (VPTs) 在有限资源设备上无法实现。在这篇论文中,我们提出了一个插件化剪辑恢复框架,called Hourglass Tokenizer (HoT),用于高效的 transformer 基于三维人体姿态估计从视频中。我们的 HoT 从剪辑姿态token 的剪辑框架中开始,并结束于恢复全长姿态token,从而在转换器中减少一些姿态token,提高模型的效率。为了实现这一点,我们提出了一个姿态剪辑集 (TPC),可以动态选择一些具有高Semantic 多样性的姿态token,同时消除视频帧中的重复性。此外,我们开发了一个姿态恢复注意力 (TRA),以便基于选择的姿态token 还原细致的空间时间信息,从而扩展网络输出到原始的全长时间分辨率,以便快速的推理。我们的实验表明,对于 Human3.6M 和 MPI-INF-3DHP 两个标准数据集,我们的方法可以实现高效和估计精度,比如对于 MotionBERT 和 MixSTE 模型,我们的 HoT 可以将 FLOPs 减少约 50% 不 sacrificing 精度,或者将 FLOPs 减少约 40% 只有 0.2% 精度下降。我们的源代码将被开源。<>

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

  • paper_url: http://arxiv.org/abs/2311.12022
  • repo_url: https://github.com/idavidrein/gpqa
  • paper_authors: David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
  • for: 这个论文的目的是提供一个高难度的多选问答集,用于测试人工智能系统的能力和超越人类水平。
  • methods: 论文使用了448个生物、物理和化学领域专家写的多选问题,并确保这些问题的质量非常高,难度极大。
  • results: 论文发现,即使使用最新的GPT-4模型,AI系统的答案准确率只有39%,而专家和非专家评审人员的答案准确率分别为65%和34%。这表明AI系统需要更多的监督和审核,以确保它们可以提供可靠的信息。
    Abstract We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.
    摘要 我们提供了GPQA数据集,包含448个多选问题,由领域专家写作生物、物理和化学领域。我们确保这些问题的质量非常高,极其困难:有或者在追求PhD的专家只有65%的准确率(74%不计算明显的错误),而高水平的非专家验证人员只有34%的准确率,尽管他们平均花费超过30分钟,并且有无限时间在网上查询(即问题是"Google-proof")。这些问题也对当前AI系统来说很困难,我们最强的GPT-4基础模型只达39%的准确率。如果我们想用未来的AI系统来帮助我们回答非常困难的问题,例如在发展新科学知识时,我们需要开发可扩展的监督方法,使人类可以监督AI系统的输出,这可能是非常困难,即使监督人员本身具备高水平的技能和知识。GPQA的困难程度不仅对非专家和前沿AI系统来说,也可以为我们进行可扩展的监督实验,我们希望通过这些实验找到一种可靠地从AI系统获取真实信息的方法,以便在AI系统超越人类能力时,人类专家可以得到可靠的信息。

Steering Responsible AI: A Case for Algorithmic Pluralism

  • paper_url: http://arxiv.org/abs/2311.12010
  • repo_url: None
  • paper_authors: Stefaan G. Verhulst
  • for: 这篇论文探讨人工智能中立性的问题,通过现有的媒体多元性和媒体媒体多元性的学术研究。
  • methods: 该论文使用现有的媒体多元性和媒体媒体多元性的学术研究来探讨人工智能中立性。
  • results: 该论文提出了对algorithmic pluralism的描述,并评估了这种概念的机遇和挑战。
    Abstract In this paper, I examine questions surrounding AI neutrality through the prism of existing literature and scholarship about mediation and media pluralism. Such traditions, I argue, provide a valuable theoretical framework for how we should approach the (likely) impending era of AI mediation. In particular, I suggest examining further the notion of algorithmic pluralism. Contrasting this notion to the dominant idea of algorithmic transparency, I seek to describe what algorithmic pluralism may be, and present both its opportunities and challenges. Implemented thoughtfully and responsibly, I argue, Algorithmic or AI pluralism has the potential to sustain the diversity, multiplicity, and inclusiveness that are so vital to democracy.
    摘要 在这篇论文中,我通过现有的文献和学术研究关于媒介和多元媒体来探讨人工智能中立性的问题。这些传统,我 Argument that they provide a valuable theoretical framework for how we should approach the (likely) impending era of AI mediation. 特别是,我 suggets examining further the notion of algorithmic pluralism. 对于 dominant idea of algorithmic transparency,我 seek to describe what algorithmic pluralism may be, and present both its opportunities and challenges. 如果实施得当和负责任,我 Argument that algorithmic or AI pluralism has the potential to sustain the diversity, multiplicity, and inclusiveness that are so vital to democracy.Note: Please keep in mind that the translation is done by a machine and may not be perfect. Also, the grammar and sentence structure may be different from the original text.

BrainWash: A Poisoning Attack to Forget in Continual Learning

  • paper_url: http://arxiv.org/abs/2311.11995
  • repo_url: None
  • paper_authors: Ali Abbasi, Parsa Nooralinejad, Hamed Pirsiavash, Soheil Kolouri
  • for: 本研究旨在探讨深度学习中的连续学习方法在面对敌意攻击时的抵触性。
  • methods: 本研究提出了一种新的数据损害方法,称为“BrainWash”,可以让连续学习模型忘记先前学习的任务。该方法不需要攻击者有过去任务的数据访问权限,只需要使用当前模型参数和最新任务的数据即可。
  • results: 实验结果表明,BrainWash方法可以成功地让连续学习模型忘记先前学习的任务,并且可以让模型在不同的常规化学习方法下表现出较差的性能。
    Abstract Continual learning has gained substantial attention within the deep learning community, offering promising solutions to the challenging problem of sequential learning. Yet, a largely unexplored facet of this paradigm is its susceptibility to adversarial attacks, especially with the aim of inducing forgetting. In this paper, we introduce "BrainWash," a novel data poisoning method tailored to impose forgetting on a continual learner. By adding the BrainWash noise to a variety of baselines, we demonstrate how a trained continual learner can be induced to forget its previously learned tasks catastrophically, even when using these continual learning baselines. An important feature of our approach is that the attacker requires no access to previous tasks' data and is armed merely with the model's current parameters and the data belonging to the most recent task. Our extensive experiments highlight the efficacy of BrainWash, showcasing degradation in performance across various regularization-based continual learning methods.
    摘要 <> simultaeneously learning has gained substantial attention within the deep learning community, offering promising solutions to the challenging problem of sequential learning. Yet, a largely unexplored facet of this paradigm is its susceptibility to adversarial attacks, especially with the aim of inducing forgetting. In this paper, we introduce "BrainWash," a novel data poisoning method tailored to impose forgetting on a continual learner. By adding the BrainWash noise to a variety of baselines, we demonstrate how a trained continual learner can be induced to forget its previously learned tasks catastrophically, even when using these continual learning baselines. An important feature of our approach is that the attacker requires no access to previous tasks' data and is armed merely with the model's current parameters and the data belonging to the most recent task. Our extensive experiments highlight the efficacy of BrainWash, showcasing degradation in performance across various regularization-based continual learning methods.Translated by Google Translate.

Exploring Lip Segmentation Techniques in Computer Vision: A Comparative Analysis

  • paper_url: http://arxiv.org/abs/2311.11992
  • repo_url: None
  • paper_authors: Pietro B. S. Masur, Francisco Braulio Oliveira, Lucas Moreira Medino, Emanuel Huber, Milene Haraguchi Padilha, Cassio de Alcantara, Renata Sellaro
  • for: lip segmentation lip reading
  • methods: EHANet Mask2Former BiSeNet V2 PIDNet STDC1
  • results: Mask2Former EHANet 具有最好的性能, BiSeNet V2 表现竞争力强, PIDNet 具有最高的报告率,但精度较低。
    Abstract Lip segmentation is crucial in computer vision, especially for lip reading. Despite extensive face segmentation research, lip segmentation has received limited attention. The aim of this study is to compare state-of-the-art lip segmentation models using a standardized setting and a publicly available dataset. Five techniques, namely EHANet, Mask2Former, BiSeNet V2, PIDNet, and STDC1, are qualitatively selected based on their reported performance, inference time, code availability, recency, and popularity. The CelebAMask-HQ dataset, comprising manually annotated face images, is used to fairly assess the lip segmentation performance of the selected models. Inference experiments are conducted on a Raspberry Pi4 to emulate limited computational resources. The results show that Mask2Former and EHANet have the best performances in terms of mIoU score. BiSeNet V2 demonstrate competitive performance, while PIDNet excels in recall but has lower precision. Most models present inference time ranging from 1000 to around 3000 milliseconds on a Raspberry Pi4, with PIDNet having the lowest mean inference time. This study provides a comprehensive evaluation of lip segmentation models, highlighting their performance and inference times. The findings contribute to the development of lightweight techniques and establish benchmarks for future advances in lip segmentation, especially in IoT and edge computing scenarios.
    摘要 <>Translate the given text into Simplified Chinese.<> lip 分 segmentation 是计算机视觉中非常重要的一环,特别是 lip 读。despite 广泛的 face 分 segmentation 研究,lip 分 segmentation 却收到了有限的关注。本研究的目标是 Comparing state-of-the-art lip 分 segmentation 模型,使用标准化的设定和公共可用的数据集进行评估。 Five techniques, namely EHANet, Mask2Former, BiSeNet V2, PIDNet, and STDC1, are qualitatively selected based on their reported performance, inference time, code availability, recency, and popularity. The CelebAMask-HQ dataset, comprising manually annotated face images, is used to fairly assess the lip 分 segmentation performance of the selected models. Inference experiments are conducted on a Raspberry Pi4 to emulate limited computational resources. The results show that Mask2Former and EHANet have the best performances in terms of mIoU score. BiSeNet V2 demonstrates competitive performance, while PIDNet excels in recall but has lower precision. Most models present inference time ranging from 1000 to around 3000 milliseconds on a Raspberry Pi4, with PIDNet having the lowest mean inference time. This study provides a comprehensive evaluation of lip 分 segmentation models, highlighting their performance and inference times. The findings contribute to the development of lightweight techniques and establish benchmarks for future advances in lip 分 segmentation, especially in IoT and edge computing scenarios.

Categorizing the Visual Environment and Analyzing the Visual Attention of Dogs

  • paper_url: http://arxiv.org/abs/2311.11988
  • repo_url: None
  • paper_authors: Shreyas Sundara Raman, Madeline H. Pelgrim, Daphna Buchsbaum, Thomas Serre
  • for: 这个论文旨在研究狗的视觉行为和与physical world的互动。
  • methods: 研究者使用了头戴式眼动追踪设备收集了11只狗在日常户外环境中的视觉注意力数据,并使用了MaskRCNN来自动分类狗的视觉固定。
  • results: 研究发现,狗具有对汽车、植物、路面和建筑设备的更多视觉注意力,而它们之间没有很大差异。这些结果为了理解狗的视觉行为和与physical world的互动提供了重要的信息。
    Abstract Dogs have a unique evolutionary relationship with humans and serve many important roles e.g. search and rescue, blind assistance, emotional support. However, few datasets exist to categorize visual features and objects available to dogs, as well as how dogs direct their visual attention within their environment. We collect and study a dataset with over 11,698 gazes to categorize the objects available to be gazed at by 11 dogs in everyday outdoor environments i.e. a walk around a college campus and urban area. We explore the availability of these object categories and the visual attention of dogs over these categories using a head mounted eye tracking apparatus. A small portion (approx. 600 images or < 20% of total dataset) of the collected data is used to fine tune a MaskRCNN for the novel image domain to segment objects present in the scene, enabling further statistical analysis on the visual gaze tendencies of dogs. The MaskRCNN, with eye tracking apparatus, serves as an end to end model for automatically classifying the visual fixations of dogs. The fine tuned MaskRCNN performs far better than chance. There are few individual differences between the 11 dogs and we observe greater visual fixations on buses, plants, pavement, and construction equipment. This work takes a step towards understanding visual behavior of dogs and their interaction with the physical world.
    摘要 狗有独特的进化关系与人类,扮演许多重要的角色,如搜索救援、残疾人帮助、情感支持等。然而,有很少的数据集可以分类狗视觉中的视觉特征和对象,以及狗在环境中如何 dirige 其视觉注意力。我们收集和研究了一个数据集,包含11个狗在日常户外环境中 gaze 的11,698个视觉 fixation,包括大学校园和城市区域的行走。我们研究了这些对象类别的可用性,以及狗在这些类别中的视觉注意力使用头戴式眼动追踪设备。一小部分(约600张图像,即 <20% 的总数据集)的数据用于练化一个MaskRCNN模型,以便在新的图像领域中进行自动物体分类,从而进一步统计狗视觉倾向的统计分析。MaskRCNN模型,具有眼动追踪设备,serve为狗视觉自动分类的终端模型。练化后的MaskRCNN表现远胜准Random。我们发现11只狗之间存在很少的个体差异,而狗更多地围注视汽车、植物、路面和建筑设备。这项工作为了理解狗视觉行为和它们与物理世界的互动而做出了一步。

Leveraging Previous Facial Action Units Knowledge for Emotion Recognition on Faces

  • paper_url: http://arxiv.org/abs/2311.11980
  • repo_url: None
  • paper_authors: Pietro B. S. Masur, Willams Costa, Lucas S. Figueredo, Veronica Teichrieb
  • for: 本研究旨在提高人机交互的效果,使机器能够理解人们的情感。
  • methods: 本研究使用了 convolutional neural networks (CNNs) 和 Facial Action Units (AUs) 技术来识别情感。
  • results: 研究提出了一种基于 Facial Action Coding System (FACS) 的机器学习方法,以提高多个cue情感识别的精度。
    Abstract People naturally understand emotions, thus permitting a machine to do the same could open new paths for human-computer interaction. Facial expressions can be very useful for emotion recognition techniques, as these are the biggest transmitters of non-verbal cues capable of being correlated with emotions. Several techniques are based on Convolutional Neural Networks (CNNs) to extract information in a machine learning process. However, simple CNNs are not always sufficient to locate points of interest on the face that can be correlated with emotions. In this work, we intend to expand the capacity of emotion recognition techniques by proposing the usage of Facial Action Units (AUs) recognition techniques to recognize emotions. This recognition will be based on the Facial Action Coding System (FACS) and computed by a machine learning system. In particular, our method expands over EmotiRAM, an approach for multi-cue emotion recognition, in which we improve over their facial encoding module.
    摘要 人们自然地理解情感,因此让机器也能够同样理解可以开启新的人机交互方式。人脸表情是情感识别技术中最大的非语言表达,可以与情感相关。一些技术基于卷积神经网络(CNN)来提取信息,但简单的CNN不一定能够找到与情感相关的面部特征。在这种工作中,我们计划通过使用表情动作单元(AU)识别技术来识别情感。这种识别基于人脸动作编码系统(FACS),由机器学习系统进行计算。具体来说,我们的方法在EmotiRAM方法中的面部编码模块上进行了改进。

Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting

  • paper_url: http://arxiv.org/abs/2311.11974
  • repo_url: None
  • paper_authors: David Latortue, Moetez Kdayem, Fidel A Guerrero Peña, Eric Granger, Marco Pedersoli
  • for: 人数计算(人 counting)
  • methods: 使用深度人脸检测器(deep person counting architectures)和Convolutional Neural Networks(CNN)
  • results: 使用CNN Image-Level模型可以达到与YOLO探测器和点级模型相当的人数计算结果,同时提供更高的帧率和相似的模型参数量。
    Abstract Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly bounding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision can affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a CNN Image-Level model achieves competitive results with YOLO detectors and point-level models, yet provides a higher frame rate and a similar amount of model parameters.
    摘要 人数检测模型广泛用于人数计算和本地化在多种应用中,但是需要贵重的 bounding box 标注数据来训练。由于隐私问题的重要性,这些模型越来越依赖于 Infrared 图像,使任务变得更加困难。在这篇论文中,我们研究如何弱一级超级视觉模型对深度人数计算架构的影响。我们的实验表明,使用 CNN 图像级别模型来计算人数可以达到与 YOLO 检测器和点级模型相同的性能水平,同时提供更高的帧率和相似的模型参数。

NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation

  • paper_url: http://arxiv.org/abs/2311.11961
  • repo_url: https://github.com/donghao51/nng-mix
  • paper_authors: Hao Dong, Gaëtan Frusque, Yue Zhao, Eleni Chatzi, Olga Fink
  • for: 这篇论文主要应用在预测罕见事件和异常数据中,包括网络入侵检测、金融诈欺检测和基础设施和工业系统中的异常检测。
  • methods: 这篇论文提出了一个新的扩展方法,名为 Nearest Neighbor Gaussian Mixup (NNG-Mix),可以将有限量的标签异常数据与大量的无标签数据混合,以生成更多的伪异常数据。
  • results: 经过广泛的实验,该扩展方法在57个 benchmark 数据集上表现出色,与其他数据扩展方法相比,它可以提高预测异常数据的性能。
    Abstract Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised anomaly detection. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this paper, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named Nearest Neighbor Gaussian Mixup (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised anomaly detection algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code will be available at https://github.com/donghao51/NNG-Mix.
    摘要 anomaly detection (AD) 是必备的在复杂系统中发现罕见和重要事件的工具,应用于网络入侵检测、财务诈骗检测和基础设施和工业系统的故障检测等领域。由于 AD 通常被视为无监督学习任务,因此在实际应用中很难获得大量标注数据。但是,在半监督和监督学习方面,可以利用这些标注数据,从而提高性能。在这篇文章中,我们不是提出一种新的半监督或监督学习方法,而是提出一种新的算法,可以生成更多的 Pseudo-anomaly,以便检测新的异常。我们称之为 Nearest Neighbor Gaussian Mixup (NNG-Mix)。NNG-Mix 算法可以有效地利用标注数据和无标注数据之间的信息,生成 Pseudo-anomaly。我们与常见的混合和剪辑技术进行比较,并通过在不同的数据类型上进行广泛的实验,证明 NNG-Mix 算法在 ADBench 上的57 个标准数据集上具有显著的性能优势。相比基eline 在原始训练数据上进行训练时,NNG-Mix 可以提供 Up to 16.4%、8.8% 和 8.0% 的性能提升。它的源代码将在 GitHub 上公开。

Correlated Attention in Transformers for Multivariate Time Series

  • paper_url: http://arxiv.org/abs/2311.11959
  • repo_url: None
  • paper_authors: Quang Minh Nguyen, Lam M. Nguyen, Subhro Das
  • for: 这个论文是为了解决多变量时间序列(MTS)分析中的问题,如金融、气候科学和医疗等领域的实际应用。
  • methods: 论文提出了一种新的相关注意机制,可以快速发现时间序列中的特征wise关系,同时可以融合在现有的Transformer模型中使用,以提高效率。
  • results: 对于多种任务,如填充、异常检测和分类,与基本Transformer模型相比,相关注意机制可以提供更好的性能,并且在各种实验中 consistently 表现出优于基本Transformer模型。
    Abstract Multivariate time series (MTS) analysis prevails in real-world applications such as finance, climate science and healthcare. The various self-attention mechanisms, the backbone of the state-of-the-art Transformer-based models, efficiently discover the temporal dependencies, yet cannot well capture the intricate cross-correlation between different features of MTS data, which inherently stems from complex dynamical systems in practice. To this end, we propose a novel correlated attention mechanism, which not only efficiently captures feature-wise dependencies, but can also be seamlessly integrated within the encoder blocks of existing well-known Transformers to gain efficiency improvement. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation. When combined with prevalent Transformer baselines, correlated attention mechanism constitutes a better alternative for encoder-only architectures, which are suitable for a wide range of tasks including imputation, anomaly detection and classification. Extensive experiments on the aforementioned tasks consistently underscore the advantages of correlated attention mechanism in enhancing base Transformer models, and demonstrate our state-of-the-art results in imputation, anomaly detection and classification.
    摘要 多变量时间序列(MTS)分析在现实应用中广泛存在,如金融、气候科学和医疗等领域。各种自注意机制,是现代Transformer模型的核心,高效地发现时间相关性,但无法好地捕捉MTS数据中不同特征之间的复杂相关性,这些相关性在实践中来自复杂的动态系统。为此,我们提出了一种新的相关注意机制,不仅高效地捕捉特征间的相关性,还可以轻松地与现有的Transformer模型集成,以提高效率。具体来说,相关注意机制在特征通道之间计算特征查询和特征键之间的差值 cov Matrix,并选择ively Representations at the sub-series level。这种架构使得自动发现和表征学习不同特征之间的不仅当前响应,还包括延迟响应,而且自然地捕捉时间序列自相关。当与普遍的Transformer基线模型结合使用时,相关注意机制组成了更好的encoder-only架构,适用于广泛的任务,如填充、异常检测和分类。extensive experiment表明,相关注意机制在改进基Transformer模型的情况下,在填充、异常检测和分类任务中具有明显的优势,并实现了我们的国际前进result。

FinanceBench: A New Benchmark for Financial Question Answering

  • paper_url: http://arxiv.org/abs/2311.11944
  • repo_url: None
  • paper_authors: Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen
  • for: The paper is written to evaluate the performance of large language models (LLMs) on open book financial question answering (QA).
  • methods: The paper uses a test suite called FinanceBench, which consists of 10,231 questions about publicly traded companies, to evaluate the performance of 16 state-of-the-art LLM configurations. The authors manually reviewed the answers to the questions (n=2,400) and found that existing LLMs have clear limitations for financial QA.
  • results: The authors found that GPT-4-Turbo, a popular LLM, incorrectly answered or refused to answer 81% of questions when used with a retrieval system. While augmentation techniques such as using longer context windows can improve performance, they are unrealistic for enterprise settings due to increased latency and cannot support larger financial documents. All models examined exhibited weaknesses such as hallucinations, which limit their suitability for use by enterprises.
    Abstract FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are intended to be clear-cut and straightforward to answer to serve as a minimum performance standard. We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2,400). The cases are available open-source. We show that existing LLMs have clear limitations for financial QA. Notably, GPT-4-Turbo used with a retrieval system incorrectly answered or refused to answer 81% of questions. While augmentation techniques such as using longer context window to feed in relevant evidence improve performance, they are unrealistic for enterprise settings due to increased latency and cannot support larger financial documents. We find that all models examined exhibit weaknesses, such as hallucinations, that limit their suitability for use by enterprises.
    摘要 金融桌面(FinanceBench)是一个首创性的测试集,用于评估大数据语言模型(LLM)在公开的财务问答(QA)领域的表现。其包含10,231个公开交易公司的问题,以及相应的答案和证据串。 FinanceBench 中的问题是生物地 Valid,覆盖了多样化的场景。它们是为了提供最低性能标准,易于答题。我们在 FinanceBench 中采样 150 个案例,测试了 16 个现状之最模型配置(包括 GPT-4-Turbo、Llama2 和 Claude2,以及 vector store 和长 context prompts),并 manually 复查其答案(n = 2,400)。案例可以在开源的形式下获得。我们发现现有的 LLM 在财务 QA 领域存在明显的限制。例如,GPT-4-Turbo 在使用检索系统时 incorrectly 答题或拒绝答题 81% 的问题。而使用 longer context window 来接受相关证据的增强技术可以提高表现,但是这些技术在企业环境中不可能实现因为增加的延迟,无法支持更大的财务文档。我们发现所有模型受测都存在一些弱点,如幻化,这限制了它们在企业环境中的适用性。

Ovarian Cancer Data Analysis using Deep Learning: A Systematic Review from the Perspectives of Key Features of Data Analysis and AI Assurance

  • paper_url: http://arxiv.org/abs/2311.11932
  • repo_url: None
  • paper_authors: Muta Tah Hira, Mohammad A. Razzaque, Mosharraf Sarker
    for:* 这些研究主要是为了检测和诊断卵巢癌。methods:* 大多数研究使用了基于样本的 Deep Learning 技术,并且只有一些研究使用了混合数据(临床或ómics数据)进行集成分析。results:* 只有一小部分的研究(仅8.3%) validate their models 使用外部和多样化的数据集,表明需要进一步 validate 模型。* 卵巢癌数据分析中的人工智能确保(AIAs)处于非常早期的阶段,只有2.1% 的研究直接考虑 AIA 通过解释性。
    Abstract Background and objectives: By extracting this information, Machine or Deep Learning (ML/DL)-based autonomous data analysis tools can assist clinicians and cancer researchers in discovering patterns and relationships from complex data sets. Many DL-based analyses on ovarian cancer (OC) data have recently been published. These analyses are highly diverse in various aspects of cancer (e.g., subdomain(s) and cancer type they address) and data analysis features. However, a comprehensive understanding of these analyses in terms of these features and AI assurance (AIA) is currently lacking. This systematic review aims to fill this gap by examining the existing literature and identifying important aspects of OC data analysis using DL, explicitly focusing on the key features and AI assurance perspectives. Methods: The PRISMA framework was used to conduct comprehensive searches in three journal databases. Only studies published between 2015 and 2023 in peer-reviewed journals were included in the analysis. Results: In the review, a total of 96 DL-driven analyses were examined. The findings reveal several important insights regarding DL-driven ovarian cancer data analysis: - Most studies 71% (68 out of 96) focused on detection and diagnosis, while no study addressed the prediction and prevention of OC. - The analyses were predominantly based on samples from a non-diverse population (75% (72/96 studies)), limited to a geographic location or country. - Only a small proportion of studies (only 33% (32/96)) performed integrated analyses, most of which used homogeneous data (clinical or omics). - Notably, a mere 8.3% (8/96) of the studies validated their models using external and diverse data sets, highlighting the need for enhanced model validation, and - The inclusion of AIA in cancer data analysis is in a very early stage; only 2.1% (2/96) explicitly addressed AIA through explainability.
    摘要 背景和目标:通过提取这些信息,机器学习或深度学习(ML/DL)基于自主数据分析工具可以帮助临床医生和癌症研究人员发现复杂数据集中的模式和关系。在最近几年内,有许多基于深度学习的OC数据分析研究被发表。这些研究在各种方面(如亚Domain和癌种)和数据分析特点上很多样。然而,对这些研究的全面理解,特别是在关键特征和人工智能保障(AI保障)方面,目前缺乏了一个全面的认知。本系统性文献综述旨在填补这一空白,通过检查现有文献,了解OC数据分析中使用DL的重要方面和AI保障视角。方法:使用PRISMA框架进行全面搜索,仅包括2015年至2023年在同行评审 журна尔上发表的研究。结果:在这次综述中,总共检查了96项DL驱动的分析。结果显示了一些关于OC数据分析的重要发现: - 大多数研究(71%,68项中)是关于检测和诊断,而没有一项关于预测和预防OC的研究。 - 这些分析主要基于来自非多样化人口(75%,72项中)的样本,限制在某个地理位置或国家。 - 只有一小部分研究(仅33%,32项中)执行了集成分析,大多数使用同类数据(临床或生物 markers)。 - 需要注意的是,只有8.3%(8项中)的研究 Validated其模型使用外部和多样化数据集,表明需要进一步提高模型验证。 - 在癌症数据分析中,AI保障处于非常早期阶段,只有2.1%(2项中)显式地考虑了AI保障,通过解释性来实现。

Generalization of Fitness Exercise Recognition from Doppler Measurements by Domain-adaption and Few-Shot Learning

  • paper_url: http://arxiv.org/abs/2311.11910
  • repo_url: None
  • paper_authors: Biying Fu, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
  • for: 本研究旨在提高智能手机应用程序的全身运动识别精度,并研究如何在实际应用中适应不同用户、环境和设备变化。
  • methods: 本研究使用了商业OFF-THE-SHELF智能手机和ultrasound Doppler探测技术,并提出了两种小数据适应技术来提高模型泛化性。
  • results: 对比基eline,使用小数据适应技术可以提高全身运动识别精度,并且在不同用户、环境和设备下的 recognize accuracy 提高了2-6倍。
    Abstract In previous works, a mobile application was developed using an unmodified commercial off-the-shelf smartphone to recognize whole-body exercises. The working principle was based on the ultrasound Doppler sensing with the device built-in hardware. Applying such a lab-environment trained model on realistic application variations causes a significant drop in performance, and thus decimate its applicability. The reason of the reduced performance can be manifold. It could be induced by the user, environment, and device variations in realistic scenarios. Such scenarios are often more complex and diverse, which can be challenging to anticipate in the initial training data. To study and overcome this issue, this paper presents a database with controlled and uncontrolled subsets of fitness exercises. We propose two concepts to utilize small adaption data to successfully improve model generalization in an uncontrolled environment, increasing the recognition accuracy by two to six folds compared to the baseline for different users.
    摘要 在前一些研究中,我们已经开发了一款基于商业化手机的移动应用程序,用于识别全身运动。工作原理基于商业手机内置的射频雷达感测。但在实际应用中,使用未修改的商业手机进行识别,因为环境、用户和设备的变化,会导致性能下降,从而减少其实用性。这种性能下降的原因可能是多方面的,包括用户、环境和设备变化。这些变化在实际场景中经常更加复杂和多样化,可能难以预测在初始训练数据中。为了研究和解决这个问题,本文提出了一个包含控制和无控制subset的健身运动数据库。我们还提出了两种概念,用于利用小量适应数据来提高模型在无控制环境中的泛化性,提高识别精度两到六倍 compared to基eline для不同的用户。

Continual Learning: Applications and the Road Forward

  • paper_url: http://arxiv.org/abs/2311.11908
  • repo_url: None
  • paper_authors: Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, Gido M. van de Ven
  • for: 这个论文的目的是解释为何应该关注连续学习,并展示了现代机器学习模型在新数据上积累知识的重要性。
  • methods: 这篇论文使用了最新的连续学习论文,以及现代机器学习模型的研究来支持自己的 Argument。
  • results: 这篇论文的结论是,连续学习将成为未来机器学习领域中不可或缺的一部分,并且需要进一步的研究以满足未来的需求。
    Abstract Continual learning is a sub-field of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by surveying recent continual learning papers published at three major machine learning conferences, and show that memory-constrained settings dominate the field. Then, we discuss five open problems in machine learning, and even though they seem unrelated to continual learning at first sight, we show that continual learning will inevitably be part of their solution. These problems are model-editing, personalization, on-device learning, faster (re-)training and reinforcement learning. Finally, by comparing the desiderata from these unsolved problems and the current assumptions in continual learning, we highlight and discuss four future directions for continual learning research. We hope that this work offers an interesting perspective on the future of continual learning, while displaying its potential value and the paths we have to pursue in order to make it successful. This work is the result of the many discussions the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.
    摘要

  • paper_url: http://arxiv.org/abs/2311.12089
  • repo_url: https://github.com/xzheng93/explainable_dl
  • paper_authors: Xiaoping Zheng, Bert Otten, Michiel F Reneman, Claudine JC Lamoth
    for:This study aimed to enhance transparency in deep learning-based gait classification for aged-related gait patterns using Explainable Artificial Intelligence.methods:The study used a dataset of 244 subjects, including 129 adults and 115 older adults (age>65), who performed a 3-minute walking task while wearing accelerometers at the lumbar segment L3. The study used deep learning models, including convolutional neural networks (CNN) and gated recurrent units (GRU), to classify adult and older adult groups. SHAP was employed to explain the models’ predictions.results:The study found that both CNN and GRU assigned higher SHAP values to the data from vertical and walking directions, particularly emphasizing data around heel contact, spanning from the terminal swing to loading response phases. GRU did not treat every stride equally, and CNN accurately distinguished between adults and older adults based on the characteristics of a single stride’s data. The study found that data around heel contact emerged as most critical, suggesting differences in acceleration and deceleration patterns during walking between different age groups.Here is the format you requested:for: 这些paper是为了做什么?methods: 这个paper使用了哪些方法?results: 这个paper得到了什么结果?I hope this helps! Let me know if you have any other questions.
    Abstract Gait analysis holds significant importance in monitoring daily health, particularly among older adults. Advancements in sensor technology enable the capture of movement in real-life environments and generate big data. Machine learning, notably deep learning (DL), shows promise to use these big data in gait analysis. However, the inherent black-box nature of these models poses challenges for their clinical application. This study aims to enhance transparency in DL-based gait classification for aged-related gait patterns using Explainable Artificial Intelligence, such as SHAP. A total of 244 subjects, comprising 129 adults and 115 older adults (age>65), were included. They performed a 3-minute walking task while accelerometers were affixed to the lumbar segment L3. DL models, convolutional neural network (CNN) and gated recurrent unit (GRU), were trained using 1-stride and 8-stride accelerations, respectively, to classify adult and older adult groups. SHAP was employed to explain the models' predictions. CNN achieved a satisfactory performance with an accuracy of 81.4% and an AUC of 0.89, and GRU demonstrated promising results with an accuracy of 84.5% and an AUC of 0.94. SHAP analysis revealed that both CNN and GRU assigned higher SHAP values to the data from vertical and walking directions, particularly emphasizing data around heel contact, spanning from the terminal swing to loading response phases. Furthermore, SHAP values indicated that GRU did not treat every stride equally. CNN accurately distinguished between adults and older adults based on the characteristics of a single stride's data. GRU achieved accurate classification by considering the relationships and subtle differences between strides. In both models, data around heel contact emerged as most critical, suggesting differences in acceleration and deceleration patterns during walking between different age groups.
    摘要 跑步分析对日常健康监测具有重要意义,特别是对老年人而言。随着感测技术的发展,我们可以在真实环境中记录和生成大量数据。深度学习(DL)显示出在这些大量数据中使用的潜在优势。然而,深度学习模型的黑盒特性带来了临床应用中的挑战。本研究旨在通过可解释人工智能(AI)技术,如SHAP,提高DL基于走姿分类的透明度。本研究共包括244名参与者,其中129名成年人和115名老年人(年龄超过65岁)。他们在L3脊梁上附加加速计,并完成3分钟步行任务。使用1步和8步加速度训练深度神经网络(CNN)和闭合循环单元(GRU)模型,分别进行成年人和老年人组的分类。SHAP技术用于解释模型预测结果。CNN实现了满意的表现,准确率为81.4%,AUC为0.89,GRU则表现出了有前途的结果,准确率为84.5%,AUC为0.94。SHAP分析表明,CNN和GRU都将 vertical和步行方向的数据作为优先级,特别是在脚部接触阶段(从终端摆动阶段到加速响应阶段)。此外,SHAP值表明GRU不会对每步数据进行平等对待。CNN可以通过单步数据的特征来正确地分类成年人和老年人。GRU则通过考虑步行过程中的关系和细微差异来准确地分类。在两个模型中,脚部接触阶段的数据 emerge as most critical,表明在不同年龄组之间走姿的加速和减速差异存在。

Towards Exploratory Reformulation of Constraint Models

  • paper_url: http://arxiv.org/abs/2311.11868
  • repo_url: None
  • paper_authors: Ian Miguel, András Z. Salamon, Christopher Stone
  • for: 本研究旨在提出一种exploratory reformulation系统,用于寻找最佳的约束模型,以提高问题的解决效率。
  • methods: 本研究使用了一种基于改进的方法,通过从初始模型开始,逐步地重新表述模型,以优化模型的性能。
  • results: 研究已经做出了一些进展,包括开发出一种基于refinement的方法,以及实现了一种可以自动生成模型的系统。
    Abstract It is well established that formulating an effective constraint model of a problem of interest is crucial to the efficiency with which it can subsequently be solved. Following from the observation that it is difficult, if not impossible, to know a priori which of a set of candidate models will perform best in practice, we envisage a system that explores the space of models through a process of reformulation from an initial model, guided by performance on a set of training instances from the problem class under consideration. We plan to situate this system in a refinement-based approach, where a user writes a constraint specification describing a problem above the level of abstraction at which many modelling decisions are made. In this position paper we set out our plan for an exploratory reformulation system, and discuss progress made so far.
    摘要 已经确立了,制定一个有效的约束模型是解决问题的关键,以便更高效地解决问题。从观察到,不可能在先知道哪一个候选模型会在实践中表现最好,我们想要一个系统可以在一个初始模型的基础上进行重新表述,以便根据训练实例集的性能进行指导。我们计划将这个系统放在一种改进型方法中,其中用户可以在许多模型决策之上写一个约束规范,描述问题。在这篇观点文章中,我们阐述了我们的探索重新表述系统计划,以及已经完成的进度。

Analyzing Emissions and Energy Efficiency in Mixed Traffic Control at Unsignalized Intersections

  • paper_url: http://arxiv.org/abs/2311.11866
  • repo_url: None
  • paper_authors: Michael Villarreal, Dawei Wang, Jia Pan, Weizi Li
  • For: This paper aims to reduce transportation-related emissions, specifically at signalized intersections, by employing mixed traffic control eco-driving strategies using robot vehicles (RVs).* Methods: The paper uses emissions analysis on unsignalized intersections with complex, real-world topologies and traffic demands, where RVs are used to reduce waiting times and congestion.* Results: The paper finds that with at least 10% RV penetration rate, RVs can reduce fuel consumption and NOx emissions by up to 27% and 28%, respectively, and with at least 30% RVs, CO and HC emissions can be reduced by up to 42% and 43%, respectively. Additionally, RVs can reduce emissions across the whole network.
    Abstract Greenhouse gas emissions have dramatically risen since the early 1900s with U.S. transportation generating 28% of the U.S' emissions. As such, there is interest in reducing transportation-related emissions. Specifically, sustainability research has sprouted around signalized intersections as intersections allow different streams of traffic to cross and change directions. Recent research has developed mixed traffic control eco-driving strategies at signalized intersections to decrease emissions. However, the inherent structure of a signalized intersection generates increased emissions by creating frequent acceleration/deceleration events, excessive idling from traffic congestion, and stop-and-go waves. Thus, we believe unsignalized intersections hold potential for further sustainability improvements. In this work, we provide an emissions analysis on unsignalized intersections with complex, real-world topologies and traffic demands where mixed traffic control strategies are employed by robot vehicles (RVs) to reduce waiting times and congestion. We find with at least 10% RV penetration rate, RVs generate less fuel consumption and NOx emissions than signalized intersections by up to 27% and 28%, respectively. With at least 30% RVs, CO and HC emissions are reduced by up to 42% and 43%, respectively. Additionally, RVs can reduce emissions across the whole network despite only employing their strategies at the intersections.
    摘要 美国交通部门自20世纪初期以来,绿色气体排放已经有了很大增长。为了降低交通相关排放,可持续发展研究在信号灯交叉口上进行了大量的研究。特别是在信号灯交叉口上,杂交控制驾驶技术的研究已经得到了广泛的应用。然而,信号灯交叉口的内置结构会导致更多的加速/减速事件、交通堵塞导致的停靠时间过长、以及往返波。因此,我们认为不信号灯交叉口具有更多的可持续发展可能性。在这个工作中,我们对无信号灯交叉口进行了排放分析,并采用了机器人车(RV)实施杂交控制策略来减少等待时间和堵塞。我们发现,当RV占用率至少为10%时,RV比信号灯交叉口排放更少的柴油消耗和NOx排放,分别下降27%和28%。当RV占用率至少为30%时,CO和HC排放也下降了42%和43%。此外,RV可以在整个网络中减少排放,即使只在交叉口上采用杂交控制策略。

Establishing Central Sensitization Inventory Cut-off Values in patients with Chronic Low Back Pain by Unsupervised Machine Learning

  • paper_url: http://arxiv.org/abs/2311.11862
  • repo_url: https://github.com/xzheng93/csi_cutoff_establishment
  • paper_authors: Xiaoping Zheng, Claudine JC Lamoth, Hans Timmerman, Ebert Otten, Michiel F Reneman
    for: 这个研究旨在确定chronic low back pain (CLBP)的患者群体中,Central Sensitization Inventory (CSI)的优化阈值,考虑到性别因素和疼痛状况的影响。methods: 这个研究使用了四种不supervised clustering方法来确定CSI中的HACS相关模式,并通过内部和外部指标评估 clustering性能。然后,通过 Receiver Operating Characteristic (ROC)分析确定最佳阈值。results: 研究发现, Hierarchical Clustering 方法得到最佳结果,可以将患者分为三个群体:健康组、CLBP with low HACS level 组和 CLBP with high HACS level 组。对于全体群体,优化阈值为 35,对于女性,阈值为 34,对于男性,阈值为 35。这些结果表明,CLBP 患者的优化阈值为 35。
    Abstract Human Assumed Central Sensitization is involved in the development and maintenance of chronic low back pain (CLBP). The Central Sensitization Inventory (CSI) was developed to evaluate the presence of HACS, with a cut-off value of 40/100 based on patients with chronic pain. However, various factors including pain conditions (e.g., CLBP), and gender may influence this cut-off value. For chronic pain condition such as CLBP, unsupervised clustering approaches can take these factors into consideration and automatically learn the HACS-related patterns. Therefore, this study aimed to determine the cut-off values for a Dutch-speaking population with CLBP, considering the total group and stratified by gender based on unsupervised machine learning. In this study, questionnaire data covering pain, physical, and psychological aspects were collected from patients with CLBP and aged-matched pain-free adults (referred to as healthy controls, HC). Four clustering approaches were applied to identify HACS-related clusters based on the questionnaire data and gender. The clustering performance was assessed using internal and external indicators. Subsequently, receiver operating characteristic analysis was conducted on the best clustering results to determine the optimal cut-off values. The study included 151 subjects, consisting of 63 HCs and 88 patients with CLBP. Hierarchical clustering yielded the best results, identifying three clusters: healthy group, CLBP with low HACS level, and CLBP with high HACS level groups. Based on the low HACS levels group (including HC and CLBP with low HACS level) and high HACS level group, the cut-off value for the overall groups were 35, 34 for females, and 35 for. The findings suggest that the optimal cut-off values for CLBP is 35. The gender-related cut-off values should be interpreted with caution due to the unbalanced gender distribution in the sample.
    摘要 人类假设中央敏感性(HACS)参与了慢性低脊梁疼痛(CLBP)的发展和维持。中央敏感性评估器(CSI)是用来评估HACS存在的存在,其分别为40/100,基于患有慢性疼痛的患者。然而,不同的因素,包括疼痛状况(例如CLBP)和性别可能影响这个分别值。为了慢性疼痛状况如CLBP,无监督聚类方法可以考虑这些因素并自动发现HACS相关的模式。因此,本研究的目的是在荷兰语言社区中确定CLBP患者的分别值,并根据性别进行分类。在本研究中,收集了疼痛、物理和心理方面的问卷数据,从患有CLBP的患者和年龄匹配的疼痛自适应人(HC)中收集数据。四种聚类方法被应用于基于问卷数据和性别 Identify HACS相关的聚类。聚类性被评估使用内部和外部指标。后续,基于最佳聚类结果进行 receiver operating characteristic 分析,以确定优化的分别值。研究包括151名参与者,包括63名HC和88名CLBP患者。层次聚类得到最佳结果,并将患者分为三个群体:健康组、CLBP低HACS水平组和 CLBP高HACS水平组。根据低HACS水平组(包括HC和CLBP低HACS水平)和高HACS水平组,总体分别值为35,女性分别值为34,男性分别值为35。结论表明,CLBP的最佳分别值为35。性别相关的分别值应该进行注意,因为样本中性别分布不均衡。

Generating Valid and Natural Adversarial Examples with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.11861
  • repo_url: None
  • paper_authors: Zimu Wang, Wei Wang, Qi Chen, Qiufeng Wang, Anh Nguyen
  • for: 这篇论文的目的是提出一种基于深度学习的自然语言处理(NLP)模型,特别是预训练语言模型(PLM)的攻击方法。
  • methods: 该方法包括两个阶段:首先,通过对words进行重要性排名,找到最容易遭受攻击的words;然后,使用大语言模型(LLM)提取的同义词来替换这些words。
  • results: 对于Movie Review(MR)、IMDB和Yelp Review Polarity等 dataset,LLM-Attack模型在对比基eline adversarial attack模型时表现出了显著的优势,在人工和GPT-4评估中也表现出了显著的提升。该模型可以生成有效、自然的攻击示例,保持语义意义、 grammaticality和人类隐身性。
    Abstract Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.
    摘要 深度学习基于自然语言处理(NLP)模型,特别是预训练语言模型(PLM),已经被揭示为易受到敌意攻击的。然而,由多数主流单词水平攻击模型生成的攻击示例通常不是有效的,导致 semantic maintenance、grammaticality 和人类不可见性的失去。基于大语言模型(LLM)的异常 capacities of language understanding and generation,我们提出了 LLM-Attack,这是一种生成有效和自然的攻击示例的方法。该方法包括两个阶段:单词重要性排名(搜索最易受到攻击的单词)和单词同义补充(使用 LLM 获取的单词同义补充替换它们)。实验结果表明,对 MR、IMDB 和 Yelp Review Polarity 数据集进行比较,LLM-Attack 高效地超越了基eline adversarial attack模型,在人类和 GPT-4 评估中也具有显著的优势。LLM-Attack 可以生成有效、自然的攻击示例,保持 semantics 的意义、 grammaticality 和人类不可见性。

Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms

  • paper_url: http://arxiv.org/abs/2311.11837
  • repo_url: None
  • paper_authors: Joren Brunekreef, Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke, Jonas Teuwen
    for: 这个研究是用于提高像素分类器的准确性和可靠性。methods: 这个方法使用了内在结构调整(Inductive Conformal Prediction)来整理像素分类器的预测结果,并且通过调整像素分类器的对应预测结果,以提高预测的准确性和可靠性。results: 这个研究发现,使用内在结构调整法可以将像素分类器的准确性和可靠性提高,并且在具有限制的数据情况下(如医疗领域),可以更好地利用可用数据进行整理。
    Abstract Image segmentation algorithms can be understood as a collection of pixel classifiers, for which the outcomes of nearby pixels are correlated. Classifier models can be calibrated using Inductive Conformal Prediction, but this requires holding back a sufficiently large calibration dataset for computing the distribution of non-conformity scores of the model's predictions. If one only requires only marginal calibration on the image level, this calibration set consists of all individual pixels in the images available for calibration. However, if the goal is to attain proper calibration for each individual pixel classifier, the calibration set consists of individual images. In a scenario where data are scarce (such as the medical domain), it may not always be possible to set aside sufficiently many images for this pixel-level calibration. The method we propose, dubbed ``Kandinsky calibration'', makes use of the spatial structure present in the distribution of natural images to simultaneously calibrate the classifiers of ``similar'' pixels. This can be seen as an intermediate approach between marginal (imagewise) and conditional (pixelwise) calibration, where non-conformity scores are aggregated over similar image regions, thereby making more efficient use of the images available for calibration. We run experiments on segmentation algorithms trained and calibrated on subsets of the public MS-COCO and Medical Decathlon datasets, demonstrating that Kandinsky calibration method can significantly improve the coverage. When compared to both pixelwise and imagewise calibration on little data, the Kandinsky method achieves much lower coverage errors, indicating the data efficiency of the Kandinsky calibration.
    摘要 Image segmentation算法可以理解为一组像素分类器,其中邻近像素的结果相关。类ifier模型可以通过卷积学习进行准确化,但需要一个够大的准确化数据集来计算模型预测结果的分布。如果只需要图像水平的准确化,则准确化数据集包括所有可用于准确化的像素。但如果目标是为每个个像素分类器进行准确化,则准确化数据集包括个别图像。在数据稀缺的情况下(如医疗领域),可能无法将够多的图像用于这种像素级准确化。我们提议的“卡金斯基准确化”方法(Kandinsky calibration)利用自然图像分布中的空间结构,同时准确化“相似”像素的分类器。这可以看作是 между像素级和图像级准确化之间的中间方法,其中不符合分布的分数聚合在相似图像区域上,从而更有效地利用可用于准确化的图像。我们在使用MS-COCO和医疗十字数据集训练和准确化的segmentation算法上进行了实验,并证明了卡金斯基准确化方法可以显著提高覆盖率。与像素级和图像级准确化相比,卡金斯基准确化方法在有限的数据情况下表现出远远更低的覆盖错误,表明了卡金斯基准确化方法在数据效率方面的优势。

System 2 Attention (is something you might need too)

  • paper_url: http://arxiv.org/abs/2311.11829
  • repo_url: https://github.com/danderfer/Comp_Sci_Sem_2
  • paper_authors: Jason Weston, Sainbayar Sukhbaatar
  • for: 提高Transformer基于大语言模型(LLM)中的软注意力,以避免在 latent representation中包含不相关信息,从而提高下一个 Token 的生成质量。
  • methods: 我们提出了 System 2 Attention(S2A),它利用 LLM 的自然语言理解能力和指令执行能力,以决定需要注意的内容。 S2A 首先重新生成输入上下文,并将其限制为只包含相关信息,然后对重新生成的上下文进行注意。
  • results: 在三个任务中(包括问答、数学问题和长篇生成),S2A 比标准注意力基于 LLM 高效,提高了事实性和 объекivity,降低了奉承性。
    Abstract Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.
    摘要 transformer-based large language models (LLMs) 的软注意力容易受到上下文中不相关信息的影响,这会对下一个 Token 生成产生负面影响。为了解决这些问题,我们介绍 System 2 Attention (S2A),它利用 LLMs 的自然语言理解能力和遵循 instrucions 来决定需要注意的内容。S2A 将输入上下文重新生成为只包含相关部分,然后对重新生成的上下文进行注意,以获得最终响应。在实验中,S2A 比标准注意力基于 LLMs 在三个任务中表现出色,包括问答、数学问题和长文生成,S2A 可以提高事实性和公正性,而减少卖舌。

Graph Variational Embedding Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2311.11824
  • repo_url: None
  • paper_authors: Narges Sadat Fazeli Dehkordi, Hadi Zare, Parham Moradi, Mahdi Jalili
  • for: 提高用户体验,如在电商、音乐等应用中。
  • methods: 使用图变量嵌入方法,即图变量自适应编码器,将用户-项目交互 captured into更加训练可能的特征推广。
  • results: 对比基于Random embedding的方法,提出了图变量嵌入CF(GVECF)框架,实现了更好的特征传播和层次GCN协同推荐,最终在 recall 和 NDCG 指标上达到了13.78%的提升。
    Abstract The customization of recommended content to users holds significant importance in enhancing user experiences across a wide spectrum of applications such as e-commerce, music, and shopping. Graph-based methods have achieved considerable performance by capturing user-item interactions. However, these methods tend to utilize randomly constructed embeddings in the dataset used for training the recommender, which lacks any user preferences. Here, we propose the concept of variational embeddings as a means of pre-training the recommender system to improve the feature propagation through the layers of graph convolutional networks (GCNs). The graph variational embedding collaborative filtering (GVECF) is introduced as a novel framework to incorporate representations learned through a variational graph auto-encoder which are embedded into a GCN-based collaborative filtering. This approach effectively transforms latent high-order user-item interactions into more trainable vectors, ultimately resulting in better performance in terms of recall and normalized discounted cumulative gain(NDCG) metrics. The experiments conducted on benchmark datasets demonstrate that our proposed method achieves up to 13.78% improvement in the recall over the test data.
    摘要 Customization of recommended content to users is crucial in enhancing user experiences in various applications, such as e-commerce, music, and shopping. Graph-based methods have shown significant performance by capturing user-item interactions. However, these methods often rely on randomly constructed embeddings in the training dataset, which neglects user preferences. To address this issue, we propose the concept of variational embeddings as a means of pre-training the recommender system to improve feature propagation through graph convolutional networks (GCNs). We introduce the graph variational embedding collaborative filtering (GVECF) as a novel framework that incorporates representations learned through a variational graph auto-encoder into a GCN-based collaborative filtering. This approach effectively transforms latent high-order user-item interactions into more trainable vectors, resulting in better performance in terms of recall and normalized discounted cumulative gain (NDCG) metrics. Experimental results on benchmark datasets demonstrate that our proposed method achieves up to 13.78% improvement in recall over the test data.

Generalized super-resolution 4D Flow MRI – using ensemble learning to extend across the cardiovascular system

  • paper_url: http://arxiv.org/abs/2311.11819
  • repo_url: None
  • paper_authors: Leon Ericsson, Adam Hjalmarsson, Muhammad Usman Akbar, Edward Ferdian, Mia Bonini, Brandon Hardy, Jonas Schollenberger, Maria Aristova, Patrick Winter, Nicholas Burris, Alexander Fyrdahl, Andreas Sigfridsson, Susanne Schnell, C. Alberto Figueroa, David Nordsletten, Alistair A. Young, David Marlevi
  • for: 这项研究的目的是探讨4D流体磁共振成像(4D Flow MRI)的超解像能力是否可以在不同的cardiovascular领域中广泛应用。
  • methods: 该研究使用了不同的 convolutional base 和ensemble learning来评估SR 4D Flow MRI的一致性,并使用了synthetic数据生成在三个不同的领域(心血管、大动脉和脑血管)进行评估。
  • results: 研究结果表明, ensemble learning可以在不同的领域中提高SR性能,并可以准确地预测高分辨率的速度从低分辨率的输入数据中。同时,优化的网络也可以从下采样的实际数据中恢复原始分辨率的速度,以及生成严格的SR-图像。
    Abstract 4D Flow Magnetic Resonance Imaging (4D Flow MRI) is a non-invasive measurement technique capable of quantifying blood flow across the cardiovascular system. While practical use is limited by spatial resolution and image noise, incorporation of trained super-resolution (SR) networks has potential to enhance image quality post-scan. However, these efforts have predominantly been restricted to narrowly defined cardiovascular domains, with limited exploration of how SR performance extends across the cardiovascular system; a task aggravated by contrasting hemodynamic conditions apparent across the cardiovasculature. The aim of our study was to explore the generalizability of SR 4D Flow MRI using a combination of heterogeneous training sets and dedicated ensemble learning. With synthetic training data generated across three disparate domains (cardiac, aortic, cerebrovascular), varying convolutional base and ensemble learners were evaluated as a function of domain and architecture, quantifying performance on both in-silico and acquired in-vivo data from the same three domains. Results show that both bagging and stacking ensembling enhance SR performance across domains, accurately predicting high-resolution velocities from low-resolution input data in-silico. Likewise, optimized networks successfully recover native resolution velocities from downsampled in-vivo data, as well as show qualitative potential in generating denoised SR-images from clinical level input data. In conclusion, our work presents a viable approach for generalized SR 4D Flow MRI, with ensemble learning extending utility across various clinical areas of interest.
    摘要 四维流体磁共振成像(4D Flow MRI)是一种非侵入性测量技术,可量化心血管系统中血液流动的量。 although practical use is limited by spatial resolution and image noise, incorporating trained super-resolution (SR) networks has the potential to enhance image quality post-scan. However, these efforts have been mainly focused on narrowly defined cardiovascular domains, with limited exploration of how SR performance extends across the cardiovascular system; a task that is exacerbated by the contrasting hemodynamic conditions present across the cardiovasculature.我们的研究的目标是探索SR 4D Flow MRI的通用性,使用不同域的训练集和专门的集成学习来评估其性能。我们使用了三个不同的域(心脏、大动脉和脑血管)中的 sintetic 训练数据,以及不同的卷积基和集成学习算法,对域和结构进行评估,以确定它们在不同域中的性能。结果表明,bagging和stacking ensemble ensemble 能够在不同域上提高SR性能,从低分辨率输入数据中预测高分辨率速度,并且在实际水平上成功地恢复原始分辨率速度。此外,优化的网络还可以生成净化后的SR图像从临床水平的输入数据中。总之,我们的研究提出了一种可行的通用SR 4D Flow MRI方法,使用集成学习来扩展其应用范围。这种方法可以在不同的临床领域中实现高质量的SR成像,并且可以帮助解决血液流动的诊断和评估问题。

Improving Real Estate Appraisal with POI Integration and Areal Embedding

  • paper_url: http://arxiv.org/abs/2311.11812
  • repo_url: None
  • paper_authors: Sumin Han, Youngjun Park, Sonia Sabir, Jisun An, Dongman Lee
  • for: 本研究主要针对两个重要挑战,第一是探讨 Points of Interest (POI) 对房屋价值的影响,并提出了一个涵盖性强的数据驱动方法来选择特征。第二是将路网基于的 Areal Embedding 应用于房地产评估,以提高空间理解。
  • methods: 本研究提出了一个修改后的 POI 特征提取方法,并讨论了每个 POI 对房屋价值评估的影响。然后,我们提出了一个基于掩盖多头注意力的折衣多项式预测模型(AMMASI),它在扩展现有的 ASI 模型的基础上,运用掩盖多头注意力来捕捉地理邻居房屋和相似特征房屋的特征。
  • results: 我们的模型在现有基eline上出现了明显的提升,并且还提供了未来优化房地产评估方法的可能性。
    Abstract Despite advancements in real estate appraisal methods, this study primarily focuses on two pivotal challenges. Firstly, we explore the often-underestimated impact of Points of Interest (POI) on property values, emphasizing the necessity for a comprehensive, data-driven approach to feature selection. Secondly, we integrate road-network-based Areal Embedding to enhance spatial understanding for real estate appraisal. We first propose a revised method for POI feature extraction, and discuss the impact of each POI for house price appraisal. Then we present the Areal embedding-enabled Masked Multihead Attention-based Spatial Interpolation for House Price Prediction (AMMASI) model, an improvement upon the existing ASI model, which leverages masked multi-head attention on geographic neighbor houses and similar-featured houses. Our model outperforms current baselines and also offers promising avenues for future optimization in real estate appraisal methodologies.
    摘要 尽管现有的房地产评估方法有所进步,这种研究主要关注两个重要挑战。首先,我们研究点位 интерес(POI)对房产价值的影响,强调需要一种全面、数据驱动的特征选择方法。其次,我们将路网基于的区域嵌入技术与房产评估相结合,以提高地理理解。我们首先提出了一种POI特征提取方法,然后讨论每个POI对房价评估的影响。接着,我们介绍了使用做废字符串嵌入的掩码多头注意力加速器(AMMASI)模型,这是现有ASI模型的改进,可以更好地利用地理相似特征和邻居房屋的特征。我们的模型在现有基准点上表现出色,并且还提供了未来房地产评估方法的优秀可能性。

Large Language Models and Explainable Law: a Hybrid Methodology

  • paper_url: http://arxiv.org/abs/2311.11811
  • repo_url: None
  • paper_authors: Marco Billi, Alessandro Parenti, Giuseppe Pisano, Marco Sanchi
  • for: 提高法律系统的可用性、使用性和解释性,协助建构一种民主和利益相关的法律技术视角。
  • methods: 开发了一种方法,用于将高级编程语言中的解释转化为自然语言,以便各种用户快速、明了、访问法律技术。
  • results: 研究人员通过这些解释,赋予非专业人员执行复杂的法律任务的能力,通过自动化法律比较,对同一个事实进行多种规则基于的推理。
    Abstract The paper advocates for LLMs to enhance the accessibility, usage and explainability of rule-based legal systems, contributing to a democratic and stakeholder-oriented view of legal technology. A methodology is developed to explore the potential use of LLMs for translating the explanations produced by rule-based systems, from high-level programming languages to natural language, allowing all users a fast, clear, and accessible interaction with such technologies. The study continues by building upon these explanations to empower laypeople with the ability to execute complex juridical tasks on their own, using a Chain of Prompts for the autonomous legal comparison of different rule-based inferences, applied to the same factual case.
    摘要 文章强调LLM可以提高法律系统的可用性、使用性和解释性,从而推动法律技术具有民主和利益相关的视角。文章提出了一种方法来探讨LLM可以将高级编程语言生成的解释翻译成自然语言,以便所有用户快速、清晰地与这些技术进行交互。研究继续发展了这些解释,以使非专业人士通过自动化的法律比较,执行复杂的法律任务。Here's a word-for-word translation of the text in Traditional Chinese:文章强调LLM可以提高法律系统的可用性、使用性和解释性,从而推动法律技术具有民主和利益相关的视角。文章提出了一种方法来探讨LLM可以将高级编程语言生成的解释翻译成自然语言,以便所有用户快速、清晰地与这些技术进行交互。研究继续发展了这些解释,以使非专业人士通过自动化的法律比较,执行复杂的法律任务。

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

  • paper_url: http://arxiv.org/abs/2311.11810
  • repo_url: None
  • paper_authors: Hao Feng, Qi Liu, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang
  • for: DocPedia is a novel large multimodal model (LMM) for versatile OCR-free document understanding.
  • methods: DocPedia directly processes visual input in the frequency domain rather than the pixel space, using a limited number of visual tokens to capture a greater amount of visual and textual information.
  • results: Extensive experiments conducted on various benchmarks confirm the effectiveness and superior performance of DocPedia over other methods.Here’s the simplified Chinese text in the format you requested:
  • for: DocPedia 是一种 novel large multimodal model (LMM) для多样化 OCR-free 文档理解。
  • methods: DocPedia 直接在频谱空间处理视觉输入,使用有限数量的视觉 токен来捕捉更多的视觉和文本信息。
  • results: 广泛的实验表明 DocPedia 的效果和其他方法相比较高。
    Abstract This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2,560$\times$2,560 resolution. Unlike existing work either struggle with high-resolution documents or give up the large language model thus vision or language ability constrained, our DocPedia directly processes visual input in the frequency domain rather than the pixel space. The unique characteristic enables DocPedia to capture a greater amount of visual and textual information using a limited number of visual tokens. To consistently enhance both perception and comprehension abilities of our model, we develop a dual-stage training strategy and enrich instructions/annotations of all training tasks covering multiple document types. Extensive quantitative and qualitative experiments conducted on various publicly available benchmarks confirm the mutual benefits of jointly learning perception and comprehension tasks. The results provide further evidence of the effectiveness and superior performance of our DocPedia over other methods.
    摘要 这个工作介绍了 DocPedia,一种新型的大型多模态模型(LMM),能够无需OCR进行多种文档理解,并且可以处理高分辨率图像(最大2560×2560像素)。与现有的工作不同, DocPedia直接在频谱空间处理视觉输入,而不是像素空间,这使得它能够更好地捕捉更多的视觉和文本信息,只需使用有限的视觉符号。为了不断提高模型的感知和理解能力,我们提出了双Stage训练策略,并对所有训练任务进行了丰富的指令/注释增强。经验证明,这种方法可以同时提高模型的感知和理解能力。广泛的量化和质量测试表明,我们的 DocPedia在其他方法的基础上表现更出色。

Age-Friendly Route Planner: Calculating Comfortable Routes for Senior Citizens

  • paper_url: http://arxiv.org/abs/2311.11802
  • repo_url: None
  • paper_authors: Andoni Aranguren, Eneko Osaba, Silvia Urra-Uriarte, Patricia Molina-Costa
  • for: 本研究旨在提高老年人在城市中的体验,通过开发一款适应老年人的路径规划器。
  • methods: 该规划器使用了多个变量来评估路径的年龄友好程度,包括路径上的设施数量、舒适元素的存在和避免恶劣路段。
  • results: 本研究示出了适应老年人的路径规划器可以提供个性化的路径,并且可以帮助创建适应老年人的路径。
    Abstract The application of routing algorithms to real-world situations is a widely studied research topic. Despite this, routing algorithms and applications are usually developed for a general purpose, meaning that certain groups, such as ageing people, are often marginalized due to the broad approach of the designed algorithms. This situation may pose a problem in cities which are suffering a slow but progressive ageing of their populations. With this motivation in mind, this paper focuses on describing our implemented Age-Friendly Route Planner, whose goal is to improve the experience in the city for senior citizens. In order to measure the age-friendliness of a route, several variables have been deemed, such as the number of amenities along the route, the amount of comfortable elements found, or the avoidance of sloppy sections. In this paper, we describe one of the main features of the Age-Friendly Route Planner: the preference-based routes, and we also demonstrate how it can contribute to the creation of adapted friendly routes.
    摘要 Routing算法在实际应用中是广泛研究的研究主题。然而,routing算法和应用通常是为普遍目标设计的,导致certain groups,如年龄增长的人群,因为设计的算法过广而被排除在外。这种情况可能在年龄增长的城市中带来问题。基于这种动机,这篇论文主要关注描述我们实施的年龄友好路径规划器,以提高城市内的老年人体验。为了测量路径年龄友好程度,我们考虑了许多变量,如路径上的设施数量、舒适元素的存在量或恶势力的避免。在这篇论文中,我们介绍了 preference-based路径的一个主要特点,并示例如何该功能可以为创造适应友好路径做出贡献。

Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents

  • paper_url: http://arxiv.org/abs/2311.11797
  • repo_url: https://github.com/zoeyyao27/cot-igniting-agent
  • paper_authors: Zhuosheng Zhang, Yao Yao, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, Hai Zhao
  • For: The paper discusses the chain-of-thought (CoT) reasoning techniques used in large language models (LLMs) and their applications in developing autonomous language agents.* Methods: The paper explores the foundational mechanics of CoT techniques, including their efficacy and the paradigm shift in CoT reasoning.* Results: The paper discusses the merits of CoT reasoning, including its ability to enhance interpretability, controllability, and flexibility, and its potential for generalization, efficiency, customization, scaling, and safety.Here are the three key points in Simplified Chinese text:* For: 这篇论文讨论了大语言模型(LLM)中的链条思维(CoT)技术,以及它们在语言代理人的发展中的应用。* Methods: 论文探讨了CoT技术的基础机制,包括它的有效性和CoT思维的 парадиг shift。* Results: 论文讨论了CoT思维的优点,包括它的解释性、可控性、灵活性和普遍性,以及其在不同领域中的应用前景。
    Abstract Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demonstrably evidenced by their formidable empirical performance across a spectrum of complex reasoning tasks. Additionally, theoretical proofs have illuminated their emergent reasoning capabilities, providing a compelling showcase of their advanced cognitive abilities in linguistic contexts. Critical to their remarkable efficacy in handling complex reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. The CoT reasoning approach has not only exhibited proficiency in amplifying reasoning performance but also in enhancing interpretability, controllability, and flexibility. In light of these merits, recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents, which adeptly adhere to language instructions and execute actions within varied environments. This survey paper orchestrates a thorough discourse, penetrating vital research dimensions, encompassing: (i) the foundational mechanics of CoT techniques, with a focus on elucidating the circumstances and justification behind its efficacy; (ii) the paradigm shift in CoT; and (iii) the burgeoning of language agents fortified by CoT approaches. Prospective research avenues envelop explorations into generalization, efficiency, customization, scaling, and safety. This paper caters to a wide audience, including beginners seeking comprehensive knowledge of CoT reasoning and language agents, as well as experienced researchers interested in foundational mechanics and engaging in cutting-edge discussions on these topics. A repository for the related papers is available at https://github.com/Zoeyyao27/CoT-Igniting-Agent.
    摘要 大型语言模型(LLM)已经对语言智能领域产生了巨大的影响,这可以通过它们在复杂的推理任务上表现出色来证明。此外,理论证明也揭示了它们在语言上的高级认知能力。LLMs 利用了Chain of Thought(CoT)推理技术,这使得它们在得出答案之前需要形ulate 中间步骤。CoT 推理方法不仅提高了推理性能,还提高了可读性、可控性和灵活性。由于这些优点, latest research efforts 将 CoT 推理方法应用于语言代理人的开发,以便这些代理人能够遵循语言指令并在不同环境中执行操作。本文协调了一个全面的讨论,涵盖:(i)CoT 技术的基础机理,强调解释 CoT 的效果的原因和情况;(ii)CoT 推理方法的变革;以及(iii)通过 CoT 方法强化语言代理人的发展。未来研究的可能性包括探索 generalization、效率、自定义、扩展和安全。这篇文章适合各种读者,包括想要了解 CoT 推理和语言代理人的新手和经验研究者。相关论文库可以在 https://github.com/Zoeyyao27/CoT-Igniting-Agent 找到。

Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems

  • paper_url: http://arxiv.org/abs/2311.11796
  • repo_url: None
  • paper_authors: Guangjing Wang, Ce Zhou, Yuanda Wang, Bocheng Chen, Hanqing Guo, Qiben Yan
  • for: 本研究旨在探讨智能系统中的传染攻击,尤其是在Cyber-Physical Security领域。
  • methods: 本文综述了不同领域中的传染攻击,包括图像、文本、图гра、音频和视频等领域。同时,本文还分析了不同的攻击方法,包括数据、过程、模型和系统等方面。
  • results: 本文发现了许多传染攻击可以在不同的领域中进行应用,并且这些攻击可以对智能系统的安全性产生很大的影响。此外,本文还提出了一些可能的研究方向,以便更好地探讨传染攻击的领域。
    Abstract Artificial Intelligence (AI) systems such as autonomous vehicles, facial recognition, and speech recognition systems are increasingly integrated into our daily lives. However, despite their utility, these AI systems are vulnerable to a wide range of attacks such as adversarial, backdoor, data poisoning, membership inference, model inversion, and model stealing attacks. In particular, numerous attacks are designed to target a particular model or system, yet their effects can spread to additional targets, referred to as transferable attacks. Although considerable efforts have been directed toward developing transferable attacks, a holistic understanding of the advancements in transferable attacks remains elusive. In this paper, we comprehensively explore learning-based attacks from the perspective of transferability, particularly within the context of cyber-physical security. We delve into different domains -- the image, text, graph, audio, and video domains -- to highlight the ubiquitous and pervasive nature of transferable attacks. This paper categorizes and reviews the architecture of existing attacks from various viewpoints: data, process, model, and system. We further examine the implications of transferable attacks in practical scenarios such as autonomous driving, speech recognition, and large language models (LLMs). Additionally, we outline the potential research directions to encourage efforts in exploring the landscape of transferable attacks. This survey offers a holistic understanding of the prevailing transferable attacks and their impacts across different domains.
    摘要 人工智能(AI)系统如自动驾驶、人脸识别和语音识别系统在我们日常生活中越来越普遍。然而,尽管它们的实用性,这些AI系统却易受到各种攻击,如对抗攻击、后门攻击、数据毒品攻击、成员推理攻击、模型反向攻击和模型窃取攻击。特别是,许多攻击是专门设计为targeting a particular model or system,但它们的效果可以扩散到其他目标,称为可传递性攻击。虽然有大量的努力在开发可传递性攻击方面,但总的来说,对这些攻击的全面理解仍然不够。在这篇论文中,我们全面探讨基于学习的攻击,特别是在Cyber-Physical Security(Cyber-Physical Security)领域中的可传递性攻击。我们在不同的领域(图像、文本、图ogram、音频、视频)中探讨了可传递性攻击的普遍和普遍性。这篇论文将攻击的架构从不同的视角进行分类和评论,包括数据、过程、模型和系统的视角。我们还考虑了可传递性攻击在实际场景中的影响,如自动驾驶、语音识别和大型自然语言模型(LLMs)。此外,我们还提出了可能的研究方向,以便在探索可传递性攻击的领域进行更多的努力。这篇论文提供了可传递性攻击的全面理解,并对不同领域中的可传递性攻击产生了影响。

PhytNet – Tailored Convolutional Neural Networks for Custom Botanical Data

  • paper_url: http://arxiv.org/abs/2311.12088
  • repo_url: None
  • paper_authors: Jamie R. Sykes, Katherine Denby, Daniel W. Franks
  • for: This paper is written for the purpose of developing a new deep learning model for automated disease, weed, and crop classification in agriculture, specifically using computer vision techniques.
  • methods: The paper uses a novel dataset of infrared cocoa tree images and develops a new convolutional neural network (CNN) architecture called PhytNet. The authors also compare the performance of PhytNet with existing CNN architectures like ResNet and EfficientNet.
  • results: The paper demonstrates the development and performance of PhytNet on a specific dataset of cocoa tree images. The results show that PhytNet displays excellent attention to relevant features, no overfitting, and an exceptionally low computation cost, making it a promising candidate for rapid disease or plant classification, or precise localisation of disease symptoms for autonomous systems.
    Abstract Automated disease, weed and crop classification with computer vision will be invaluable in the future of agriculture. However, existing model architectures like ResNet, EfficientNet and ConvNeXt often underperform on smaller, specialised datasets typical of such projects. We address this gap with informed data collection and the development of a new CNN architecture, PhytNet. Utilising a novel dataset of infrared cocoa tree images, we demonstrate PhytNet's development and compare its performance with existing architectures. Data collection was informed by analysis of spectroscopy data, which provided useful insights into the spectral characteristics of cocoa trees. Such information could inform future data collection and model development. Cocoa was chosen as a focal species due to the diverse pathology of its diseases, which pose significant challenges for detection. ResNet18 showed some signs of overfitting, while EfficientNet variants showed distinct signs of overfitting. By contrast, PhytNet displayed excellent attention to relevant features, no overfitting, and an exceptionally low computation cost (1.19 GFLOPS). As such PhytNet is a promising candidate for rapid disease or plant classification, or precise localisation of disease symptoms for autonomous systems.
    摘要 自动化疾病、植物和作物分类使用计算机视觉将在农业未来非常重要。然而,现有的模型架构如ResNet、EfficientNet和ConvNeXt经常在小型特殊数据集上表现不佳。我们通过了 informed data collection和开发新的CNN架构,即PhytNet,解决这个问题。我们使用了一个新的红外巧克力树图像集来证明PhytNet的发展和与现有架构进行比较。数据收集是根据谱spectroscopy数据进行分析,提供了有用的信息,例如巧克力树的spectral特征。这种信息可能会在未来的数据收集和模型开发中提供帮助。巧克力是我们选择的关键种类,因为它的疾病多样化和诊断具有挑战性。ResNet18显示了一定程度的过拟合,而EfficientNet变种显示了明显的过拟合。相比之下,PhytNet具有优秀的注意力特征,没有过拟合,计算成本非常低(1.19 GFLOPS)。因此,PhytNet是一个有前途的 кандидат,用于快速的疾病或植物分类,或精确的疾病 симптом的自动化识别。

Responsible AI Research Needs Impact Statements Too

  • paper_url: http://arxiv.org/abs/2311.11776
  • repo_url: None
  • paper_authors: Alexandra Olteanu, Michael Ekstrand, Carlos Castillo, Jina Suh
  • for: The paper is written to explore the potential unintended and adverse consequences of responsible artificial intelligence (RAI), ethical AI, and ethics in AI.
  • methods: The paper uses a qualitative research approach, including a literature review and expert interviews, to identify potential risks and challenges associated with RAI, ethical AI, and ethics in AI.
  • results: The paper highlights several potential unintended and adverse consequences of RAI, ethical AI, and ethics in AI, including the risk of reinforcing existing biases and power imbalances, the potential for unintended consequences of AI systems, and the need for careful consideration of ethical issues in AI development and deployment.
    Abstract All types of research, development, and policy work can have unintended, adverse consequences - work in responsible artificial intelligence (RAI), ethical AI, or ethics in AI is no exception.
    摘要 所有类型的研究、开发和政策工作都可能有意外、不良影响 - 负责任人工智能(RAI)、伦理AI或AI伦理都不例外。Note that "负责任人工智能" (RAI) is a term used in China to refer to "ethical AI" or "responsible AI".

Intelligent methods for business rule processing: State-of-the-art

  • paper_url: http://arxiv.org/abs/2311.11775
  • repo_url: None
  • paper_authors: Cristiano André da Costa, Uélison Jean Lopes dos Santos, Eduardo Souza dos Reis, Rodolfo Stoffel Antunes, Henrique Chaves Pacheco, Thaynã da Silva França, Rodrigo da Rosa Righi, Jorge Luis Victória Barbosa, Franklin Jebadoss, Jorge Montalvao, Rogerio Kunkel
  • for: 这篇论文主要用于介绍最新的智能技术在业务规则处理方面的应用。
  • methods: 论文采用了涵义检索和机器学习等智能方法进行研究。
  • results: 论文对市场前十家供应商和其主要解决方案进行了审查和分析。
    Abstract In this article, we provide an overview of the latest intelligent techniques used for processing business rules. We have conducted a comprehensive survey of the relevant literature on robot process automation, with a specific focus on machine learning and other intelligent approaches. Additionally, we have examined the top vendors in the market and their leading solutions to tackle this issue.
    摘要 在本文中,我们提供了对最新的智能技术处理商业规则的概述。我们进行了对相关文献的全面调查,具体强调机器学习和其他智能方法。此外,我们还评估了市场上领先的供应商和他们的主要解决方案。

Unveiling the Unseen Potential of Graph Learning through MLPs: Effective Graph Learners Using Propagation-Embracing MLPs

  • paper_url: http://arxiv.org/abs/2311.11759
  • repo_url: None
  • paper_authors: Yong-Min Shin, Won-Yong Shin
  • for: 本研究旨在使用多层感知器(MLP)解决 semi-supervised 节点分类问题,通过在教师图神经网络(GNN)的知识填充(KD)下训练学生 MLP。先前的研究主要集中在在 KD 过程中匹配教师和学生模型的输出概率分布上,尚未系统地研究如何在 KD 过程中显式地注入结构信息。
  • methods: 我们提出了 Propagate & Distill(P&D)方法,它在教师 GNN 的输出上进行了传播,然后进行 KD。P&D 可以被解释为一种简化的 inverse propagation 的过程,它可以让学生 MLP 显式地学习特征变换 T 和卷积 $\Pi$。
  • results: 通过对实际世界 benchmark 数据集进行了广泛的评估,我们证明了 P&D 的效果,并表明了学生 MLP 的性能得到了进一步提高。
    Abstract Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semi-supervised node classification on graphs, by training a student MLP by knowledge distillation (KD) from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during KD, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the KD process as enabling the student MLP to explicitly learn both $T$ and $\Pi$. Although this can be achieved by applying the inverse propagation $\Pi^{-1}$ before distillation from the teacher GNN, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation $\Pi^{-1}$. Through comprehensive evaluations using real-world benchmark datasets, we demonstrate the effectiveness of P&D by showing further performance boost of the student MLP.
    摘要 Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the KD process as enabling the student MLP to explicitly learn both $T$ and $\Pi$. This can be achieved by applying the inverse propagation $\Pi^{-1}$ before distillation from the teacher GNN, but this comes with a high computational cost from large matrix multiplications during training.To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation $\Pi^{-1}$. Through comprehensive evaluations using real-world benchmark datasets, we demonstrate the effectiveness of P&D by showing a further performance boost of the student MLP.

LSTM-CNN: An efficient diagnostic network for Parkinson’s disease utilizing dynamic handwriting analysis

  • paper_url: http://arxiv.org/abs/2311.11756
  • repo_url: None
  • paper_authors: Xuechao Wang, Junqing Huang, Sven Nomm, Marianna Chatzakou, Kadri Medijainen, Aaro Toomela, Michael Ruzhansky
    for: 这个研究的目的是提出一种基于深度学习的动手写分析方法,以提供早期诊断 Parkinson 病的 объектив标准。methods: 该方法采用了一种混合深度学习approach,结合了LSTM和CNN两种不同的神经网络模型,以提高分类精度和计算效率。results: 实验结果表明,提出的方法在新的 DraWritePD 数据集上达到了96.2%的高精度分类率,并在 PaHaW 数据集上达到了90.7%的高精度分类率。此外,该方法还具有轻量级的参数和计算量,可以在几十万个样本上进行实时推理。
    Abstract Background and objectives: Dynamic handwriting analysis, due to its non-invasive and readily accessible nature, has recently emerged as a vital adjunctive method for the early diagnosis of Parkinson's disease. In this study, we design a compact and efficient network architecture to analyse the distinctive handwriting patterns of patients' dynamic handwriting signals, thereby providing an objective identification for the Parkinson's disease diagnosis. Methods: The proposed network is based on a hybrid deep learning approach that fully leverages the advantages of both long short-term memory (LSTM) and convolutional neural networks (CNNs). Specifically, the LSTM block is adopted to extract the time-varying features, while the CNN-based block is implemented using one-dimensional convolution for low computational cost. Moreover, the hybrid model architecture is continuously refined under ablation studies for superior performance. Finally, we evaluate the proposed method with its generalization under a five-fold cross-validation, which validates its efficiency and robustness. Results: The proposed network demonstrates its versatility by achieving impressive classification accuracies on both our new DraWritePD dataset ($96.2\%$) and the well-established PaHaW dataset ($90.7\%$). Moreover, the network architecture also stands out for its excellent lightweight design, occupying a mere $0.084$M of parameters, with a total of only $0.59$M floating-point operations. It also exhibits near real-time CPU inference performance, with inference times ranging from $0.106$ to $0.220$s. Conclusions: We present a series of experiments with extensive analysis, which systematically demonstrate the effectiveness and efficiency of the proposed hybrid neural network in extracting distinctive handwriting patterns for precise diagnosis of Parkinson's disease.
    摘要 背景和目标:动态手写分析因为其不侵入性和Ready accessible的特点,最近在诊断parkinson病的早期诊断中得到了广泛应用。在本研究中,我们设计了一个紧凑型和高效的网络架构,以分析患者的动态手写信号特征,从而提供一种对parkinson病诊断的 объектив标准。方法:提议的网络采用了一种混合深度学习方法,旨在挖掘患者的动态手写特征。具体来说,LSTM块被采用来提取时变特征,而CNN基于的块则是通过一维 convolution来实现低计算成本。此外,我们还进行了一系列的ablation study,以进一步提高表现。最后,我们使用五fold Cross-Validation进行评估,以验证提议的方法的可行性和稳定性。结果:提议的网络在我们的新的DraWritePD数据集上达到了96.2%的分类精度,同时在已知的PaHaW数据集上也达到了90.7%的分类精度。此外,该网络架构还具有优秀的轻量级设计,占用约0.084个参数,总计约0.59亿浮点运算。它还表现出了几乎实时的CPU执行时间,执行时间在0.106-0.220秒之间。结论:我们通过了一系列的实验和分析,系统地证明了提议的混合神经网络在诊断parkinson病的精度和效率。

A Large-Scale Car Parts (LSCP) Dataset for Lightweight Fine-Grained Detection

  • paper_url: http://arxiv.org/abs/2311.11754
  • repo_url: None
  • paper_authors: Wang Jie, Zhong Yilin, Cao Qianqian
  • for: 本研究旨在提供一个大规模、细化的汽车AI数据集,用于探索汽车部件检测任务中的可能性。
  • methods: 本研究使用了自然摄像头和在线网站收集的84,162张图像,并提出了一种新的半监督自动标注方法,以及一种基于预训练检测器的针对性提升技术。
  • results: 研究人员通过使用许多轻量级YOLO系列检测器进行精细的汽车部件检测,并证明了数据集的有效性。
    Abstract Automotive related datasets have previously been used for training autonomous driving systems or vehicle classification tasks. However, there is a lack of datasets in the field of automotive AI for car parts detection, and most available datasets are limited in size and scope, struggling to cover diverse scenarios. To address this gap, this paper presents a large-scale and fine-grained automotive dataset consisting of 84,162 images for detecting 12 different types of car parts. This dataset was collected from natural cameras and online websites which covers various car brands, scenarios, and shooting angles. To alleviate the burden of manual annotation, we propose a novel semi-supervised auto-labeling method that leverages state-of-the-art pre-trained detectors. Moreover, we study the limitations of the Grounding DINO approach for zero-shot labeling. Finally, we evaluate the effectiveness of our proposed dataset through fine-grained car parts detection by training several lightweight YOLO-series detectors.
    摘要 自动驾驶系统或车型分类任务上使用了汽车相关数据集。然而,车部AI领域内有数据集缺失,现有数据集大多受限,缺乏多样化场景的覆盖。为了填补这一空白,本文提出了一个大规模、细致的汽车数据集,包含12种不同的车部类型的84,162张图像。这个数据集来自于自然摄像头和在线网站,覆盖了多种车型、场景和拍摄角度。为了避免手动标注的劳动ious burden,我们提出了一种新的半supervised自动标注方法,利用了当前领先的预训练检测器。此外,我们研究了零shot标签Grounding DINO的局限性。最后,我们通过使用多种轻量级YOLO系列检测器进行细致的车部检测来评估我们提出的数据集的有效性。

Sparse4D v3: Advancing End-to-End 3D Detection and Tracking

  • paper_url: http://arxiv.org/abs/2311.11722
  • repo_url: https://github.com/linxuewu/sparse4d
  • paper_authors: Xuewu Lin, Zixiang Pei, Tianwei Lin, Lichao Huang, Zhizhong Su
  • for: 这篇论文主要针对 autonomous driving 视觉系统中的 3D 检测和跟踪两个基本任务进行深入研究,基于 Sparse4D 框架。
  • methods: 本文引入了两个辅助训练任务(时间实例净化和质量评估),并提出了分离注意力的方法,从而对检测性能进行了重要改进。
  • results: 经验表明,在 nuScenes 测试集上,我们的提出改进可以获得显著提高,包括 mAP 、NDS 和 AMOTA 的提高率。最佳模型在 nuScenes 测试集上达到了 71.9% NDS 和 67.7% AMOTA。
    Abstract In autonomous driving perception systems, 3D detection and tracking are the two fundamental tasks. This paper delves deeper into this field, building upon the Sparse4D framework. We introduce two auxiliary training tasks (Temporal Instance Denoising and Quality Estimation) and propose decoupled attention to make structural improvements, leading to significant enhancements in detection performance. Additionally, we extend the detector into a tracker using a straightforward approach that assigns instance ID during inference, further highlighting the advantages of query-based algorithms. Extensive experiments conducted on the nuScenes benchmark validate the effectiveness of the proposed improvements. With ResNet50 as the backbone, we witnessed enhancements of 3.0\%, 2.2\%, and 7.6\% in mAP, NDS, and AMOTA, achieving 46.9\%, 56.1\%, and 49.0\%, respectively. Our best model achieved 71.9\% NDS and 67.7\% AMOTA on the nuScenes test set. Code will be released at \url{https://github.com/linxuewu/Sparse4D}.
    摘要 自动驾驶视觉系统中,3D探测和跟踪是两个基本任务。这篇论文将深入探讨这一领域,基于Sparse4D框架。我们引入了两个辅助训练任务(时间实例干净和质量估计),并提出了分离注意力的方法,导致探测性能得到了显著提高。此外,我们将探测器转换成跟踪器,使用简单的方法,在推理时分配实例ID,进一步发挥了查询基于算法的优势。我们在nuScenes benchmark上进行了广泛的实验, validate了我们提出的改进方法的效果。使用ResNet50作为背景网络,我们在mAP、NDS和AMOTA中提高了3.0\%、2.2\%和7.6\%,分别达到了46.9\%、56.1\%和49.0\%。我们的最佳模型在nuScenes测试集上达到了71.9\%的NDS和67.7\%的AMOTA。代码将在 GitHub上发布,详细信息请参考 \url{https://github.com/linxuewu/Sparse4D}.

Can we infer the presence of Differential Privacy in Deep Learning models’ weights? Towards more secure Deep Learning

  • paper_url: http://arxiv.org/abs/2311.11717
  • repo_url: https://github.com/xehartnort/dp-from-weights
  • paper_authors: Jiménez-López, Daniel, Rodríguez-Barroso, Nuria, Luzón, M. Victoria, Herrera, Francisco
  • for: 保护数据和模型免受攻击,确保数据隐私。
  • methods: 使用Diff Privacy Stochastic Gradient Descent(DP-SGD)实现数据隐私。
  • results: 通过分析模型参数,可以判断模型是否在训练过程中使用了Diff Privacy,不需要信任模型提供者。
    Abstract Differential Privacy (DP) is a key property to protect data and models from integrity attacks. In the Deep Learning (DL) field, it is commonly implemented through the Differentially Private Stochastic Gradient Descent (DP-SGD). However, when a model is shared or released, there is no way to check whether it is differentially private, that is, it required to trust the model provider. This situation poses a problem when data privacy is mandatory, specially with current data regulations, as the presence of DP can not be certificated consistently by any third party. Thus, we face the challenge of determining whether a DL model has been trained with DP, according to the title question: Can we infer the presence of Differential Privacy in Deep Learning models' weights? Since the DP-SGD significantly changes the training process of a DL model, we hypothesize that DP leaves an imprint in the weights of a DL model, which can be used to predict whether a model has been trained with DP regardless of its architecture and the training dataset. In this paper, we propose to employ the imprint in model weights of using DP to infer the presence of DP training in a DL model. To substantiate our hypothesis, we developed an experimental methodology based on two datasets of weights of DL models, each with models with and without DP training and a meta-classifier to infer whether DP was used in the training process of a DL model, by accessing its weights. We accomplish both, the removal of the requirement of a trusted model provider and a strong foundation for this interesting line of research. Thus, our contribution is an additional layer of security on top of the strict private requirements of DP training in DL models, towards to DL models.
    摘要 diffeential privacy (DP) 是一种保护数据和模型的重要性能。在深度学习(DL)领域,通常通过差分 private stochastic gradient descent (DP-SGD) 实现。但当模型被分享或发布时,没有方式来检查该模型是否具有差分隐私,即需要信任模型提供者。这种情况对于数据隐私是必要的,特别是现有的数据法规,因为差分隐私的存在无法被第三方证明。因此,我们面临着判断一个深度学习模型是否在训练过程中使用差分隐私的挑战。根据标题提问,我们问:可以通过深度学习模型的参数来推断它是否在差分隐私训练中?因为差分-SGD 对深度学习模型的训练过程进行了重要变化,我们假设差分隐私会留下在深度学习模型的参数中的印记,可以用来预测该模型是否在差分隐私训练中,无论其架构和训练数据。在这篇论文中,我们提议使用差分隐私训练中模型参数中的印记来判断深度学习模型是否在差分隐私训练中。为了证实我们的假设,我们采用了一种实验方法,该方法基于两个深度学习模型参数的数据集,每个数据集包含具有和不具有差分隐私训练的模型,以及一个元类фика器来预测深度学习模型是否在差分隐私训练中。通过这种方法,我们成功地 removeds了需要信任模型提供者的要求,并提供了一种强大的基础 для这一有趣的研究领域。因此,我们的贡献是在差分隐私训练中添加了一层安全性,以加强深度学习模型的隐私保护。

Control in Hybrid Chatbots

  • paper_url: http://arxiv.org/abs/2311.11701
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Thomas Rüdel, Jochen L. Leidner
  • for: 商业规则引擎和嵌入式神经语音助手的集成
  • methods: 使用商业规则引擎和嵌入式神经网络模型的集成方式,以实现更高水平的控制和避免模型“幻觉”现象
  • results: 研究人员通过实践和分析,发现这种集成方式可以提供更高的控制级别和更好的性能,同时避免模型“幻觉”现象的出现
    Abstract Customer data typically is held in database systems, which can be seen as rule-based knowledge base, whereas businesses increasingly want to benefit from the capabilities of large, pre-trained language models. In this technical report, we describe a case study of how a commercial rule engine and an integrated neural chatbot may be integrated, and what level of control that particular integration mode leads to. We also discuss alternative ways (including past ways realized in other systems) how researchers strive to maintain control and avoid what has recently been called model "hallucination".
    摘要 客户数据通常被存储在数据库系统中,可以看作为规则基本知识库。而企业正在寻求利用大量预训练语言模型的能力。在这份技术报告中,我们描述了一个商业规则引擎和一个集成的神经网络聊天机器人的集成方式,该集成方式导致了什么样的控制水平。我们还讨论了其他方法(包括过去在其他系统中实现的方法)如何维护控制并避免最近被称为“模型幻觉”的问题。

Sparse Low-rank Adaptation of Pre-trained Language Models

  • paper_url: http://arxiv.org/abs/2311.11696
  • repo_url: https://github.com/tsinghuac3i/sora
  • paper_authors: Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, Maosong Sun
  • for: 提高 fine-tuning 大型自然语言模型的效率和效iveness
  • methods: 增加 LoRA 方法的灵活性,通过动态调整归一化级别来控制约束维度
  • results: SoRA 可以与其他基eline相比,即使剩下 70% 参数和 70% 训练时间,也可以获得更好的表现
    Abstract Fine-tuning pre-trained large language models in a parameter-efficient manner is widely studied for its effectiveness and efficiency. The popular method of low-rank adaptation (LoRA) offers a notable approach, hypothesizing that the adaptation process is intrinsically low-dimensional. Although LoRA has demonstrated commendable performance, it is implemented with a fixed and unalterable intrinsic rank that might not always be the ideal choice. Recognizing the need for more flexible adaptation, we extend the methodology of LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. We achieve this through the incorporation of a gate unit optimized with proximal gradient method in the training stage, controlling the cardinality of rank under the sparsity of the gate. In the subsequent inference stage, we eliminate the parameter blocks corresponding to the zeroed-out ranks, to reduce each SoRA module back to a concise yet rank-optimal LoRA. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters via updating in a sparse way. We further introduce a sparsifying scheduler for SoRA, aiming to examine the impact of the number of non-zero parameters on the model's memorization and generalization. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
    摘要 大量语言模型的精细调整已被广泛研究,以提高效率和表现。LoRA方法在这些研究中具有重要地位,假设适应过程是低维度的。虽然LoRA已经表现出色,但它使用固定和不可变的内在维度,可能不是理想的选择。我们认为需要更 flexible的适应方式,于是我们扩展了LoRA方法,并将其称为SoRA。在训练阶段,我们通过在训练过程中使用适应器来控制维度的卡达利度,使得SoRA模块在执行过程中可以动态地调整其内在维度。在推理阶段,我们可以根据适应器的输出来决定是否保留每个SoRA模块中的参数块。我们的方法可以强化LoRA的表现力,同时efficiently 控制参数的数量。我们还引入了一个缩短调度器来考虑SoRA模块中参数的数量对模型的记忆和泛化造成的影响。我们的实验结果表明,SoRA可以在70% retained parameters和70% 训练时间下超越其他基elines。

Towards Robust Text Retrieval with Progressive Learning

  • paper_url: http://arxiv.org/abs/2311.11691
  • repo_url: None
  • paper_authors: Tong Wu, Yulei Qin, Enwei Zhang, Zihan Xu, Yuting Gao, Ke Li, Xing Sun
  • for: This paper aims to improve the performance of large language models (LLMs) in retrieving up-to-date and domain-specific information by proposing a progressively learned embeddings (PEG) model.
  • methods: The PEG model uses a progressive learning mechanism that dynamically modulates its attention to samples throughout the entire training process, and it is trained on more than 100 million data covering various tasks and domains.
  • results: The PEG model outperforms state-of-the-art embeddings in retrieving true positives, demonstrating its significant potential for applications in LLMs.
    Abstract Retrieval augmentation has become an effective solution to empower large language models (LLMs) with external and verified knowledge sources from the database, which overcomes the limitations and hallucinations of LLMs in handling up-to-date and domain-specific information. However, existing embedding models for text retrieval usually have three non-negligible limitations. First, the number and diversity of samples in a batch are too restricted to supervise the modeling of textual nuances at scale. Second, the high proportional noise are detrimental to the semantic correctness and consistency of embeddings. Third, the equal treatment to easy and difficult samples would cause sub-optimum convergence of embeddings with poorer generalization. In this paper, we propose the PEG, a progressively learned embeddings for robust text retrieval. Specifically, we increase the training in-batch negative samples to 80,000, and for each query, we extracted five hard negatives. Concurrently, we incorporated a progressive learning mechanism, enabling the model to dynamically modulate its attention to the samples throughout the entire training process. Additionally, PEG is trained on more than 100 million data, encompassing a wide range of domains (e.g., finance, medicine, and tourism) and covering various tasks (e.g., question-answering, machine reading comprehension, and similarity matching). Extensive experiments conducted on C-MTEB and DuReader demonstrate that PEG surpasses state-of-the-art embeddings in retrieving true positives, highlighting its significant potential for applications in LLMs. Our model is publicly available at https://huggingface.co/TownsWu/PEG.
    摘要 大量语言模型(LLM)的问题解决方法之一是使用数据库中的可靠和证明的知识来强化LLM,这种方法可以超越LLM在处理最新和域pecific信息方面的局限性和偏见。然而,现有的文本检索嵌入模型通常具有三种不可忽略的限制。首先,批处理中的样本数量和多样性太少,无法全面supervise模型处理文本细节的各种变化。其次,高卷积噪音会导致嵌入的semantic正确性和一致性受到损害。第三,对于易于处理和困难处理的样本进行相同的处理会导致嵌入的优化不佳,从而影响其总体性能。在这篇论文中,我们提出了Progressive Embeddings for Robust Text Retrieval(PEG)模型,用于解决这些限制。具体来说,我们增加了批处理中的负样本数量至80,000,并对每个查询选择五个困难的负样本。同时,我们还实现了一种进程学习机制,使得模型可以在整个训练过程中动态调整对样本的注意力。此外,PEG模型在1000万多个数据上进行训练,覆盖了多个领域(如金融、医学和旅游等)和多种任务(如问答、机器阅读理解和相似性匹配等)。我们在C-MTEB和DuReader上进行了广泛的实验,显示PEG模型在检索真正正确的样本方面表现出色, highlighting its significant potential for LLM applications。我们的模型可以在https://huggingface.co/TownsWu/PEG中找到。

Refactoring Programs Using Large Language Models with Few-Shot Examples

  • paper_url: http://arxiv.org/abs/2311.11690
  • repo_url: None
  • paper_authors: Atsushi Shirafuji, Yusuke Oda, Jun Suzuki, Makoto Morishita, Yutaka Watanobe
  • for: 提高程序维护和安全性,促进编程学习
  • methods: 使用大语言模型GPT-3.5提出简化版Python程序,透过几何映射学习
  • results: 95.68%的程序可以通过生成10个候选程序来简化,减少平均 cyclomatic complexity 17.35%,减少平均行数25.84%,且有出色的代码格式化能力,但也有一些不必要的行为,如删除或翻译注释。
    Abstract A less complex and more straightforward program is a crucial factor that enhances its maintainability and makes writing secure and bug-free programs easier. However, due to its heavy workload and the risks of breaking the working programs, programmers are reluctant to do code refactoring, and thus, it also causes the loss of potential learning experiences. To mitigate this, we demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program, aiming to encourage users to learn how to write better programs. We propose a method to leverage the prompting with few-shot examples of the LLM by selecting the best-suited code refactoring examples for each target programming problem based on the prior evaluation of prompting with the one-shot example. The quantitative evaluation shows that 95.68% of programs can be refactored by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity and a 25.84% decrease in the average number of lines after filtering only generated programs that are semantically correct. Furthermore, the qualitative evaluation shows outstanding capability in code formatting, while unnecessary behaviors such as deleting or translating comments are also observed.
    摘要 一个较简单且直观的程式是一个重要的因素,可以提高程式的维护和写作安全、无错程式的能力。然而,由于工作负担重大和可能会破坏正常运行的程式,因此开发者对程式 refactoring 的态度不够积极,从而导致学习机会的损失。为了解决这个问题,我们示范了使用大型自然语言模型(LLM)GPT-3.5,可以建议使用者写的 Python 程式中更加简单的版本,以便帮助使用者学习写更好的程式。我们提出了一种方法,利用LLM的提示,选择每个目标程式问题最适合的代码 refactoring 示例,根据先前评估的提示一个例子。根据量化评估,95.68%的程式可以通过生成10个候选者,实现了17.35%的减少平均顶点复杂度和25.84%的减少平均行数。此外,量化评估还表明了代码格式化的出色能力,而不必要的行为,如删除或翻译注解,也被观察到。

Causal Structure Learning Supervised by Large Language Model

  • paper_url: http://arxiv.org/abs/2311.11689
  • repo_url: https://github.com/tymadara/ils-csl
  • paper_authors: Taiyu Ban, Lyuzhou Chen, Derui Lyu, Xiangyu Wang, Huanhuan Chen
  • for: 提高 causal discovery 的精度和效率,使用 Large Language Models (LLMs) 增强 causal structure learning (CSL)
  • methods: 提出了 Iterative LLM Supervised CSL (ILS-CSL) 框架,将 LLM-based causal inference 融合到 CSL 中,通过反馈机制提高 causal DAG 的精度和 robustness
  • results: 在 eight 个实际数据集上进行了 comprehensive 评估,显示 ILS-CSL 的性能更高,创造了新的标准 для CSL 效率,并展示了其在 causal discovery 领域的潜在进步
    Abstract Causal discovery from observational data is pivotal for deciphering complex relationships. Causal Structure Learning (CSL), which focuses on deriving causal Directed Acyclic Graphs (DAGs) from data, faces challenges due to vast DAG spaces and data sparsity. The integration of Large Language Models (LLMs), recognized for their causal reasoning capabilities, offers a promising direction to enhance CSL by infusing it with knowledge-based causal inferences. However, existing approaches utilizing LLMs for CSL have encountered issues, including unreliable constraints from imperfect LLM inferences and the computational intensity of full pairwise variable analyses. In response, we introduce the Iterative LLM Supervised CSL (ILS-CSL) framework. ILS-CSL innovatively integrates LLM-based causal inference with CSL in an iterative process, refining the causal DAG using feedback from LLMs. This method not only utilizes LLM resources more efficiently but also generates more robust and high-quality structural constraints compared to previous methodologies. Our comprehensive evaluation across eight real-world datasets demonstrates ILS-CSL's superior performance, setting a new standard in CSL efficacy and showcasing its potential to significantly advance the field of causal discovery. The codes are available at \url{https://github.com/tyMadara/ILS-CSL}.
    摘要 causal discovery from observational data 是解释复杂关系的关键。 causal Structure Learning (CSL) 是 deriv ing causal Directed Acyclic Graphs (DAGs) from data 的方法,面临的挑战包括庞大 DAG 空间和数据稀缺。 将 Large Language Models (LLMs) integrate into CSL 提供了一个有前途的方向, LLMS 被认可为具有 causal 推理能力。 however, existing approaches using LLMs for CSL have encountered issues, including unreliable constraints from imperfect LLM inferences and the computational intensity of full pairwise variable analyses。 In response, we introduce the Iterative LLM Supervised CSL (ILS-CSL) framework。 ILS-CSL 创新地将 LLM-based causal inference 与 CSL 集成在迭代过程中,通过 LLM 反馈来约束 causal DAG。 这种方法不仅更有效使用 LLM 资源,还生成了更加可靠和高质量的结构约束,与之前的方法相比。 our comprehensive evaluation across eight real-world datasets demonstrates ILS-CSL's superior performance, setting a new standard in CSL efficacy and showcasing its potential to significantly advance the field of causal discovery。 codes are available at \url{https://github.com/tyMadara/ILS-CSL}.

ViP-Mixer: A Convolutional Mixer for Video Prediction

  • paper_url: http://arxiv.org/abs/2311.11683
  • repo_url: None
  • paper_authors: Xin Zheng, Ziang Peng, Yuan Cao, Hongming Shan, Junping Zhang
  • for: 预测未来帧数据,提高视频预测精度。
  • methods: 使用卷积混合器模型视频的空间时间演化,并将各个维度之间的关系充分利用。
  • results: 在三个标准视频数据集上实现新的最佳预测性能。
    Abstract Video prediction aims to predict future frames from a video's previous content. Existing methods mainly process video data where the time dimension mingles with the space and channel dimensions from three distinct angles: as a sequence of individual frames, as a 3D volume in spatiotemporal coordinates, or as a stacked image where frames are treated as separate channels. Most of them generally focus on one of these perspectives and may fail to fully exploit the relationships across different dimensions. To address this issue, this paper introduces a convolutional mixer for video prediction, termed ViP-Mixer, to model the spatiotemporal evolution in the latent space of an autoencoder. The ViP-Mixers are stacked sequentially and interleave feature mixing at three levels: frames, channels, and locations. Extensive experiments demonstrate that our proposed method achieves new state-of-the-art prediction performance on three benchmark video datasets covering both synthetic and real-world scenarios.
    摘要 视频预测目标是预测视频的未来帧。现有方法主要处理视频数据,其时间维度杂mix With space和channel维度,从三个不同的角度来看:为个帧序列,为三维空间时间坐标,或者为堆叠的图像,帧被视为不同的通道。大多数它们通常只关注一个这些视角,可能会不全面利用不同维度之间的关系。为解决这个问题,本文提出了一种基于卷积混合的视频预测方法,称为ViP-Mixer,用于模型视频的空间时间演化在自适应Encoder中的幂空间。ViP-Mixer堆叠在一起,并在三级Feature混合:帧、通道和位置。广泛的实验证明,我们提出的方法在三个标准视频数据集上实现了新的最佳预测性能,覆盖了synthetic和实际场景。

MGCT: Mutual-Guided Cross-Modality Transformer for Survival Outcome Prediction using Integrative Histopathology-Genomic Features

  • paper_url: http://arxiv.org/abs/2311.11659
  • repo_url: None
  • paper_authors: Mingxin Liu, Yunzan Liu, Hui Cui, Chunquan Li, Jiquan Ma
    for:* The paper is written to propose a new deep learning-based computational pathology method for prognosticating cancer patients using whole slide images (WSIs) and genomic features.methods:* The proposed method, called Mutual-Guided Cross-Modality Transformer (MGCT), uses a weakly-supervised, attention-based multimodal learning framework to combine histology features and genomic features to model the genotype-phenotype interactions within the tumor microenvironment.results:* The experiments conducted using nearly 3,600 gigapixel WSIs across five different cancer types sourced from The Cancer Genome Atlas (TCGA) consistently show that MGCT outperforms the state-of-the-art (SOTA) methods.
    Abstract The rapidly emerging field of deep learning-based computational pathology has shown promising results in utilizing whole slide images (WSIs) to objectively prognosticate cancer patients. However, most prognostic methods are currently limited to either histopathology or genomics alone, which inevitably reduces their potential to accurately predict patient prognosis. Whereas integrating WSIs and genomic features presents three main challenges: (1) the enormous heterogeneity of gigapixel WSIs which can reach sizes as large as 150,000x150,000 pixels; (2) the absence of a spatially corresponding relationship between histopathology images and genomic molecular data; and (3) the existing early, late, and intermediate multimodal feature fusion strategies struggle to capture the explicit interactions between WSIs and genomics. To ameliorate these issues, we propose the Mutual-Guided Cross-Modality Transformer (MGCT), a weakly-supervised, attention-based multimodal learning framework that can combine histology features and genomic features to model the genotype-phenotype interactions within the tumor microenvironment. To validate the effectiveness of MGCT, we conduct experiments using nearly 3,600 gigapixel WSIs across five different cancer types sourced from The Cancer Genome Atlas (TCGA). Extensive experimental results consistently emphasize that MGCT outperforms the state-of-the-art (SOTA) methods.
    摘要 深度学习计算生物学领域在使用整个扫描图像(WSIs)对肿瘤病人进行预测中已经显示了扎实的成果。然而,大多数预测方法都受限于历史学或基因学单独使用,这会导致预测病人预测的精度受到限制。而将WSIs和基因特征结合起来则存在三个主要挑战:(1)WSIs的巨大多样性,可以达到150,000x150,000像素的大小;(2)历史学图像和基因分子数据之间没有空间相对关系;(3)现有的早期、晚期和中期多Modal特征融合策略难以捕捉肿瘤微环境中的基因-生理相互作用。为了解决这些问题,我们提出了弱型监督的注意力基本多Modal学习框架——综合型响应器(MGCT),可以结合历史学特征和基因特征来模型肿瘤微环境中的基因-生理相互作用。为验证MGCT的效果,我们在TCGA数据集上进行了大量实验,结果表明MGCT在肿瘤预测方面具有明显的优势。

Peeking Inside the Schufa Blackbox: Explaining the German Housing Scoring System

  • paper_url: http://arxiv.org/abs/2311.11655
  • repo_url: None
  • paper_authors: Dean-Robin Kern, Gunnar Stevens, Erik Dethier, Sidra Naveed, Fatemeh Alizadeh, Delong Du, Md Shajalal
  • for: 这个研究旨在为德国的信用评分系统(Schufa)开发可解释的人工智能解释,以满足用户的信息需求和期望。
  • methods: 这个研究使用了推测设计方法,让商业信息学生想象出了为租客和房东提供住房信贷分数解释的用户界面。
  • results: 初步发现结果表明,尽管有一些通用的需求,但也有因角色和实际情况而冲突的需求。这些发现提供了未来人类中心的XAI研究的可能性。
    Abstract Explainable Artificial Intelligence is a concept aimed at making complex algorithms transparent to users through a uniform solution. Researchers have highlighted the importance of integrating domain specific contexts to develop explanations tailored to end users. In this study, we focus on the Schufa housing scoring system in Germany and investigate how users information needs and expectations for explanations vary based on their roles. Using the speculative design approach, we asked business information students to imagine user interfaces that provide housing credit score explanations from the perspectives of both tenants and landlords. Our preliminary findings suggest that although there are general needs that apply to all users, there are also conflicting needs that depend on the practical realities of their roles and how credit scores affect them. We contribute to Human centered XAI research by proposing future research directions that examine users explanatory needs considering their roles and agencies.
    摘要 人工智能可解释(Explainable Artificial Intelligence)是一种概念,旨在为用户提供复杂算法的通用解释。研究人员认为,在发展解释时应该考虑域专业上下文。在这项研究中,我们关注德国的信用评分系统(Schufa),并调查了用户信息需求和对解释的期望是如何因role而异。我们使用推想设计方法,请商业信息学生想象提供住房信用评分解释的用户界面,从租户和地产业者两个角度出发。我们的初步发现结果表明,虽然有一些共同的需求,但也有因role而异的需求,这些需求取决于各自的角色和信用评分对其造成的实际影响。我们的研究贡献了人类中心的XAI研究,并提出了未来研究的方向,旨在考虑用户的角色和权力,以更好地满足用户的解释需求。

Web News Timeline Generation with Extended Task Prompting

  • paper_url: http://arxiv.org/abs/2311.11652
  • repo_url: None
  • paper_authors: Sha Wang, Yuchen Li, Hanhua Xiao, Lambert Deng, Yanfei Dong
  • for: 这个研究的目的是为了生成新闻时间线,以提供全面和Contextual的事件发展趋势。
  • methods: 这个研究使用了扩展的任务提示技术来提高传统自然语言处理(NLP)技术的效果。
  • results: 研究表明,通过添加更多的任务提示,可以提高NLP技术对不同新闻数据集的效果,使新闻时间线生成成为职业使用的现实。
    Abstract The creation of news timeline is essential for a comprehensive and contextual understanding of events as they unfold over time. This approach aids in discerning patterns and trends that might be obscured when news is viewed in isolation. By organizing news in a chronological sequence, it becomes easier to track the development of stories, understand the interrelation of events, and grasp the broader implications of news items. This is particularly helpful in sectors like finance and insurance, where timely understanding of the event development-ranging from extreme weather to political upheavals and health crises-is indispensable for effective risk management. While traditional natural language processing (NLP) techniques have had some success, they often fail to capture the news with nuanced relevance that are readily apparent to domain experts, hindering broader industry integration. The advance of Large Language Models (LLMs) offers a renewed opportunity to tackle this challenge. However, direct prompting LLMs for this task is often ineffective. Our study investigates the application of an extended task prompting technique to assess past news relevance. We demonstrate that enhancing conventional prompts with additional tasks boosts their effectiveness on various news dataset, rendering news timeline generation practical for professional use. This work has been deployed as a publicly accessible browser extension which is adopted within our network.
    摘要 创建新闻时间轴是对事件的全面和Contextual理解的关键,帮助揭示事件发展的趋势和patterns。通过将新闻按时间顺序排序,可以轻松跟踪事件的发展,理解事件之间的相互关系,并捕捉新闻项目的更大意义。特别是在金融和保险领域,时间线新闻生成对于有效的风险管理至关重要,因为它可以帮助早发现和理解extreme weather、政治动荡和健康危机等事件的发展。传统的自然语言处理(NLP)技术有一定的成功,但它们经常无法捕捉域专家所看到的细微相关性,从而阻碍了更广泛的行业 интеграción。大语言模型(LLMs)的进步提供了一个新的机会,以解决这个挑战。然而,直接推 prompt LLMs 这种任务通常是不 efective。我们的研究探讨了在扩展任务推动技术下对过去新闻 relevance 的评估。我们示出,通过提高传统推送的效果,可以在不同的新闻数据集上提高新闻时间轴生成的实用性,使其成为职业使用的实际应用。这项工作已经被部署为公共可访问的浏览器扩展,并被我们的网络所采纳。

Leveraging healthy population variability in deep learning unsupervised anomaly detection in brain FDG PET

  • paper_url: http://arxiv.org/abs/2311.12081
  • repo_url: None
  • paper_authors: Maëlys Solal, Ravi Hassanaly, Ninon Burgos
  • for: 这篇论文的目的是为了开发一种基于无监督学习的脑成像数据分析方法,以检测脑成像中的各种异常。
  • methods: 这篇论文使用了一种基于Z-score的方法,将健康人群的脑成像模型与患者的脑成像进行比较,以检测异常。
  • results: 这篇论文的实验结果显示,这种方法可以精准地检测阿兹海默症相关的异常。
    Abstract Unsupervised anomaly detection is a popular approach for the analysis of neuroimaging data as it allows to identify a wide variety of anomalies from unlabelled data. It relies on building a subject-specific model of healthy appearance to which a subject's image can be compared to detect anomalies. In the literature, it is common for anomaly detection to rely on analysing the residual image between the subject's image and its pseudo-healthy reconstruction. This approach however has limitations partly due to the pseudo-healthy reconstructions being imperfect and to the lack of natural thresholding mechanism. Our proposed method, inspired by Z-scores, leverages the healthy population variability to overcome these limitations. Our experiments conducted on FDG PET scans from the ADNI database demonstrate the effectiveness of our approach in accurately identifying Alzheimer's disease related anomalies.
    摘要 不监督异常检测是脑成像数据分析中广泛使用的方法,因为它可以从无标注数据中识别各种异常。它基于建立个体特定的健康模型,并将个体图像与这个模型进行比较,以检测异常。在文献中,异常检测通常基于分析个体图像与其 Pseudo-健康重建图像之间的差异。然而,这种方法有一些限制,其中之一是 Pseudo-健康重建图像不准确,另一个是缺乏自然的阈值分割机制。我们提出的方法, Drawing inspiration from Z-scores,利用健康人群的变化来超越这些限制。我们在 ADNI 数据库中的FDG PET扫描实验结果表明,我们的方法可以准确地识别阿尔茨heimer病相关的异常。

A novel transformer-based approach for soil temperature prediction

  • paper_url: http://arxiv.org/abs/2311.11626
  • repo_url: None
  • paper_authors: Muhammet Mucahit Enes Yurtsever, Ayhan Kucukmanisa, Zeynep Hilal Kilimci
  • for: 这研究旨在预测土壤温度,以便更好地了解高山冰川的能量、动力学、水文过程、生态稳定性、土壤、水和农作物的管理。
  • methods: 本研究使用了变换器模型,这是首次应用变换器模型预测土壤温度。实验使用了FLUXNET站点,模型使用五种不同的变换器模型,即简单变换器、信息变换器、自动变换器、重构变换器和ETS变换器。
  • results: 实验结果表明,使用变换器模型可以为预测土壤温度做出重要贡献,并确定新的州对比。相比深度学习方法和文献研究,本研究的效果更好。
    Abstract Soil temperature is one of the most significant parameters that plays a crucial role in glacier energy, dynamics of mass balance, processes of surface hydrological, coaction of glacier-atmosphere, nutrient cycling, ecological stability, the management of soil, water, and field crop. In this work, we introduce a novel approach using transformer models for the purpose of forecasting soil temperature prediction. To the best of our knowledge, the usage of transformer models in this work is the very first attempt to predict soil temperature. Experiments are carried out using six different FLUXNET stations by modeling them with five different transformer models, namely, Vanilla Transformer, Informer, Autoformer, Reformer, and ETSformer. To demonstrate the effectiveness of the proposed model, experiment results are compared with both deep learning approaches and literature studies. Experiment results show that the utilization of transformer models ensures a significant contribution to the literature, thence determining the new state-of-the-art.
    摘要 土壤温度是冰川能源中最重要的参数之一,它对冰川动力、质量平衡、表面水文过程、冰川-大气相互作用、营养征回循环、生态稳定性等具有关键作用。在这个工作中,我们提出了一种新的方法,利用转换器模型来预测土壤温度。根据我们所知,这是第一次使用转换器模型预测土壤温度的尝试。我们在六个FLUXNET站上进行了六个不同的转换器模型,分别是粉丝转换器、信息转换器、自动转换器、改进转换器和ETS转换器。为证明我们提出的模型的效果,我们与深度学习方法和文献研究进行了比较。实验结果表明,通过使用转换器模型可以做出显著贡献,因此确定了新的状态体。

Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks

  • paper_url: http://arxiv.org/abs/2311.11608
  • repo_url: https://github.com/dutir-bionlp/taiyi-llm
  • paper_authors: Ling Luo, Jinzhong Ning, Yingwen Zhao, Zhijun Wang, Zeyuan Ding, Peng Chen, Weiru Fu, Qinyu Han, Guangtao Xu, Yunzhi Qiu, Dinghao Pan, Jiru Li, Hao Li, Wenduo Feng, Senbo Tu, Yuqi Liu, Zhihao Yang, Jian Wang, Yuanyuan Sun, Hongfei Lin
  • for: This paper is focused on developing a bilingual language model (Taiyi) for diverse biomedical natural language processing tasks in both English and Chinese.
  • methods: The authors use a two-stage fine-tuning strategy to optimize the model’s performance across various tasks, and they use a comprehensive collection of biomedical text mining datasets to evaluate the model’s performance.
  • results: The authors show that Taiyi achieves superior performance compared to general language models on 13 test sets covering named entity recognition, relation extraction, text classification, and question answering tasks. Additionally, the authors demonstrate the model’s potential for bilingual biomedical multi-tasking through a case study involving additional biomedical NLP tasks.Here are the three key points in Simplified Chinese text:
  • for: 这篇论文是为了开发一个双语语言模型(Taiyi),用于多种生物医学自然语言处理任务。
  • methods: 作者使用了两个阶段的超参数调整策略,以优化模型的性能 across 多种任务。
  • results: 作者表明,Taiyi 比普通语言模型在 13 个测试集上表现出色,包括命名实体识别、关系抽取、文本分类和问答任务。此外,作者通过一个案例研究,展示了 Taiyi 在多语言生物医学多任务中的可能性。
    Abstract Recent advancements in large language models (LLMs) have shown promising results across a variety of natural language processing (NLP) tasks. The application of LLMs to specific domains, such as biomedicine, has achieved increased attention. However, most biomedical LLMs focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To further investigate the effectiveness of the LLMs on diverse biomedical NLP tasks in different languages, we present Taiyi, a bilingual (English and Chinese) fine-tuned LLM for diverse biomedical tasks. In this work, we first curated a comprehensive collection of 140 existing biomedical text mining datasets across over 10 task types. Subsequently, a two-stage strategy is proposed for supervised fine-tuning to optimize the model performance across varied tasks. Experimental results on 13 test sets covering named entity recognition, relation extraction, text classification, question answering tasks demonstrate Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking. The source code, datasets, and model for Taiyi are freely available at https://github.com/DUTIR-BioNLP/Taiyi-LLM.
    摘要 现代大语言模型(LLM)的进步已经在各种自然语言处理(NLP)任务中显示出色的结果。将LLM应用到特定领域,如生物医学,已经吸引了更多的关注。然而,大多数生物医学LLM都是专门针对单语言生物医学问答和对话任务进行提升性能。为了进一步调查LLM在不同语言的生物医学NLP任务中的效果,我们提出了 Taiyi,一个英文和中文双语精度调整的大语言模型。在这种工作中,我们首先绘制了140个现有的生物医学文本挖掘数据集,涵盖了10多种任务类型。然后,我们提议了一种两阶段的监督微调策略,以便在不同任务中优化模型的性能。实验结果表明,Taiyi在13个测试集上(包括命名实体识别、关系抽取、文本分类、问答任务)表现出色,比普通的LLM更高。此外,在进一步的生物医学NLP任务中,Taiyi还表现出了很好的多任务优势。Taiyi的源代码、数据集和模型可以免费下载于https://github.com/DUTIR-BioNLP/Taiyi-LLM。

Machine learning-based malware detection for IoT devices using control-flow data

  • paper_url: http://arxiv.org/abs/2311.11605
  • repo_url: None
  • paper_authors: Gergely Hevesi
    for: This thesis project aims to provide better security for IoT devices using machine learning algorithms and reverse engineering tools.methods: The proposed method consists of two phases: (1) extracting control-flow related data using static binary analysis, and (2) classifying binary executables as malicious or benign using a neural network model.results: The method is trained using a dataset of malicious and benign ARM applications, and is able to detect malware with high accuracy.
    Abstract Embedded devices are specialised devices designed for one or only a few purposes. They are often part of a larger system, through wired or wireless connection. Those embedded devices that are connected to other computers or embedded systems through the Internet are called Internet of Things (IoT for short) devices. With their widespread usage and their insufficient protection, these devices are increasingly becoming the target of malware attacks. Companies often cut corners to save manufacturing costs or misconfigure when producing these devices. This can be lack of software updates, ports left open or security defects by design. Although these devices may not be as powerful as a regular computer, their large number makes them suitable candidates for botnets. Other types of IoT devices can even cause health problems since there are even pacemakers connected to the Internet. This means, that without sufficient defence, even directed assaults are possible against people. The goal of this thesis project is to provide better security for these devices with the help of machine learning algorithms and reverse engineering tools. Specifically, I study the applicability of control-flow related data of executables for malware detection. I present a malware detection method with two phases. The first phase extracts control-flow related data using static binary analysis. The second phase classifies binary executables as either malicious or benign using a neural network model. I train the model using a dataset of malicious and benign ARM applications.
    摘要 特殊设备(embedded devices)是专门设计用于一些或只有几个用途的设备。它们通常是大型系统的一部分,通过硬件或无线连接。这些与其他计算机或嵌入式系统通过互联网连接的特殊设备被称为互联网物品(IoT for short)设备。由于它们的广泛使用和不充分保护,这些设备在不断增长的Malware攻击目标。公司经常为了降低生产成本或配置不当,导致这些设备缺乏软件更新、开放的端口或设计上的安全漏洞。尽管这些设备可能不如常见的计算机强大,但由于它们的大量使用,它们成为了黑客的目标。此外,一些IoT设备甚至可能对人们健康造成威胁,因为有些 pacemakers 已经连接到互联网。这意味着,如果不充分防御,甚至可能发生对人的targeted攻击。本论文的目标是通过机器学习算法和反向工程工具来提供更好的安全保护 для这些设备。 Specifically,我研究控制流相关数据的应用可能性,以提出一种两相阶段的恶意软件检测方法。第一相阶段使用静态二进制分析提取控制流相关数据。第二相阶段使用神经网络模型将二进制执行文件分类为恶意或正常。我使用一个基于恶意和正常 ARM 应用的数据集进行训练。

A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow

  • paper_url: http://arxiv.org/abs/2311.11602
  • repo_url: https://github.com/J911/MISO-VFI
  • paper_authors: Jaemin Lee, Minseok Seo, Sangwoo Lee, Hyobin Park, Dong-Geol Choi
  • for: 这种纸是用于优化视频帧 interpolate的方法,以提高视频帧 interpolate的精度和效果。
  • methods: 这种方法不是使用传统的运动范围估计,而是使用多个输入帧并将其混合成一帧输出,从而更好地处理 occlusion 和非线性运动。此外,我们还提出了一种新的运动观察损失函数,使得这种方法更好地捕捉视频帧中的空间时间相关性。
  • results: 我们的方法在 Vimeo90K、Middlebury 和 UCF101 等视频帧 interpolate benchmark 上达到了状态机器的 результаты,与现有方法相比,具有显著的性能差距。
    Abstract In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have been applied to VFI to address these issues. However, as VFI is not a task focused on generating plausible images, but rather on predicting accurate intermediate frames between two given frames, performance limitations still persist. In this paper, we propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation, allowing it to effectively model occlusions and nonlinear motion. Additionally, we introduce a novel motion perceptual loss that enables MISO-VFI to better capture the spatio-temporal correlations within the video frames. Our MISO-VFI method achieves state-of-the-art results on VFI benchmarks Vimeo90K, Middlebury, and UCF101, with a significant performance gap compared to existing approaches.
    摘要 通常,深度学习基于视频帧 interpolate(VFI)方法都是估算两个输入帧之间的运动 вектор,并将其折叠到目标时间。这种方法在线性运动时 exhibits 出色的表现,但是在遮挡和非线性运动时存在局限性。在最近,生成模型被应用于 VFI 以解决这些问题。然而,由于 VFI 不是一个关注生成可信worth images 的任务,而是预测两个输入帧之间的准确中间帧,因此性能还是有限。在这篇论文中,我们提出了一种多入一出(MISO)基于 VFI 方法,不需要运动 вектор估算,因此可以有效地处理遮挡和非线性运动。此外,我们还引入了一种新的运动感知损失,使得 MISO-VFI 更好地捕捉视频帧中的空间-时间相关性。我们的 MISO-VFI 方法在 Vimeo90K、Middlebury 和 UCF101 视频帧 interpolate 测试benchmark上达到了当前最佳的结果,与现有方法相比,具有显著的性能差距。

DesignGPT: Multi-Agent Collaboration in Design

  • paper_url: http://arxiv.org/abs/2311.11591
  • repo_url: None
  • paper_authors: Shiying Ding, Xinyi Chen, Yan Fang, Wenrui Liu, Yiwu Qiu, Chunlei Chai
  • for: 这项研究旨在应对生产设计过程中的生成AI面临的挑战,如界面使用性和交互模式。
  • methods: 研究人员采用了设计思维和设计过程,开发了多代理人合作框架DesignGPT,该框架使用人工智能代理人模拟设计公司不同职位的角色,并允许人类设计师与其进行自然语言协作。
  • results: 实验结果显示,相比单独使用的AI工具,DesignGPT可以提高设计师的表现, highlighting the potential of 将多代理人系统集成到产品方案设计中。
    Abstract Generative AI faces many challenges when entering the product design workflow, such as interface usability and interaction patterns. Therefore, based on design thinking and design process, we developed the DesignGPT multi-agent collaboration framework, which uses artificial intelligence agents to simulate the roles of different positions in the design company and allows human designers to collaborate with them in natural language. Experimental results show that compared with separate AI tools, DesignGPT improves the performance of designers, highlighting the potential of applying multi-agent systems that integrate design domain knowledge to product scheme design.
    摘要 <>translate english text to simplified chinese产生式AI在产品设计 workflow 中遇到许多挑战,如界面可用性和互动模式。因此,基于设计思维和设计过程,我们开发了 DesignGPT 多代理协作框架,使用人工智能代理模拟不同的设计公司位置,让人类设计师和其在自然语言中合作。实验结果显示,相比单独的 AI 工具,DesignGPT 提高了设计师的性能, highlighting 将多代理系统应用于产品方案设计中的潜在价值。Note: "产生式AI" is a term used in China to refer to generative AI, and "设计公司" means "design company" in Chinese.

Advancing Urban Renewal: An Automated Approach to Generating Historical Arcade Facades with Stable Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.11590
  • repo_url: None
  • paper_authors: Zheyuan Kuang, Jiaxin Zhang, Yiying Huang, Yunqin Li
  • for: 历史街区更新和转化过程中保留历史城市质感非常重要,特别是在知名的建筑和历史遗产地区。这些区域拥有多样化的建筑风格,传统上需要广泛的初步研究,常常导致主观的结果。
  • methods: 我们的研究引入了一种新的方法,利用稳定扩散模型(Stable Diffusion Models)来自动生成历史拱廊建筑图像,并通过文本描述来控制样式。我们分类和标记了多种拱廊风格,并构建了许多真实的拱廊建筑图像集。我们训练了多个低级 adaptation(LoRA)模型来控制生成图像的艺术性,并补充了ControlNet模型以提高精度和 AUTHENTICITY。
  • results: 我们的方法得到了高级别的精度、AUTHENTICITY和多样性,显示了在实际城市更新项目中的潜在潜力。这种新的方法可以更有效率地替代传统的城市更新设计过程,解决不authentic的图像细节、缺乏精度和限制的样式多样性问题。未来的研究可以考虑将这种二维图像生成技术与三维模型技术集成,为历史街区的建筑改造提供更全面的解决方案。
    Abstract Urban renewal and transformation processes necessitate the preservation of the historical urban fabric, particularly in districts known for their architectural and historical significance. These regions, with their diverse architectural styles, have traditionally required extensive preliminary research, often leading to subjective results. However, the advent of machine learning models has opened up new avenues for generating building facade images. Despite this, creating high-quality images for historical district renovations remains challenging, due to the complexity and diversity inherent in such districts. In response to these challenges, our study introduces a new methodology for automatically generating images of historical arcade facades, utilizing Stable Diffusion models conditioned on textual descriptions. By classifying and tagging a variety of arcade styles, we have constructed several realistic arcade facade image datasets. We trained multiple low-rank adaptation (LoRA) models to control the stylistic aspects of the generated images, supplemented by ControlNet models for improved precision and authenticity. Our approach has demonstrated high levels of precision, authenticity, and diversity in the generated images, showing promising potential for real-world urban renewal projects. This new methodology offers a more efficient and accurate alternative to conventional design processes in urban renewal, bypassing issues of unconvincing image details, lack of precision, and limited stylistic variety. Future research could focus on integrating this two-dimensional image generation with three-dimensional modeling techniques, providing a more comprehensive solution for renovating architectural facades in historical districts.
    摘要 都市更新和转化过程中需要保留历史城市胶囊,特别是那些建筑和历史意义极高的区域。这些区域具有多样化的建筑风格,传统上需要广泛的初步研究,经常导致主观的结果。然而,机器学习模型的出现开创了新的可能性,可以生成建筑立面图像。然而,为历史区域重新建设而生成高质量图像仍然是挑战,因为这些区域的复杂性和多样性。为解决这些挑战,我们的研究提出了一种新的方法,利用稳定扩散模型 Conditioned on 文本描述来自动生成历史街区立面图像。我们根据不同的街区风格进行分类和标注,并构建了许多真实的街区立面图像数据集。我们使用多个低级 adaptive (LoRA)模型控制生成图像的艺术性方面,并补充了 ControlNet 模型以提高精度和authenticity。我们的方法已经显示了高度的精度、authenticity 和多样性,表现出了潜在的应用潜力。这种新的方法可以为都市更新项目提供更高效和准确的替代方案,通过 circumventing 问题,例如不寓实的图像细节、缺乏精度和限制的艺术风格多样性。未来的研究可能会集成这种二维图像生成技术和三维模型技术,为历史区域的建筑立面重新建设提供更全面的解决方案。

Decoupled DETR For Few-shot Object Detection

  • paper_url: http://arxiv.org/abs/2311.11570
  • repo_url: None
  • paper_authors: Zeyu Shangguan, Lian Huai, Tong Liu, Xingqun Jiang
  • for: 提高几个样本 object detection(FSOD)的性能,解决数据匮乏问题。
  • methods: 提出一种基于DETR的FSOD模型,通过分离类别的参数和尝试不同类型的跳过连接来提高模型性能。
  • results: 在PASCAL VOC和MSCOCO等常用 dataset 上测试,提出的模型能够稳定地提高 Fine-tuning 和 meta-learning 下的性能,与最新的工作相比达到最高得分。
    Abstract Few-shot object detection (FSOD), an efficient method for addressing the severe data-hungry problem, has been extensively discussed. Current works have significantly advanced the problem in terms of model and data. However, the overall performance of most FSOD methods still does not fulfill the desired accuracy. In this paper we improve the FSOD model to address the severe issue of sample imbalance and weak feature propagation. To alleviate modeling bias from data-sufficient base classes, we examine the effect of decoupling the parameters for classes with sufficient data and classes with few samples in various ways. We design a base-novel categories decoupled DETR (DeDETR) for FSOD. We also explore various types of skip connection between the encoder and decoder for DETR. Besides, we notice that the best outputs could come from the intermediate layer of the decoder instead of the last layer; therefore, we build a unified decoder module that could dynamically fuse the decoder layers as the output feature. We evaluate our model on commonly used datasets such as PASCAL VOC and MSCOCO. Our results indicate that our proposed module could achieve stable improvements of 5% to 10% in both fine-tuning and meta-learning paradigms and has outperformed the highest score in recent works.
    摘要 “几枚shot物类探测(FSOD),一种高效的方法来解决严重的数据饵问题,已经被广泛讨论。现有的工作对这个问题进行了重要的进步,但大多数FSOD方法的总性表现仍未能满足预期的精度。在这篇论文中,我们改进FSOD模型,以解决严重的样本不均衡和弱feature传播问题。为了从数据充足的基础类别中避免模型偏见,我们考虑了各种方法来隔离具有充足数据的类别和具有少数数据的类别的参数。我们设计了基于类别分类的DETR(DeDETR)模型来进行FSOD。此外,我们发现最佳的输出可能来自decoder层的中途层而不是最后层,因此我们建立了一个统一的decoder模组,可以动态地融合decoder层的输出特征。我们将我们的模型评估在常用的PASCAL VOC和MSCOCO datasets上,结果显示我们的提案模组可以在精度调整和元学习模式下稳定地提高5%至10%的表现,并且超过了最近的最高得分。”

Replay-enhanced Continual Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.11557
  • repo_url: None
  • paper_authors: Tiantian Zhang, Kevin Zehua Shen, Zichuan Lin, Bo Yuan, Xueqian Wang, Xiu Li, Deheng Ye
  • for: 避免 catastrophic forgetting 在 continual reinforcement learning 中
  • methods: 使用 replay-enhanced method,包括 adaptive normalization 和 policy distillation
  • results: 在 Continual World benchmark 上表现出色,比 purely perfect memory replay 好,并且与 state-of-the-art continual learning methods 相当或更好
    Abstract Replaying past experiences has proven to be a highly effective approach for averting catastrophic forgetting in supervised continual learning. However, some crucial factors are still largely ignored, making it vulnerable to serious failure, when used as a solution to forgetting in continual reinforcement learning, even in the context of perfect memory where all data of previous tasks are accessible in the current task. On the one hand, since most reinforcement learning algorithms are not invariant to the reward scale, the previously well-learned tasks (with high rewards) may appear to be more salient to the current learning process than the current task (with small initial rewards). This causes the agent to concentrate on those salient tasks at the expense of generality on the current task. On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting. In this paper, we introduce RECALL, a replay-enhanced method that greatly improves the plasticity of existing replay-based methods on new tasks while effectively avoiding the recurrence of catastrophic forgetting in continual reinforcement learning. RECALL leverages adaptive normalization on approximate targets and policy distillation on old tasks to enhance generality and stability, respectively. Extensive experiments on the Continual World benchmark show that RECALL performs significantly better than purely perfect memory replay, and achieves comparable or better overall performance against state-of-the-art continual learning methods.
    摘要 重新播放过去的经验被证明是预学批处理中维持快速学习的高效方法。然而,一些重要的因素仍然被忽视,使其在重复学习中容易出现严重的失败。在现在的任务中,由于大多数学习算法不具有奖励缩放的不变性,以前的任务(高奖)可能会在当前任务(初始奖)中显得更加吸引人,导致代理人偏向这些鲜明的任务,而忽略当前任务。另一方面,在学习新任务时, reuse 批处理过去的任务可能会导致批处理集和学习策略之间的分布差异,从而导致忘记。在本文中,我们介绍了 RECALL,一种增强现有批处理方法的方法,以提高新任务的抽象和稳定性。RECALL 利用了adaptive normalization的 approximate targets和policy distillation on old tasks来提高多任务学习的灵活性和稳定性。我们在 Continual World benchmark 上进行了广泛的实验,发现 RECALL 比纯粹的完美记忆播放更好,并与现状顶尖的 continual learning 方法相比,达到了相似或更好的总性表现。

Exploring Prompting Large Language Models as Explainable Metrics

  • paper_url: http://arxiv.org/abs/2311.11552
  • repo_url: https://github.com/ghazaleh-mahmoodi/Prompting_LLMs_AS_Explainable_Metrics
  • paper_authors: Ghazaleh Mahmoudi
  • for: 提出了一种针对自然语言处理(NLP)领域摘要任务的可解释评估策略,使用大语言模型(LLMs)作为评估指标。
  • methods: 提出了零批量基于提示的策略,并在实验中采用了少量和零量的方法。
  • results: 实验结果表明,LLMs 在 NLP 领域,特别是摘要任务中,具有扎实的可解释评估能力,并且可以在少量和零量的情况下达到可观的性能。Prompt-based strategy的最佳提示表现达到了人工评估的克ен德拉度相对值0.477。代码和结果在 GitHub 上公开发布。
    Abstract This paper describes the IUST NLP Lab submission to the Prompting Large Language Models as Explainable Metrics Shared Task at the Eval4NLP 2023 Workshop on Evaluation & Comparison of NLP Systems. We have proposed a zero-shot prompt-based strategy for explainable evaluation of the summarization task using Large Language Models (LLMs). The conducted experiments demonstrate the promising potential of LLMs as evaluation metrics in Natural Language Processing (NLP), particularly in the field of summarization. Both few-shot and zero-shot approaches are employed in these experiments. The performance of our best provided prompts achieved a Kendall correlation of 0.477 with human evaluations in the text summarization task on the test data. Code and results are publicly available on GitHub.
    摘要 这篇论文描述了我们在Eval4NLP 2023工作坊上提交的Prompting Large Language Models as Explainable Metrics Shared Task的实验报告。我们提议了一种零批量基于提示的方法来评估摘要任务中的大自然语言模型(LLMs)。我们进行了实验,并证明了LLMs在自然语言处理(NLP)领域,特别是摘要任务中的表现有扎实的潜力。我们在几个shot和零shot的情况下都使用了这些提示。实验结果显示,我们提供的最佳提示的性能达到了人工评估数据上的Kendall相关性0.477。代码和结果在GitHub上公开可用。

Which AI Technique Is Better to Classify Requirements? An Experiment with SVM, LSTM, and ChatGPT

  • paper_url: http://arxiv.org/abs/2311.11547
  • repo_url: None
  • paper_authors: Abdelkarim El-Hajjami, Nicolas Fafin, Camille Salinesi
  • for: 这paper的目的是评估大自然语言模型(LLM)在需求工程(RE)中的应用,特别是在需求分类方面。
  • methods: 本paper使用了多种实验评估了三种ChatGPT模型(text-davinci-003、gpt-3.5-turbo、gpt-4)在零shot和几shot Setting下的表现,并进行了对比分析。
  • results: 研究发现,ChatGPT在需求分类方面的表现比LSTM更好,但在分类非功能需求(NFR)方面,SVM的表现更好。此外,研究发现在大多数情况下,几shot Setting不一定能够提高表现,有时甚至会下降。I hope that helps! Let me know if you have any other questions.
    Abstract Context and motivation: Recently, Large Language Models (LLMs) like ChatGPT have demonstrated remarkable proficiency in various Natural Language Processing (NLP) tasks. Their application in Requirements Engineering (RE), especially in requirements classification, has gained increasing interest. Question/problem: In our research, we conducted an extensive empirical evaluation of ChatGPT models including text-davinci-003, gpt-3.5-turbo, and gpt-4 in both zero-shot and few-shot settings for requirements classification. The question arises as to how these models compare to traditional classification methods, specifically Support Vector Machine (SVM) and Long Short-Term Memory (LSTM). Principal ideas/results: Based on five diverse datasets, our results show that ChatGPT consistently outperforms LSTM, and while ChatGPT is more effective than SVM in classifying functional requirements (FR), SVM is better in classifying non-functional requirements (NFR). Our results also show that contrary to our expectations, the few-shot setting does not always lead to enhanced performance; in most instances, it was found to be suboptimal. Contribution: Our findings underscore the potential of LLMs in the RE domain, suggesting that they could play a pivotal role in future software engineering processes, particularly as tools to enhance requirements classification.
    摘要 Context and motivation: 近些年来,大型自然语言模型(LLM)如ChatGPT在自然语言处理(NLP)方面的表现很出色,其应用于需求工程(RE)领域,特别是需求分类,引起了越来越多的关注。问题/问题:在我们的研究中,我们进行了大量的实验性评估,包括文本达尔文-003、gpt-3.5-turbo和gpt-4等ChatGPT模型,在零shot和几shot设置下进行了需求分类。问题是如何与传统的分类方法相比,特别是支持向量机(SVM)和长期记忆(LSTM)?主要想法/结果:根据五个多样化的数据集,我们的结果显示,ChatGPT在FR和NFR之间的分类表现相对较好,而SVM在NFR中的表现更好。此外,我们发现,在大多数情况下,几shot设置不一定是最佳的,反而有时会下降性能。贡献:我们的发现表明了LLM在RE领域的潜力, suggesting that they could play a crucial role in future software engineering processes, particularly as tools to enhance requirements classification.

Data-driven project planning: An integrated network learning and constraint relaxation approach in favor of scheduling

  • paper_url: http://arxiv.org/abs/2311.11542
  • repo_url: None
  • paper_authors: Izack Cohen
  • for: 这项研究旨在支持数据驱动的项目规划,帮助项目规划人员在项目规划和资源调配方面做出更好的决策。
  • methods: 该研究提出了一种数据驱动的项目规划方法,包括从历史记录中学习项目网络,发现项目中的时间约束,并在多个项目计划变体中寻找最佳的项目计划。
  • results: 该研究使用两个实际项目数据集,显示了该方法可以为项目规划人员提供显著的灵活性(最多减少项目 kritical path 26%),以便调整项目计划和时间表。
    Abstract Our focus is on projects, i.e., business processes, which are emerging as the economic drivers of our times. Differently from day-to-day operational processes that do not require detailed planning, a project requires planning and resource-constrained scheduling for coordinating resources across sub- or related projects and organizations. A planner in charge of project planning has to select a set of activities to perform, determine their precedence constraints, and schedule them according to temporal project constraints. We suggest a data-driven project planning approach for classes of projects such as infrastructure building and information systems development projects. A project network is first learned from historical records. The discovered network relaxes temporal constraints embedded in individual projects, thus uncovering where planning and scheduling flexibility can be exploited for greater benefit. Then, the network, which contains multiple project plan variations, from which one has to be selected, is enriched by identifying decision rules and frequent paths. The planner can rely on the project network for: 1) decoding a project variation such that it forms a new project plan, and 2) applying resource-constrained project scheduling procedures to determine the project's schedule and resource allocation. Using two real-world project datasets, we show that the suggested approach may provide the planner with significant flexibility (up to a 26% reduction of the critical path of a real project) to adjust the project plan and schedule. We believe that the proposed approach can play an important part in supporting decision making towards automated data-driven project planning.
    摘要 First, a project network is learned from historical records. The discovered network relaxes temporal constraints embedded in individual projects, thus uncovering where planning and scheduling flexibility can be exploited for greater benefit. Then, the network, which contains multiple project plan variations, from which one has to be selected, is enriched by identifying decision rules and frequent paths. The planner can rely on the project network for:1. Decoding a project variation so that it forms a new project plan, and2. Applying resource-constrained project scheduling procedures to determine the project's schedule and resource allocation.Using two real-world project datasets, we show that the suggested approach may provide the planner with significant flexibility (up to a 26% reduction of the critical path of a real project) to adjust the project plan and schedule. We believe that the proposed approach can play an important part in supporting decision making towards automated data-driven project planning.

A New Approach to Intuitionistic Fuzzy Decision Making Based on Projection Technology and Cosine Similarity Measure

  • paper_url: http://arxiv.org/abs/2311.11539
  • repo_url: None
  • paper_authors: Jing Yang, Wei Su
  • for: 这个论文的目的是提出一种基于投影技术和偏好相似度度量的多属性决策方法和医疗诊断方法,用于处理基于Intuitionistic Fuzzy Set(IFS)的不确定和不完整信息。
  • methods: 该方法使用了投影技术和偏好相似度度量来评估IFS之间的相似性。具体来说,该方法首先将IFS转换成一个高维空间中的点集,然后使用投影技术将这些点集投影到一个低维空间中,最后使用偏好相似度度量来评估这些点集之间的相似性。
  • results: compare with existed methods, the proposed method can identify the optimal scheme more accurately. In medical diagnosis area, it can quickly diagnose disease. The proposed method enriches the exist-ing similarity measure methods and it can be applied to not only IFSs, but also other interval-valued intuitionistic fuzzy sets(IVIFSs) as well.
    Abstract For a multi-attribute decision making (MADM) problem, the information of alternatives under different attributes is given in the form of intuitionistic fuzzy number(IFN). Intuitionistic fuzzy set (IFS) plays an important role in dealing with un-certain and incomplete information. The similarity measure of intuitionistic fuzzy sets (IFSs) has always been a research hotspot. A new similarity measure of IFSs based on the projection technology and cosine similarity measure, which con-siders the direction and length of IFSs at the same time, is first proposed in this paper. The objective of the presented pa-per is to develop a MADM method and medical diagnosis method under IFS using the projection technology and cosine similarity measure. Some examples are used to illustrate the comparison results of the proposed algorithm and some exist-ing methods. The comparison result shows that the proposed algorithm is effective and can identify the optimal scheme accurately. In medical diagnosis area, it can be used to quickly diagnose disease. The proposed method enriches the exist-ing similarity measure methods and it can be applied to not only IFSs, but also other interval-valued intuitionistic fuzzy sets(IVIFSs) as well.
    摘要 For a multi-attribute decision making (MADM) problem, the information of alternatives under different attributes is given in the form of intuitionistic fuzzy numbers (IFN). Intuitionistic fuzzy sets (IFS) play an important role in dealing with uncertain and incomplete information. The similarity measure of intuitionistic fuzzy sets (IFSs) has always been a research hotspot. A new similarity measure of IFSs based on the projection technology and cosine similarity measure, which considers the direction and length of IFSs at the same time, is proposed in this paper for the first time. The objective of the presented paper is to develop a MADM method and medical diagnosis method under IFS using the projection technology and cosine similarity measure. Some examples are used to illustrate the comparison results of the proposed algorithm and some existing methods. The comparison result shows that the proposed algorithm is effective and can identify the optimal scheme accurately. In the medical diagnosis area, it can be used to quickly diagnose diseases. The proposed method enriches the existing similarity measure methods and it can be applied to not only IFSs, but also other interval-valued intuitionistic fuzzy sets (IVIFSs) as well.Here's the word-for-word translation of the text into Simplified Chinese:为多Attribute决策问题,alternatives的不同属性信息给出为Intuitionistic Fuzzy Number(IFN)的形式。Intuitionistic Fuzzy Set(IFS)在处理不确定和不完整信息方面发挥重要作用。IFS的相似度度量在IFS中 Always是研究热点。本文提出了一种基于投影技术和cosine相似度度量的IFS相似度度量,这种度量考虑了IFS的方向和长度同时。本文的目标是使用投影技术和cosine相似度度量来解决IFS下的多Attribute决策问题和医学诊断问题。一些例子用于比较提出的算法和现有方法的比较结果,结果显示,提出的算法是有效的,可以准确地确定优化方案。在医学诊断领域,它可以快速诊断疾病。提出的方法riches存在相似度度量方法,并可以应用于不仅IFS,还有其他间隔值Intuitionistic Fuzzy Set(IVIFS)。

Assessing Prompt Injection Risks in 200+ Custom GPTs

  • paper_url: http://arxiv.org/abs/2311.11538
  • repo_url: None
  • paper_authors: Jiahao Yu, Yuhang Wu, Dong Shu, Mingyu Jin, Xinyu Xing
  • for: 这研究旨在阐述用户自定义GPT模型时存在的安全漏洞,以及这些漏洞的可能的 Mitigation 策略。
  • methods: 该研究通过对超过200个用户自定义GPT模型进行了广泛的靶场测试,并通过对这些系统的敏感Prompt进行了攻击,以证明这些系统容易受到攻击。
  • results: 该研究发现,通过提示攻击,敏感Prompt可以被提取,并且可以访问上传的文件。这些发现告诉我们,在设计和部署自定义GPT模型时,需要建立Robust的安全框架,以避免安全和隐私问题。
    Abstract In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature: customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.
    摘要 在人工智能领域的快速发展中,ChatGPT已广泛应用于多种应用程序。新的特性:用户自定义ChatGPT模型以满足特定需求,打开了新的人工智能实用性前iers。然而,这项研究发现了自定义GPT模型中的一定安全漏洞:提示插入攻击。通过对超过200个用户定制GPT模型进行广泛测试,我们证实了这些系统受到提示插入攻击的漏洞。通过提示插入,敌对者不仅可以提取自定义系统提示,还可以访问上传文件。本文提供了首次的提示插入分析,以及可能的 Mitigation 策略的评估。我们的发现强调了在设计和部署自定义GPT模型时需要建立Robust 安全框架,以确保人工智能技术的发展不受安全和隐私问题的限制。本文的目的是逐itung 人工智能社区,提醒他们不要因为自定义GPT模型的好处而忽略安全和隐私问题的重要性。

ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.11537
  • repo_url: None
  • paper_authors: Yizhao Jin, Greg Slabaugh, Simon Lucas
  • for: 该论文旨在探讨 reinforcement learning (DRL) Agent 如何适应外部任务,并解决过拟合、忘记和样本不足等问题。
  • methods: 该论文提出了一种基于 adapter 的适应策略,并在 nanoRTS 环境中进行实验,证明了该策略可以提高基础 Agent 的训练效率和性能。
  • results: 该论文的实验结果表明,使用 adapter 可以提高基础 Agent 的性能,并且可以与先前训练过的神经网络和规则基础 Agent 集成,以便捕捉人类专家的知识。
    Abstract Deep Reinforcement Learning (DRL) agents frequently face challenges in adapting to tasks outside their training distribution, including issues with over-fitting, catastrophic forgetting and sample inefficiency. Although the application of adapters has proven effective in supervised learning contexts such as natural language processing and computer vision, their potential within the DRL domain remains largely unexplored. This paper delves into the integration of adapters in reinforcement learning, presenting an innovative adaptation strategy that demonstrates enhanced training efficiency and improvement of the base-agent, experimentally in the nanoRTS environment, a real-time strategy (RTS) game simulation. Our proposed universal approach is not only compatible with pre-trained neural networks but also with rule-based agents, offering a means to integrate human expertise.
    摘要 深度强化学习(DRL)代理频繁面临外部任务适应问题,包括过拟合、致命忘记和样本不足。 although adapters have proven effective in supervised learning contexts such as natural language processing and computer vision, their potential within the DRL domain remains largely unexplored. This paper explores the integration of adapters in reinforcement learning, presenting an innovative adaptation strategy that demonstrates enhanced training efficiency and improvement of the base-agent, experimentally in the nanoRTS environment, a real-time strategy (RTS) game simulation. Our proposed universal approach is not only compatible with pre-trained neural networks but also with rule-based agents, offering a means to integrate human expertise.Here's the translation in Traditional Chinese:深度强化学习(DRL)代理频繁面临外部任务适应问题,包括过拟合、致命忘记和样本不足。 although adapters have proven effective in supervised learning contexts such as natural language processing and computer vision, their potential within the DRL domain remains largely unexplored. This paper explores the integration of adapters in reinforcement learning, presenting an innovative adaptation strategy that demonstrates enhanced training efficiency and improvement of the base-agent, experimentally in the nanoRTS environment, a real-time strategy (RTS) game simulation. Our proposed universal approach is not only compatible with pre-trained neural networks but also with rule-based agents, offering a means to integrate human expertise.

Optimal Hyperparameter $ε$ for Adaptive Stochastic Optimizers through Gradient Histograms

  • paper_url: http://arxiv.org/abs/2311.11532
  • repo_url: None
  • paper_authors: Gustavo Silva, Paul Rodriguez
  • for: 这个论文的目的是为了研究和分析适应性优化器的优化器论文,尤其是 Adam 优化器的一些小优化器参数的影响。
  • methods: 该论文使用了 gradient histogram Framework 来分析和证明适应性优化器的优化器参数的优化性和关系。
  • results: 该论文提出了一种基于 gradient histogram 的新算法,可以自动优化 safeguard 参数 $\epsilon$ 的搜索空间,以便更好地找到最佳值。
    Abstract Optimizers are essential components for successfully training deep neural network models. In order to achieve the best performance from such models, designers need to carefully choose the optimizer hyperparameters. However, this can be a computationally expensive and time-consuming process. Although it is known that all optimizer hyperparameters must be tuned for maximum performance, there is still a lack of clarity regarding the individual influence of minor priority hyperparameters, including the safeguard factor $\epsilon$ and momentum factor $\beta$, in leading adaptive optimizers (specifically, those based on the Adam optimizers). In this manuscript, we introduce a new framework based on gradient histograms to analyze and justify important attributes of adaptive optimizers, such as their optimal performance and the relationships and dependencies among hyperparameters. Furthermore, we propose a novel gradient histogram-based algorithm that automatically estimates a reduced and accurate search space for the safeguard hyperparameter $\epsilon$, where the optimal value can be easily found.
    摘要 In this paper, we propose a new framework based on gradient histograms to analyze and justify important attributes of adaptive optimizers, such as their optimal performance and the relationships and dependencies among hyperparameters. Furthermore, we introduce a novel gradient histogram-based algorithm that automatically estimates a reduced and accurate search space for the safeguard hyperparameter $\epsilon$, making it easy to find the optimal value.

GPT in Data Science: A Practical Exploration of Model Selection

  • paper_url: http://arxiv.org/abs/2311.11516
  • repo_url: None
  • paper_authors: Nathalia Nascimento, Cristina Tavares, Paulo Alencar, Donald Cowan
  • for: 本研究旨在探讨大语言模型(LLMs)在处理结构化数据和增强数据科学过程中的可能性,以及这种结合的可靠性和决策方法的问题。
  • methods: 本研究使用了一种变量模型来描述这些因素,并使用了小型数据集来评估模型和实现的规则。
  • results: 研究发现,GPT-4的模型选择建议受到了多种因素的指导,包括数据的性质、问题的类型、性能指标、计算资源、可解释性 vs 准确性、数据假设和伦理考虑。这些因素的权重不同,决定了模型选择的结果。
    Abstract There is an increasing interest in leveraging Large Language Models (LLMs) for managing structured data and enhancing data science processes. Despite the potential benefits, this integration poses significant questions regarding their reliability and decision-making methodologies. It highlights the importance of various factors in the model selection process, including the nature of the data, problem type, performance metrics, computational resources, interpretability vs accuracy, assumptions about data, and ethical considerations. Our objective is to elucidate and express the factors and assumptions guiding GPT-4's model selection recommendations. We employ a variability model to depict these factors and use toy datasets to evaluate both the model and the implementation of the identified heuristics. By contrasting these outcomes with heuristics from other platforms, our aim is to determine the effectiveness and distinctiveness of GPT-4's methodology. This research is committed to advancing our comprehension of AI decision-making processes, especially in the realm of model selection within data science. Our efforts are directed towards creating AI systems that are more transparent and comprehensible, contributing to a more responsible and efficient practice in data science.
    摘要 随着大型语言模型(LLMs)在数据管理和数据科学过程中的潜在应用,人们对其可靠性和决策方法的问题日益增加。这些问题提出了许多因素在选择模型过程中的重要性,包括数据的性质、问题的类型、性能指标、计算资源、解释性vs准确率、数据假设以及伦理考虑。我们的目标是解释和表达引导GPT-4选择模型的因素和假设。我们使用变量模型来描述这些因素,并使用小型数据集来评估模型和标记出的规则。通过与其他平台的假设进行比较,我们的目标是确定GPT-4的方法的效果和特点。这项研究旨在提高我们对人工智能决策过程的理解,特别是数据科学中模型选择的领域。我们的努力是创造更加透明和可读的AI系统,以便更负责任和高效地实践数据科学。

MultiLoRA: Democratizing LoRA for Better Multi-Task Learning

  • paper_url: http://arxiv.org/abs/2311.11501
  • repo_url: None
  • paper_authors: Yiming Wang, Yu Lin, Xiaodong Zeng, Guannan Zhang
  • for: 这个论文的目的是提高LoRA模型在多任务场景中的适应性和性能。
  • methods: 这个论文使用了LoRA模型的水平扩展和初始化参数变换来减少最高特征值的影响,从而实现更好的多任务适应性。
  • results: 与单个LoRAcounterpart和精度调整相比,MultiLoRA模型在多个benchmark和模型缩放上表现出色,只需要2.5%的额外参数。进一步的分析发现MultiLoRA模型的weight更新矩阵中有更多的低级特征值贡献。
    Abstract LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. Since ChatGPT demonstrated superior performance on various tasks, there has been a growing desire to adapt one model for all tasks. However, the explicit low-rank of LoRA limits the adaptation performance in complex multi-task scenarios. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. In this paper, we propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA. MultiLoRA scales LoRA modules horizontally and change parameter initialization of adaptation matrices to reduce parameter dependency, thus yields more balanced unitary subspaces. We unprecedentedly construct specialized training data by mixing datasets of instruction follow, natural language understanding, world knowledge, to cover semantically and syntactically different samples. With only 2.5% of additional parameters, MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple benchmarks and model scales. Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.
    摘要 LoRA 实现了非常出色的资源效率和相对性,当适应特定任务时。由于 ChatGPT 在多种任务上显示出了优秀表现,因此有越来越多的人想要适应一个模型 для所有任务。然而,LoRA 的明确低纬度限制了复杂多任务场景中的适应性能。LoRA 由一小数量的Top singular vectors 控制,而 fine-tuning decomposition 为一组 less important 的单位变换。在这篇论文中,我们提议 MultiLoRA 来改进多任务适应性能。MultiLoRA 将 LoRA 模块拓宽到 Horizontal 方向,并改变适应矩阵的参数初始化,以减少参数依赖性,因此实现更加平衡的单位空间。我们首次混合了 instrucion follow、自然语言理解、世界知识等数据集,以覆盖semantically和syntactically不同的样本。尽管只增加了2.5%的参数,MultiLoRA 仍然超过了单个 LoRA 对手和 fine-tuning 多个benchmark和模型Scale。进一步的调查表明,MultiLoRA 的weight update 矩阵中具有减少 top singular vectors 的依赖性和更多的democratic 单位变换贡献。

Interpretability in Machine Learning: on the Interplay with Explainability, Predictive Performances and Models

  • paper_url: http://arxiv.org/abs/2311.11491
  • repo_url: None
  • paper_authors: Benjamin Leblanc, Pascal Germain
  • for: 提高机器学习模型的理解和应用
  • methods: 通过对机器学习术语的分析和讨论,摘要了对机器学习模型的理解和应用的关系
  • results: 挑战了一些关于机器学习模型的理解和应用的假设和误解,并提出了一些新的思路和方法来提高机器学习模型的理解和应用
    Abstract Interpretability has recently gained attention in the field of machine learning, for it is crucial when it comes to high-stakes decisions or troubleshooting. This abstract concept is hard to grasp and has been associated, over time, with many labels and preconceived ideas. In this position paper, in order to clarify some misunderstandings regarding interpretability, we discuss its relationship with significant concepts in machine learning: explainability, predictive performances, and machine learning models. For instance, we challenge the idea that interpretability and explainability are substitutes to one another, or that a fixed degree of interpretability can be associated with a given machine learning model.
    摘要 优先级化的Machine Learning领域中,可解性(Interpretability)在最近几年来受到了关注,因为它在高风险决策或疑难解答中具有重要性。这个抽象的概念很难理解,而且在时间的推移中,与多种标签和先入为主的想法相关联。在这篇position paper中,我们希望通过讨论可解性与机器学习模型之间的关系,以及与预测性能和其他核心概念的关系,来清楚一些有关可解性的误解。例如,我们挑战了认为可解性和解释性是纯粹的替代品,或者一个固定的可解性水平可以与某种机器学习模型相关联。

A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records

  • paper_url: http://arxiv.org/abs/2311.11483
  • repo_url: None
  • paper_authors: Lin Lawrence Guo, Jason Fries, Ethan Steinberg, Scott Lanyon Fleming, Keith Morse, Catherine Aftandilian, Jose Posada, Nigam Shah, Lillian Sung
    for: 这种研究旨在检验基础模型在不同医院之间是否可以进行共享和适应,以提高AI在医疗领域的可扩展性和成本效益。methods: 这个研究使用了一个已经发布的结构化医疗记录基础模型($FM_{SM}$),通过在长期医疗记录数据上进行训练,来评估基础模型在不同医院的适应性。研究还使用了两个不同的医院的EHR数据集,包括Stanford医学院的2.57万名患者的 longitudinal医疗记录数据和MIMIC-IV数据集。results: 研究发现,通过继续在本地数据上进行预训练,可以大幅提高基础模型的适应性和任务适应性,而无需大量的本地训练数据。此外,基础模型在8个临床预测任务上的表现也证明了其在不同医院之间的适应性。
    Abstract Foundation models hold promise for transforming AI in healthcare by providing modular components that are easily adaptable to downstream healthcare tasks, making AI development more scalable and cost-effective. Structured EHR foundation models, trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across different hospitals and their performance for local task adaptation. This multi-center study examined the adaptability of a recently released structured EHR foundation model ($FM_{SM}$), trained on longitudinal medical record data from 2.57M Stanford Medicine patients. Experiments were conducted using EHR data at The Hospital for Sick Children and MIMIC-IV. We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of training models from scratch at each site, including a local foundation model. We evaluated the performance of these models on 8 clinical prediction tasks. In both datasets, adapting the off-the-shelf $FM_{SM}$ matched the performance of GBM models locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. With continued pretraining on local data, label efficiency substantially improved, such that $FM_{SM}$ required fewer than 1% of training examples to match the fully trained GBM's performance. Continued pretraining was also 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings show that adapting shared EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.
    摘要 基础模型在医疗领域的应用拥有潜在的优势,它们可以提供可重用的组件,使AI开发变得更加扩展和成本效果。使用结构化医疗记录数据进行训练的基础模型($FM_{SM}$),已经在医疗任务上显示出了多种优点,包括使用 fewer 的训练标签和更好的分布shift的Robustness。然而,共享这些模型在不同医院中的可行性以及其对本地任务适应性的性能仍然存在问题。本多中心研究检查了这种基础模型的适应性,通过在医疗儿童医院和MIMIC-IV数据集上进行实验。我们通过继续预训这些模型地方数据来评估其适应性和任务适应性,并与基于 scratch 训练的本地模型进行比较。我们对8种临床预测任务进行了评估。在两个数据集中,适应 $FM_{SM} $ 与基于GBM模型的本地训练相比,提供了13%的提升,而且在具有少量任务特定训练标签的情况下,适应 $FM_{SM} $ 只需要 fewer than 1%的训练示例来匹配完全训练的GBM模型的性能。继续预训使得样本效率明显提高,例如,$FM_{SM}$ 只需要 fewer than 1%的训练示例来匹配完全训练的GBM模型的性能。此外,继续预训还比基于 scratch 训练的本地模型采样更加高效,60%-90%。我们的发现表明,在医疗机构之间共享基础模型可以提供更好的预测性能,同时降低开发成本,从而赞成基础模型作为医疗AI开发的模块化组件。

Meta Prompting for AGI Systems

  • paper_url: http://arxiv.org/abs/2311.11482
  • repo_url: https://github.com/meta-prompting/meta-prompting
  • paper_authors: Yifan Zhang
  • for: 本研究探讨了一种新的技术——Meta Prompting,它改变了大型语言模型(LLM)、多 modal基础模型和人工智能系统如何解决问题和 интерпретирова数据的方式。
  • methods: Meta Prompting 基于类型理论和类型理论,强调信息结构和 sintaxis,提供了一种独特的框架,超越传统内容强调的方法。
  • results: 本研究表明,Meta Prompting 在多种 AI 应用中具有优势,特别是在复杂的问题解决方面。它能够将复杂的问题破解成可管理的子问题,实现了Token效率和 fair comparison在问题解决方面的优势。此外,本研究还推广了 Meta Prompting 到多 modal基础模型设置,实现了不同数据类型的集成,如图像、音频和视频,并提出了在这些数据类型之间的挑战和潜在的应用前景。
    Abstract This paper presents an in-depth exploration of Meta Prompting, a novel technique that revolutionizes the way large language models (LLMs), multi-modal foundation models, and AI systems approach problem-solving and data interpretation. Meta Prompting, rooted in type theory and category theory, prioritizes the structure and syntax of information, providing a unique framework that transcends traditional content-focused methods. We delve into the formal definitions of Meta Prompting, contrasting it with Few-Shot Prompting, and highlight its applicability and superiority in various AI applications. Key to this exploration is the expansion of Meta Prompting into the realm of complex reasoning. Here, we demonstrate how this technique adeptly breaks down intricate problems into manageable sub-problems, facilitating a step-by-step, detailed approach to problem-solving. This method proves especially advantageous in terms of token efficiency and offering a fair comparison in problem-solving scenarios, standing out against few-shot example approaches. Furthermore, the paper breaks new ground by extending Meta Prompting into multi-modal foundation model settings. This extension addresses the integration of diverse data types, such as images, audio, and video, within the structured framework of Meta Prompting, highlighting both the challenges and the vast potential of this approach in handling complex, multi-faceted data (The code is available at https://github.com/meta-prompting/meta-prompting).
    摘要

Empowering remittance management in the digitised landscape: A real-time Data-Driven Decision Support with predictive abilities for financial transactions

  • paper_url: http://arxiv.org/abs/2311.11476
  • repo_url: None
  • paper_authors: Rashikala Weerawarna, Shah J Miah
  • for: 这项研究是为了为区块链技术(BT)驱动的财付转场景下的决策支持系统设计一个数据驱动的预测决策支持方法。
  • methods: 该研究采用了 teoría-generating 设计科学研究(DSR)方法,通过对交易大数据的分析,揭示了预测 capability 的emergence。该方法结合了预测分析和机器学习(ML),实现了实时的财付转监测,帮助管理决策者在适应 digitized 的区块链 oriented 财付转公司面临的挑战时作出更好的决策。
  • results: 该研究不仅增强了财付转公司的安全性,还为未来的预测决策支持解决方案提供了基础,扩展了 predictive analytics 的潜在应用领域。此外,通过实施 artifact 的 theory,DSR 方法得到了进一步的发展和深入的 theory 发展在信息系统领域。
    Abstract The advent of Blockchain technology (BT) revolutionised the way remittance transactions are recorded. Banks and remittance organisations have shown a growing interest in exploring blockchain's potential advantages over traditional practices. This paper presents a data-driven predictive decision support approach as an innovative artefact designed for the blockchain-oriented remittance industry. Employing a theory-generating Design Science Research (DSR) approach, we have uncovered the emergence of predictive capabilities driven by transactional big data. The artefact integrates predictive analytics and Machine Learning (ML) to enable real-time remittance monitoring, empowering management decision-makers to address challenges in the uncertain digitised landscape of blockchain-oriented remittance companies. Bridging the gap between theory and practice, this research not only enhances the security of the remittance ecosystem but also lays the foundation for future predictive decision support solutions, extending the potential of predictive analytics to other domains. Additionally, the generated theory from the artifact's implementation enriches the DSR approach and fosters grounded and stakeholder theory development in the information systems domain.
    摘要 随着区块链技术(BT)的出现,财务转移交易的记录方式发生了革命。银行和财务转移机构对区块链的潜在优势表示了增加的兴趣。这篇论文描述了一种数据驱动的预测决策支持方法,作为区块链 Orientated 财务转移industry 的创新艺ifact。通过使用理论生成的设计科学研究(DSR)方法,我们揭示了交易大数据驱动的预测能力的出现。该艺ifact结合预测分析和机器学习(ML),以实时监控财务转移,让管理决策者能够在不确定的区块链 Orientated 财务转移公司 digitized 景观中 Address 挑战。将理论与实践相连,这项研究不仅提高了财务转移生态系统的安全性,还为未来的预测决策支持解决方案 lay 基础,扩展预测分析的潜在应用领域。此外,艺ifact的实施生成了理论,扩充了DSR方法,并为信息系统领域的grounded 和参与者理论发展提供了基础。

CSGNN: Conquering Noisy Node labels via Dynamic Class-wise Selection

  • paper_url: http://arxiv.org/abs/2311.11473
  • repo_url: None
  • paper_authors: Yifan Li, Zhen Tan, Kai Shu, Zongsheng Cao, Yu Kong, Huan Liu
    for: 这篇论文的目的是提出一种新的graph neural network(GNN)选择方法,以提高GNN在graph上的表现学习。methods: 本篇论文使用了一种新的邻居总和 latent space,以适应不同类别的范例选择。具体来说,这篇论文使用了一种动态类别选择机制,通过类别概率的调整,对不同类别的范例进行适应性选择。此外,本篇论文还使用了一种内部阶层学习方法,以避免过滤错误的范例。results: 根据实验结果,CSGNN比顶尖方法更高效和更稳定,具体来说,CSGNN在零散测验中的表现比顶尖方法高出30%以上。此外,CSGNN还能够在不同类别的范例上提高表现,并且能够避免过滤错误的范例。
    Abstract Graph Neural Networks (GNNs) have emerged as a powerful tool for representation learning on graphs, but they often suffer from overfitting and label noise issues, especially when the data is scarce or imbalanced. Different from the paradigm of previous methods that rely on single-node confidence, in this paper, we introduce a novel Class-wise Selection for Graph Neural Networks, dubbed CSGNN, which employs a neighbor-aggregated latent space to adaptively select reliable nodes across different classes. Specifically, 1) to tackle the class imbalance issue, we introduce a dynamic class-wise selection mechanism, leveraging the clustering technique to identify clean nodes based on the neighbor-aggregated confidences. In this way, our approach can avoid the pitfalls of biased sampling which is common with global threshold techniques. 2) To alleviate the problem of noisy labels, built on the concept of the memorization effect, CSGNN prioritizes learning from clean nodes before noisy ones, thereby iteratively enhancing model performance while mitigating label noise. Through extensive experiments, we demonstrate that CSGNN outperforms state-of-the-art methods in terms of both effectiveness and robustness.
    摘要 格点神经网络(GNNs)已经成为图像 Representation Learning 的强大工具,但它们经常受到过拟合和标签噪声问题的影响,特别是数据稀缺或不均衡时。与先前的方法不同,我们在本文中引入了一种新的类别选择方法,称为 CSGNN,它使用邻居积分的隐藏空间来适应不同类型的可靠节点选择。具体来说,我们解决了类别不均衡问题的方法是,通过 clustering 技术来识别干净的节点,基于邻居积分的信任度。这种方法可以避免全球阈值技术中的偏袋 sampling 问题。此外,我们使用了 memorization effect 的概念,在不同类型的节点上偏好学习干净的节点,从而逐步提高模型性能,同时缓解标签噪声。我们在广泛的实验中证明了 CSGNN 可以与当前状态的方法相比,在效果和稳定性两个方面具有优势。

cs.CL - 2023-11-20

Unifying Corroborative and Contributive Attributions in Large Language Models

  • paper_url: http://arxiv.org/abs/2311.12233
  • repo_url: None
  • paper_authors: Theodora Worledge, Judy Hanwen Shen, Nicole Meister, Caleb Winston, Carlos Guestrin
  • for: 本研究旨在提供一个统一的大语言模型归因框架,以涵盖现有不同类型的归因方法,包括引用生成和训练数据归因。
  • methods: 本研究使用了现有的归因方法,并将其们 integrate 到一个统一的框架中。
  • results: 本研究提出了一个统一的大语言模型归因框架,可以涵盖现有不同类型的归因方法,并可以用于解释现实世界中的应用场景。
    Abstract As businesses, products, and services spring up around large language models, the trustworthiness of these models hinges on the verifiability of their outputs. However, methods for explaining language model outputs largely fall across two distinct fields of study which both use the term "attribution" to refer to entirely separate techniques: citation generation and training data attribution. In many modern applications, such as legal document generation and medical question answering, both types of attributions are important. In this work, we argue for and present a unified framework of large language model attributions. We show how existing methods of different types of attribution fall under the unified framework. We also use the framework to discuss real-world use cases where one or both types of attributions are required. We believe that this unified framework will guide the use case driven development of systems that leverage both types of attribution, as well as the standardization of their evaluation.
    摘要 As businesses, products, and services spring up around large language models, the trustworthiness of these models hinges on the verifiability of their outputs. However, methods for explaining language model outputs largely fall across two distinct fields of study, both of which use the term "attribution" to refer to entirely separate techniques: citation generation and training data attribution. In many modern applications, such as legal document generation and medical question answering, both types of attributions are important. In this work, we argue for and present a unified framework of large language model attributions. We show how existing methods of different types of attribution fall under the unified framework. We also use the framework to discuss real-world use cases where one or both types of attributions are required. We believe that this unified framework will guide the use case-driven development of systems that leverage both types of attribution, as well as the standardization of their evaluation.Here's the translation in Traditional Chinese:当商业、产品和服务逐渐发展around大型自然语言模型时,这些模型的可信度将直接受到其输出的可追溯性影响。然而,对于大型自然语言模型的输出解释方法主要分布在两个不同的领域中,它们都使用“参考”这个名称来描述完全不同的技术:引用生成和训练数据参考。在现代应用中,如法律文件生成和医疗问题回答,都需要这两种参考。在这个工作中,我们认为并提出了一个统一框架,以涵盖现有不同类型的参考方法。我们还使用这个框架来讨论实际应用中需要一或二种参考的问题。我们相信这个统一框架将导引use case驱动的开发系统,以及参考评估的标准化。

Leveraging Closed-Access Multilingual Embedding for Automatic Sentence Alignment in Low Resource Languages

  • paper_url: http://arxiv.org/abs/2311.12179
  • repo_url: https://github.com/abumafrim/cohere-align
  • paper_authors: Idris Abdulmumin, Auwal Abubakar Khalid, Shamsuddeen Hassan Muhammad, Ibrahim Said Ahmad, Lukman Jibril Aliyu, Babangida Sani, Bala Mairiga Abduljalil, Sani Ahmad Hassan
  • for: 本研究旨在提高机器翻译中的质量,通过使用高质量的并行数据集来改进翻译模型的性能。
  • methods: 本研究使用了Cohere多语言嵌入的关闭访问,并开发了一个简单 yet 高效的并行句子对齐器。
  • results: 该方法在FLORES和MAFAND-MT数据集上达到了$94.96$和$54.83$的f1分数,与LASER相比具有了显著的改进(超过5个BLEU分数)。
    Abstract The importance of qualitative parallel data in machine translation has long been determined but it has always been very difficult to obtain such in sufficient quantity for the majority of world languages, mainly because of the associated cost and also the lack of accessibility to these languages. Despite the potential for obtaining parallel datasets from online articles using automatic approaches, forensic investigations have found a lot of quality-related issues such as misalignment, and wrong language codes. In this work, we present a simple but qualitative parallel sentence aligner that carefully leveraged the closed-access Cohere multilingual embedding, a solution that ranked second in the just concluded #CoHereAIHack 2023 Challenge (see https://ai6lagos.devpost.com). The proposed approach achieved $94.96$ and $54.83$ f1 scores on FLORES and MAFAND-MT, compared to $3.64$ and $0.64$ of LASER respectively. Our method also achieved an improvement of more than 5 BLEU scores over LASER, when the resulting datasets were used with MAFAND-MT dataset to train translation models. Our code and data are available for research purposes here (https://github.com/abumafrim/Cohere-Align).
    摘要 In this work, we present a simple yet effective parallel sentence aligner that leverages the closed-access Cohere multilingual embedding. This approach achieved F1 scores of $94.96$ and $54.83$ on FLORES and MAFAND-MT, respectively, outperforming LASER by more than 5 BLEU scores. Our method and code are available for research purposes at https://github.com/abumafrim/Cohere-Align.

Human Learning by Model Feedback: The Dynamics of Iterative Prompting with Midjourney

  • paper_url: http://arxiv.org/abs/2311.12131
  • repo_url: https://github.com/shachardon/mid-journey-to-alignment
  • paper_authors: Shachar Don-Yehiya, Leshem Choshen, Omri Abend
  • for: 本研究探讨了在生成图像时,用户需要进行多次尝试,并通过反馈来更新提示,以实现更好的图像生成。
  • methods: 该研究采用了文本到图像模型,并对用户的提示进行分析,以了解用户在尝试过程中的行为。
  • results: 研究发现,用户的提示在尝试过程中会predictably converge到特定的特征,并且这种吸引力可能是由于用户意外地忽略了重要细节,或者是由于模型的偏好,生成更加适合特定语言风格的图像。
    Abstract Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image. Taking inspiration from cognitive work on reference games and dialogue alignment, this paper analyzes the dynamics of the user prompts along such iterations. We compile a dataset of iterative interactions of human users with Midjourney. Our analysis then reveals that prompts predictably converge toward specific traits along these iterations. We further study whether this convergence is due to human users, realizing they missed important details, or due to adaptation to the model's ``preferences'', producing better images for a specific language style. We show initial evidence that both possibilities are at play. The possibility that users adapt to the model's preference raises concerns about reusing user data for further training. The prompts may be biased towards the preferences of a specific model, rather than align with human intentions and natural manner of expression.
    摘要 通常,通过文本到图像模型生成图像需要多次尝试,用户会在反馈基础上不断更新提示。以认知工作的参考游戏和对话Alignment为 inspirations,这篇论文分析了用户提示的动态。我们编译了人类用户与Midjourney的多轮交互的数据集。我们的分析显示,提示逐渐趋向特定特征。我们进一步研究了这种吸引力是由于用户注意到重要细节的不足,或者是由于模型的偏好而生成更好的图像。我们发现了这两种可能性。用户适应模型的偏好可能会导致 reuse user data for further training,但是这些提示可能会受到模型的偏好而不是人类的意图和自然表达方式。

LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

  • paper_url: http://arxiv.org/abs/2311.12023
  • repo_url: https://github.com/hanguo97/lq-lora
  • paper_authors: Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim
  • for: 本研究旨在提出一种简单的方法,以提高预训练语言模型的内存效率。
  • methods: 该方法使用迭代算法将预训练矩阵分解成高精度低级组件和内存效率量化组件。在finetuning时,量化组件保持不变,只有低级组件进行更新。我们还提出了一种基于欧几里得方程的量化组件的整数线性编程方法,可以在给定的总内存预算下动态配置量化参数(例如,比特宽和块大小)。
  • results: 我们的研究表明,使用我们的low-rank plus quantized matrix decomposition方法(LQ-LoRA)可以超过强QLoRA和GPTQ-LoRA基elines,并且可以实现更加致命的量化。例如,在OpenAssistant标准 benchmark上,LQ-LoRA可以学习一个2.5比特LLaMA-2模型,与4比特QLoRA基eline相当。此外,当finetuning在语言模型调整数据集上时,LQ-LoRA也可以用于模型压缩,其中2.75比特LLaMA-2-70B模型(其中2.85比特是包括低级组件的平均值,需要27GB的GPU内存)与原始模型相当。
    Abstract We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming formulation of the quantization component which enables dynamic configuration of quantization parameters (e.g., bit-width, block size) for each matrix given an overall target memory budget. We further explore a data-aware version of the algorithm which uses an approximation of the Fisher information matrix to weight the reconstruction objective during matrix decomposition. Experiments on adapting RoBERTa and LLaMA-2 (7B and 70B) demonstrate that our low-rank plus quantized matrix decomposition approach (LQ-LoRA) outperforms strong QLoRA and GPTQ-LoRA baselines and moreover enables more aggressive quantization. For example, on the OpenAssistant benchmark LQ-LoRA is able to learn a 2.5-bit LLaMA-2 model that is competitive with a model finetuned with 4-bit QLoRA. When finetuned on a language modeling calibration dataset, LQ-LoRA can also be used for model compression; in this setting our 2.75-bit LLaMA-2-70B model (which has 2.85 bits on average when including the low-rank components and requires 27GB of GPU memory) is competitive with the original model in full precision.
    摘要 我们提出了一种简单的方法来实现卷积语言模型的内存高效化。我们的方法使用迭代算法将预训练的矩阵 decomposes 成高精度低级成分和内存高效化的量化组件。在训练中,量化组件保持不变,只有低级成分被更新。我们提出了一种整数线性程序表示法,可以在给定总内存预算下动态配置量化参数(比如位宽、块大小)。此外,我们还探讨了基于数据的版本,使用估计的施密特信息矩阵来衡量重建目标时的权重。我们的LQ-LoRA方法在适应RoBERTa和LLaMA-2(7B和70B)上进行了实验,比强QLoRA和GPTQ-LoRA基eline高效,并且允许更加谨慎的量化。例如,在OpenAssistant benchmark上,LQ-LoRA可以学习一个2.5位的LLaMA-2模型,与4位QLoRA基eline相当。当训练在语言模型准确性调整数据集上时,LQ-LoRA还可以用于模型压缩;在这种情况下,我们的2.75位LLaMA-2-70B模型(具有2.85位的平均位数,需要27GB的GPU内存)与原始模型相当。

GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration

  • paper_url: http://arxiv.org/abs/2311.12015
  • repo_url: None
  • paper_authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi
  • for: 该论文目的是提高一种通用视觉语言模型,以便更好地控制 робо。
  • methods: 该系统使用视频中人类行为的观察,创建可执行的 робо程序,并将环境和动作细节转化为文本。然后,使用 GPT-4 掌握任务规划,并使用视觉系统重新分析视频,以便更好地了解手持物体的时间和方式。
  • results: 实验结果表明,该方法可以在不同的场景下,由人类示例而不需要更多的训练,快速地将人类示例转化为机器人操作。
    Abstract We introduce a pipeline that enhances a general-purpose Vision Language Model, GPT-4V(ision), by integrating observations of human actions to facilitate robotic manipulation. This system analyzes videos of humans performing tasks and creates executable robot programs that incorporate affordance insights. The computation starts by analyzing the videos with GPT-4V to convert environmental and action details into text, followed by a GPT-4-empowered task planner. In the following analyses, vision systems reanalyze the video with the task plan. Object names are grounded using an open-vocabulary object detector, while focus on the hand-object relation helps to detect the moment of grasping and releasing. This spatiotemporal grounding allows the vision systems to further gather affordance data (e.g., grasp type, way points, and body postures). Experiments across various scenarios demonstrate this method's efficacy in achieving real robots' operations from human demonstrations in a zero-shot manner. The prompts of GPT-4V/GPT-4 are available at this project page: https://microsoft.github.io/GPT4Vision-Robot-Manipulation-Prompts/
    摘要 我们介绍一个管道,把通用视觉语言模型GPT-4V(视觉)与人类动作观察结合,以便为机器人操作提供帮助。这个系统分析视频中人们完成任务的方式,并生成可执行的机器人程序,并包括环境和动作详细信息。计算开始于使用GPT-4V分析视频,并将环境和动作详细信息转换为文本。接着,使用GPT-4 empowered task planner进行计划。然后,视觉系统重新分析视频,并使用开放词汇对象检测器将对象名称固定。强调手对象关系可以检测抓取和释放的时刻。这种空间时间固定 Allow the vision system to further gather affordance data (e.g., grasp type, way points, and body postures).实验在不同enario中展示了这种方法在实现人类示例动作下的机器人操作 Zero-shot manner 的能力。GPT-4V/GPT-4的提示可以在这个项目页面上找到:https://microsoft.github.io/GPT4Vision-Robot-Manipulation-Prompts/Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

H-COAL: Human Correction of AI-Generated Labels for Biomedical Named Entity Recognition

  • paper_url: http://arxiv.org/abs/2311.11981
  • repo_url: None
  • paper_authors: Xiaojing Duan, John P. Lalor
  • for: 这个论文是为了解决人工智能生成标签的准确性问题,提出了一种新的人工 corrections of AI-generated labels (H-COAL) 框架。
  • methods: 该框架使用了一种排名算法,可以选择性地更正 AI 生成的标签,以达到黄金标准性表现(100% 的人工标注),但需要远少于人工努力。
  • results: 研究发现,对标签的5% 的更正可以提高 AI 和人类表现的差距,相对提高64%;对标签的20% 的更正可以提高 AI 和人类表现的差距,相对提高86%。
    Abstract With the rapid advancement of machine learning models for NLP tasks, collecting high-fidelity labels from AI models is a realistic possibility. Firms now make AI available to customers via predictions as a service (PaaS). This includes PaaS products for healthcare. It is unclear whether these labels can be used for training a local model without expensive annotation checking by in-house experts. In this work, we propose a new framework for Human Correction of AI-Generated Labels (H-COAL). By ranking AI-generated outputs, one can selectively correct labels and approach gold standard performance (100% human labeling) with significantly less human effort. We show that correcting 5% of labels can close the AI-human performance gap by up to 64% relative improvement, and correcting 20% of labels can close the performance gap by up to 86% relative improvement.
    摘要 随着机器学习模型在自然语言处理任务中的快速发展,收集高品质标注从AI模型是一个现实性的可能性。现在,公司通过预测为服务(PaaS)提供AI给客户。这包括医疗领域的Paas产品。然而,是否可以使用这些标注来训练本地模型,而无需高昂的人工标注检查,是一个未知的问题。在这项工作中,我们提出了一个新的人工纠正AI生成标注的框架(H-COAL)。通过对AI生成输出进行排名,可以选择性地纠正标注,并接近金标准性表现(100%人工标注),但是具有显著更少的人工努力。我们显示,只纠正5%的标注可以减少AI与人性能差距的相对改善,达到64%的相对改善;只纠正20%的标注可以减少AI与人性能差距的相对改善,达到86%的相对改善。

On the Potential and Limitations of Few-Shot In-Context Learning to Generate Metamorphic Specifications for Tax Preparation Software

  • paper_url: http://arxiv.org/abs/2311.11979
  • repo_url: None
  • paper_authors: Dananjay Srinivas, Rohan Das, Saeid Tizpaz-Niari, Ashutosh Trivedi, Maria Leonor Pacheco
  • for: 本研究旨在提高法定税软件的正确性,以避免税务纠纷和罚款。
  • methods: 本研究使用了变态测试,以测试和调试法定税软件。变态测试可以帮助找出软件中的错误和漏洞。
  • results: 本研究提出了一种基于自然语言和首领逻辑的方法,用于自动生成变态规则。这种方法可以帮助减少人工干预,提高测试效率和正确性。
    Abstract Due to the ever-increasing complexity of income tax laws in the United States, the number of US taxpayers filing their taxes using tax preparation software (henceforth, tax software) continues to increase. According to the U.S. Internal Revenue Service (IRS), in FY22, nearly 50% of taxpayers filed their individual income taxes using tax software. Given the legal consequences of incorrectly filing taxes for the taxpayer, ensuring the correctness of tax software is of paramount importance. Metamorphic testing has emerged as a leading solution to test and debug legal-critical tax software due to the absence of correctness requirements and trustworthy datasets. The key idea behind metamorphic testing is to express the properties of a system in terms of the relationship between one input and its slightly metamorphosed twinned input. Extracting metamorphic properties from IRS tax publications is a tedious and time-consuming process. As a response, this paper formulates the task of generating metamorphic specifications as a translation task between properties extracted from tax documents - expressed in natural language - to a contrastive first-order logic form. We perform a systematic analysis on the potential and limitations of in-context learning with Large Language Models(LLMs) for this task, and outline a research agenda towards automating the generation of metamorphic specifications for tax preparation software.
    摘要 The core idea behind metamorphic testing is to express the properties of a system in terms of the relationship between one input and its slightly modified twin input. However, extracting metamorphic properties from IRS tax publications is a time-consuming and laborious process. To address this challenge, this paper proposes the task of generating metamorphic specifications as a translation task between properties extracted from tax documents (expressed in natural language) and a contrastive first-order logic form.We conduct a systematic analysis of the potential and limitations of in-context learning with Large Language Models (LLMs) for this task and outline a research agenda towards automating the generation of metamorphic specifications for tax preparation software.

Context-aware Neural Machine Translation for English-Japanese Business Scene Dialogues

  • paper_url: http://arxiv.org/abs/2311.11976
  • repo_url: https://github.com/su0315/discourse_context_mt
  • paper_authors: Sumire Honda, Patrick Fernandes, Chrysoula Zerva
  • for: 这个论文的目的是提高现有的神经机器翻译模型(NMT)的性能,以便更好地翻译英日商务对话。
  • methods: 这篇论文使用了预训练的mBART模型,并在多句对话数据上进行了微调。这 позволяет作者们实验不同的上下文大小和extra-sentential信息编码方法。
  • results: 作者们发现,模型可以利用上一句和extra-sentential context(通过CXMI指标提高),并且在增加上下文大小和包含场景和speaker信息时,翻译质量有所提高, measured by BLEU和COMET指标。
    Abstract Despite the remarkable advancements in machine translation, the current sentence-level paradigm faces challenges when dealing with highly-contextual languages like Japanese. In this paper, we explore how context-awareness can improve the performance of the current Neural Machine Translation (NMT) models for English-Japanese business dialogues translation, and what kind of context provides meaningful information to improve translation. As business dialogue involves complex discourse phenomena but offers scarce training resources, we adapted a pretrained mBART model, finetuning on multi-sentence dialogue data, which allows us to experiment with different contexts. We investigate the impact of larger context sizes and propose novel context tokens encoding extra-sentential information, such as speaker turn and scene type. We make use of Conditional Cross-Mutual Information (CXMI) to explore how much of the context the model uses and generalise CXMI to study the impact of the extra-sentential context. Overall, we find that models leverage both preceding sentences and extra-sentential context (with CXMI increasing with context size) and we provide a more focused analysis on honorifics translation. Regarding translation quality, increased source-side context paired with scene and speaker information improves the model performance compared to previous work and our context-agnostic baselines, measured in BLEU and COMET metrics.
    摘要 尽管机器翻译技术有了很大的进步,但当 dealing with 高度上下文语言如日语时,当前句子水平的模型遇到了挑战。在这篇论文中,我们explore了如何使用上下文意识来提高当前的神经机器翻译(NMT)模型在英日商务对话翻译中的表现,以及哪些类型的上下文提供了有用的信息以改善翻译。商务对话 involve 复杂的语言现象,但受训资源匮乏,我们采用了预训练的 mBART 模型,并在多句话对话数据上进行了微调。我们研究了不同上下文大小的影响,并提出了新的上下文符号编码方式,包括发言人Turn和场景类型。我们使用 Conditional Cross-Mutual Information(CXMI)来探索模型如何使用上下文,并推广 CXMI 来研究额外上下文的影响。总之,我们发现模型可以利用上一句和extra-sentential context(CXMI 随上下文大小增加),并提供了更加专注的 honorifics 翻译分析。在翻译质量方面,增加源语言的上下文和场景信息可以提高模型的表现,比之前的工作和我们的上下文无关基线, measured 在 BLEU 和 COMET метриках中。

Adaptive Training Distributions with Scalable Online Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2311.11973
  • repo_url: None
  • paper_authors: David Grangier, Pierre Ablin, Awni Hannun
  • for: 这篇论文是关于现代机器学习中大脑网络的预训练和应用领域之间的匹配问题。
  • methods: 该论文提出了一种基于在线双层优化问题的算法,以适应小样本数据的情况。该算法优先计算在训练点上的梯度,以提高目标分布上的损失。
  • results: 该论文通过实验表明,在某些情况下,该方法可以超过现有领域适应技术,但在其他情况下可能不成功。该论文还提出了一种简单的测试方法,以评估该方法在不同情况下的效果。
    Abstract Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this paradigm, the distribution of the large, heterogeneous pretraining data rarely matches that of the application domain. This work considers modifying the pretraining distribution in the case where one has a small sample of data reflecting the targeted test conditions. We propose an algorithm motivated by a recent formulation of this setting as an online, bilevel optimization problem. With scalability in mind, our algorithm prioritizes computing gradients at training points which are likely to most improve the loss on the targeted distribution. Empirically, we show that in some cases this approach is beneficial over existing strategies from the domain adaptation literature but may not succeed in other cases. We propose a simple test to evaluate when our approach can be expected to work well and point towards further research to address current limitations.
    摘要 大型神经网络在现代机器学习中处于中心位置,这些神经网络通常在庞大的网络数据上进行预训练。在这种情况下,预训练数据的分布与应用领域的分布rarely匹配。本文考虑在有限个数据点反映Targeted测试条件时修改预训练分布。我们提出一种基于最近的online, bilateral优化问题的算法。以可扩展性为目标,我们的算法在训练点上计算梯度,以提高Targeted分布上的损失。实际证明了,在某些情况下,我们的方法可以超越现有的领域适应Literature中的策略,但在其他情况下可能无法成功。我们提出一种简单的测试方法来评估我们的方法在哪些情况下能够效果,并指出了进一步研究的方向。

Automatic Analysis of Substantiation in Scientific Peer Reviews

  • paper_url: http://arxiv.org/abs/2311.11967
  • repo_url: None
  • paper_authors: Yanzhu Guo, Guokan Shang, Virgile Rennard, Michalis Vazirgiannis, Chloé Clavel
  • for: 提高 AI 会议中异常评审的自动化质量控制方法。
  • methods: 使用科学 peer review 中的声明-证据对 extracted 问题,并使用 argued mining 系统自动分析评审的支持程度。
  • results: 使用 SubstanReview 数据集进行数据分析,获得 NLP 会议评审质量的有用洞察。
    Abstract With the increasing amount of problematic peer reviews in top AI conferences, the community is urgently in need of automatic quality control measures. In this paper, we restrict our attention to substantiation -- one popular quality aspect indicating whether the claims in a review are sufficiently supported by evidence -- and provide a solution automatizing this evaluation process. To achieve this goal, we first formulate the problem as claim-evidence pair extraction in scientific peer reviews, and collect SubstanReview, the first annotated dataset for this task. SubstanReview consists of 550 reviews from NLP conferences annotated by domain experts. On the basis of this dataset, we train an argument mining system to automatically analyze the level of substantiation in peer reviews. We also perform data analysis on the SubstanReview dataset to obtain meaningful insights on peer reviewing quality in NLP conferences over recent years.
    摘要 随着顶尖AI会议中的问题评审量度的增加,学术社区紧迫需要自动质量控制机制。在这篇论文中,我们仅考虑证据——一种受欢迎的质量特征,表示评论中的laim是否得到了足够的证据支持。我们提供一种自动评估这个问题的解决方案。为 достичь这个目标,我们首先将问题定义为科学 peer review 中的laim-evidence对 Extraction问题,并收集了 SubstanReview,这是第一个对这个任务进行注释的数据集。SubstanReview包含550篇来自NLP会议的评论,由领域专家注释。基于这个数据集,我们训练了一个 Argument Mining 系统,以自动分析 peer review 中的证据水平。我们还对 SubstanReview 数据集进行了数据分析,从而获得了对 NLP会议最近几年 peer review 质量的有用发现。

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions

  • paper_url: http://arxiv.org/abs/2311.11904
  • repo_url: None
  • paper_authors: Songhao Han, Le Zhuo, Yue Liao, Si Liu
  • for: 提高图像分类的精度和可解释性,提出了一种新的图像分类框架,即Iterative Optimization with Visual Feedback(短回归遍游 optimization with visual feedback)。
  • methods: 该方法首先使用大语言模型(LLM)生成图像分类器,然后使用一种演化优化策略来优化类别描述符。这个过程中,我们将视觉反馈从VLM分类指标中引入,以帮助优化过程具体化。
  • results: 我们在多种图像分类 benchmark上进行了实验,并 obtianed 3.47%的平均提升率,比存在的方法高。此外,我们还发现,通过使用这些描述符,可以在不同的底层模型上实现更好的性能。
    Abstract Vision-language models (VLMs) offer a promising paradigm for image classification by comparing the similarity between images and class embeddings. A critical challenge lies in crafting precise textual representations for class names. While previous studies have leveraged recent advancements in large language models (LLMs) to enhance these descriptors, their outputs often suffer from ambiguity and inaccuracy. We identify two primary causes: 1) The prevalent reliance on textual interactions with LLMs, leading to a mismatch between the generated text and the visual content in VLMs' latent space - a phenomenon we term the "explain without seeing" dilemma. 2) The oversight of the inter-class relationships, resulting in descriptors that fail to differentiate similar classes effectively. To address these issues, we propose a novel image classification framework combining VLMs with LLMs, named Iterative Optimization with Visual Feedback. In particular, our method develops an LLM-based agent, employing an evolutionary optimization strategy to refine class descriptors. Crucially, we incorporate visual feedback from VLM classification metrics, thereby guiding the optimization process with concrete visual data. Our method leads to improving accuracy on a wide range of image classification benchmarks, with 3.47\% average gains over state-of-the-art methods. We also highlight the resulting descriptions serve as explainable and robust features that can consistently improve the performance across various backbone models.
    摘要 vision-language模型(VLM)提供了一个有前途的思路,通过比较图像和分类embedding之间的相似性来进行图像分类。然而,一个挑战是制定精确的文本表述来描述分类名称。在先前的研究中,人们利用大型语言模型(LLM)来增强这些描述器,但其输出经常受到不确定性和不准确性的影响。我们认为这有两个主要原因:1)文本与LLM的交互过多,导致VLM的 latent space中的文本与图像之间存在匹配问题,我们称之为“解释无法看到”的困难。2)缺乏分类关系的考虑,导致描述器无法分类类型效果地区分类。为了解决这些问题,我们提出了一种 combining VLM和LLM的图像分类框架,名为Iterative Optimization with Visual Feedback。具体来说,我们的方法通过利用进化优化策略来练化分类描述器。关键是,我们在VLM的分类指标上 incorporate visual feedback,以导航优化过程中的具体视觉数据。我们的方法在各种图像分类benchmark上显示3.47%的平均提升,并且显示出的描述器为可解释和稳定的特征,可以在不同的背景模型上进行改进表现。

Evil Geniuses: Delving into the Safety of LLM-based Agents

  • paper_url: http://arxiv.org/abs/2311.11855
  • repo_url: https://github.com/t1ans1r/evil-geniuses
  • paper_authors: Yu Tian, Xiao Yang, Jingyuan Zhang, Yinpeng Dong, Hang Su
  • for: 这篇论文探讨了LLM-基于代理的安全问题,以寻求答案。
  • methods: 该论文采用了手动囚犯提示和虚拟聊天对LLM-基于代理进行了系列的安全检测和评估。
  • results: 研究发现:1)LLM-基于代理agent具有减少的抗攻击能力。2)被攻击后的agent可以提供更加细腻的回应。3)检测生成的不当回应的困难度更高。这些现象提醒我们LLM-基于代理agent的安全性存在问题,并且在不同的角色专业水平和系统/代理层面都存在漏洞。
    Abstract The rapid advancements in large language models (LLMs) have led to a resurgence in LLM-based agents, which demonstrate impressive human-like behaviors and cooperative capabilities in various interactions and strategy formulations. However, evaluating the safety of LLM-based agents remains a complex challenge. This paper elaborately conducts a series of manual jailbreak prompts along with a virtual chat-powered evil plan development team, dubbed Evil Geniuses, to thoroughly probe the safety aspects of these agents. Our investigation reveals three notable phenomena: 1) LLM-based agents exhibit reduced robustness against malicious attacks. 2) the attacked agents could provide more nuanced responses. 3) the detection of the produced improper responses is more challenging. These insights prompt us to question the effectiveness of LLM-based attacks on agents, highlighting vulnerabilities at various levels and within different role specializations within the system/agent of LLM-based agents. Extensive evaluation and discussion reveal that LLM-based agents face significant challenges in safety and yield insights for future research. Our code is available at https://github.com/T1aNS1R/Evil-Geniuses.
    摘要 大量语言模型(LLM)的快速进步导致了LLM基于代理的复活,这些代理展现出人类化行为和合作能力在不同的互动和策略设计中。然而,评估LLM基于代理的安全性仍然是一个复杂的挑战。这篇论文通过手动囚室询问和虚拟聊天带动的邪恶天才团队(Evil Geniuses)进行了系列的探索,以全面探讨LLM基于代理的安全性问题。我们的调查发现了三个吸引注意的现象:1)LLM基于代理的代理具有较弱的抗攻击性。2)遭受攻击的代理可以提供更细腻的回应。3)检测生产的不当回应的可能性更高。这些发现促使我们质疑LLM基于代理的攻击是否有效,并高亮了系统/代理中LLM基于代理的代理存在的漏洞和不同角色尝试的攻击性。我们的评估和讨论表明,LLM基于代理的安全性面临着重大挑战,并提供了未来研究的发展方向。我们的代码可以在https://github.com/T1aNS1R/Evil-Geniuses上获取。

Deepparse : An Extendable, and Fine-Tunable State-Of-The-Art Library for Parsing Multinational Street Addresses

  • paper_url: http://arxiv.org/abs/2311.11846
  • repo_url: None
  • paper_authors: David Beauchemin, Marouane Yassine
  • for: 本文旨在提供一个开源、可编辑、可精度调整的地址分解解决方案,用于解决多国地址的分解问题。
  • methods: 本文使用了深度学习算法来实现地址分解,并在60多个国家的数据上进行了训练。
  • results: 据说,本文的预训练模型在训练国家上达到了99%的分解精度,而且不需要预处理或后处理。此外,库还支持自定义地址分解器的生成。
    Abstract Segmenting an address into meaningful components, also known as address parsing, is an essential step in many applications from record linkage to geocoding and package delivery. Consequently, a lot of work has been dedicated to develop accurate address parsing techniques, with machine learning and neural network methods leading the state-of-the-art scoreboard. However, most of the work on address parsing has been confined to academic endeavours with little availability of free and easy-to-use open-source solutions. This paper presents Deepparse, a Python open-source, extendable, fine-tunable address parsing solution under LGPL-3.0 licence to parse multinational addresses using state-of-the-art deep learning algorithms and evaluated on over 60 countries. It can parse addresses written in any language and use any address standard. The pre-trained model achieves average $99~\%$ parsing accuracies on the countries used for training with no pre-processing nor post-processing needed. Moreover, the library supports fine-tuning with new data to generate a custom address parser.
    摘要 分析地址的各个 Component 是许多应用程序中的重要步骤,从record linkage到地理编码和快递配送。因此,很多工作都投入到了发展高精度地址分析技术中,其中机器学习和神经网络方法现在领先于所有其他方法。然而,大多数地址分析工作都受到了学术研究的限制,而且有限的免费和易用的开源解决方案。本文介绍了Deepparse,一个基于Python的开源、可扩展、可调整地址分析解决方案,采用LGPL-3.0许可证。它可以分析来自60多个国家的多国语言地址,使用当前最先进的深度学习算法。无需预处理或后处理,模型可以达到99%的平均分析精度。此外,库支持自定义地址分析器的自定义。

How to Use Large Language Models for Text Coding: The Case of Fatherhood Roles in Public Policy Documents

  • paper_url: http://arxiv.org/abs/2311.11844
  • repo_url: None
  • paper_authors: Lorenzo Lupo, Oscar Magnusson, Dirk Hovy, Elin Naurin, Lena Wängnerud
  • for: 这项研究旨在评估大语言模型(LLM)在政治科学文本分析方面的应用,以及如何使用LLM进行文本编码。
  • methods: 本研究使用了三种原创的非英文政治科学文本编码任务,并提供了一个通用的LLM使用工作流程。
  • results: 研究发现,当提供了详细的标签定义和编码示例时,一个LLM可以与人类标注员相当或甚至更好,具有更快的速度(达百度倍的速度)、更低的成本(至多60%比人类编码便宜)和更易扩展到大量文本。总之,LLMs 是大多数文本编码项目的可靠选择。
    Abstract Recent advances in large language models (LLMs) like GPT-3 and GPT-4 have opened up new opportunities for text analysis in political science. They promise automation with better results and less programming. In this study, we evaluate LLMs on three original coding tasks of non-English political science texts, and we provide a detailed description of a general workflow for using LLMs for text coding in political science research. Our use case offers a practical guide for researchers looking to incorporate LLMs into their research on text analysis. We find that, when provided with detailed label definitions and coding examples, an LLM can be as good as or even better than a human annotator while being much faster (up to hundreds of times), considerably cheaper (costing up to 60% less than human coding), and much easier to scale to large amounts of text. Overall, LLMs present a viable option for most text coding projects.
    摘要 (Simplified Chinese translation)最近的大语言模型(LLM)如GPT-3和GPT-4的发展,为政治科学中的文本分析带来了新的机遇。它们提供了更好的自动化结果和更少的编程。在这项研究中,我们对非英文政治科学文本进行了三项原创编码任务的评估,并提供了用于在政治科学研究中使用LLM进行文本编码的通用工作流程的详细描述。我们的使用 случа子为研究人员寻求在研究中使用LLM的实践指南。我们发现,当给LLM提供详细的标签定义和编码示例时,LLM可以和人工标注员相当或者更好,而且比人工标注更快(可以达到百万倍),更加便宜(可以达到60%的成本减少),并且更易扩展到大量文本。总之,LLMs 是大多数文本编码项目的可靠选择。

Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule

  • paper_url: http://arxiv.org/abs/2311.11813
  • repo_url: None
  • paper_authors: Andrey Bout, Alexander Podolskiy, Sergey Nikolenko, Irina Piontkovskaya
  • for: 提高神经语法错误修正(GEC)的进步受限于缺乏高质量的手动标注数据。
  • methods: 我们提出了两个方法来更好地使用可用数据:一是采用预先训练的auxiliary任务,二是调整训练数据的顺序和实例顺序。
  • results: 我们的方法可以达到 significan improvements,比如使用小型模型(400M参数)超越最佳T5-XXL(11B参数)模型。
    Abstract Progress in neural grammatical error correction (GEC) is hindered by the lack of annotated training data. Sufficient amounts of high-quality manually annotated data are not available, so recent research has relied on generating synthetic data, pretraining on it, and then fine-tuning on real datasets; performance gains have been achieved either by ensembling or by using huge pretrained models such as XXL-T5 as the backbone. In this work, we explore an orthogonal direction: how to use available data more efficiently. First, we propose auxiliary tasks that exploit the alignment between the original and corrected sentences, such as predicting a sequence of corrections. We formulate each task as a sequence-to-sequence problem and perform multi-task training. Second, we discover that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance, so we set out to find the best training schedule. Together, these two ideas lead to significant improvements, producing results that improve state of the art with much smaller models; in particular, we outperform the best models based on T5-XXL (11B parameters) with a BART-based model (400M parameters).
    摘要 进步在神经 grammatical error correction(GEC)方面受到欠缺标注训练数据的阻碍。实际上, sufficient amounts of high-quality manually annotated data 不可得,所以latest research 仅可靠生成 sintethic data,先行预训练,然后精确地训练 real datasets; performance 增长仅可能通过 ensemble 或使用巨大的 pretrained models 如 XXL-T5 作为 backbone。在这个工作中,我们探索了一个 orthogonal 方向:如何更有效地使用可用的数据。首先,我们提出了auxiliary tasks,利用原始和修复句子之间的对齐,例如预测修复序列。我们将每个任务推定为一个 sequence-to-sequence 问题,并进行多任务训练。其次,我们发现了训练和获得数据的顺序和个别实例在数据集中的顺序可能具有重要的影响,因此我们展开了寻找最佳训练计划。总之,这两个想法共同带来了重要的改进,生成了比前一代模型更好的结果,特别是我们超越了基于 T5-XXL(11B 个参数)的最佳模型,使用 BART 型基本模型(400M 个参数)。

Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

  • paper_url: http://arxiv.org/abs/2311.11745
  • repo_url: None
  • paper_authors: Jungil Kong, Junmo Lee, Jeongmin Kim, Beomjeong Kim, Jihoon Park, Dohee Kong, Changheon Lee, Sangjin Kim
  • for: 模型多个说话者的特征表达,包括各种说话者的特征特征,如嗓音、语速、语调等。
  • methods: 提出一种新的方法,通过精细化特征和conditioning方法来表达目标说话者的speech特征,从而实现不需要额外训练目标说话者的数据集。
  • results: 对比seen说话者的best-performing多说话者模型,提出的方法在主观相似度评价中获得了显著更高的相似性mean opinion score(SMOS),并且在零戳法中也表现出了显著的优势。此外,方法还可以生成新的人工说话者,并且表明编码的秘密特征具有足够的信息可以重建原始说话者的speech。
    Abstract In this work, we propose a novel method for modeling numerous speakers, which enables expressing the overall characteristics of speakers in detail like a trained multi-speaker model without additional training on the target speaker's dataset. Although various works with similar purposes have been actively studied, their performance has not yet reached that of trained multi-speaker models due to their fundamental limitations. To overcome previous limitations, we propose effective methods for feature learning and representing target speakers' speech characteristics by discretizing the features and conditioning them to a speech synthesis model. Our method obtained a significantly higher similarity mean opinion score (SMOS) in subjective similarity evaluation than seen speakers of a best-performing multi-speaker model, even with unseen speakers. The proposed method also outperforms a zero-shot method by significant margins. Furthermore, our method shows remarkable performance in generating new artificial speakers. In addition, we demonstrate that the encoded latent features are sufficiently informative to reconstruct an original speaker's speech completely. It implies that our method can be used as a general methodology to encode and reconstruct speakers' characteristics in various tasks.
    摘要 在这项工作中,我们提出了一种新的方法,用于模拟多个说话者,该方法可以详细表达说话者的总体特征,就如同一个训练过的多说话者模型,无需额外训练目标说话者的数据集。虽然过去有很多类似的研究,但它们的性能还未达到多说话者模型的水平,这是因为它们的基本限制。为了超越这些限制,我们提出了有效的特征学习方法和表达目标说话者的speech特征,通过离散特征和conditioning它们到一个speech生成模型。我们的方法在主观相似性评价中获得了比较高的相似性mean opinion score(SMOS), même avec des speakers不 connus。此外,我们的方法还在生成新的人工说话者方面表现出色,并且表明了编码的秘密特征足够具有重建原始说话者的speech的能力。这意味着我们的方法可以用于通用的编码和重建说话者特征的方法ологи。

Addressing the Length Bias Problem in Document-Level Neural Machine Translation

  • paper_url: http://arxiv.org/abs/2311.11601
  • repo_url: https://github.com/salvation-z/D2DToolkits
  • paper_authors: Zhuocheng Zhang, Shuhao Gu, Min Zhang, Yang Feng
  • for: 解决文本翻译 lengths bias问题,提高文本翻译质量
  • methods: 提出了改进DNMT模型的训练方法、注意机制和解码策略
  • results: 实验结果显示,我们的方法可以在多个公开数据集上带来显著改进,并且分析结果表明,我们的方法可以有效缓解 lengths bias问题。Here’s the full Chinese text in simplified Chinese characters:
  • for: 这个研究是为了解决文本翻译 lengths bias问题,提高文本翻译质量。
  • methods: 我们提出了改进DNMT模型的训练方法、注意机制和解码策略。
  • results: 实验结果显示,我们的方法可以在多个公开数据集上带来显著改进,并且分析结果表明,我们的方法可以有效缓解 lengths bias问题。
    Abstract Document-level neural machine translation (DNMT) has shown promising results by incorporating more context information. However, this approach also introduces a length bias problem, whereby DNMT suffers from significant translation quality degradation when decoding documents that are much shorter or longer than the maximum sequence length during training. %i.e., the length bias problem. To solve the length bias problem, we propose to improve the DNMT model in training method, attention mechanism, and decoding strategy. Firstly, we propose to sample the training data dynamically to ensure a more uniform distribution across different sequence lengths. Then, we introduce a length-normalized attention mechanism to aid the model in focusing on target information, mitigating the issue of attention divergence when processing longer sequences. Lastly, we propose a sliding window strategy during decoding that integrates as much context information as possible without exceeding the maximum sequence length. The experimental results indicate that our method can bring significant improvements on several open datasets, and further analysis shows that our method can significantly alleviate the length bias problem.
    摘要 文档级神经机器翻译(DNMT)已经显示了有前途的结果,通过包含更多上下文信息。然而,这种方法也会导致长度偏见问题,DNMT在训练时documenmt decoding documents that are much shorter or longer than the maximum sequence length during training. то есть,长度偏见问题。为解决长度偏见问题,我们提议通过训练方法、注意机制和decoding策略进行改进。首先,我们提议在训练数据中采样 dynamically ensure a more uniform distribution across different sequence lengths。然后,我们引入length-normalized attention mechanism,以帮助模型关注目标信息,避免长序信息处理时的注意力散失。最后,我们提议在decoding中使用滑块策略,integrate as much context information as possible without exceeding the maximum sequence length。实验结果表明,我们的方法可以在多个公开数据集上提供显著改进,并且分析表明,我们的方法可以有效缓解长度偏见问题。

Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions

  • paper_url: http://arxiv.org/abs/2311.11598
  • repo_url: https://github.com/thunlp-mt/fiig
  • paper_authors: Ziyue Wang, Chi Chen, Peng Li, Yang Liu
  • for: 这个论文主要是为了提高大语言模型(LLM)在视觉问答 зада务中的表现,以及使LLM能够更好地利用图像信息。
  • methods: 这篇论文使用了一种框架,允许LLM提问更多细节信息,以及一些筛选器来约束生成的信息。
  • results: 论文的实验结果表明,使用这种框架和筛选器可以持续提高OK-VQA和A-OKVQA的基eline方法表现,具体来说,平均提高2.15%的性能。
    Abstract Large Language Models (LLMs) demonstrate impressive reasoning ability and the maintenance of world knowledge not only in natural language tasks, but also in some vision-language tasks such as open-domain knowledge-based visual question answering (OK-VQA). As images are invisible to LLMs, researchers convert images to text to engage LLMs into the visual question reasoning procedure. This leads to discrepancies between images and their textual representations presented to LLMs, which consequently impedes final reasoning performance. To fill the information gap and better leverage the reasoning capability, we design a framework that enables LLMs to proactively ask relevant questions to unveil more details in the image, along with filters for refining the generated information. We validate our idea on OK-VQA and A-OKVQA. Our method continuously boosts the performance of baselines methods by an average gain of 2.15% on OK-VQA, and achieves consistent improvements across different LLMs.
    摘要

How well ChatGPT understand Malaysian English? An Evaluation on Named Entity Recognition and Relation Extraction

  • paper_url: http://arxiv.org/abs/2311.11583
  • repo_url: https://github.com/mohanraj-nlp/chatgpt-malaysian-english
  • paper_authors: Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam
  • for: 本研究用于评估 chatGPT 在马来西亚英语新闻文本(MEN)上的实体抽取和关系抽取能力。
  • methods: 本研究采用了三步方法,即教育-预测-评估(educate-predict-evaluate)。
  • results: chatGPT 在马来西亚英语新闻文本中的实体抽取性能不高,最高 F1 分为 0.497。进一步分析发现,马来西亚英语中的 morphosyntactic 变化减少了 chatGPT 的性能。然而,这种 morphosyntactic 变化并不影响 chatGPT 的关系抽取性能。
    Abstract Recently, ChatGPT has attracted a lot of interest from both researchers and the general public. While the performance of ChatGPT in named entity recognition and relation extraction from Standard English texts is satisfactory, it remains to be seen if it can perform similarly for Malaysian English. Malaysian English is unique as it exhibits morphosyntactic and semantical adaptation from local contexts. In this study, we assess ChatGPT's capability in extracting entities and relations from the Malaysian English News (MEN) dataset. We propose a three-step methodology referred to as \textbf{\textit{educate-predict-evaluate}. The performance of ChatGPT is assessed using F1-Score across 18 unique prompt settings, which were carefully engineered for a comprehensive review. From our evaluation, we found that ChatGPT does not perform well in extracting entities from Malaysian English news articles, with the highest F1-Score of 0.497. Further analysis shows that the morphosyntactic adaptation in Malaysian English caused the limitation. However, interestingly, this morphosyntactic adaptation does not impact the performance of ChatGPT for relation extraction.
    摘要 最近,ChatGPT已经吸引了许多研究者和普通民众的关注。虽然ChatGPT在标准英语文本中的名实体识别和关系提取表现良好,但是还未经过测试是否可以在马来西亚英语文本中表现良好。马来西亚英语独特,它在本地语言上具有语法 sintactic 和semantic 的适应。在这项研究中,我们评估了ChatGPT在马来西亚英语新闻(MEN)数据集中的实体和关系提取能力。我们提出了一种三步方法,称之为“教育-预测-评估”。我们使用F1-Score指标评估ChatGPT在18种不同的提示设定下的表现。从我们的评估结果来看,ChatGPT在马来西亚英语新闻文章中提取实体的表现不佳,最高F1-Score为0.497。进一步分析发现,马来西亚英语中的语法 sintactic 适应限制了ChatGPT的表现。但是奇怪的是,这种语法 sintactic 适应不会影响ChatGPT的关系提取表现。

KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model

  • paper_url: http://arxiv.org/abs/2311.11564
  • repo_url: https://github.com/ngwlh-gl/kbioxlm
  • paper_authors: Lei Geng, Xu Yan, Ziqiang Cao, Juntao Li, Wenjie Li, Sujian Li, Xinjie Zhou, Yang Yang, Jun Zhang
  • for: 本研究旨在提高生物医学领域的自然语言处理模型的多语言能力。
  • methods: 我们提出了一种名为KBioXLM的模型,它将基于XLM-R模型进行知识启发式转换,以适应生物医学领域的多语言需求。我们将三级别的知识启发(实体、事实和段落水平) integrate into 英语版本的医学文献,并设计三种相应的训练任务(实体覆盖、关系覆盖和段落关系预测)。
  • results: 我们通过将英文benchmarks中的多个任务翻译成中文,以及对XLM-R模型进行练习和提升,达到了跨语言零shot和几shot情况下的显著改进,提高了10+点。
    Abstract Most biomedical pretrained language models are monolingual and cannot handle the growing cross-lingual requirements. The scarcity of non-English domain corpora, not to mention parallel data, poses a significant hurdle in training multilingual biomedical models. Since knowledge forms the core of domain-specific corpora and can be translated into various languages accurately, we propose a model called KBioXLM, which transforms the multilingual pretrained model XLM-R into the biomedical domain using a knowledge-anchored approach. We achieve a biomedical multilingual corpus by incorporating three granularity knowledge alignments (entity, fact, and passage levels) into monolingual corpora. Then we design three corresponding training tasks (entity masking, relation masking, and passage relation prediction) and continue training on top of the XLM-R model to enhance its domain cross-lingual ability. To validate the effectiveness of our model, we translate the English benchmarks of multiple tasks into Chinese. Experimental results demonstrate that our model significantly outperforms monolingual and multilingual pretrained models in cross-lingual zero-shot and few-shot scenarios, achieving improvements of up to 10+ points. Our code is publicly available at https://github.com/ngwlh-gl/KBioXLM.
    摘要 大多数生物医学预训模型都是单语言的,无法满足增长的跨语言要求。因为非英语领域数据的罕见性,以及并不存在并行数据,训练多语言生物医学模型具有 significiant hurdle。我们提议一种名为KBioXLM的模型,通过知识anchor的方法将多语言预训模型XLM-R转换到生物医学领域。我们创建了三级别知识对应(实体、事实、段落),并将其包含到单语言 corpora 中。然后,我们设计了三种相应的训练任务(实体覆盖、关系覆盖、段落关系预测),继续在 XLM-R 模型之上进行训练,以提高其跨语言领域的适应能力。为了证明我们的模型的有效性,我们将英文benchmarks的多个任务翻译成中文。实验结果表明,我们的模型在跨语言零shot和几shot scenarios中与单语言和多语言预训模型相比,提高了10+点的性能。我们的代码公开可用于https://github.com/ngwlh-gl/KBioXLM。

Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning

  • paper_url: http://arxiv.org/abs/2311.11551
  • repo_url: None
  • paper_authors: Quanyu Long, Wenya Wang, Sinno Jialin Pan
  • For: This paper focuses on the problem of Unsupervised Domain Adaptation (UDA) for language models (LLMs) in an in-context learning setting, where the goal is to adapt LLMs from a source domain to a target domain without any target labels.* Methods: The proposed method retrieves a subset of cross-domain elements that are most similar to the query, and elicits the language model to adapt in an in-context manner by learning both the target domain distribution and the discriminative task signal simultaneously with the augmented cross-domain in-context examples. The method uses different prompting and training strategies, accounting for different LM architectures to learn the target distribution via language modeling.* Results: The paper demonstrates significant improvements over baseline models through extensive experiments on Sentiment Analysis (SA) and Named Entity Recognition (NER) tasks, thoroughly studying the effectiveness of In-Context Learning (ICL) for domain transfer.Here is the same information in Simplified Chinese text:* For: 这篇论文关注了无监督领域适应(UDA)语言模型(LLM)在卷积学习设置下,目标是将LLM从源频谱中适应目标频谱 без任何目标标签。* Methods: 提议的方法选择源频谱中最相似的查询,并通过增强跨频谱卷积示例来引导语言模型在卷积学习中适应目标频谱和任务信号。* Results: 论文通过对各种语言模型的启发和训练策略进行了广泛的实验,证明了ICL在频谱转移中的有效性,并在 Sentiment Analysis(SA)和Named Entity Recognition(NER)任务上达到了显著的改进。
    Abstract Large language models (LLMs) have showcased their capability with few-shot inference known as in-context learning. However, in-domain demonstrations are not always readily available in real scenarios, leading to cross-domain in-context learning. Besides, LLMs are still facing challenges in long-tail knowledge in unseen and unfamiliar domains. The above limitations demonstrate the necessity of Unsupervised Domain Adaptation (UDA). In this paper, we study the UDA problem under an in-context learning setting to adapt language models from the source domain to the target domain without any target labels. The core idea is to retrieve a subset of cross-domain elements that are the most similar to the query, and elicit language model to adapt in an in-context manner by learning both target domain distribution and the discriminative task signal simultaneously with the augmented cross-domain in-context examples. We devise different prompting and training strategies, accounting for different LM architectures to learn the target distribution via language modeling. With extensive experiments on Sentiment Analysis (SA) and Named Entity Recognition (NER) tasks, we thoroughly study the effectiveness of ICL for domain transfer and demonstrate significant improvements over baseline models.
    摘要

Multi-teacher Distillation for Multilingual Spelling Correction

  • paper_url: http://arxiv.org/abs/2311.11518
  • repo_url: None
  • paper_authors: Jingfen Zhang, Xuan Guo, Sravan Bodapati, Christopher Potts
  • for: 这个论文是为了解决现代搜索界面中的精准拼写检查问题,特别是在移动设备和语音至文字转换 interfaces 中。
  • methods: 这篇论文使用多教师浸泡法来解决这个问题,其中每种语言/地区都有一个单语言教师模型,这些个教师模型被浸泡到一个多语言学生模型中,以满足所有语言/地区的需求。
  • results: 在使用开源数据以及世界各地搜索服务用户数据的实验中,我们示出了这种方法可以生成高效的拼写检查模型,能够适应部署服务的紧张延迟要求。
    Abstract Accurate spelling correction is a critical step in modern search interfaces, especially in an era of mobile devices and speech-to-text interfaces. For services that are deployed around the world, this poses a significant challenge for multilingual NLP: spelling errors need to be caught and corrected in all languages, and even in queries that use multiple languages. In this paper, we tackle this challenge using multi-teacher distillation. On our approach, a monolingual teacher model is trained for each language/locale, and these individual models are distilled into a single multilingual student model intended to serve all languages/locales. In experiments using open-source data as well as user data from a worldwide search service, we show that this leads to highly effective spelling correction models that can meet the tight latency requirements of deployed services.
    摘要 现代搜索界面中,精准的拼写修正是一项重要的步骤,尤其是在移动设备和语音到文本转换器的时代。为全球部署的服务而言,这对多语言NLP提出了一大挑战:拼写错误需要在所有语言和地区中捕捉和修正。在这篇论文中,我们使用多教师浸泡法来解决这个问题。我们的方法是训练每种语言/地区的单语言教师模型,然后将这些个体模型浸泡到一个可以服务所有语言/地区的多语言学生模型中。在使用开源数据以及全球搜索服务的用户数据进行实验中,我们发现这种方法可以创造出高效的拼写修正模型,可以满足部署服务的紧张延迟要求。

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

  • paper_url: http://arxiv.org/abs/2311.11509
  • repo_url: None
  • paper_authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Vishy Swaminathan
  • for: 本研究旨在提高大型自然语言模型(LLM)对针对攻击的识别,以减少模型在不正常输入 Situation 下的敏感性。
  • methods: 本研究提出了一种基于Token-level检测方法,利用LLM对下一个Token的概率预测来识别针对攻击。研究者们还利用了周围Token信息,以促进检测趋势性的针对攻击序列。
  • results: 研究者们提出了两种方法:一种是判断每个Token是否属于针对攻击序列中的一部分,另一种是估计每个Token是否属于针对攻击序列。两种方法均可以帮助提高LLM对针对攻击的识别能力。
    Abstract In recent years, Large Language Models (LLM) have emerged as pivotal tools in various applications. However, these models are susceptible to adversarial prompt attacks, where attackers can carefully curate input strings that lead to undesirable outputs. The inherent vulnerability of LLMs stems from their input-output mechanisms, especially when presented with intensely out-of-distribution (OOD) inputs. This paper proposes a token-level detection method to identify adversarial prompts, leveraging the LLM's capability to predict the next token's probability. We measure the degree of the model's perplexity and incorporate neighboring token information to encourage the detection of contiguous adversarial prompt sequences. As a result, we propose two methods: one that identifies each token as either being part of an adversarial prompt or not, and another that estimates the probability of each token being part of an adversarial prompt.
    摘要 Recently, Large Language Models (LLM) have become crucial tools in various applications, but they are vulnerable to adversarial prompt attacks. These attacks manipulate the input strings to elicit undesirable outputs from the models. The vulnerability stems from the input-output mechanisms of LLMs, especially when faced with highly out-of-distribution (OOD) inputs. This paper proposes a token-level detection method to identify adversarial prompts, leveraging the LLM's ability to predict the next token's probability. We measure the model's perplexity and incorporate neighboring token information to detect contiguous adversarial prompt sequences. As a result, we propose two methods: one that identifies each token as either part of an adversarial prompt or not, and another that estimates the probability of each token being part of an adversarial prompt.Here's the text in Traditional Chinese:近年来,大语言模型(LLM)已成为不同应用中的重要工具,但它们受到了对抗提示攻击的威胁。这些攻击可以专门设计input字串,以让模型产生不适合的输出。这些攻击的根源在于LLM的输入输出机制,尤其是面对高度out-of-distribution(OOD)的输入。本文提出了一种token级检测方法,利用LLM对下一个字串的概率预测来识别对抗提示。我们 mesure the model的困惑度和包含相邻字串信息,以实现检测连续的对抗提示序列。因此,我们提出了两种方法:一种是将每个字串标记为是否是对抗提示的一部分,另一种是估算每个字串是否是对抗提示的概率。

What’s left can’t be right – The remaining positional incompetence of contrastive vision-language models

  • paper_url: http://arxiv.org/abs/2311.11477
  • repo_url: None
  • paper_authors: Nils Hoehing, Ellen Rushe, Anthony Ventresque
  • for: 这 paper 是为了解释 contrastive vision-language models 缺乏空间理解能力的可能性。
  • methods: 作者通过分析 dataset 和嵌入空间来分析这种现象。他们主要关注简单的左右位置关系,并证明这种行为可以预测,即使使用大规模 dataset。此外,他们还示出可以使用 sintetic data 教导这种关系,并且这种方法可以良好地适应自然图像,提高 Visual Genome Relations 中左右关系的表现。
  • results: 作者的研究表明,通过教导 left-right 位置关系,可以提高 contrastive vision-language models 的空间理解能力。
    Abstract Contrastive vision-language models like CLIP have been found to lack spatial understanding capabilities. In this paper we discuss the possible causes of this phenomenon by analysing both datasets and embedding space. By focusing on simple left-right positional relations, we show that this behaviour is entirely predictable, even with large-scale datasets, demonstrate that these relations can be taught using synthetic data and show that this approach can generalise well to natural images - improving the performance on left-right relations on Visual Genome Relations.
    摘要 clip 类型的视觉语言模型缺乏空间理解能力。在这篇论文中,我们分析了数据集和嵌入空间,探讨了这种现象的可能的原因。通过关注简单的左右位置关系,我们表明这种行为是可预测的,甚至使用大规模数据集,并证明这种方法可以通过人工数据进行教育,并且这种方法可以良好地泛化到自然图像上,提高Visual Genome Relations中的左右关系表现。

cs.LG - 2023-11-20

Improvements in Interlayer Pipelining of CNN Accelerators Using Genetic Algorithms

  • paper_url: http://arxiv.org/abs/2311.12235
  • repo_url: None
  • paper_authors: Mark Horeni, Siddharth Joshi
  • for: 这篇论文是为了提高边缘平台上的卷积神经网络(CNNs)的效率执行。
  • methods: 我们使用了一种层融合技术,使得CNNs可以更有效地执行,并使用生成算法(GA)应用于图形基本排序来减少外部数据传输。
  • results: 我们的方法可以对MobileNet-v3进行适当的优化,实现1.8倍的能源效率提升和1.9倍的能源延迟产品(EDP)提升,并在SIMBA和Eyeriss上保持了1.4倍的EDP提升和1.12倍的EDP提升。
    Abstract Deploying Convolutional Neural Networks (CNNs) on edge platforms necessitates efficient hardware acceleration. Any unnecessary data movement in such accelerators can unacceptably degrade performance and efficiency. To address this, we develop a layer fusion technique targeting CNNs, that reduces off-chip data communication using a Genetic Algorithm (GA) applied to graph-based topological sort. Results show a 1.8$\times$ increase in energy efficiency and 1.9$\times$ improvement in energy-delay product (EDP) for MobileNet-v3 on a SIMBA-like mobile architecture. Our approach consistently improves workload performance, averaging 1.4$\times$ improvement to EDP for SIMBA and 1.12$\times$ for Eyeriss.
    摘要 部署卷积神经网络(CNN)在边缘平台上需要高效的硬件加速。任何没有必要的数据传输在这些加速器中可以不acceptably降低性能和效率。为解决这个问题,我们开发了层融合技术targeting CNNs,减少了off-chip数据通信使用基因算法(GA)应用于图形基于排序。结果显示MobileNet-v3在SIMBA-like移动架构上提高了能效率1.8倍和能量延迟产品(EDP)1.9倍。我们的方法一般改善工作负荷性能,平均提高了SIMBA和Eyeriss的EDP1.4倍和1.12倍。

Data-Guided Regulator for Adaptive Nonlinear Control

  • paper_url: http://arxiv.org/abs/2311.12230
  • repo_url: https://github.com/NiyoushaRahimi/Data-Guided-Regulator-for-Adaptive-Nonlinear-Control
  • paper_authors: Niyousha Rahimi, Mehran Mesbahi
  • for: 这篇论文是关于设计基于数据驱动的反馈控制器,用于处理复杂非线性动力系统中的时变干扰。
  • methods: 该论文提出了一种基于直观政策更新的数据驱动控制策略,可以在不知道系统动力学的情况下实现系统状态的快速稳定。
  • results: 在一个6度自由落体下降控制问题中,该策略能够有效地处理恶境干扰。In English, this would be:
  • for: This paper addresses the problem of designing a data-driven feedback controller for complex nonlinear dynamical systems in the presence of time-varying disturbances with unknown dynamics.
  • methods: The proposed method uses direct policy updates based on data-driven control, which can achieve fast regulation of system states without knowing the system dynamics.
  • results: The proposed method is effective in handling adverse environmental disturbances in a 6-DOF power descent guidance problem.
    Abstract This paper addresses the problem of designing a data-driven feedback controller for complex nonlinear dynamical systems in the presence of time-varying disturbances with unknown dynamics. Such disturbances are modeled as the "unknown" part of the system dynamics. The goal is to achieve finite-time regulation of system states through direct policy updates while also generating informative data that can subsequently be used for data-driven stabilization or system identification. First, we expand upon the notion of "regularizability" and characterize this system characteristic for a linear time-varying representation of the nonlinear system with locally-bounded higher-order terms. "Rapid-regularizability" then gauges the extent by which a system can be regulated in finite time, in contrast to its asymptotic behavior. We then propose the Data-Guided Regulation for Adaptive Nonlinear Control ( DG-RAN) algorithm, an online iterative synthesis procedure that utilizes discrete time-series data from a single trajectory for regulating system states and identifying disturbance dynamics. The effectiveness of our approach is demonstrated on a 6-DOF power descent guidance problem in the presence of adverse environmental disturbances.
    摘要 We first explore the concept of "regularizability" and its application to a linear time-varying representation of the nonlinear system with locally-bounded higher-order terms. This allows us to gauge the extent to which a system can be regulated in finite time, rather than just its asymptotic behavior.Next, we propose the Data-Guided Regulation for Adaptive Nonlinear Control (DG-RAN) algorithm, an online iterative synthesis procedure that uses discrete time-series data from a single trajectory to regulate system states and identify disturbance dynamics. The effectiveness of our approach is demonstrated on a 6-DOF power descent guidance problem in the presence of adverse environmental disturbances.

Random Fourier Signature Features

  • paper_url: http://arxiv.org/abs/2311.12214
  • repo_url: None
  • paper_authors: Csaba Toth, Harald Oberhauser, Zoltan Szabo
  • for: 这篇论文是为了提出一种基于张量代数的时间序列相似度度量方法,以及两种可扩展的时间序列特征提取方法。
  • methods: 这篇论文使用了随机傅立叶特征来加速张量kernel的计算,并使用了最近的张量 проек 来 derivate两种更加可扩展的时间序列特征。
  • results: 论文的实验结果表明,采用随机傅立叶特征来加速张量kernel的计算可以减少计算成本,同时维持准确性。这种方法可以处理中型数据集,并且可以扩展到大型数据集。
    Abstract Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series.
    摘要 tensor代数可以生成序列的最强度 similarity 度量,称为签名kernel,并且拥有attractive的统计分析理论保证。先前的计算签名kernel 方法scales quadratic 方式与序列长度和序列数量相关。为了解决这种严重的计算瓶颈,我们开发了基于Random Fourier Feature的签名kernel 加速方法,对序列的非欧几何空间进行加速。我们证明了对提档 estimator 的 uniform approximation guarantees,并且保持计算线性响应于序列长度和序列数量。此外,通过与近期的tensor projection 技术结合,我们 derive 两种even more scalable 时间序列特征,具有良好的峰度性和计算复杂度。我们的实验结果表明,计算成本的减少来自于精度的减少,并且可以在moderate-sized datasets 上进行扩展,并且可以扩展到大量时间序列数据。

Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation

  • paper_url: http://arxiv.org/abs/2311.12199
  • repo_url: None
  • paper_authors: Chenyang Gao, Yue Gu, Ivan Marsic
  • for: solve excessive label assignment switching and layer-decoupling issues in supervised speech separation using permutation invariant training (PIT)
  • methods: dynamic sample dropout (DSD) and layer-wise optimization (LO)
  • results: outperforms the baseline and improves the performance of speech separation tasks
    Abstract In supervised speech separation, permutation invariant training (PIT) is widely used to handle label ambiguity by selecting the best permutation to update the model. Despite its success, previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments. To address this issue, we propose a novel training strategy, dynamic sample dropout (DSD), which considers previous best label assignments and evaluation metrics to exclude the samples that may negatively impact the learned label assignments during training. Additionally, we include layer-wise optimization (LO) to improve the performance by solving layer-decoupling. Our experiments showed that combining DSD and LO outperforms the baseline and solves excessive label assignment switching and layer-decoupling issues. The proposed DSD and LO approach is easy to implement, requires no extra training sets or steps, and shows generality to various speech separation tasks.
    摘要 <>将文本翻译成简化字符的中文。<>在监督推理中的语音分离方法中,固定 permutation 的训练(PIT)广泛应用于处理标签不确定性,选择最佳 permutation 更新模型。尽管它取得了成功,但前一些研究表明,PIT 受到邻近轮次的标签分配转移的困扰,阻碍模型学习更好的标签分配。为解决这个问题,我们提出了一种新的训练策略,动态样本排除(DSD),该策略考虑了前一些最佳标签分配和评价指标,以排除在训练过程中可能干扰学习的标签分配的样本。此外,我们还包括层 wise 优化(LO),以提高性能,解决层脱decoupling问题。我们的实验表明,将 DSD 和 LO 结合使用可以超越基准,解决频繁标签分配转移和层脱decoupling问题。提议的 DSD 和 LO 方法易于实现,无需额外的训练集或步骤,并且对各种语音分离任务具有通用性。

Node classification in random trees

  • paper_url: http://arxiv.org/abs/2311.12167
  • repo_url: https://github.com/wouterwln/neuralfactortrees
  • paper_authors: Wouter W. L. Nuijten, Vlado Menkovski
  • for: 模型random trees中的节点分类任务
  • methods: 使用Markov网络和Graph Neural Network来定义一个 Gibbs 分布,并使用 MCMC 来采样节点分类结果
  • results: 在Stanford Sentiment Treebank dataset上,方法比基eline表现出色,能够有效地模型节点分类任务中的联合分布。
    Abstract We propose a method for the classification of objects that are structured as random trees. Our aim is to model a distribution over the node label assignments in settings where the tree data structure is associated with node attributes (typically high dimensional embeddings). The tree topology is not predetermined and none of the label assignments are present during inference. Other methods that produce a distribution over node label assignment in trees (or more generally in graphs) either assume conditional independence of the label assignment, operate on a fixed graph topology, or require part of the node labels to be observed. Our method defines a Markov Network with the corresponding topology of the random tree and an associated Gibbs distribution. We parameterize the Gibbs distribution with a Graph Neural Network that operates on the random tree and the node embeddings. This allows us to estimate the likelihood of node assignments for a given random tree and use MCMC to sample from the distribution of node assignments. We evaluate our method on the tasks of node classification in trees on the Stanford Sentiment Treebank dataset. Our method outperforms the baselines on this dataset, demonstrating its effectiveness for modeling joint distributions of node labels in random trees.
    摘要 我们提出一种方法用于对结构化为随机树的对象进行分类。我们的目标是模型在结构化为高维嵌入的树数据结构下的分布 over 节点标签分配。树的结构不固定,并且在推理过程中没有任何节点标签的 observable。现有的方法可以生成节点标签分配的分布在树(或更一般地在图)中,但是它们都假设节点标签之间的 conditional independence,或者操作在固定的图结构上,或者需要一些节点标签的观察值。我们的方法定义了一个Markov网络,其中包含随机树的相应的topology和节点嵌入的相关性。我们使用 Graph Neural Network 来参数化 Gibbs 分布,以便在给定的随机树和节点嵌入下 estimating 节点分配的概率。我们使用 MCMC 来采样这个分布中的节点分配。我们在 Stanford Sentiment Treebank 数据集上进行节点分类任务中,我们的方法比基eline 高效,这说明了我们的方法在模型结构化为随机树的情况下 joint 分布的节点标签的能力。

Creating Temporally Correlated High-Resolution Power Injection Profiles Using Physics-Aware GAN

  • paper_url: http://arxiv.org/abs/2311.12166
  • repo_url: None
  • paper_authors: Hritik Gopal Shah, Behrouz Azimian, Anamitra Pal
  • for: solves the problem of lacking granularity in traditional smart meter measurements, enabling real-time decision-making.
  • methods: uses generative adversarial networks (GAN) with hard inequality constraints and convex optimization layer to enforce temporal consistency and create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information.
  • results: successfully creates high-resolution power injection profiles from slow timescale aggregated power information, offering a promising avenue for improved high-speed state estimation in distribution systems.
    Abstract Traditional smart meter measurements lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated power information obtained from historical smart meter data. The results demonstrate that the model can successfully create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring such systems.
    摘要 传统智能仪表测量lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated power information obtained from historical smart meter data. The results demonstrate that the model can successfully create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring such systems.Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".Here is the text with the same translation, but with Traditional Chinese characters:传统智能仪表测量lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated power information obtained from historical smart meter data. The results demonstrate that the model can successfully create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring such systems.Note: Traditional Chinese is also known as "Traditional Mandarin" or "Formal Chinese".

Quantum Inception Score

  • paper_url: http://arxiv.org/abs/2311.12163
  • repo_url: None
  • paper_authors: Akira Sone, Naoki Yamamoto
  • for: 评估量化宇宙学模型质量
  • methods: 基于量子频率的量子通道识别数据集,并利用量子凝聚和量子混合提高模型质量
  • results: 量子生成模型比 классиical模型提供更高质量,归因于量子凝聚和量子混合的存在,而且利用量子抖动定理Physical Limitation of Quantum Generative Models.
    Abstract Motivated by the great success of classical generative models in machine learning, enthusiastic exploration of their quantum version has recently started. To depart on this journey, it is important to develop a relevant metric to evaluate the quality of quantum generative models; in the classical case, one such examples is the inception score. In this paper, we propose the quantum inception score, which relates the quality to the classical capacity of the quantum channel that classifies a given dataset. We prove that, under this proposed measure, the quantum generative models provide better quality than their classical counterparts because of the presence of quantum coherence and entanglement. Finally, we harness the quantum fluctuation theorem to characterize the physical limitation of the quality of quantum generative models.
    摘要 受古典生成模型在机器学习中的成功启发,现在开始了对其量子版本的积极探索。为进行这项探索,需要开发一个相关的评价指标,以衡量量子生成模型的质量。在古典情况下,一个例子是引入性分数。在这篇论文中,我们提出了量子引入分数,它与给定数据集的量子通道的分类质量有关。我们证明了,根据我们提出的评价指标,量子生成模型比其古典对应者更高质量,这是因为量子干扰和共聚的存在。最后,我们利用量子抖振定理来 caracterize量子生成模型的物理限制。

Risk-averse Batch Active Inverse Reward Design

  • paper_url: http://arxiv.org/abs/2311.12004
  • repo_url: https://github.com/pliam1105/RBAIRD
  • paper_authors: Panagiotis Liampas
    for:This paper proposes a new method called Risk-averse Batch Active Inverse Reward Design (RBAIRD) to help train AI models that can adapt to real-world scenarios and learn how to treat unknown features.methods:RBAIRD uses a series of queries to compute a probability distribution over the intended reward function, and then uses this distribution to construct batches of environments that the agent encounters in the real world. It also integrates a risk-averse planner to ensure safety while the agent is learning the reward function.results:Compared to previous approaches, RBAIRD outperformed in terms of efficiency, accuracy, and action certainty, and demonstrated quick adaptability to new, unknown features. It can be more widely used for the alignment of crucial, powerful AI models.
    Abstract Designing a perfect reward function that depicts all the aspects of the intended behavior is almost impossible, especially generalizing it outside of the training environments. Active Inverse Reward Design (AIRD) proposed the use of a series of queries, comparing possible reward functions in a single training environment. This allows the human to give information to the agent about suboptimal behaviors, in order to compute a probability distribution over the intended reward function. However, it ignores the possibility of unknown features appearing in real-world environments, and the safety measures needed until the agent completely learns the reward function. I improved this method and created Risk-averse Batch Active Inverse Reward Design (RBAIRD), which constructs batches, sets of environments the agent encounters when being used in the real world, processes them sequentially, and, for a predetermined number of iterations, asks queries that the human needs to answer for each environment of the batch. After this process is completed in one batch, the probabilities have been improved and are transferred to the next batch. This makes it capable of adapting to real-world scenarios and learning how to treat unknown features it encounters for the first time. I also integrated a risk-averse planner, similar to that of Inverse Reward Design (IRD), which samples a set of reward functions from the probability distribution and computes a trajectory that takes the most certain rewards possible. This ensures safety while the agent is still learning the reward function, and enables the use of this approach in situations where cautiousness is vital. RBAIRD outperformed the previous approaches in terms of efficiency, accuracy, and action certainty, demonstrated quick adaptability to new, unknown features, and can be more widely used for the alignment of crucial, powerful AI models.
    摘要 <>设计完美的奖励函数,涵盖所有行为方面的目标是几乎不可能,尤其是在训练环境外的泛化。活动逆奖函数设计(AIRD)提议使用一系列的问题,比较可能的奖励函数在单个训练环境中。这 Allow human 为 agent 提供关于不优的行为的信息,以计算概率分布上的奖励函数。然而,它忽略了实际环境中可能出现的未知特征,以及agent完全学习奖励函数时需要的安全措施。我改进了这种方法,创造了风险偏好批量活动逆奖函数设计(RBAIRD),它在实际应用中顺序处理批量环境,并在 predetermined 数量的迭代中请求人类回答每个环境的问题。在这个过程中,概率已经提高,并将被传递到下一个批量中。这使得它可以适应实际应用场景,学习新特征时遇到的 unknown features 的处理方式。我还将逆奖函数设计(IRD)中的风险偏好排定器integrated,该排定器在概率分布中采样一组奖励函数,并计算一个可以获得最大可靠奖励的路径。这确保了安全性,使得 Agent 在学习奖励函数时能够保持综合性,并在需要谨慎的情况下使用这种方法。RBAIRD 在效率、准确性和行动确定性方面超越了先前的方法,快速适应新的未知特征,并可以更广泛地应用于对重要、强大 AI 模型的对Alignment。

Machine-Learned Atomic Cluster Expansion Potentials for Fast and Quantum-Accurate Thermal Simulations of Wurtzite AlN

  • paper_url: http://arxiv.org/abs/2311.11990
  • repo_url: None
  • paper_authors: Guang Yang, Yuan-Bin Liu, Lei Yang, Bing-Yang Cao
  • for: 本研究用 atomic cluster expansion 框架开发了一个机器学习对应材料的声波传输性质的 potential.
  • methods: 本研究使用了 density functional theory (DFT) 和 machine learning 技术来预测 wurtzite aluminum nitride 的热导发性和声波传输性质.
  • results: 研究发现 ACE potential 可以对 wurtzite aluminum nitride 的热导发性和声波传输性质进行高精度预测,并且可以对该材料的对称对称和材料对称对称具有较高的预测能力.
    Abstract Using the atomic cluster expansion (ACE) framework, we develop a machine learning interatomic potential for fast and accurately modelling the phonon transport properties of wurtzite aluminum nitride. The predictive power of the ACE potential against density functional theory (DFT) is demonstrated across a broad range of properties of w-AlN, including ground-state lattice parameters, specific heat capacity, coefficients of thermal expansion, bulk modulus, and harmonic phonon dispersions. Validation of lattice thermal conductivity is further carried out by comparing the ACE-predicted values to the DFT calculations and experiments, exhibiting the overall capability of our ACE potential in sufficiently describing anharmonic phonon interactions. As a practical application, we perform a lattice dynamics analysis using the potential to unravel the effects of biaxial strains on thermal conductivity and phonon properties of w-AlN, which is identified as a significant tuning factor for near-junction thermal design of w-AlN-based electronics.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries. The traditional Chinese form is also available, which is used in Taiwan and Hong Kong. If you prefer traditional Chinese, please let me know.

Provably Efficient CVaR RL in Low-rank MDPs

  • paper_url: http://arxiv.org/abs/2311.11965
  • repo_url: None
  • paper_authors: Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee
  • For: The paper aims to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau$ in low-rank Markov Decision Processes (MDPs) with nonlinear function approximation.* Methods: The paper proposes a novel Upper Confidence Bound (UCB) bonus-driven algorithm that balances exploration, exploitation, and representation learning in CVaR RL. The algorithm uses a discretized Least-Squares Value Iteration (LSVI) algorithm for the CVaR objective as the planning oracle.* Results: The paper achieves a sample complexity of $\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right)$ to yield an $\epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations. The algorithm is provably efficient in low-rank MDPs and can find the near-optimal policy in a polynomial running time with a Maximum Likelihood Estimation oracle.
    Abstract We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau$. Prior theoretical work studying risk-sensitive RL focuses on the tabular Markov Decision Processes (MDPs) setting. To extend CVaR RL to settings where state space is large, function approximation must be deployed. We study CVaR RL in low-rank MDPs with nonlinear function approximation. Low-rank MDPs assume the underlying transition kernel admits a low-rank decomposition, but unlike prior linear models, low-rank MDPs do not assume the feature or state-action representation is known. We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to carefully balance the interplay between exploration, exploitation, and representation learning in CVaR RL. We prove that our algorithm achieves a sample complexity of $\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right)$ to yield an $\epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations. Computational-wise, we design a novel discretized Least-Squares Value Iteration (LSVI) algorithm for the CVaR objective as the planning oracle and show that we can find the near-optimal policy in a polynomial running time with a Maximum Likelihood Estimation oracle. To our knowledge, this is the first provably efficient CVaR RL algorithm in low-rank MDPs.
    摘要 我们研究风险敏感奖励学习(RL),目标是 Maximize Conditional Value at Risk(CVaR),并且固定风险容忍度为 $\tau$。先前的理论研究把注重风险RL固定在文件Markov Decision Processes(MDPs)中。为了扩展CVaR RL到大状态空间中,函数近似必须被应用。我们研究CVaR RL在低级MDPs中,其中过渡概率函数可以分解为低级矩阵。低级MDPs不同于先前的线性模型,不需要状态或行动表示的知识。我们提出了一种新的Upper Confidence Bound(UCB)奖励驱动算法,用于在探索、利用和表示学习之间进行精准的平衡。我们证明我们的算法可以在 $\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right)$ 样本复杂度内获得 $\epsilon$-优的CVaR,其中 $H$ 是每个episode的长度, $A$ 是动作空间的容量, $d$ 是表示的维度。计算机上,我们设计了一种灵活的积分最小二乘值迭代(LSVI)算法来实现CVaR目标,并证明我们可以在对数时间内找到近似优化策略。到我们所知,这是首个可证明高效的CVaR RL算法在低级MDPs中。

Estimation of entropy-regularized optimal transport maps between non-compactly supported measures

  • paper_url: http://arxiv.org/abs/2311.11934
  • repo_url: https://github.com/mattwerenski/entropic-map
  • paper_authors: Matthew Werenski, James M. Murphy, Shuchin Aeron
  • for: 这个论文目的是解决估计带有Entropy regularization的优化运输(EOT)图的问题,其中源和目标概率分布均为几何分布。
  • methods: 该论文使用了一种新的在样本中估计器,并使用了偏置-变量分解以控制偏置和变量的整体影响。
  • results: 论文表明,在一些特定的情况下,预期的$L^2$误差的平方根速率至少为$O(n^{-1/3})$,而在总体情况下,预期的$L^1$误差的幂率至少为$O(n^{-1/6})$。此外,论文还证明了对于不同的常数参数,误差的幂率均为 polynomials。
    Abstract This paper addresses the problem of estimating entropy-regularized optimal transport (EOT) maps with squared-Euclidean cost between source and target measures that are subGaussian. In the case that the target measure is compactly supported or strongly log-concave, we show that for a recently proposed in-sample estimator, the expected squared $L^2$-error decays at least as fast as $O(n^{-1/3})$ where $n$ is the sample size. For the general subGaussian case we show that the expected $L^1$-error decays at least as fast as $O(n^{-1/6})$, and in both cases we have polynomial dependence on the regularization parameter. While these results are suboptimal compared to known results in the case of compactness of both the source and target measures (squared $L^2$-error converging at a rate $O(n^{-1})$) and for when the source is subGaussian while the target is compactly supported (squared $L^2$-error converging at a rate $O(n^{-1/2})$), their importance lie in eliminating the compact support requirements. The proof technique makes use of a bias-variance decomposition where the variance is controlled using standard concentration of measure results and the bias is handled by T1-transport inequalities along with sample complexity results in estimation of EOT cost under subGaussian assumptions. Our experimental results point to a looseness in controlling the variance terms and we conclude by posing several open problems.
    摘要 Our proof technique uses a bias-variance decomposition, where the variance is controlled using standard concentration of measure results, and the bias is handled by T1-transport inequalities and sample complexity results in estimation of EOT cost under subGaussian assumptions. However, our experimental results suggest that there may be looseness in controlling the variance terms, and we conclude by posing several open problems.

Deep Calibration of Market Simulations using Neural Density Estimators and Embedding Networks

  • paper_url: http://arxiv.org/abs/2311.11913
  • repo_url: None
  • paper_authors: Namid R. Stillman, Rory Baggott, Justin Lyon, Jianfei Zhang, Dingqiu Zhu, Tao Chen, Perukrishnen Vytelingum
  • for: This paper is written for those who are interested in developing realistic simulators of financial exchanges, and who want to use deep learning techniques to calibrate these simulators to specific periods of trading.
  • methods: The paper uses deep learning techniques, specifically neural density estimators and embedding networks, to calibrate market simulators to a specific period of trading.
  • results: The paper demonstrates that its approach is able to correctly identify high probability parameter sets, both when applied to synthetic and historical data, and without reliance on manually selected or weighted ensembles of stylised facts.Here’s the Chinese translation of the three pieces of information:
  • for: 这篇论文是为了帮助开发金融交易市场的实际模拟器,并使用深度学习技术来对这些模拟器进行准确的启动。
  • methods: 这篇论文使用深度学习技术,具体来说是神经density估计器和嵌入网络,来对市场模拟器进行准确的启动。
  • results: 这篇论文显示了其方法可以正确地标记高概率参数集,并且无需人工选择或权重组合的简化特征。
    Abstract The ability to construct a realistic simulator of financial exchanges, including reproducing the dynamics of the limit order book, can give insight into many counterfactual scenarios, such as a flash crash, a margin call, or changes in macroeconomic outlook. In recent years, agent-based models have been developed that reproduce many features of an exchange, as summarised by a set of stylised facts and statistics. However, the ability to calibrate simulators to a specific period of trading remains an open challenge. In this work, we develop a novel approach to the calibration of market simulators by leveraging recent advances in deep learning, specifically using neural density estimators and embedding networks. We demonstrate that our approach is able to correctly identify high probability parameter sets, both when applied to synthetic and historical data, and without reliance on manually selected or weighted ensembles of stylised facts.
    摘要 <>使用深度学习技术构建真实的金融交易场景 simulator,包括复制限制Order Book 的动态,可以提供许多 counterfactual enario,如快速衰退、 margin call 或 macroeconomic 景象变化。在过去几年, agent-based 模型已经开发出来,能够复制交易场景的许多特征,如一组 stylised facts 和统计。但是,对 simulator 的准确填充仍然是一个开放的挑战。在这项工作中,我们开发了一种新的方法来填充市场 simulator,利用最新的深度学习技术, Specifically using neural density estimators 和 embedding networks。我们示示了我们的方法可以正确地标识高概率参数集,无需手动选择或Weight ensemble of stylised facts。<>

Certification of Distributional Individual Fairness

  • paper_url: http://arxiv.org/abs/2311.11911
  • repo_url: None
  • paper_authors: Matthew Wicker, Vihari Piratia, Adrian Weller
  • for: 本文研究了机器学习算法的形式保证(certificates),以确保个人公平(individual fairness,IF)。
  • methods: 本文提出了一种新的凸函数方法来快速提供IF保证,并且利用了 quasi-convex 优化技术提供了有效的证明。
  • results: 本文表明了其方法可以覆盖大型神经网络,并且在实际分布偏移中提供了有效的IF保证。
    Abstract Providing formal guarantees of algorithmic fairness is of paramount importance to socially responsible deployment of machine learning algorithms. In this work, we study formal guarantees, i.e., certificates, for individual fairness (IF) of neural networks. We start by introducing a novel convex approximation of IF constraints that exponentially decreases the computational cost of providing formal guarantees of local individual fairness. We highlight that prior methods are constrained by their focus on global IF certification and can therefore only scale to models with a few dozen hidden neurons, thus limiting their practical impact. We propose to certify distributional individual fairness which ensures that for a given empirical distribution and all distributions within a $\gamma$-Wasserstein ball, the neural network has guaranteed individually fair predictions. Leveraging developments in quasi-convex optimization, we provide novel and efficient certified bounds on distributional individual fairness and show that our method allows us to certify and regularize neural networks that are several orders of magnitude larger than those considered by prior works. Moreover, we study real-world distribution shifts and find our bounds to be a scalable, practical, and sound source of IF guarantees.
    摘要 <>转换文本到简化中文。<> socially responsible deployment of machine learning algorithms 需要提供正式的公平 garanties。在这项工作中,我们研究正式的 garanties,即证书, для神经网络的个体公平(IF)。我们开始于引入一种新的凸函数approximation of IF 约束,这些约束可以快速减少提供本地个体公平的计算成本。我们指出先前的方法受到global IF认证的限制,因此只能适用于几十个隐藏神经元的模型,这限制了它们的实际影响。我们提议使用分布式公平认证,以确保神经网络对于给定的 empirical distribution 和所有在 $\gamma $- Wasserstein 球中的分布都有保证的公平预测。通过 quasi-convex 优化技术,我们提供了新的和有效的分布式公平认证 bounds,并证明我们的方法可以 certificates 和规范化神经网络,这些神经网络比先前的工作中考虑的神经网络要多少orders of magnitude larger。此外,我们研究了实际的分布转移和发现我们的 bound 是一种可扩展、实用和有Sound的公平 garanties sources。

Real-Time Surface-to-Air Missile Engagement Zone Prediction Using Simulation and Machine Learning

  • paper_url: http://arxiv.org/abs/2311.11905
  • repo_url: https://github.com/jpadantas/sam-ez
  • paper_authors: Joao P. A. Dantas, Diego Geraldo, Felipe L. L. Medeiros, Marcos R. O. A. Maximo, Takashi Yoneyama
  • for: 这个研究旨在提高现代空中防御系统中的地面到空间导弹(SAM)的效果,特别是考虑到对抗目标的可用空间域(Engagement Zone,EZ)。
  • methods: 本研究提出了一种结合机器学习技术的方法,通过训练有监督的机器学习算法来准确预测SAM EZ。
  • results: 研究发现,这种方法可以快速预测SAM EZ,并提供现场实时的洞察,从而提高SAM系统的性能。
    Abstract Surface-to-Air Missiles (SAMs) are crucial in modern air defense systems. A critical aspect of their effectiveness is the Engagement Zone (EZ), the spatial region within which a SAM can effectively engage and neutralize a target. Notably, the EZ is intrinsically related to the missile's maximum range; it defines the furthest distance at which a missile can intercept a target. The accurate computation of this EZ is essential but challenging due to the dynamic and complex factors involved, which often lead to high computational costs and extended processing times when using conventional simulation methods. In light of these challenges, our study investigates the potential of machine learning techniques, proposing an approach that integrates machine learning with a custom-designed simulation tool to train supervised algorithms. We leverage a comprehensive dataset of pre-computed SAM EZ simulations, enabling our model to accurately predict the SAM EZ for new input parameters. It accelerates SAM EZ simulations, enhances air defense strategic planning, and provides real-time insights, improving SAM system performance. The study also includes a comparative analysis of machine learning algorithms, illuminating their capabilities and performance metrics and suggesting areas for future research, highlighting the transformative potential of machine learning in SAM EZ simulations.
    摘要 现代空中防御系统中,地面对空导弹(SAM)是关键性的。SAM的作战区域(EZ)是指导弹可以有效地侦测和 нейтрализу攻击目标的空间区域。需要注意的是,EZ与导弹的最大范围直接相关,即导弹可以在这个范围内 intercept攻击目标。正确计算EZ是非常重要但也是非常困难的,因为存在许多动态和复杂的因素,这会导致高计算成本和延长的处理时间。为了解决这些挑战,我们的研究探讨了机器学习技术的潜在作用,并提出了一种机器学习与自定义的 simulations 工具集成的方法。我们利用了一个全面的SAM EZ simulations数据集,使得我们的模型可以准确地预测新的输入参数下的SAM EZ。这有助于加速SAM EZ simulations,提高空 defense 战略规划,并提供实时的洞察,从而提高SAM系统性能。研究还包括机器学习算法的比较分析,描述了这些算法的能力和性能指标,并建议了未来研究的方向,强调了机器学习在SAM EZ simulations中的转型性。

Measuring and Mitigating Biases in Motor Insurance Pricing

  • paper_url: http://arxiv.org/abs/2311.11900
  • repo_url: None
  • paper_authors: Mulah Moriah, Franck Vermet, Arthur Charpentier
  • For: The paper aims to provide a comprehensive set of tools for insurers to adopt fairer pricing strategies in the context of automobile insurance, while ensuring consistency and performance.* Methods: The paper uses a range of statistical methodologies and available data to construct optimal pricing structures that align with the overarching corporate strategy and accommodate market competition.* Results: The study assesses the effectiveness of these tools through practical application in the context of automobile insurance, with a focus on equitable premiums, age-based premium fairness, and the consideration of new dimensions for evaluating fairness, such as the presence of serious illnesses or disabilities.
    Abstract The non-life insurance sector operates within a highly competitive and tightly regulated framework, confronting a pivotal juncture in the formulation of pricing strategies. Insurers are compelled to harness a range of statistical methodologies and available data to construct optimal pricing structures that align with the overarching corporate strategy while accommodating the dynamics of market competition. Given the fundamental societal role played by insurance, premium rates are subject to rigorous scrutiny by regulatory authorities. These rates must conform to principles of transparency, explainability, and ethical considerations. Consequently, the act of pricing transcends mere statistical calculations and carries the weight of strategic and societal factors. These multifaceted concerns may drive insurers to establish equitable premiums, taking into account various variables. For instance, regulations mandate the provision of equitable premiums, considering factors such as policyholder gender or mutualist group dynamics in accordance with respective corporate strategies. Age-based premium fairness is also mandated. In certain insurance domains, variables such as the presence of serious illnesses or disabilities are emerging as new dimensions for evaluating fairness. Regardless of the motivating factor prompting an insurer to adopt fairer pricing strategies for a specific variable, the insurer must possess the capability to define, measure, and ultimately mitigate any ethical biases inherent in its pricing practices while upholding standards of consistency and performance. This study seeks to provide a comprehensive set of tools for these endeavors and assess their effectiveness through practical application in the context of automobile insurance.
    摘要 非人寿保险业在高度竞争和严格管制的框架下运作,面临着决定性的价格策略制定之刻。保险公司需要结合多种统计方法和可用数据,构建最佳的价格结构,以实现公司核心策略的协调,同时满足市场竞争的变化。由于保险在社会中扮演着重要的角色,保险费用受到严格的监管和社会要求。因此,价格的确定不仅仅是统计计算,也受到战略和社会因素的影响。这些多方面的因素可能会导致保险公司采取更公平的费用策略,考虑多个变量。例如,法规要求提供公平的费用政策,考虑因素 such as policyholder gender或mutualist group dynamics,与公司战略相符。年龄基本的费用公平也被规定。在某些保险领域,存在严重疾病或残疾的存在是新的评价公平的维度。无论某保险公司采取哪一种公平价格策略,它必须具备定义、测量和最终缓解任何伦理偏见的能力,并保持一致性和性能标准。这项研究的目的是提供一套全面的工具,并评估其效果在汽车保险上。

AMES: A Differentiable Embedding Space Selection Framework for Latent Graph Inference

  • paper_url: http://arxiv.org/abs/2311.11891
  • repo_url: None
  • paper_authors: Yuan Lu, Haitz Sáez de Ocáriz Borde, Pietro Liò
  • for: 这 paper 是为了解决数据集中元素之间的 latent graph inference 问题,使得 Graph Neural Networks (GNNs) 可以在点云数据上进行动态学习必要的图结构。
  • methods: 这 paper 使用了 Attentional Multi-Embedding Selection (AMES) 框架,这是一种可微的方法,通过 backpropagation 来选择最佳的 embedding space,并考虑下游任务。
  • results: comparing 五个 benchmark 数据集,这 paper 的方法可以达到与之前方法相当或更高的结果,而且不需要多次实验来确定最佳的 embedding space。 更重要的是,这 paper 还提供了一种可读性技术,可以跟踪不同的 latent graph 的梯度贡献,从而了解这种 attention-based, fully differentiable 方法如何选择适当的 latent space。
    Abstract In real-world scenarios, although data entities may possess inherent relationships, the specific graph illustrating their connections might not be directly accessible. Latent graph inference addresses this issue by enabling Graph Neural Networks (GNNs) to operate on point cloud data, dynamically learning the necessary graph structure. These graphs are often derived from a latent embedding space, which can be modeled using Euclidean, hyperbolic, spherical, or product spaces. However, currently, there is no principled differentiable method for determining the optimal embedding space. In this work, we introduce the Attentional Multi-Embedding Selection (AMES) framework, a differentiable method for selecting the best embedding space for latent graph inference through backpropagation, considering a downstream task. Our framework consistently achieves comparable or superior results compared to previous methods for latent graph inference across five benchmark datasets. Importantly, our approach eliminates the need for conducting multiple experiments to identify the optimal embedding space. Furthermore, we explore interpretability techniques that track the gradient contributions of different latent graphs, shedding light on how our attention-based, fully differentiable approach learns to choose the appropriate latent space. In line with previous works, our experiments emphasize the advantages of hyperbolic spaces in enhancing performance. More importantly, our interpretability framework provides a general approach for quantitatively comparing embedding spaces across different tasks based on their contributions, a dimension that has been overlooked in previous literature on latent graph inference.
    摘要 在实际场景中,数据实体可能拥有自然的关系,但具体的关系图可能并不直接可访问。 latent graph inference Addresses this issue by enabling Graph Neural Networks (GNNs) to operate on point cloud data,动态学习必要的关系图结构。这些图通常来自于 latent embedding space,可以是欧几何、卷积、球形或产品空间。然而,目前没有原理性的分解ifferentiable方法来确定最佳 embedding space。在这种工作中,我们引入 Attentional Multi-Embedding Selection (AMES) 框架,一种可分解的方法,通过反射传播来选择最佳 embedding space для latent graph inference,考虑到下游任务。我们的框架在五个 benchmark 数据集上 consistently achievable comparable or superior results compared to previous methods for latent graph inference。关键是,我们的方法消除了需要进行多次实验来确定最佳 embedding space 的需求。此外,我们还 explore interpretability techniques ,跟踪不同的 latent graph 的梯度贡献,探讨我们的注意力基于、完全分解的方法如何选择合适的 latent space。与前一些工作一样,我们的实验强调了使用卷积空间的优点,并且我们的可解释框架提供了一种通用的方法来比较不同任务中 embedding space 的贡献,这一维度在 previous literature 中被忽略了。

Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2311.11883
  • repo_url: None
  • paper_authors: Minh Tri Lê, Pierre Wolinski, Julyan Arbel
  • for: 本评论文章提供了关于微型机器学习(TinyML)应用中高效神经网络和深度学习模型的报道和分析。
  • methods: 本文使用的方法包括模型压缩、量化和低级分解,以优化神经网络体系结构,以适应资源受限的MCU上的应用。
  • results: 本文总结了现有的深度学习模型在MCU上的部署技术,包括模型剪辑、硬件加速和算法-架构协同设计等方法,以提高模型在MCU上的高效部署。
    Abstract The field of Tiny Machine Learning (TinyML) has gained significant attention due to its potential to enable intelligent applications on resource-constrained devices. This review provides an in-depth analysis of the advancements in efficient neural networks and the deployment of deep learning models on ultra-low power microcontrollers (MCUs) for TinyML applications. It begins by introducing neural networks and discussing their architectures and resource requirements. It then explores MEMS-based applications on ultra-low power MCUs, highlighting their potential for enabling TinyML on resource-constrained devices. The core of the review centres on efficient neural networks for TinyML. It covers techniques such as model compression, quantization, and low-rank factorization, which optimize neural network architectures for minimal resource utilization on MCUs. The paper then delves into the deployment of deep learning models on ultra-low power MCUs, addressing challenges such as limited computational capabilities and memory resources. Techniques like model pruning, hardware acceleration, and algorithm-architecture co-design are discussed as strategies to enable efficient deployment. Lastly, the review provides an overview of current limitations in the field, including the trade-off between model complexity and resource constraints. Overall, this review paper presents a comprehensive analysis of efficient neural networks and deployment strategies for TinyML on ultra-low-power MCUs. It identifies future research directions for unlocking the full potential of TinyML applications on resource-constrained devices.
    摘要 随着智能应用的普及,迫切需要实现智能应用在资源有限的设备上。这篇评论文章提供了关于高效神经网络和深度学习模型在超低功耗微控制器(MCU)上的投入和部署的深入分析。文章首先介绍神经网络,并讨论其建构和资源需求。然后,文章探讨了基于MEMS技术的应用在超低功耗MCU上,并强调它们在资源有限的设备上启用TinyML的潜力。文章的核心部分是高效神经网络的优化,包括模型压缩、量化和低级因数分解等技术,以最小化MCU上神经网络的资源使用。文章还详细介绍了深度学习模型的部署在超低功耗MCU上,包括计算能力和存储器资源的限制。文章提出了多种策略,如模型剪辑、硬件加速和算法-架构协同设计,以实现高效的部署。最后,文章提供了当前领域的限制,包括模型复杂度和资源约束之间的贸易OFF。总的来说,这篇评论文章提供了关于TinyML在超低功耗MCU上的深入分析,并提出了未来研究方向,以推动TinyML应用在资源有限的设备上的全面发展。

Forward Gradients for Data-Driven CFD Wall Modeling

  • paper_url: http://arxiv.org/abs/2311.11876
  • repo_url: None
  • paper_authors: Jan Hückelheim, Tadbhagya Kumar, Krishnan Raghavan, Pinaki Pal
  • for: 用于增强CFD simulate wall-bounded flow 精度和效率。
  • methods: 使用机器学习和数据驱动方法,减少CFD计算成本和存储占用。
  • results: 实现了一种不需要分离前向和反向扫描的梯度计算方法,可以更高效地训练增Resolution wall模型,提高CFD simulate 精度。
    Abstract Computational Fluid Dynamics (CFD) is used in the design and optimization of gas turbines and many other industrial/ scientific applications. However, the practical use is often limited by the high computational cost, and the accurate resolution of near-wall flow is a significant contributor to this cost. Machine learning (ML) and other data-driven methods can complement existing wall models. Nevertheless, training these models is bottlenecked by the large computational effort and memory footprint demanded by back-propagation. Recent work has presented alternatives for computing gradients of neural networks where a separate forward and backward sweep is not needed and storage of intermediate results between sweeps is not required because an unbiased estimator for the gradient is computed in a single forward sweep. In this paper, we discuss the application of this approach for training a subgrid wall model that could potentially be used as a surrogate in wall-bounded flow CFD simulations to reduce the computational overhead while preserving predictive accuracy.
    摘要 计算流体动力学(CFD)在设计和优化液压机和其他工业/科学应用中广泛使用。然而,实际使用受到计算成本的限制,而近墙流动的准确解决也是一大 contribuutor。机器学习(ML)和其他数据驱动方法可以补充现有墙模型。然而,训练这些模型受到计算努力和存储空间的限制,因为back-propagation需要大量的计算努力和存储空间。最近的工作已经提出了不需要分离前进和返回扫描的方法来计算神经网络的梯度。在这篇文章中,我们讨论了使用这种方法来训练一个子网格墙模型,以便在墙 bounded 流动 CFD 模拟中减少计算成本而保持预测精度。

Training robust and generalizable quantum models

  • paper_url: http://arxiv.org/abs/2311.11871
  • repo_url: https://github.com/daniel-fink-de/training-robust-and-generalizable-quantum-models
  • paper_authors: Julian Berberich, Daniel Fink, Daniel Pranjić, Christian Tutschku, Christian Holm
  • for: 这个论文研究了量子机器学习模型的抗攻击性和通用性。
  • methods: 论文使用了Liψitz bounds来研究量子机器学习模型的抗攻击性和通用性。
  • results: 研究发现,可调编码可以系统地提高量子机器学习模型的抗攻击性和通用性,而固定编码则无法通过调整参数来改善这两个性能指标。此外,论文还提出了一种做出量子机器学习模型更加 robust和通用的实际策略。
    Abstract Adversarial robustness and generalization are both crucial properties of reliable machine learning models. In this paper, we study these properties in the context of quantum machine learning based on Lipschitz bounds. We derive tailored, parameter-dependent Lipschitz bounds for quantum models with trainable encoding, showing that the norm of the data encoding has a crucial impact on the robustness against perturbations in the input data. Further, we derive a bound on the generalization error which explicitly depends on the parameters of the data encoding. Our theoretical findings give rise to a practical strategy for training robust and generalizable quantum models by regularizing the Lipschitz bound in the cost. Further, we show that, for fixed and non-trainable encodings as frequently employed in quantum machine learning, the Lipschitz bound cannot be influenced by tuning the parameters. Thus, trainable encodings are crucial for systematically adapting robustness and generalization during training. With numerical results, we demonstrate that, indeed, Lipschitz bound regularization leads to substantially more robust and generalizable quantum models.
    摘要 机器学习模型的可靠性和抗抗击性都是非常重要的性能指标。在这篇论文中,我们在量子机器学习的上下文中研究了这两个性能指标,基于 lipschitz bound。我们 derivated 特定的参数 dependent lipschitz bound,表明数据编码的 норahlength对于输入数据的变化具有关键性的影响。此外,我们还 derivated 一个参数dependent的泛化误差 bound,显示了数据编码参数对模型的泛化性能有直接的影响。我们的理论发现给出了一种实用的训练稳定和泛化的量子模型策略,通过规范 lipschitz bound 的成本来实现。此外,我们还证明了 fix 和 non-trainable 编码被常用于量子机器学习中的情况下, lipschitz bound 无法通过调整参数来改变。因此,可调编码是对系统地改进了robustness和泛化性的关键。通过数据示例,我们证明了 lipschitz bound 规范确实导致了比较稳定和泛化的量子模型。

Deep learning complete intersection Calabi-Yau manifolds

  • paper_url: http://arxiv.org/abs/2311.11847
  • repo_url: None
  • paper_authors: Harold Erbin, Riccardo Finotello
  • for: 理解如何使用机器学习处理代数拓扑数据,特别是完全交叉Calabi-Yau(CICY)3-和4-维结构。
  • methods: 本文讨论了方法学方面和数据分析方面,然后介绍了神经网络架构。
  • results: 文章描述了当前预测黑格数的状态艺术,包括在低黑格数预测中进行推断,以及在高黑格数预测中进行推断。此外,文章还描述了在低黑格数预测中进行推断的新结果。
    Abstract We review advancements in deep learning techniques for complete intersection Calabi-Yau (CICY) 3- and 4-folds, with the aim of understanding better how to handle algebraic topological data with machine learning. We first discuss methodological aspects and data analysis, before describing neural networks architectures. Then, we describe the state-of-the art accuracy in predicting Hodge numbers. We include new results on extrapolating predictions from low to high Hodge numbers, and conversely.
    摘要 我团队正在审查深度学习技术在完全交叉Calabi-Yau(CICY)3-4维上的进步,以更好地理解如何使用机器学习处理代数 topological 数据。我们首先讨论了方法学方面和数据分析,然后介绍神经网络架构。接着,我们介绍了目前领域中最佳准确率预测Hodge 数。我们还报道了在低到高Hodge数的预测推断中的新结果,以及相反的情况。Note that "完全交叉Calabi-Yau" (CICY) is a specific type of algebraic variety, and "Hodge 数" (Hodge numbers) are a way of measuring the geometry of the variety.

High Probability Guarantees for Random Reshuffling

  • paper_url: http://arxiv.org/abs/2311.11841
  • repo_url: None
  • paper_authors: Hengxu Yu, Xiao Li
  • for: 这个论文主要研究了 Stochastic Gradient Method with Random Reshuffling(SGD-RR)在缓冲非对称优化问题上的应用。SGD-RR在实际中广泛应用,尤其是在神经网络训练中。
  • methods: 本论文首先研究了 SGD-RR 的采样过程中的集中性性质,并提出了一个新的高probability 样本复杂性保证,使得 gradient (无需期望)在 $\varepsilon$ 下降。这个复杂性保证与最好的现有的均衡式样本复杂性保证相当,只是增加了一个对数性 término。此外,我们还利用我们 derivated 的高probability 下降性和积分误差 bound,提出了一个简单可计算的停止 criterion(denoted as $\mathsf{RR}$-$\mathsf{sc}$),这个 criterion 能 garantúe 在一定数量的迭代后,返回一个满足 $\varepsilon$ 的迭代。此外,我们还提出了一个 perturbed random reshuffling method( $\mathsf{p}$-$\mathsf{RR}$),该方法在Stationary Point附近加入了一个随机干扰过程。我们证明了 $\mathsf{p}$-$\mathsf{RR}$ 可以高效地逃脱精度下降点并返回第二阶 stationary point,无需对 stochastic gradient error 做任何 sub-Gaussian 尾部假设。
  • results: 本论文通过数学实验证明了其理论发现。在神经网络训练中,SGD-RR 可以高效地逃脱精度下降点并返回第二阶 stationary point,而不是假设 sub-Gaussian 尾部。
    Abstract We consider the stochastic gradient method with random reshuffling ($\mathsf{RR}$) for tackling smooth nonconvex optimization problems. $\mathsf{RR}$ finds broad applications in practice, notably in training neural networks. In this work, we first investigate the concentration property of $\mathsf{RR}$'s sampling procedure and establish a new high probability sample complexity guarantee for driving the gradient (without expectation) below $\varepsilon$, which effectively characterizes the efficiency of a single $\mathsf{RR}$ execution. Our derived complexity matches the best existing in-expectation one up to a logarithmic term while imposing no additional assumptions nor changing $\mathsf{RR}$'s updating rule. Furthermore, by leveraging our derived high probability descent property and bound on the stochastic error, we propose a simple and computable stopping criterion for $\mathsf{RR}$ (denoted as $\mathsf{RR}$-$\mathsf{sc}$). This criterion is guaranteed to be triggered after a finite number of iterations, and then $\mathsf{RR}$-$\mathsf{sc}$ returns an iterate with its gradient below $\varepsilon$ with high probability. Moreover, building on the proposed stopping criterion, we design a perturbed random reshuffling method ($\mathsf{p}$-$\mathsf{RR}$) that involves an additional randomized perturbation procedure near stationary points. We derive that $\mathsf{p}$-$\mathsf{RR}$ provably escapes strict saddle points and efficiently returns a second-order stationary point with high probability, without making any sub-Gaussian tail-type assumptions on the stochastic gradient errors. Finally, we conduct numerical experiments on neural network training to support our theoretical findings.
    摘要 我们考虑使用测度 gradient 方法($\mathsf{RR}$)来解决缓和非凸优化问题。 $\mathsf{RR}$ 在实践中获得了广泛的应用,特别是在训练神经网络中。在这个工作中,我们首先研究 $\mathsf{RR}$ 的抽掣程序中的集中性性质,然后建立一个新的高概率抽掣次数保证,将梯度(不对期望)下降至 $\varepsilon$ 以下,这个结果有效地描述了 $\mathsf{RR}$ 的效率。我们的 derive 的复杂度与最佳的对 expectation 的复杂度几乎相同,但不需要额外的假设,也不需要更改 $\mathsf{RR}$ 的更新规则。此外,我们还利用我们 derive 的高概率下降性和测度错误的上限,提出了一个简单可计算的停止条件(denoted as $\mathsf{RR}$-$\mathsf{sc}$)。这个条件会在一定的回归次数之后触发,并且返回一个梯度下降至 $\varepsilon$ 以下的回归点,且高概率上发生。此外,我们还提出了一个受 perturbed random reshuffling 方法($\mathsf{p}$-$\mathsf{RR}$),这个方法具有在站点点发生时额外添加一些随机干扰程序的特点。我们证明了 $\mathsf{p}$-$\mathsf{RR}$ 可以干扰紧缩点,并高概率地返回一个二阶 stationary point。在这个过程中,我们不需要假设测度错误具有子高斯分布的特性。最后,我们在神经网络训练中进行了数值实验,以支持我们的理论发现。

Zero redundancy distributed learning with differential privacy

  • paper_url: http://arxiv.org/abs/2311.11822
  • repo_url: None
  • paper_authors: Zhiqi Bu, Justin Chiu, Ruixuan Liu, Sheng Zha, George Karypis
  • for: 这篇论文的目的是发展一个可以应用于分布式深度学习中的隐私保护技术,以便在训练大型深度学习模型时能够保护用户的隐私。
  • methods: 本论文使用了 Zero Redundancy Optimizer (ZeRO) 来实现分布式深度学习,并且将其与隐私保护技术整合,以便在训练大型深度学习模型时能够维护用户的隐私。
  • results: 本论文的结果显示,DP-ZeRO 能够实现与标准 ZeRO 相同的计算和通信效率,并且能够训练更大的模型,例如 GPT-100B。此外,DP-ZeRO 还能够支持混合精度训练。
    Abstract Deep learning using large models have achieved great success in a wide range of domains. However, training these models on billions of parameters is very challenging in terms of the training speed, memory cost, and communication efficiency, especially under the privacy-preserving regime with differential privacy (DP). On the one hand, DP optimization has comparable efficiency to the standard non-private optimization on a single GPU, but on multiple GPUs, existing DP distributed learning (such as pipeline parallel) has suffered from significantly worse efficiency. On the other hand, the Zero Redundancy Optimizer (ZeRO) is a state-of-the-art solution to the standard distributed learning, exhibiting excellent training efficiency on large models, but to work compatibly with DP is technically complicated. In this work, we develop a new systematic solution, DP-ZeRO, (I) to scale up the trainable DP model size, e.g. to GPT-100B, (II) to obtain the same computation and communication efficiency as the standard ZeRO, and (III) to enable mixed-precision DP training. Our DP-ZeRO, like the standard ZeRO, has the potential to train models with arbitrary size and is evaluated on the world's largest DP models in terms of the number of trainable parameters.
    摘要 深度学习使用大型模型已经在各种领域取得了很大的成功。然而,在 Billions of 参数上进行深度学习训练是具有很大的挑战,特别是在遵循隐私保护(DP)的情况下。一方面,DP 优化的效率相对于标准不隐私的优化在单个 GPU 上具有相似的效率,但在多个 GPU 上,现有的 DP 分布式学习(如管道并行)表现得更加糟糕。另一方面,零重复优化器(ZeRO)是当前顶尖的分布式学习解决方案,在大型模型上显示出了极佳的训练效率,但与 DP 兼容需要技术上的努力。在这种情况下,我们开发了一种新的系统性解决方案——DP-ZeRO,以下是该系统的三大目标:1. 扩展可训练DP模型的大小,例如GPT-100B。2. 与标准ZeRO的计算和通信效率相同。3. 实现混合精度DP训练。我们的 DP-ZeRO 同样可以训练任意大小的模型,并在世界上最大的 DP 模型上进行评估。

LogLead – Fast and Integrated Log Loader, Enhancer, and Anomaly Detector

  • paper_url: http://arxiv.org/abs/2311.11809
  • repo_url: https://github.com/evotestops/loglead
  • paper_authors: Mika Mäntylä, Yuqing Wang, Jesse Nyyssölä
  • for: 本文介绍了一种名为LogLead的日志分析工具,用于高效地处理日志数据。
  • methods: LogLead结合了三个基本的日志处理步骤:加载、增强和异常检测。它利用了Polars高速数据Frame库。文章提供了7个加载器和多种增强器,包括3个解析器(Drain、Spell、LenMa),Bert嵌入式创建和其他日志表示技术。LogLead集成了5种supervised和4种Unsupervised机器学习算法,用于异常检测。
  • results: 文章表明,使用LogLead将日志从原始文件转换为数据帧,比过去的解决方案快得多(大于10倍)。同时,文章还证明了将日志消息准确化异步传递给LogLead可以提高Draind parsing速度(大约2倍)。最后,文章还对HDFS日志进行了简短的测试,结果表明,日志表示技术 beyond bag-of-words 提供的好处很有限。
    Abstract This paper introduces LogLead, a tool designed for efficient log analysis. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have 7 Loaders out of which 4 is for public data sets (HDFS, Hadoop, BGL, and Thunderbird). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to 5 supervised and 4 unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We demonstrate that log loading from raw file to dataframe is over 10x faster with LogLead is compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. We demonstrate a brief benchmarking on HDFS suggesting that log representations beyond bag-of-words provide limited benefits. Screencast demonstrating the tool: https://youtu.be/8stdbtTfJVo
    摘要

Operator Learning for Continuous Spatial-Temporal Model with A Hybrid Optimization Scheme

  • paper_url: http://arxiv.org/abs/2311.11798
  • repo_url: None
  • paper_authors: Chuanqi Chen, Jin-Long Wu
  • for: 这篇论文是用来模拟复杂动态系统的空间-时间模型的。
  • methods: 该模型使用了最近的运算学习进步,并使用了不断空间和时间的数据驱动模型。
  • results: 该模型能够保持空间和时间分辨率的不变性,并且可以通过短时间序列数据进行稳定长期 simulate。此外,该模型还可以通过混合短时间和长时间数据进行优化,以更好地预测长期统计。
    Abstract Partial differential equations are often used in the spatial-temporal modeling of complex dynamical systems in many engineering applications. In this work, we build on the recent progress of operator learning and present a data-driven modeling framework that is continuous in both space and time. A key feature of the proposed model is the resolution-invariance with respect to both spatial and temporal discretizations. To improve the long-term performance of the calibrated model, we further propose a hybrid optimization scheme that leverages both gradient-based and derivative-free optimization methods and efficiently trains on both short-term time series and long-term statistics. We investigate the performance of the spatial-temporal continuous learning framework with three numerical examples, including the viscous Burgers' equation, the Navier-Stokes equations, and the Kuramoto-Sivashinsky equation. The results confirm the resolution-invariance of the proposed modeling framework and also demonstrate stable long-term simulations with only short-term time series data. In addition, we show that the proposed model can better predict long-term statistics via the hybrid optimization scheme with a combined use of short-term and long-term data.
    摘要 《partial differential equations在空间-时间模型中的应用》中,我们会使用最近的运算学进步,提出一种数据驱动的模型框架,该框架在空间和时间上是连续的。这个提案的一个重要特点是对空间和时间分辨率的不变性。为了提高模型的长期性能,我们进一步提议一种混合优化方案,该方案利用了梯度优化和无梯度优化方法,并高效地在短期时间序列和长期统计数据上训练。我们通过三个数字例子,包括粘滞布尔gers方程、奈尔-斯托克方程和库拉摩-西瓦希诺斯基方程,证明了提案的模型框架的不变性,并表明了只使用短期时间序列数据进行训练可以实现稳定的长期 simulate。此外,我们还证明了我们的模型可以更好地预测长期统计信息,通过混合优化方案并使用短期和长期数据进行训练。

Approximate Linear Programming and Decentralized Policy Improvement in Cooperative Multi-agent Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2311.11789
  • repo_url: None
  • paper_authors: Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar
  • for: 本文研究了一种多智能机器人(agent)协同解决Markov决策过程(MDP),其中所有agent都知道系统模型。
  • methods: 我们使用了分布式政策改进算法,其中每个agent假设其他agent的决策已经固定,然后改进自己的决策。我们还使用了 Approximate Linear Programming(ALP)计算近似价值函数。
  • results: 我们提供了对协同多智能机器人Finite和无限远景折扣MDP的近似政策迭代算法的理论保证,并在一些数学示例中证明了我们的算法的性能。
    Abstract In this work, we consider a `cooperative' multi-agent Markov decision process (MDP) involving m greater than 1 agents, where all agents are aware of the system model. At each decision epoch, all the m agents cooperatively select actions in order to maximize a common long-term objective. Since the number of actions grows exponentially in the number of agents, policy improvement is computationally expensive. Recent works have proposed using decentralized policy improvement in which each agent assumes that the decisions of the other agents are fixed and it improves its decisions unilaterally. Yet, in these works, exact values are computed. In our work, for cooperative multi-agent finite and infinite horizon discounted MDPs, we propose suitable approximate policy iteration algorithms, wherein we use approximate linear programming to compute the approximate value function and use decentralized policy improvement. Thus our algorithms can handle both large number of states as well as multiple agents. We provide theoretical guarantees for our algorithms and also demonstrate the performance of our algorithms on some numerical examples.
    摘要 在这个工作中,我们考虑了一个“合作”多代理Markov决策过程(MDP),其中有多于1个代理参与,所有代理都知道系统模型。在每个决策瞬间,所有的m代理合作选择动作,以最大化共同长期目标。由于行动数量在代理数量增加时 exponentiates,策略改进 computationally expensive。 recent works have proposed using decentralized policy improvement, in which each agent assumes that the decisions of the other agents are fixed and it improves its decisions unilaterally. However, in these works, exact values are computed. In our work, for cooperative multi-agent finite and infinite horizon discounted MDPs, we propose suitable approximate policy iteration algorithms, wherein we use approximate linear programming to compute the approximate value function and use decentralized policy improvement. Therefore, our algorithms can handle both large number of states and multiple agents. We provide theoretical guarantees for our algorithms and also demonstrate the performance of our algorithms on some numerical examples.

Masked Autoencoders Are Robust Neural Architecture Search Learners

  • paper_url: http://arxiv.org/abs/2311.12086
  • repo_url: None
  • paper_authors: Yiming Hu, Xiangxiang Chu, Bo Zhang
  • for: 提高Neural Architecture Search(NAS)的效率和可靠性,减少或完全消除需要标注数据的使用。
  • methods: 基于Masked Autoencoders(MAE)的方法,替换supervised learning目标函数,使用图像重建任务来进行搜索过程,不需要标注数据,同时保持性能和通用能力。
  • results: 通过对不同的搜索空间和数据集进行广泛的实验,证明提posed方法的有效性和可靠性,比基eline方法更高。
    Abstract Neural Architecture Search (NAS) currently relies heavily on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust discovery of network architectures without compromising performance and generalization ability. Additionally, we address the problem of performance collapse encountered in the widely-used Differentiable Architecture Search (DARTS) method in the unsupervised paradigm by introducing a multi-scale decoder. Through extensive experiments conducted on various search spaces and datasets, we demonstrate the effectiveness and robustness of the proposed method, providing empirical evidence of its superiority over baseline approaches.
    摘要 Currently, Neural Architecture Search (NAS) heavily relies on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust discovery of network architectures without compromising performance and generalization ability. Additionally, we address the problem of performance collapse encountered in the widely-used Differentiable Architecture Search (DARTS) method in the unsupervised paradigm by introducing a multi-scale decoder. Through extensive experiments conducted on various search spaces and datasets, we demonstrate the effectiveness and robustness of the proposed method, providing empirical evidence of its superiority over baseline approaches.Here's the translation in Traditional Chinese:目前,Neural Architecture Search (NAS) 对于标签数据的依赖是相当高,这些数据不仅成本高,也需要耗费很长的时间来取得。在这篇文章中,我们提出了一个基于Masked Autoencoders (MAE)的 NAS 框架,这个框架不需要标签数据进行搜寻过程。我们通过将超级vised learning 目标取代为图像重建任务,使我们的方法可以在搜寻过程中获得无标签数据的Robust 发现网络架构。此外,我们解决了 Differentiable Architecture Search (DARTS) 方法在无supervised 情况下的性能崩溃问题,通过引入多尺度解oder。通过对不同的搜寻空间和数据集进行广泛的实验,我们证明了我们的方法的有效性和Robustness,提供了实践证据,与基eline方法相比,我们的方法具有superiority。

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations

  • paper_url: http://arxiv.org/abs/2311.11762
  • repo_url: None
  • paper_authors: Daniel Bogdoll, Yitian Yang, J. Marius Zöllner
  • for: 提高自动驾驶系统的理解能力,增强系统的决策能力。
  • methods: 使用原始相机和激光数据学习感知数据不受感知器件的多模态世界模型,以便直接用于下游任务,如规划。
  • results: 实现多模态未来预测,并证明我们的几何表示提高了相机图像和激光点云预测质量。
    Abstract Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with Geometric VOxel Representations to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world, which can directly be used by downstream tasks, such as planning. We demonstrate multimodal future predictions and show that our geometric representation improves the prediction quality of both camera images and lidar point clouds.
    摘要 学习无监控世界模型可以帮助自动驾驶系统的理解能力提高很多。然而,大多数工作忽略了世界的物理属性,而专注于感知数据alone。我们提议MUVO,一个多Modal World Model with Geometric VOxel Representations来解决这个挑战。我们使用原始的相机和激光数据来学习无关感知的几何表示世界,这可以直接用于下游任务,如规划。我们展示了多模态未来预测,并证明我们的几何表示提高了相机图像和激光点云预测质量。

Revealing behavioral impact on mobility prediction networks through causal interventions

  • paper_url: http://arxiv.org/abs/2311.11749
  • repo_url: None
  • paper_authors: Ye Hong, Yanan Xin, Simon Dirmeier, Fernando Perez-Cruz, Martin Raubal
  • for: 这个研究旨在研究 Deep neural networks 在 mobilit 预测任务中的解释性问题,尤其是如何各种 mobilit 行为因素影响这些神经网络的预测结果。
  • methods: 我们在这个研究中提出了一种 causal intervention 框架,用于评估不同 mobilit 行为因素对神经网络的影响。我们使用个人 mobilit 模型生成Synthetic location visit sequences,并通过控制数据生成过程来控制行为动力学。我们使用 mobilit 度量来评估 intervened location sequences,并输入这些位置序列到已经训练好的网络中进行分析性能变化。
  • results: 我们的结果表明可以生成具有不同 mobilit 行为特征的location sequences,并且可以在不同的 spatial 和 temporal 环境下进行模拟。这些变化导致神经网络的预测性能发生变化,并且揭示了关键 mobilit 行为因素,包括location transition 的顺序模式、探索新位置的倾向和个人和人口层次的位置选择偏好。这些发现对实际应用中的 mobilit 预测网络具有重要价值,而 causal inference 框架可以提高神经网络在 mobilit 应用中的解释性和可靠性。
    Abstract Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. In this study, we introduce a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location prediction -- a task focusing on predicting the immediate next location of an individual. To achieve this, we employ individual mobility models to generate synthetic location visit sequences and control behavior dynamics by intervening in their data generation process. We evaluate the interventional location sequences using mobility metrics and input them into well-trained networks to analyze performance variations. The results demonstrate the effectiveness in producing location sequences with distinct mobility behaviors, thus facilitating the simulation of diverse spatial and temporal changes. These changes result in performance fluctuations in next location prediction networks, revealing impacts of critical mobility behavior factors, including sequential patterns in location transitions, proclivity for exploring new locations, and preferences in location choices at population and individual levels. The gained insights hold significant value for the real-world application of mobility prediction networks, and the framework is expected to promote the use of causal inference for enhancing the interpretability and robustness of neural networks in mobility applications.
    摘要

Leveraging Uncertainty Estimates To Improve Classifier Performance

  • paper_url: http://arxiv.org/abs/2311.11723
  • repo_url: None
  • paper_authors: Gundeep Arora, Srujana Merugu, Anoop Saladi, Rajeev Rastogi
  • for: 该论文主要探讨了如何基于模型分数和不确定性选择决策边界,以提高模型的准确率和回归率。
  • methods: 该论文提出了一种基于动态计划和iso随机函数的算法,用于选择决策边界,并进行了理论分析和实验验证。
  • results: 该论文的实验结果表明,使用模型分数和不确定性可以提高模型的准确率和回归率,并且在三个实际 dataset 上实现了25%-40%的提升。
    Abstract Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements (e.g., maximizing recall for a precision bound). However, model scores are often not aligned with the true positivity rate. This is especially true when the training involves a differential sampling across classes or there is distributional drift between train and test settings. In this paper, we provide theoretical analysis and empirical evidence of the dependence of model score estimation bias on both uncertainty and score itself. Further, we formulate the decision boundary selection in terms of both model score and uncertainty, prove that it is NP-hard, and present algorithms based on dynamic programming and isotonic regression. Evaluation of the proposed algorithms on three real-world datasets yield 25%-40% gain in recall at high precision bounds over the traditional approach of using model score alone, highlighting the benefits of leveraging uncertainty.
    摘要 二分类分类涉及判断实例标签是否根据模型分数超过选择的阈值(例如,以最大准确率为约束)。但模型分数与实际正确率不一致,尤其是在类别采样不同或测试环境中存在分布漂移情况下。本文提供了对模型分数估计偏差的理论分析和实验证据,并证明模型分数和不确定性的决策边界选择是NP困难的。基于动态计划和iso逻辑回归,我们提出了一些算法,并评估这些算法在三个实际数据集上,获得25%-40%的准确率提升,证明了通过利用不确定性来提高二分类分类的效果。

Unveiling the Power of Self-Attention for Shipping Cost Prediction: The Rate Card Transformer

  • paper_url: http://arxiv.org/abs/2311.11694
  • repo_url: https://github.com/lucidrains/tab-transformer-pytorch
  • paper_authors: P Aditya Sreekar, Sahil Verma, Varun Madhavan, Abhishek Persad
    for: 这个研究是为了提高亚马逊在销售过程中的财务决策,具体来说是减少邮费估算错误的影响。methods: 这个研究使用了一种新的架构called Rate Card Transformer (RCT),它使用自注意力来编码包裹信息,包括包裹属性、交通公司信息和路径规划。RCT可以编码一个变量列表,从而更好地捕捉包裹信息。results: 研究结果显示,使用RCT进行邮费估算可以减少错误率28.82%,并且超过了现有的转换器基本模型FTTransformer的性能。此外,RCT还可以改善树状模型的性能。
    Abstract Amazon ships billions of packages to its customers annually within the United States. Shipping cost of these packages are used on the day of shipping (day 0) to estimate profitability of sales. Downstream systems utilize these days 0 profitability estimates to make financial decisions, such as pricing strategies and delisting loss-making products. However, obtaining accurate shipping cost estimates on day 0 is complex for reasons like delay in carrier invoicing or fixed cost components getting recorded at monthly cadence. Inaccurate shipping cost estimates can lead to bad decision, such as pricing items too low or high, or promoting the wrong product to the customers. Current solutions for estimating shipping costs on day 0 rely on tree-based models that require extensive manual engineering efforts. In this study, we propose a novel architecture called the Rate Card Transformer (RCT) that uses self-attention to encode all package shipping information such as package attributes, carrier information and route plan. Unlike other transformer-based tabular models, RCT has the ability to encode a variable list of one-to-many relations of a shipment, allowing it to capture more information about a shipment. For example, RCT can encode properties of all products in a package. Our results demonstrate that cost predictions made by the RCT have 28.82% less error compared to tree-based GBDT model. Moreover, the RCT outperforms the state-of-the-art transformer-based tabular model, FTTransformer, by 6.08%. We also illustrate that the RCT learns a generalized manifold of the rate card that can improve the performance of tree-based models.
    摘要

Tiny-VBF: Resource-Efficient Vision Transformer based Lightweight Beamformer for Ultrasound Single-Angle Plane Wave Imaging

  • paper_url: http://arxiv.org/abs/2311.12082
  • repo_url: None
  • paper_authors: Abdul Rahoof, Vivek Chaturvedi, Mahesh Raveendranatha Panicker, Muhammad Shafique
  • for: 加速非实时 beamforming 算法在ultrasound 成像中使用深度学习架构,以提高图像质量和速度。
  • methods: 提出了一种基于视transformer的 tiny beamformer(Tiny-VBF)模型,使用 raw radio-frequency 通道数据,并使用 hybrid 量化 schemes 加速 FPGA 实现。
  • results: Tiny-VBF 模型在尺寸 368 x 128 的帧中需要0.34 GOPs/帧,与 state-of-the-art 深度学习模型相比下降8%的对比度和5%和33%的轴向和横向分辨率提升。同时,与 conventional Delay-and-Sum 扩展器相比,Tiny-VBF 模型提供了4.2%的对比度和4%和20%的轴向和横向分辨率提升。
    Abstract Accelerating compute intensive non-real-time beam-forming algorithms in ultrasound imaging using deep learning architectures has been gaining momentum in the recent past. Nonetheless, the complexity of the state-of-the-art deep learning techniques poses challenges for deployment on resource-constrained edge devices. In this work, we propose a novel vision transformer based tiny beamformer (Tiny-VBF), which works on the raw radio-frequency channel data acquired through single-angle plane wave insonification. The output of our Tiny-VBF provides fast envelope detection requiring very low frame rate, i.e. 0.34 GOPs/Frame for a frame size of 368 x 128 in comparison to the state-of-the-art deep learning models. It also exhibited an 8% increase in contrast and gains of 5% and 33% in axial and lateral resolution respectively when compared to Tiny-CNN on in-vitro dataset. Additionally, our model showed a 4.2% increase in contrast and gains of 4% and 20% in axial and lateral resolution respectively when compared against conventional Delay-and-Sum (DAS) beamformer. We further propose an accelerator architecture and implement our Tiny-VBF model on a Zynq UltraScale+ MPSoC ZCU104 FPGA using a hybrid quantization scheme with 50% less resource consumption compared to the floating-point implementation, while preserving the image quality.
    摘要 快速计算非实时射频成形算法在超音波成像中使用深度学习架构受到过去几年的推动。然而,现状的深度学习技术的复杂性使得部署在有限资源的边缘设备上具有挑战。在这项工作中,我们提出了一种基于视transformer的小型射频成形器(Tiny-VBF),它在单角扫描电磁信号中处理原始的射频通道数据。Tiny-VBF的输出具有快速的幅度检测,需要非常低的帧率,即0.34 GOPs/帧,与现状的深度学习模型相比。此外,我们的模型在射频成像 dataset 上显示了8%的对比度提高和 axial 和 lateral 分辨率的提高,分别为5%和33%,相比之下Tiny-CNN模型。此外,我们的模型还与传统的延迟和总和(DAS)成形器进行了比较,显示了4.2%的对比度提高和 axial 和 lateral 分辨率的提高,分别为4%和20%。最后,我们还提出了一种加速器架构,并在 Zynq UltraScale+ MPSoC ZCU104 FPGA 上实现了一种混合量化方案,相比于浮点实现,消耗资源量减少了50%,保持图像质量。

Unraveling the Control Engineer’s Craft with Neural Networks

  • paper_url: http://arxiv.org/abs/2311.11644
  • repo_url: None
  • paper_authors: Braghadeesh Lakshminarayanan, Federico Dettù, Cristian R. Rojas, Simone Formentin
  • for: 这篇论文旨在提出一种数据驱动的控制器调试方法,使用数字模拟器生成输入输出数据,并使用神经网络学习模型来调节控制器参数。
  • methods: 该方法使用数字模拟器生成输入输出数据,然后使用神经网络学习模型来学习控制器调节规则,从而实际替换控制工程师。
  • results: 该方法通过数字模拟器生成的输入输出数据,使用神经网络学习模型来学习控制器调节规则,可以快速和高精度地调节控制器参数。
    Abstract Many industrial processes require suitable controllers to meet their performance requirements. More often, a sophisticated digital twin is available, which is a highly complex model that is a virtual representation of a given physical process, whose parameters may not be properly tuned to capture the variations in the physical process. In this paper, we present a sim2real, direct data-driven controller tuning approach, where the digital twin is used to generate input-output data and suitable controllers for several perturbations in its parameters. State-of-the art neural-network architectures are then used to learn the controller tuning rule that maps input-output data onto the controller parameters, based on artificially generated data from perturbed versions of the digital twin. In this way, as far as we are aware, we tackle for the first time the problem of re-calibrating the controller by meta-learning the tuning rule directly from data, thus practically replacing the control engineer with a machine learning model. The benefits of this methodology are illustrated via numerical simulations for several choices of neural-network architectures.
    摘要 很多工业过程需要适合的控制器来满足其性能要求。更常见的情况是,存在一个非常复杂的数字孪生,即物理过程的虚拟表示,其参数可能没有正确地调整到物理过程的变化。在这篇论文中,我们提出了一种 sim2real、直接数据驱动控制器调整方法,其中使用数字孪生生成输入输出数据和适合的控制器,并使用现代神经网络架构学习控制器调整规则,以将输入输出数据映射到控制器参数。这种方法,至我们所知,是第一次通过直接从数据中学习控制器调整规则,实际地将控制工程师 replaced by 机器学习模型。我们通过数值仿真对几种神经网络架构进行了评估,并证明了本方法的优点。

Incorporating LLM Priors into Tabular Learners

  • paper_url: http://arxiv.org/abs/2311.11628
  • repo_url: None
  • paper_authors: Max Zhu, Siniša Stanivuk, Andrija Petrovic, Mladen Nikolic, Pietro Lio
  • for: 本研究旨在 Addressing Large Language Models (LLMs) 的挑战,如数据序列化敏感性和偏见,并结合传统的表格数据分类技术。
  • methods: 本研究提出了两种使用 LLMs 进行排序 categorical 变量和生成对 continous 变量和目标的 Priors 的策略,以提高几个shot enario 下的性能。具体来说,我们引入了 MonotonicLR,一种使用非线性增长函数将 ordinal 映射到 cardinal 的方法,保持 LLM 决定的顺序。
  • results: 对比基eline 模型,我们的方法在低数据场景下表现出优于其他方法,特别是在几个shot enario 下。此外,我们的方法仍然可以保持可读性。
    Abstract We present a method to integrate Large Language Models (LLMs) and traditional tabular data classification techniques, addressing LLMs challenges like data serialization sensitivity and biases. We introduce two strategies utilizing LLMs for ranking categorical variables and generating priors on correlations between continuous variables and targets, enhancing performance in few-shot scenarios. We focus on Logistic Regression, introducing MonotonicLR that employs a non-linear monotonic function for mapping ordinals to cardinals while preserving LLM-determined orders. Validation against baseline models reveals the superior performance of our approach, especially in low-data scenarios, while remaining interpretable.
    摘要 我们提出了一种方法,将大型语言模型(LLM)与传统的表格数据分类技术集成,解决 LLM 的敏感性和偏见问题。我们提出了两种使用 LLM для排序普通分类和生成对目标变量和连续变量之间的相关性的先验,提高了几个shot场景中的性能。我们专注于LOGISTIC REGRESSION,提出了一种非线性凝leans函数,将ORDINALS映射到Cardinals,保持 LLM 决定的顺序。验证基eline模型表明,我们的方法在低数据场景中表现出色,而且保持可解释性。

Testing multivariate normality by testing independence

  • paper_url: http://arxiv.org/abs/2311.11575
  • repo_url: None
  • paper_authors: Povilas Daniušis
  • for: 本文提出了一种简单的多变量正态性测试,基于加铁-伯恩斯坦的特征化,可以通过利用现有的统计独立性测试来进行。
  • methods: 本文使用了现有的统计独立性测试来实现这种测试,并进行了Empirical investigation,发现在高维数据中,提出的方法可能更高效。
  • results: 本文的Empirical investigation表明,在高维数据中,提出的方法可能更高效。I hope that helps! Let me know if you have any other questions.
    Abstract We propose a simple multivariate normality test based on Kac-Bernstein's characterization, which can be conducted by utilising existing statistical independence tests for sums and differences of data samples. We also perform its empirical investigation, which reveals that for high-dimensional data, the proposed approach may be more efficient than the alternative ones. The accompanying code repository is provided at \url{https://shorturl.at/rtuy5}.
    摘要 我们提出了一种简单的多变量正态性测试,基于加邦-bernstein的特征化,可以通过利用现有的统计独立性测试来进行。我们还进行了实验研究,发现对高维数据来说,我们的方法可能比其他方法更高效。代码存储库可以在以下链接中找到:https://shorturl.at/rtuy5。Note that "Simplified Chinese" is a translation of the text into Chinese using a simpler vocabulary and grammar, which is more commonly used in mainland China. If you need a translation into "Traditional Chinese" (used in Taiwan and other parts of the world), I can provide that as well.

A Deep-Genetic Algorithm (Deep-GA) Approach for High-Dimensional Nonlinear Parabolic Partial Differential Equations

  • paper_url: http://arxiv.org/abs/2311.11558
  • repo_url: None
  • paper_authors: Endah Rokhmati Merdika Putri, Muhammad Luthfi Shahab, Mohammad Iqbal, Imam Mukhlash, Amirul Hakam, Lutfi Mardianto, Hadi Susanto
  • for: 增加深度学习算法解决高维度偏微分方程的性能,提高解决速度和精度。
  • methods: embedding a genetic algorithm (GA) into the solver to optimize the initial guess selection, accelerating the convergence of the nonlinear PDEs on a broader interval.
  • results: 比深度BSDE更快速地解决非线性偏微分方程,并且保持相同的准确性。
    Abstract We propose a new method, called a deep-genetic algorithm (deep-GA), to accelerate the performance of the so-called deep-BSDE method, which is a deep learning algorithm to solve high dimensional partial differential equations through their corresponding backward stochastic differential equations (BSDEs). Recognizing the sensitivity of the solver to the initial guess selection, we embed a genetic algorithm (GA) into the solver to optimize the selection. We aim to achieve faster convergence for the nonlinear PDEs on a broader interval than deep-BSDE. Our proposed method is applied to two nonlinear parabolic PDEs, i.e., the Black-Scholes (BS) equation with default risk and the Hamilton-Jacobi-Bellman (HJB) equation. We compare the results of our method with those of the deep-BSDE and show that our method provides comparable accuracy with significantly improved computational efficiency.
    摘要 我们提出了一种新方法,称为深度进化算法(深度-GA),以加速深度学习算法解决高维partial differential equations(PDEs)的方法,即通过其相应的反向随机 differential equations(BSDEs)。我们认为选择初始假设对解算法的敏感性,因此我们将进化算法(GA)embed到解算法中来优化选择。我们的目标是在更广泛的区间上比deep-BSDE更快地 converges。我们的提posed方法应用于两种非线性parabolic PDEs,即黑-股(BS)方程和汉密尔-雅各布-贝尔(HJB)方程。我们比较了我们的方法与深度-BSDE的结果,并显示了我们的方法可以提供相当于准确性的同时significantly improve计算效率。

Fast Controllable Diffusion Models for Undersampled MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2311.12078
  • repo_url: https://github.com/ppn-paper/ppn
  • paper_authors: Wei Jiang, Zhuang Xiong, Feng Liu, Nan Ye, Hongfu Sun
  • for: 用于增强和加速控制可生成的扩散模型,以提高MRI下抽取重建的效果和速度。
  • methods: 使用Predictor-Projector-Noisor(PPN)算法,该算法可以快速生成高质量的MR图像,并且可以适应不同的MRI获取参数。
  • results: PPN算法可以生成高准确性的MR图像,并且比其他控制可生成方法更快。此外,PPN算法可以适应不同的MRI获取参数,使其在临床应用中更实用。
    Abstract Supervised deep learning methods have shown promise in Magnetic Resonance Imaging (MRI) undersampling reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to MRI undersampling reconstruction, without paired data or model retraining for different MRI acquisitions. However, diffusion models are generally slow in sampling and state-of-the-art acceleration techniques can lead to sub-optimal results when directly applied to the controllable generation process. This study introduces a new algorithm called Predictor-Projector-Noisor (PPN), which enhances and accelerates controllable generation of diffusion models for MRI undersampling reconstruction. Our results demonstrate that PPN produces high-fidelity MR images that conform to undersampled k-space measurements with significantly shorter reconstruction time than other controllable sampling methods. In addition, the unsupervised PPN accelerated diffusion models are adaptable to different MRI acquisition parameters, making them more practical for clinical use than supervised learning techniques.
    摘要 超visisted深度学习方法在磁共振成像(MRI)下采样重建中表现出了承诺,但它们的对偶数据要求限制了它们在不同MRI获取参数下的普适性。最近,不supervised控制可生成扩散模型已经应用到MRI下采样重建中,无需对数据对或模型重新训练。然而,扩散模型通常在采样中慢,现状的加速技术可能导致直接应用到控制可生成过程中的优化结果不佳。这项研究介绍了一种新的算法called Predictor-Projector-Noisor(PPN),该算法可以加速和提高控制可生成的扩散模型在MRI下采样重建中。我们的结果表明,PPN可以生成高质量的MRI图像,符合下采样的k空间测量结果,并且重建时间比其他控制可生成方法更短。此外,不supervised PPN加速的扩散模型可以适应不同的MRI获取参数,使其更适用于临床应用。

Understanding Variation in Subpopulation Susceptibility to Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2311.11544
  • repo_url: None
  • paper_authors: Evan Rose, Fnu Suya, David Evans
  • for: 本研究探讨了机器学习模型在不同子 популяции上的攻击敏感性,尤其是在攻击者可以控制一部分训练数据点的情况下。
  • methods: 本研究使用了现有的欺诈攻击方法,并对不同子 популяции进行了实验研究,以探讨攻击效果的不同性。
  • results: 研究发现,对于较不分化的数据集,攻击者可以通过控制一部分数据点来影响模型的行为,而对于较分化的数据集,攻击效果受到具体的子 популяции属性的影响。此外,研究还发现了一种关键的子 популяción属性,即模型在干净数据集上的损失差异,可以用于评估攻击效果。
    Abstract Machine learning is susceptible to poisoning attacks, in which an attacker controls a small fraction of the training data and chooses that data with the goal of inducing some behavior unintended by the model developer in the trained model. We consider a realistic setting in which the adversary with the ability to insert a limited number of data points attempts to control the model's behavior on a specific subpopulation. Inspired by previous observations on disparate effectiveness of random label-flipping attacks on different subpopulations, we investigate the properties that can impact the effectiveness of state-of-the-art poisoning attacks against different subpopulations. For a family of 2-dimensional synthetic datasets, we empirically find that dataset separability plays a dominant role in subpopulation vulnerability for less separable datasets. However, well-separated datasets exhibit more dependence on individual subpopulation properties. We further discover that a crucial subpopulation property is captured by the difference in loss on the clean dataset between the clean model and a target model that misclassifies the subpopulation, and a subpopulation is much easier to attack if the loss difference is small. This property also generalizes to high-dimensional benchmark datasets. For the Adult benchmark dataset, we show that we can find semantically-meaningful subpopulation properties that are related to the susceptibilities of a selected group of subpopulations. The results in this paper are accompanied by a fully interactive web-based visualization of subpopulation poisoning attacks found at https://uvasrg.github.io/visualizing-poisoning
    摘要 机器学习容易受到毒素攻击,攻击者可以控制一小部分训练数据,并选择这些数据以达到模型开发者未INTENDED的行为。我们考虑了一个现实主义的设置,在该设置下,敌对者可以插入有限数量的数据点来控制模型的行为。继承 previous observations on random label-flipping attacks 的结果,我们研究了不同subpopulation的攻击效果。对于一家2维的 sintethic 数据集,我们发现了 dataset 分离性对 subpopulation 的感itivity具有主导作用。然而,well-separated 数据集更多地受到个体 subpopulation 的特性的影响。我们还发现,一个重要的 subpopulation 特性是 clean dataset 上模型和target模型之间的损失差,如果损失差小,那么这个 subpopulation 容易受到攻击。这个特性也适用于高维 benchmark 数据集。对 Adult 数据集,我们表明可以找到 semantically-meaningful subpopulation 特性,这些特性与模型中的某些 subpopulation 的感itivity相关。本文的结果通过 提供了一个完整的交互式网页式visualization of subpopulation poisoning attacks。

An NMF-Based Building Block for Interpretable Neural Networks With Continual Learning

  • paper_url: http://arxiv.org/abs/2311.11485
  • repo_url: None
  • paper_authors: Brian K. Vogel
  • for: 提高预测性能和解释性的平衡
  • methods: 使用基于NMF的Predictive Factorized Coupling(PFC)块,结合超vised neural network训练方法,以提高预测性能 while retaining NMF的解释性
  • results: 在小 dataset上测试,PFC块可以与MLP具有相同的预测性能,同时提供更好的解释性,并在不同的场景中(如 continual learning、非i.i.d.数据训练和知识 removalfter training)表现出优异的效果。
    Abstract Existing learning methods often struggle to balance interpretability and predictive performance. While models like nearest neighbors and non-negative matrix factorization (NMF) offer high interpretability, their predictive performance on supervised learning tasks is often limited. In contrast, neural networks based on the multi-layer perceptron (MLP) support the modular construction of expressive architectures and tend to have better recognition accuracy but are often regarded as black boxes in terms of interpretability. Our approach aims to strike a better balance between these two aspects through the use of a building block based on NMF that incorporates supervised neural network training methods to achieve high predictive performance while retaining the desirable interpretability properties of NMF. We evaluate our Predictive Factorized Coupling (PFC) block on small datasets and show that it achieves competitive predictive performance with MLPs while also offering improved interpretability. We demonstrate the benefits of this approach in various scenarios, such as continual learning, training on non-i.i.d. data, and knowledge removal after training. Additionally, we show examples of using the PFC block to build more expressive architectures, including a fully-connected residual network as well as a factorized recurrent neural network (RNN) that performs competitively with vanilla RNNs while providing improved interpretability. The PFC block uses an iterative inference algorithm that converges to a fixed point, making it possible to trade off accuracy vs computation after training but also currently preventing its use as a general MLP replacement in some scenarios such as training on very large datasets. We provide source code at https://github.com/bkvogel/pfc
    摘要 现有的学习方法 often 难以平衡解释性和预测性的表现。 nearest neighbors 和非负矩阵因子(NMF)提供了高度的解释性,但是其在指导学习任务上的预测性常常有限。 相比之下,基于多层感知器(MLP)的神经网络支持模块化的建构和表现出了更好的识别精度,但是它们通常被视为黑盒子,即无法解释性。 我们的方法希望能够更好地平衡这两个方面,通过使用基于 NMF 的 Predictive Factorized Coupling(PFC)块来实现高度预测性,同时保留 NMF 的愉悦解释性特点。 我们在小样本上评估了 PFC 块,并显示了它在指导学习任务上与 MLP 相当,同时提供了改善的解释性。 我们在不同的场景下展示了这种方法的优势,包括不间断学习、训练非同一致数据和学习后知识 removals。 此外,我们还展示了使用 PFC 块建立更加表达力的架构,包括完全连接的差异阶段网络和 факторизован的 RNN,这些架构在指导学习任务上表现竞争力强,同时提供了改善的解释性。 PFC 块使用迭代推理算法,其总是向一个固定点收敛,因此可以在训练后交换准确率和计算量,但目前不支持在很大的数据集上进行训练。 我们在 GitHub 上提供了源代码,请参考

Gaussian Interpolation Flows

  • paper_url: http://arxiv.org/abs/2311.11475
  • repo_url: None
  • paper_authors: Yuan Gao, Jian Huang, Yuling Jiao
  • for: 这个论文主要研究了基于Gaussian denoising的连续正常化流的建构,以及这些流的理论性质和正则化效果。
  • methods: 该论文使用了一种统一框架,称为Gaussian interpolation flow,来研究连续正常化流的启发性、存在和一意性、流速场的 lipschitz 连续性和时间反转流map的 lipschitz 连续性等。
  • results: 该研究发现,Gaussian interpolation flows 具有良好的启发性、存在和一意性、流速场的 lipschitz 连续性和时间反转流map的 lipschitz 连续性等特点,并且可以用于描述一些rich classes of target distributions。此外,该研究还探讨了这些流的稳定性和源分布的扰动。
    Abstract Gaussian denoising has emerged as a powerful principle for constructing simulation-free continuous normalizing flows for generative modeling. Despite their empirical successes, theoretical properties of these flows and the regularizing effect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free continuous normalizing flows built on Gaussian denoising. Through a unified framework termed Gaussian interpolation flow, we establish the Lipschitz regularity of the flow velocity field, the existence and uniqueness of the flow, and the Lipschitz continuity of the flow map and the time-reversed flow map for several rich classes of target distributions. This analysis also sheds light on the auto-encoding and cycle-consistency properties of Gaussian interpolation flows. Additionally, we delve into the stability of these flows in source distributions and perturbations of the velocity field, using the quadratic Wasserstein distance as a metric. Our findings offer valuable insights into the learning techniques employed in Gaussian interpolation flows for generative modeling, providing a solid theoretical foundation for end-to-end error analyses of learning GIFs with empirical observations.
    摘要 (Note: Simplified Chinese is a simplified version of Chinese that uses shorter words and sentences, and is often used in informal writing and online communication. The translation above is written in Simplified Chinese, but the original text is in Traditional Chinese.)

Towards a Post-Market Monitoring Framework for Machine Learning-based Medical Devices: A case study

  • paper_url: http://arxiv.org/abs/2311.11463
  • repo_url: None
  • paper_authors: Jean Feng, Adarsh Subbaswamy, Alexej Gossmann, Harvineet Singh, Berkman Sahiner, Mi-Ok Kim, Gene Pennello, Nicholas Petrick, Romain Pirracchio, Fan Xia
  • for: 这个研究的目的是为了制定一种系统atic的监控策略,以确保在临床实践中部署的机器学习(ML)系统的安全性和有效性。
  • methods: 这篇研究使用了 causal inference 和统计过程控制 等工具,对监控方法进行了定义、评估和比较。
  • results: 研究发现,选择实际(observational)数据或进行实验性的研究是监控策略的关键决策,但这些决策受到了各种偏见和偏向的影响。
    Abstract After a machine learning (ML)-based system is deployed in clinical practice, performance monitoring is important to ensure the safety and effectiveness of the algorithm over time. The goal of this work is to highlight the complexity of designing a monitoring strategy and the need for a systematic framework that compares the multitude of monitoring options. One of the main decisions is choosing between using real-world (observational) versus interventional data. Although the former is the most convenient source of monitoring data, it exhibits well-known biases, such as confounding, selection, and missingness. In fact, when the ML algorithm interacts with its environment, the algorithm itself may be a primary source of bias. On the other hand, a carefully designed interventional study that randomizes individuals can explicitly eliminate such biases, but the ethics, feasibility, and cost of such an approach must be carefully considered. Beyond the decision of the data source, monitoring strategies vary in the performance criteria they track, the interpretability of the test statistics, the strength of their assumptions, and their speed at detecting performance decay. As a first step towards developing a framework that compares the various monitoring options, we consider a case study of an ML-based risk prediction algorithm for postoperative nausea and vomiting (PONV). Bringing together tools from causal inference and statistical process control, we walk through the basic steps of defining candidate monitoring criteria, describing potential sources of bias and the causal model, and specifying and comparing candidate monitoring procedures. We hypothesize that these steps can be applied more generally, as causal inference can address other sources of biases as well.
    摘要 after a machine learning(ML)based system is deployed in clinical practice, performance monitoring is important to ensure the safety and effectiveness of the algorithm over time. The goal of this work is to highlight the complexity of designing a monitoring strategy and the need for a systematic framework that compares the multitude of monitoring options. one of the main decisions is choosing between using real-world(observational)versus interventional data. although the former is the most convenient source of monitoring data, it exhibits well-known biases, such as confounding, selection, and missingness. in fact, when the ML algorithm interacts with its environment, the algorithm itself may be a primary source of bias. on the other hand, a carefully designed interventional study that randomizes individuals can explicitly eliminate such biases, but the ethics, feasibility, and cost of such an approach must be carefully considered. beyond the decision of the data source, monitoring strategies vary in the performance criteria they track, the interpretability of the test statistics, the strength of their assumptions, and their speed at detecting performance decay. as a first step towards developing a framework that compares the various monitoring options, we consider a case study of an ML-based risk prediction algorithm for postoperative nausea and vomiting(PONV). bringing together tools from causal inference and statistical process control, we walk through the basic steps of defining candidate monitoring criteria, describing potential sources of bias and the causal model, and specifying and comparing candidate monitoring procedures. we hypothesize that these steps can be applied more generally, as causal inference can address other sources of biases as well.

eess.IV - 2023-11-20

Tubular Curvature Filter: Implicit Pointwise Curvature Calculation Method for Tubular Objects

  • paper_url: http://arxiv.org/abs/2311.11931
  • repo_url: None
  • paper_authors: Elifnur Sunger, Beyza Kalkanli, Veysi Yildiz, Tales Imbiriba, Peter Campbell, Deniz Erdogmus
  • for: 用于计算管状物体的本地弯曲度
  • methods: 使用方向ional rate of change在Hessian矩阵的eigenvector上进行了本地弯曲度计算
  • results: 实验结果表明,Tubular Curvature Filter方法可以准确地计算管状物体任何点的本地弯曲度
    Abstract Curvature estimation methods are important as they capture salient features for various applications in image processing, especially within medical domains where tortuosity of vascular structures is of significant interest. Existing methods based on centerline or skeleton curvature fail to capture curvature gradients across a rotating tubular structure. This paper presents a Tubular Curvature Filter method that locally calculates the acceleration of bundles of curves that traverse along the tubular object parallel to the centerline. This is achieved by examining the directional rate of change in the eigenvectors of the Hessian matrix of a tubular intensity function in space. This method implicitly calculates the local tubular curvature without the need to explicitly segment the tubular object. Experimental results demonstrate that the Tubular Curvature Filter method provides accurate estimates of local curvature at any point inside tubular structures.
    摘要 CURVATURE 估计方法是重要的,因为它们捕捉了图像处理中的重要特征,特别是医疗领域中血管结构的折叠性是非常重要的。现有基于中心线或skeleton curvature的方法无法捕捉旋转管体结构中的曲线幅度跃变。本文介绍了一种管体曲线滤波器方法,它地方计算管体内部曲线的加速度,通过对管体内部曲线的平行方向进行方向差异率的检查,并通过计算管体内部曲线的HESSIAN矩阵的方向差异来计算本地管体曲线。这种方法不需要显式地分割管体对象,可以准确地估计管体内部任何点的曲线。实验结果表明,管体曲线滤波器方法可以准确地估计管体内部曲线的本地弯曲。

eess.SP - 2023-11-20

Tensor-based Space Debris Detection for Satellite Mega-constellations

  • paper_url: http://arxiv.org/abs/2311.11838
  • repo_url: None
  • paper_authors: Olivier Daoust, Hasan Nayir, Irfan Azam, Antoine Lesage-Landry, Gunes Karabulut Kurt
  • for: 避免遥感器损坏和残骸的潜在威胁,提高遥感器的安全性。
  • methods: integrate sensing and communication techniques to detect space debris, using canonical polyadic (CP) tensor decomposition method to estimate the rank of the tensor that denotes the number of paths including line-of-sight and non-line-of-sight.
  • results: simulation results show that the proposed tensor-based scheme has higher probability of detection than conventional energy-based detection scheme for space debris detection.
    Abstract Thousands of satellites, asteroids, and rocket bodies break, collide, or degrade, resulting in large amounts of space debris in low Earth orbit. The presence of space debris poses a serious threat to satellite mega-constellations and to future space missions. Debris can be avoided if detected within the safety range of a satellite. In this paper, an integrated sensing and communication technique is proposed to detect space debris for satellite mega-constellations. The canonical polyadic (CP) tensor decomposition method is used to estimate the rank of the tensor that denotes the number of paths including line-of-sight and non-line-of-sight by exploiting the sparsity of THz channel with limited scattering. The analysis reveals that the reflected signals of the THz can be utilized for the detection of space debris. The CP decomposition is cast as an optimization problem and solved using the alternating least square (ALS) algorithm. Simulation results show that the probability of detection of the proposed tensor-based scheme is higher than the conventional energy-based detection scheme for the space debris detection.
    摘要 众多卫星、小行星和火箭体坍塌、相撞或衰变,导致低地球轨道中有很多空间垃圾。空间垃圾对卫星巨合群和未来空间任务构成了严重的威胁。如果探测到空间垃圾在卫星的安全范围内,可以避免垃圾的损害。在这篇论文中,我们提出了一种集成探测和通信技术,用于检测卫星巨合群中的空间垃圾。我们使用了多重线性(CP)张量分解方法来估算卫星检测到的信号的rank,通过利用THz频道的稀热扩散来减少检测的复杂性。分析表明,THz信号的反射可以用于检测空间垃圾。CP分解被表示为一个优化问题,并使用了变分最小二乘(ALS)算法来解决。实验结果表明,提议的张量基本方法的检测可能性比传统的能量基本方法高。

Movable-Antenna Array-Enabled Wireless Communication with CoMP Reception

  • paper_url: http://arxiv.org/abs/2311.11814
  • repo_url: None
  • paper_authors: Guojie Hu, Qingqing Wu, Jian Ouyang, Kui Xu, Yunlong Cai, Naofal Al-Dhahir
  • for: 实现高效的无线通信,通过允许移动天线(MA)阵列实现均衡多点接收(CoMP)技术,并且允许多个目标组合共同解读从传送器装备MA阵列的通信讯号。
  • methods: 使用最大比率组合技术来实现多个目标的共同解读,并且将传送器的天线组合优化为最大化受到信号噪音比。
  • results: 这种方法可以实现高效的无线通信,并且比过去的参考模型表现出更好的性能。将MA阵列的位置优化可以获得更高的受到信号噪音比,并且可以实现更好的均衡多点接收。
    Abstract We consider the movable antenna (MA) array-enabled wireless communication with coordinate multi-point (CoMP) reception, where multiple destinations adopt the maximal ratio combination technique to jointly decode the common message sent from the transmitter equipped with the MA array. Our goal is to maximize the effective received signal-to-noise ratio, by jointly optimizing the transmit beamforming and the positions of the MA array. Although the formulated problem is highly non-convex, we reveal that it is fundamental to maximize the principal eigenvalue of a hermite channel matrix which is a function of the positions of the MA array. The corresponding sub-problem is still non-convex, for which we develop a computationally efficient algorithm. Afterwards, the optimal transmit beamforming is determined with a closed-form solution. In addition, the theoretical performance upper bound is analyzed. Since the MA array brings an additional spatial degree of freedom by flexibly adjusting all antennas' positions, it achieves significant performance gain compared to competitive benchmarks.
    摘要 我们考虑了可移动天线(MA)数组启用的无线通信,使多个目标使用最大比率组合技术同时解码发送者配备MA数组的共同消息。我们的目标是 Maximizing the effective received signal-to-noise ratio,通过同时优化发射扫描和MA数组的位置来实现。虽然问题是非常不对称,但我们发现,最大化 hermite channel 矩阵的主要特征值是 MA 数组位置的函数。相应的子问题仍然是非对称的,我们开发了 computationally efficient 算法。然后,最佳发射扫描是通过关闭式解决方案确定的。此外,我们还分析了理论性能Upper bound。由于 MA 数组通过自由调整所有天线的位置,增加了空间学度的自由度,因此实现了比竞争benchmarks更高的性能。

Robust Multidimentional Chinese Remainder Theorem for Integer Vector Reconstruction

  • paper_url: http://arxiv.org/abs/2311.11804
  • repo_url: None
  • paper_authors: Li Xiao, Haiye Huo, Xiang-Gen Xia
  • for: 该 paper targets the problem of robustly reconstructing an integer vector from its erroneous remainders in multidimensional (MD) signal processing.
  • methods: 该 paper 使用了robust MD Chinese remainder theorem (CRT),which is recently proposed for a special class of moduli, but with a strict constraint on the moduli. The paper investigates the robust MD-CRT for a general set of moduli and presents a necessary and sufficient condition on the difference between paired remainder errors, as well as a simple sufficient condition on the remainder error bound.
  • results: 该 paper presents a closed-form reconstruction algorithm and generalizes the results of the robust MD-CRT from integer vectors/matrices to real ones. The paper validates the robust MD-CRT for general moduli through numerical simulations and applies it to MD sinusoidal frequency estimation based on multiple sub-Nyquist samplers.
    Abstract The problem of robustly reconstructing an integer vector from its erroneous remainders appears in many applications in the field of multidimensional (MD) signal processing. To address this problem, a robust MD Chinese remainder theorem (CRT) was recently proposed for a special class of moduli, where the remaining integer matrices left-divided by a greatest common left divisor (gcld) of all the moduli are pairwise commutative and coprime. The strict constraint on the moduli limits the usefulness of the robust MD-CRT in practice. In this paper, we investigate the robust MD-CRT for a general set of moduli. We first introduce a necessary and sufficient condition on the difference between paired remainder errors, followed by a simple sufficient condition on the remainder error bound, for the robust MD-CRT for general moduli, where the conditions are associated with (the minimum distances of) these lattices generated by gcld's of paired moduli, and a closed-form reconstruction algorithm is presented. We then generalize the above results of the robust MD-CRT from integer vectors/matrices to real ones. Finally, we validate the robust MD-CRT for general moduli by employing numerical simulations, and apply it to MD sinusoidal frequency estimation based on multiple sub-Nyquist samplers.
    摘要 “多维度(MD)信号处理中,有问题 robustly 从其不准确余值中重建整数 вектор。为解决这问题,一种 robust MD Chinese remainder theorem(CRT)最近被提出,但这问题的紧张限制了实际应用。本文研究robust MD-CRT 的一般模仿,包括对于一般模仿的必要和充分条件、对于不同对象的充分条件、以及一个关于这些体积生成的最小距离的封闭式重建算法。然后,我们将这些结果扩展到实数 вектор/矩阵上。最后,我们透过数字实验验证robust MD-CRT 的可靠性,并将其应用到多 sub-Nyquist 探针中的 MD 谐波频率估计。”

A Zero-Forcing Approach for the RIS-Aided MIMO Broadcast Channel

  • paper_url: http://arxiv.org/abs/2311.11769
  • repo_url: None
  • paper_authors: Dominik Semmler, Michael Joham, Wolfgang Utschick
  • for: 这 paper 是为了最大化多用户智能面带有多输入多输出广播通道的含义 спектраль效率 (SE) 的算法。
  • methods: 这 paper 使用了零做法,包括用户分配,以确保计算独立于智能面元素的数量。特别是, paper 提出了两种算法,利用基站和智能面之间的直线结构来提高 SE 性能。
  • results: simulations 表明,这些算法可以与其他线性 precoding 算法相比,具有更高的 SE 性能,但具有较低的复杂性。
    Abstract We present efficient algorithms for the sum-spectral efficiency (SE) maximization of the multi-user reconfigurable intelligent surface (RIS)-aided multiple-input multiple-output (MIMO) broadcast channel based on a zero-forcing approach. These methods conduct a user allocation for which the computation is independent of the number of elements at the RIS, that is usually large. Specifically, two algorithms are given that exploit the line-of-sight (LOS) structure between the base station (BS) and the RIS. Simulations show superior SE performance compared to other linear precoding algorithms but with lower complexity.
    摘要 我们提出了高效的算法,用于多用户智能表面受助多输入多输出广播频道上的总spectral efficiency(SE)最大化,基于零强制方法。这些方法在RIS中元素数量约为大的情况下,实现了用户分配,计算独立于RIS元素数量。 Specifically,我们提出了两种算法,利用基站(BS)与RIS之间的直线结构。模拟结果表明,我们的算法在其他线性 precoding 算法比较高,但具有较低的复杂度。

AIaaS for ORAN-based 6G Networks: Multi-time scale slice resource management with DRL

  • paper_url: http://arxiv.org/abs/2311.11668
  • repo_url: None
  • paper_authors: Suvidha Mhatre, Ferran Adelantado, Kostas Ramantas, Christos Verikoukis
  • for: 本研究旨在Addressing slice resource management for 6G networks at different time scales, using an open radio access network (ORAN) architecture and artificial intelligence (AI) techniques.
  • methods: 提议的解决方案包括在网络边缘使用AI控制两级循环,以实现优化性性能比其他技术。ORAN支持可编程的网络架构,以便实现多级时间尺度的管理。
  • results: 提议的算法可以分析最大资源利用率,以取得决策在 интер-slice 级别。内 slice 智能代理工作在非实时层次,重新配置资源在不同的slice中。 results 表明,该方法可以提高 eMBB、URLLC 和 mMTC 等slice类型的性能。
    Abstract This paper addresses how to handle slice resources for 6G networks at different time scales in an architecture based on an open radio access network (ORAN). The proposed solution includes artificial intelligence (AI) at the edge of the network and applies two control-level loops to obtain optimal performance compared to other techniques. The ORAN facilitates programmable network architectures to support such multi-time scale management using AI approaches. The proposed algorithms analyze the maximum utilization of resources from slice performance to take decisions at the inter-slice level. Inter-slice intelligent agents work at a non-real-time level to reconfigure resources within various slices. Further than meeting the slice requirements, the intra-slice objective must also include the minimization of maximum resource utilization. This enables smart utilization of the resources within each slice without affecting slice performance. Here, each xApp that is an intra-slice agent aims at meeting the optimal QoS of the users, but at the same time, some inter-slice objectives should be included to coordinate intra- and inter-slice agents. This is done without penalizing the main intra-slice objective. All intelligent agents use deep reinforcement learning (DRL) algorithms to meet their objectives. We have presented results for enhanced mobile broadband (eMBB), ultra-reliable low latency (URLLC), and massive machine type communication (mMTC) slice categories.
    摘要 To optimize resource utilization, the proposed algorithms analyze the maximum utilization of resources from a slice performance perspective and make decisions at the inter-slice level. Inter-slice intelligent agents work at a non-real-time level to reconfigure resources within various slices. Additionally, the intra-slice objective should minimize maximum resource utilization to ensure smart resource utilization within each slice without affecting slice performance.Each xApp, or intra-slice agent, aims to meet the optimal quality of service (QoS) of users while coordinating with inter-slice objectives. This is achieved without penalizing the main intra-slice objective. All intelligent agents use deep reinforcement learning (DRL) algorithms to meet their objectives.The paper presents results for enhanced mobile broadband (eMBB), ultra-reliable low latency (URLLC), and massive machine type communication (mMTC) slice categories, demonstrating the effectiveness of the proposed solution.

RIS-Parametrized Rich-Scattering Environments: Physics-Compliant Models, Channel Estimation, and Optimization

  • paper_url: http://arxiv.org/abs/2311.11651
  • repo_url: None
  • paper_authors: Philipp del Hougne
  • for: 这个论文旨在探讨如何使用可 Configurable 智能面 (RIS) 来控制复杂的吸收物 scattering 环境,以实现智能无线环境。
  • methods: 论文使用 physics-compliant 模型来描述 RIS-parametrized 吸收物 scattering 环境,并使用 open-loop 控制来优化 RIS 配置。
  • results: 论文提出了一种 physics-compliant 模型,可以用来 parametrically 模拟 RIS-parametrized 吸收物 scattering 环境,并且可以在不知道实验环境的情况下优化 RIS 配置。
    Abstract The tunability of radio environments with reconfigurable intelligent surfaces (RISs) enables the paradigm of smart radio environments in which wireless system engineers are no longer limited to only controlling the radiated signals but can in addition also optimize the wireless channels. Many practical radio environments include complex scattering objects, especially indoor and factory settings. Multipath propagation therein creates seemingly intractable coupling effects between RIS elements, leading to the following questions: How can a RIS-parametrized rich-scattering environment be modelled in a physics-compliant manner? Can the parameters of such a model be estimated for a specific but unknown experimental environment? And how can the RIS configuration be optimized given a calibrated physics-compliant model? This chapter summarizes the current state of the art in this field, highlighting the recently unlocked potential of frugal physical-model-based open-loop control of RIS-parametrized rich-scattering radio environments.
    摘要 《受控Radio环境中的智能表面(RIS)的可调性,使得无线系统工程师不再只能控制发射的信号,还可以优化无线通道。实际的无线环境中存在复杂的散射物,特别是室内和工厂环境。多路射传输使得RIS元素之间存在许多难以解决的干扰效应,这引出了以下问题:如何在物理相容的方式模拟RIS参数化的 ricH-scattering 环境?可以估算特定而不确定的实验环境中RIS参数的模型吗?如何基于加工物理相容的模型来优化RIS配置?本章简要总结当前领域的最新进展,强调最近解锁了基于减少物理模型的开 loop控制RIS参数化 ricH-scattering 无线环境的潜力。》Note: "ricH-scattering" is a term used to describe the complex scattering behavior of signals in environments with many reflective objects, such as indoor and factory settings.

High-performance cVEP-BCI under minimal calibration

  • paper_url: http://arxiv.org/abs/2311.11596
  • repo_url: None
  • paper_authors: Yining Miao, Nanlin Shi, Changxing Huang, Yonghao Song, Xiaogang Chen, Yijun Wang, Xiaorong Gao
  • for: 提高 code-modulated visual evoked potential-based BCIs(cVEP-BCIs)的高速性和可用性。
  • methods: 使用快速干扰(WN)的广带干扰,并提出了两种方法:线性模型方法和 пере移学习技术。
  • results: 实现了最高的信息传输率(ITR)250 bits per minute(bpm),与当前的稳态视觉诱发potential-based BCIs(SSVEP-BCIs)相当。
    Abstract The ultimate goal of brain-computer interfaces (BCIs) based on visual modulation paradigms is to achieve high-speed performance without the burden of extensive calibration. Code-modulated visual evoked potential-based BCIs (cVEP-BCIs) modulated by broadband white noise (WN) offer various advantages, including increased communication speed, expanded encoding target capabilities, and enhanced coding flexibility. However, the complexity of the spatial-temporal patterns under broadband stimuli necessitates extensive calibration for effective target identification in cVEP-BCIs. Consequently, the information transfer rate (ITR) of cVEP-BCI under limited calibration usually stays around 100 bits per minute (bpm), significantly lagging behind state-of-the-art steady-state visual evoked potential-based BCIs (SSVEP-BCIs), which achieve rates above 200 bpm. To enhance the performance of cVEP-BCIs with minimal calibration, we devised an efficient calibration stage involving a brief single-target flickering, lasting less than a minute, to extract generalizable spatial-temporal patterns. Leveraging the calibration data, we developed two complementary methods to construct cVEP temporal patterns: the linear modeling method based on the stimulus sequence and the transfer learning techniques using cross-subject data. As a result, we achieved the highest ITR of 250 bpm under a minute of calibration, which has been shown to be comparable to the state-of-the-art SSVEP paradigms. In summary, our work significantly improved the cVEP performance under few-shot learning, which is expected to expand the practicality and usability of cVEP-BCIs.
    摘要 BCIs based on visual modulation paradigms 的最终目标是实现高速性表现,无需耗费庞大的调整。基于宽频白噪(WN)的 cVEP-BCI 具有许多优点,包括提高的交通速率、扩展的编码目标能力和提高的编码flexibility。然而,在宽频刺激下,需要进行广泛的调整以便有效地识别目标。因此, cVEP-BCI 的信息传输率(ITR)在有限的调整下通常在 100 位/分钟(bpm),远远落后于当前的稳态 visual evoked potential(SSVEP)基于 BCIs,它们的 ITR 超过 200 bpm。为了提高 cVEP-BCI 的性能,我们设计了一个高效的快速单目标抖摆calibration阶段,持续时间低于 1 分钟,以EXTRACT GENERALIZABLE SPATIAL-TEMPORAL PATTERNS。通过利用 calibration 数据,我们开发了两种COMPLEMENTARY METHODS TO CONSTRUCT cVEP temporal patterns:基于刺激序列的直线模型方法和使用交叉主题数据的转移学习技术。因此,我们实现了最高的 ITR 250 bpm ,在有限的 calibration 下,这个成果已经被证明可以与当前 SSVEP 方法相比。总之,我们的工作在几个shot学习中提高了 cVEP 性能,这可能扩大 BCIs 的实用性和可用性。

Joint Design of ISAC Waveform under PAPR Constraints

  • paper_url: http://arxiv.org/abs/2311.11594
  • repo_url: None
  • paper_authors: Yating Chen, Cai Wen, Yan Huang, Le Liang, Jie Li, Hui Zhang, Wei Hong
  • for: 本文解决了集成感知通信(ISAC)波形的预编码问题,并提出了一种基于幂函数约束的非对称射频规划(QCQP)方法。
  • methods: 本文提出了一种基于交叉方向法(ADMM)的高效算法,可以减少多变量的coupling,并提供一个封闭式解决方案。
  • results: 实验结果表明,提出的双功能波形可以提供良好的通信质量服务(QoS)和感知性能。
    Abstract In this paper, we formulate the precoding problem of integrated sensing and communication (ISAC) waveform as a non-convex quadratically constrainted quadratic program (QCQP), in which the weighted sum of communication multi-user interference (MUI) and the gap between dual-use waveform and ideal radar waveform is minimized with peak-to-average power ratio (PAPR) constraints. We propose an efficient algorithm based on alternating direction method of multipliers (ADMM), which is able to decouple multiple variables and provide a closed-form solution for each subproblem. In addition, to improve the sensing performance in both spatial and temporal domains, we propose a new criteria to design the ideal radar waveform, in which the beam pattern is made similar to the ideal one and the integrated sidelobe level of the ambiguity function in each target direction is minimized in the region of interest. The limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm is applied to the design of the ideal radar waveform which works as a reference in the design of the dual-function waveform. Numerical results indicate that the designed dual-function waveform is capable of offering good communication quality of service (QoS) and sensing performance.
    摘要 在本文中,我们将 интеграted sensing and communication(ISAC)波形的预处理问题设为非对称quadratically constrained quadratic program(QCQP),其中Weighted sum of communication multi-user interference(MUI)和理想 radar 波形与实际 radar 波形之间的差异是最小化的,同时受限于峰值至平均功率比(PAPR)的约束。我们提出了一种高效的 alternating direction method of multipliers(ADMM)算法,可以分解多个变量并提供每个子问题的关闭式解。此外,为了提高探测性能在空间和时域两个领域,我们提出了一新的标准来设计理想 radar 波形,其中扫描方式与理想波形类似,并在兴趣区域内尽量减少了各个方向的混合副幅水平。我们使用了有限记忆 Broyden-Fletcher-Goldfarb-Shanno(L-BFGS)算法来设计理想 radar 波形,该波形作为双功能波形的参照。numerical results表明,设计的双功能波形能够提供良好的通信质量服务(QoS)和探测性能。

Asymptotic CRB Analysis of Random RIS-Assisted Large-Scale Localization Systems

  • paper_url: http://arxiv.org/abs/2311.11582
  • repo_url: None
  • paper_authors: Zhengyu Wang, Hongzheng Liu, Rujing Xiong, Fuhai Wang, Robert Caiming Qiu
  • for: 这 paper 研究了一种随机 RIS 协助的多 Target 本地化系统的性能,其中 RIS 的配置采用随机设置,以避免高复杂度优化。
  • methods: 我们首先关注了 RIS 元素数量很多的场景,然后 obtaint 了 CRB 的扩展征具下降规律,发现 CRB 在 RIS 维度增加第三或第四阶时下降。然后,我们推广我们的分析至大规模系统,其中包括多个 Target 和感知器。
  • results: 我们发现,在提posed 随机配置 RIS 系统中, asymptotic 公式可以准确地 approximate exact CRB。numerical 结果表明,随机配置 RIS 对 CRB 的影响是非常显著的。
    Abstract This paper studies the performance of a randomly RIS-assisted multi-target localization system, in which the configurations of the RIS are randomly set to avoid high-complexity optimization. We first focus on the scenario where the number of RIS elements is significantly large, and then obtain the scaling law of Cram\'er-Rao bound (CRB) under certain conditions, which shows that CRB decreases in the third or fourth order as the RIS dimension increases. Second, we extend our analysis to large systems where both the number of targets and sensors is substantial. Under this setting, we explore two common RIS models: the constant module model and the discrete amplitude model, and illustrate how the random RIS configuration impacts the value of CRB. Numerical results demonstrate that asymptotic formulas provide a good approximation to the exact CRB in the proposed randomly configured RIS systems.
    摘要 这篇论文研究了随机RIS协助的多Target定位系统的性能,其中RIS配置随机设置以避免高复杂性优化。我们首先关注了RIS元素数量相对较多的场景,然后获得CRB下降的幂级关系,显示CRB在RIS维度增加的第三或第四阶幂下降。其次,我们扩展我们的分析至大规模系统,其中目标和感知器的数量都是substantial。在这种设置下,我们探讨了两种常见RIS模型:常数模块模型和随机振荡模型,并解释了随机RIS配置对CRB的影响。数值结果表明,幂级公式可以准确地 aproximate exact CRB在我们提议的随机配置RIS系统中。

On the Effective throughput of Shadowed Beaulieu-Xie fading channel

  • paper_url: http://arxiv.org/abs/2311.11521
  • repo_url: None
  • paper_authors: Manpreet Kaur, Sandeep Kumar, Poonam Yadav, Puspraj Singh Chauhan
  • for: investigate data rate performance across various fading scenarios
  • methods: PDF-based approach, low-SNR and high-SNR approximation
  • results: effective throughput and impact of system parameters and delay parameter on effective throughput
    Abstract Given the imperative for advanced wireless networks in the next generation and the rise of real-time applications within wireless communication, there is a notable focus on investigating data rate performance across various fading scenarios. This research delved into analyzing the effective throughput of the shadowed Beaulieu-Xie (SBX) composite fading channel using the PDF-based approach. To get the simplified relationship between the performance parameter and channel parameters, the low-SNR and the high-SNR approximation of the effective rate are also provided. The proposed formulations are evaluated for different values of system parameters to study their impact on the effective throughput. Also, the impact of the delay parameter on the EC is investigated. Monte-Carlo simulations are used to verify the facticity of the deduced equations.
    摘要 随着下一代无线网络的需求和实时应用在无线通信中的增长,研究数据传输速率在不同的抑噪场景下的性能已经得到了广泛的关注。本研究利用PDF基本方法对shadowed Beaulieu-Xie(SBX)复杂抑噪通道的有效传输率进行分析。为了获得简化关系 между性能参数和通道参数,本研究还提供了低SNR和高SNR的近似值。通过不同系统参数的调整,研究对效果传输率的影响。此外,延迟参数对EC的影响也被调查。使用Monte Carlo仿真来验证推导出的公式。